Linux Tegra architecture development

Linux Tegra architecture development
 help / color / mirror / Atom feed

* Re: [PATCH v3 2/3] iommu/arm-smmu-v3: Detect Tegra264 erratum
From: Ashish Mhetre @ 2026-06-09  7:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Will Deacon, robin.murphy, joro, nicolinc, linux-arm-kernel,
	iommu, linux-kernel, linux-tegra
In-Reply-To: <20260605141053.GF2487554@ziepe.ca>



On 6/5/2026 7:40 PM, Jason Gunthorpe wrote:
> External email: Use caution opening links or attachments
>
>
> On Fri, Jun 05, 2026 at 07:35:35PM +0530, Ashish Mhetre wrote:
>>>> +{
>>>> +     if (!(smmu->options & ARM_SMMU_OPT_TLBI_TWICE))
>>>> +             return false;
>>> Maybe we should make this a static key?
>> Okay. Shall I add just static key and remove option bit, or
>> have static key alongside existing option bit such that
>> static_branch_unlikely will precede the option bit check?
> You'd have the static key and the options. Keep it simple, enable the
> static key once if any driver probes to set TWICE. Check the key
> before options to get the best code gen

Okay, I'll incorporate this in V4 and send.

> But IDK if it is really worth it, there are already lots of branches
> on the performance tlbi flow, and we didn't do this for other tlbi
> affecting errata..
>
> IDK if we really care about branches we should also be doing things
> like disabling the range/non-range paths and ATC based on what is
> actually in use..
>
> Jason

^ permalink raw reply

* Re: [PATCH v2 00/14] list: Prepare entry iterators to cache cursor state
From: Andy Shevchenko @ 2026-06-09  7:05 UTC (permalink / raw)
  To: Kaitao Cheng
  Cc: Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai, Laurent Pinchart, Jonas Karlman,
	Jernej Skrabec, Matthew Auld, Matthew Brost, Waiman Long,
	drbd-dev, linux-block, linux1394-devel, dri-devel, intel-gfx,
	linux-spi, linux-stm32, linux-arm-kernel, linux-tegra,
	linux-sound, linux-kernel, Andrew Morton, Randy Dunlap,
	Christian Brauner, David Howells, Luca Ceresoli, Kaito Cheng
In-Reply-To: <aie299WveL1utNya@ashevche-desk.local>

On Tue, Jun 09, 2026 at 09:47:34AM +0300, Andy Shevchenko wrote:
> On Tue, Jun 09, 2026 at 02:13:33PM +0800, Kaitao Cheng wrote:
> > 
> > This series prepares for, and then updates, the list_for_each_entry()
> > family so the common entry iterators cache their next or previous cursor
> > before the loop body runs.

While code looks okay, this doesn't explain "why?" aspects.

> > The first 13 patches open-code loops that intentionally depend on the
> > old "derive the next entry from the current cursor at the end of the
> > iteration" behaviour.  These loops append work to the list being walked,
> > restart traversal after dropping a lock, skip an entry consumed by the
> > current iteration, or otherwise adjust the cursor in the loop body.
> > 
> > The final patch changes include/linux/list.h to keep a private cursor in
> > the common entry iterators while preserving the public macro interface.
> > The safe variants remain available when callers need the temporary
> > cursor explicitly or have stronger mutation requirements.
> 
> Something is really wrong with the patch series email chaining.
> Patches 3, 10, and 13 start the subthreads. Please, check your
> tools and fix them accordingly.
> 
> Note, `git format-patch ...` should not have this "side-effect"
> when used correctly.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH v2 07/14] spi: fsi: Open-code message transfer walk
From: Andy Shevchenko @ 2026-06-09  7:02 UTC (permalink / raw)
  To: Kaitao Cheng
  Cc: Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai, Laurent Pinchart, Jonas Karlman,
	Jernej Skrabec, Matthew Auld, Matthew Brost, Waiman Long,
	drbd-dev, linux-block, linux1394-devel, dri-devel, intel-gfx,
	linux-spi, linux-stm32, linux-arm-kernel, linux-tegra,
	linux-sound, linux-kernel, Andrew Morton, Randy Dunlap,
	Christian Brauner, David Howells, Luca Ceresoli, Kaitao Cheng
In-Reply-To: <20260609062526.94907-5-kaitao.cheng@linux.dev>

On Tue, Jun 09, 2026 at 02:25:19PM +0800, Kaitao Cheng wrote:
>
> A later change will make list_for_each_entry() cache the next element
> before entering the loop body. fsi_spi_transfer_one_message() can combine
> the current transfer with the following transfer and then advance the
> cursor to that consumed entry.
> 
> Keep the transfer walk open-coded so the loop step observes that cursor
> update and skips the consumed transfer. This preserves the existing
> message sequencing semantics and prepares the code for the list iterator
> update.

...

> -	list_for_each_entry(transfer, &mesg->transfers, transfer_list) {
> +	for (transfer = list_first_entry(&mesg->transfers,
> +					 typeof(*transfer), transfer_list);

You can keep this on a single line for more logical split.

	for (transfer = list_first_entry(&mesg->transfers, typeof(*transfer), transfer_list);

it's under relaxed limits for the line length.

> +	     !list_entry_is_head(transfer, &mesg->transfers, transfer_list);
> +	     transfer = list_next_entry(transfer, transfer_list)) {

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH v2 04/14] drm/i915/gt: Open-code active timeline walk
From: Andy Shevchenko @ 2026-06-09  7:00 UTC (permalink / raw)
  To: Kaitao Cheng
  Cc: Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai, Laurent Pinchart, Jonas Karlman,
	Jernej Skrabec, Matthew Auld, Matthew Brost, Waiman Long,
	drbd-dev, linux-block, linux1394-devel, dri-devel, intel-gfx,
	linux-spi, linux-stm32, linux-arm-kernel, linux-tegra,
	linux-sound, linux-kernel, Andrew Morton, Randy Dunlap,
	Christian Brauner, David Howells, Luca Ceresoli, Kaitao Cheng
In-Reply-To: <20260609062526.94907-2-kaitao.cheng@linux.dev>

On Tue, Jun 09, 2026 at 02:25:16PM +0800, Kaitao Cheng wrote:
> 
> A later change will make list_for_each_entry() cache the next element
> before entering the loop body. __intel_gt_unset_wedged() drops
> timelines->lock while waiting on a fence and then restarts the walk from
> the list head after the lock is reacquired.
> 
> Keep the loop open-coded so the next timeline is selected after that
> restart logic has run. This preserves the existing lock-drop traversal
> semantics and prepares the code for the list iterator update.

...

>  	spin_lock(&timelines->lock);
> -	list_for_each_entry(tl, &timelines->active_list, link) {
> +	for (tl = list_first_entry(&timelines->active_list, typeof(*tl), link);
> +	     !list_entry_is_head(tl, &timelines->active_list, link);
> +	     tl = list_next_entry(tl, link)) {

Yeah, these cases should rather be converted to do {} while or while-loop.
This will make the intention clearer and reduces the possibility that someone
mistakenly changes these back to use list_for_each_entry().

See, for example, deferred_probe_work_func() implementation.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH v2 00/14] list: Prepare entry iterators to cache cursor state
From: Andy Shevchenko @ 2026-06-09  6:47 UTC (permalink / raw)
  To: Kaitao Cheng
  Cc: Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai, Laurent Pinchart, Jonas Karlman,
	Jernej Skrabec, Matthew Auld, Matthew Brost, Waiman Long,
	drbd-dev, linux-block, linux1394-devel, dri-devel, intel-gfx,
	linux-spi, linux-stm32, linux-arm-kernel, linux-tegra,
	linux-sound, linux-kernel, Andrew Morton, Randy Dunlap,
	Christian Brauner, David Howells, Luca Ceresoli, Kaito Cheng
In-Reply-To: <20260609061347.93688-1-kaitao.cheng@linux.dev>

On Tue, Jun 09, 2026 at 02:13:33PM +0800, Kaitao Cheng wrote:
> 
> This series prepares for, and then updates, the list_for_each_entry()
> family so the common entry iterators cache their next or previous cursor
> before the loop body runs.
> 
> The first 13 patches open-code loops that intentionally depend on the
> old "derive the next entry from the current cursor at the end of the
> iteration" behaviour.  These loops append work to the list being walked,
> restart traversal after dropping a lock, skip an entry consumed by the
> current iteration, or otherwise adjust the cursor in the loop body.
> 
> The final patch changes include/linux/list.h to keep a private cursor in
> the common entry iterators while preserving the public macro interface.
> The safe variants remain available when callers need the temporary
> cursor explicitly or have stronger mutation requirements.

Something is really wrong with the patch series email chaining.
Patches 3, 10, and 13 start the subthreads. Please, check your
tools and fix them accordingly.

Note, `git format-patch ...` should not have this "side-effect"
when used correctly.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* [PATCH v2 14/14] list: Cache cursors in entry iterators
From: Kaitao Cheng @ 2026-06-09  6:41 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609064122.95825-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

The non-safe list_for_each_entry() family advances by deriving the next
element from the current cursor in the loop step. If the loop body
unlinks the current entry, the step can no longer rely on the current
entry's list pointers.

Callers can use the _safe variants today, but those interfaces require a
temporary cursor to be declared outside the macro. That is necessary when
the caller actually needs the temporary cursor, but it looks redundant
and awkward when the cursor is only there to satisfy the macro and is
never otherwise used.

Add private next and previous cursors for the common entry iterators and
use unique internal names so callers keep the same interface. This lets
the loop step use a cursor captured before the body runs, while callers
that need to alter traversal state can still open-code the walk.

The safe variants remain useful when the caller needs access to the
temporary cursor or has stronger mutation requirements. Update their
comments to steer users toward the simpler iterators when that temporary
cursor is not needed.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 include/linux/list.h | 46 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/include/linux/list.h b/include/linux/list.h
index 09d979976b3b..9df84a56a789 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -809,6 +809,29 @@ static inline size_t list_count_nodes(struct list_head *head)
 #define list_entry_is_head(pos, head, member)				\
 	list_is_head(&pos->member, (head))
 
+#define __list_for_each_entry(pos, next, head, member)			\
+	for (typeof(pos) next = list_next_entry(pos =			\
+		list_first_entry(head, typeof(*pos), member), member);	\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = next, next = list_next_entry(next, member))
+
+#define __list_for_each_entry_reverse(pos, prev, head, member)		\
+	for (typeof(pos) prev = list_prev_entry(pos =			\
+		list_last_entry(head, typeof(*pos), member), member);	\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = prev, prev = list_prev_entry(prev, member))
+
+#define __list_for_each_entry_continue(pos, next, head, member)		\
+	for (typeof(pos) next = list_next_entry(pos =			\
+		list_next_entry(pos, member), member);			\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = next, next = list_next_entry(next, member))
+
+#define __list_for_each_entry_from(pos, next, head, member)		\
+	for (typeof(pos) next = list_next_entry(pos, member);		\
+	     !list_entry_is_head(pos, head, member);			\
+	     pos = next, next = list_next_entry(next, member))
+
 /**
  * list_for_each_entry	-	iterate over list of given type
  * @pos:	the type * to use as a loop cursor.
@@ -816,9 +839,7 @@ static inline size_t list_count_nodes(struct list_head *head)
  * @member:	the name of the list_head within the struct.
  */
 #define list_for_each_entry(pos, head, member)				\
-	for (pos = list_first_entry(head, typeof(*pos), member);	\
-	     !list_entry_is_head(pos, head, member);			\
-	     pos = list_next_entry(pos, member))
+	__list_for_each_entry(pos, __UNIQUE_ID(next), head, member)
 
 /**
  * list_for_each_entry_reverse - iterate backwards over list of given type.
@@ -827,9 +848,7 @@ static inline size_t list_count_nodes(struct list_head *head)
  * @member:	the name of the list_head within the struct.
  */
 #define list_for_each_entry_reverse(pos, head, member)			\
-	for (pos = list_last_entry(head, typeof(*pos), member);		\
-	     !list_entry_is_head(pos, head, member); 			\
-	     pos = list_prev_entry(pos, member))
+	__list_for_each_entry_reverse(pos, __UNIQUE_ID(prev), head, member)
 
 /**
  * list_prepare_entry - prepare a pos entry for use in list_for_each_entry_continue()
@@ -852,9 +871,7 @@ static inline size_t list_count_nodes(struct list_head *head)
  * the current position.
  */
 #define list_for_each_entry_continue(pos, head, member) 		\
-	for (pos = list_next_entry(pos, member);			\
-	     !list_entry_is_head(pos, head, member);			\
-	     pos = list_next_entry(pos, member))
+	__list_for_each_entry_continue(pos, __UNIQUE_ID(next), head, member)
 
 /**
  * list_for_each_entry_continue_reverse - iterate backwards from the given point
@@ -879,8 +896,7 @@ static inline size_t list_count_nodes(struct list_head *head)
  * Iterate over list of given type, continuing from current position.
  */
 #define list_for_each_entry_from(pos, head, member) 			\
-	for (; !list_entry_is_head(pos, head, member);			\
-	     pos = list_next_entry(pos, member))
+	__list_for_each_entry_from(pos, __UNIQUE_ID(next), head, member)
 
 /**
  * list_for_each_entry_from_reverse - iterate backwards over list of given type
@@ -901,6 +917,8 @@ static inline size_t list_count_nodes(struct list_head *head)
  * @n:		another type * to use as temporary storage
  * @head:	the head for your list.
  * @member:	the name of the list_head within the struct.
+ *
+ * Prefer list_for_each_entry() unless the temporary cursor is needed.
  */
 #define list_for_each_entry_safe(pos, n, head, member)			\
 	for (pos = list_first_entry(head, typeof(*pos), member),	\
@@ -917,6 +935,8 @@ static inline size_t list_count_nodes(struct list_head *head)
  *
  * Iterate over list of given type, continuing after current point,
  * safe against removal of list entry.
+ *
+ * Prefer list_for_each_entry_continue() unless the temporary cursor is needed.
  */
 #define list_for_each_entry_safe_continue(pos, n, head, member) 		\
 	for (pos = list_next_entry(pos, member), 				\
@@ -933,6 +953,8 @@ static inline size_t list_count_nodes(struct list_head *head)
  *
  * Iterate over list of given type from current point, safe against
  * removal of list entry.
+ *
+ * Prefer list_for_each_entry_from() unless the temporary cursor is needed.
  */
 #define list_for_each_entry_safe_from(pos, n, head, member) 			\
 	for (n = list_next_entry(pos, member);					\
@@ -948,6 +970,8 @@ static inline size_t list_count_nodes(struct list_head *head)
  *
  * Iterate backwards over list of given type, safe against removal
  * of list entry.
+ *
+ * Prefer list_for_each_entry_reverse() unless the temporary cursor is needed.
  */
 #define list_for_each_entry_safe_reverse(pos, n, head, member)		\
 	for (pos = list_last_entry(head, typeof(*pos), member),		\
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 13/14] ASoC: dapm: Open-code widget invalidation walk
From: Kaitao Cheng @ 2026-06-09  6:41 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609061347.93688-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. dapm_widget_invalidate_paths() appends
newly reached widgets to the temporary work list while walking it.

Keep the work-list walk open-coded so the next widget is looked up after
new widgets have been appended. This preserves the existing invalidation
traversal semantics and prepares the code for the list iterator update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 sound/soc/soc-dapm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c
index d6192204e613..5bd921fca132 100644
--- a/sound/soc/soc-dapm.c
+++ b/sound/soc/soc-dapm.c
@@ -255,7 +255,9 @@ static __always_inline void dapm_widget_invalidate_paths(
 	list_add_tail(&w->work_list, &list);
 	w->endpoints[dir] = -1;
 
-	list_for_each_entry(w, &list, work_list) {
+	for (w = list_first_entry(&list, typeof(*w), work_list);
+	     !list_entry_is_head(w, &list, work_list);
+	     w = list_next_entry(w, work_list)) {
 		snd_soc_dapm_widget_for_each_path(w, dir, p) {
 			if (p->is_supply || !p->connect)
 				continue;
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 12/14] locking/ww_mutex: Open-code stress reorder list walk
From: Kaitao Cheng @ 2026-06-09  6:38 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609063855.95710-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. stress_reorder_work() can move list
entries while handling wound/wait locking conflicts and then continue
from the adjusted cursor.

Keep the list walk open-coded so the loop step observes the cursor
selected by the body. This preserves the existing stress-test traversal
semantics and prepares the code for the list iterator update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 kernel/locking/test-ww_mutex.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 838d631544ed..08a6ab5ac041 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -519,7 +519,9 @@ static void stress_reorder_work(struct work_struct *work)
 	do {
 		ww_acquire_init(&ctx, stress->class);
 
-		list_for_each_entry(ll, &locks, link) {
+		for (ll = list_first_entry(&locks, typeof(*ll), link);
+		     !list_entry_is_head(ll, &locks, link);
+		     ll = list_next_entry(ll, link)) {
 			err = ww_mutex_lock(ll->lock, &ctx);
 			if (!err)
 				continue;
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 11/14] locking/locktorture: Open-code ww mutex list walk
From: Kaitao Cheng @ 2026-06-09  6:38 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609063855.95710-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. The ww-mutex torture path can move list
entries while it resolves a wound/wait conflict and then continue from
the adjusted cursor.

Keep the list walk open-coded so the loop step observes the cursor
selected by the body. This preserves the existing stress-test traversal
semantics and prepares the code for the list iterator update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 kernel/locking/locktorture.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index e618bcf75e2d..0eb75e9bccaa 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -644,7 +644,9 @@ __acquires(torture_ww_mutex_2)
 
 	ww_acquire_init(ctx, &torture_ww_class);
 
-	list_for_each_entry(ll, &list, link) {
+	for (ll = list_first_entry(&list, typeof(*ll), link);
+	     !list_entry_is_head(ll, &list, link);
+	     ll = list_next_entry(ll, link)) {
 		int err;
 
 		err = ww_mutex_lock(ll->lock, ctx);
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 10/14] spi: tegra210-quad: Open-code message transfer walk
From: Kaitao Cheng @ 2026-06-09  6:38 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609061347.93688-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. tegra_qspi_non_combined_seq_xfer() can
consume the following transfer as part of the current operation and then
advance the loop cursor to that entry.

Keep the transfer walk open-coded so the loop step observes that cursor
update and skips the consumed transfer. This preserves the existing
message sequencing semantics and prepares the code for the list iterator
update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 drivers/spi/spi-tegra210-quad.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-tegra210-quad.c b/drivers/spi/spi-tegra210-quad.c
index db28dd556484..42dd5cf53c67 100644
--- a/drivers/spi/spi-tegra210-quad.c
+++ b/drivers/spi/spi-tegra210-quad.c
@@ -1302,7 +1302,9 @@ static int tegra_qspi_non_combined_seq_xfer(struct tegra_qspi *tqspi,
 	if (tqspi->soc_data->supports_tpm)
 		val &= ~QSPI_TPM_WAIT_POLL_EN;
 	tegra_qspi_writel(tqspi, val, QSPI_GLOBAL_CONFIG);
-	list_for_each_entry(transfer, &msg->transfers, transfer_list) {
+	for (transfer = list_first_entry(&msg->transfers, typeof(*transfer), transfer_list);
+	     !list_entry_is_head(transfer, &msg->transfers, transfer_list);
+	     transfer = list_next_entry(transfer, transfer_list)) {
 		struct spi_transfer *xfer = transfer;
 		u8 dummy_bytes = 0;
 		u32 cmd1;
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 09/14] spi: stm32-qspi: Open-code message transfer walk
From: Kaitao Cheng @ 2026-06-09  6:25 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609062526.94907-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. stm32_qspi_transfer_one_message() can
consume the following transfer as part of the current operation and then
advance the loop cursor to that entry.

Keep the transfer walk open-coded so the loop step observes that cursor
update and skips the consumed transfer. This preserves the existing
message sequencing semantics and prepares the code for the list iterator
update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 drivers/spi/spi-stm32-qspi.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-stm32-qspi.c b/drivers/spi/spi-stm32-qspi.c
index df1bbacec90a..27d82a578c9f 100644
--- a/drivers/spi/spi-stm32-qspi.c
+++ b/drivers/spi/spi-stm32-qspi.c
@@ -577,7 +577,10 @@ static int stm32_qspi_transfer_one_message(struct spi_controller *ctrl,
 
 	gpiod_set_value_cansleep(spi_get_csgpiod(spi, 0), true);
 
-	list_for_each_entry(transfer, &msg->transfers, transfer_list) {
+	for (transfer = list_first_entry(&msg->transfers,
+					 typeof(*transfer), transfer_list);
+	     !list_entry_is_head(transfer, &msg->transfers, transfer_list);
+	     transfer = list_next_entry(transfer, transfer_list)) {
 		u8 dummy_bytes = 0;
 
 		memset(&op, 0, sizeof(op));
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 08/14] spi: stm32-ospi: Open-code message transfer walk
From: Kaitao Cheng @ 2026-06-09  6:25 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609062526.94907-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. stm32_ospi_transfer_one_message() can
consume the following transfer as part of the current operation and then
advance the loop cursor to that entry.

Keep the transfer walk open-coded so the loop step observes that cursor
update and skips the consumed transfer. This preserves the existing
message sequencing semantics and prepares the code for the list iterator
update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 drivers/spi/spi-stm32-ospi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-stm32-ospi.c b/drivers/spi/spi-stm32-ospi.c
index 4461c6e24b9e..4dc2b56b4c20 100644
--- a/drivers/spi/spi-stm32-ospi.c
+++ b/drivers/spi/spi-stm32-ospi.c
@@ -675,7 +675,9 @@ static int stm32_ospi_transfer_one_message(struct spi_controller *ctrl,
 
 	gpiod_set_value_cansleep(cs_gpiod, true);
 
-	list_for_each_entry(transfer, &msg->transfers, transfer_list) {
+	for (transfer = list_first_entry(&msg->transfers, typeof(*transfer), transfer_list);
+	     !list_entry_is_head(transfer, &msg->transfers, transfer_list);
+	     transfer = list_next_entry(transfer, transfer_list)) {
 		u8 dummy_bytes = 0;
 
 		memset(&op, 0, sizeof(op));
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 07/14] spi: fsi: Open-code message transfer walk
From: Kaitao Cheng @ 2026-06-09  6:25 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609062526.94907-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. fsi_spi_transfer_one_message() can combine
the current transfer with the following transfer and then advance the
cursor to that consumed entry.

Keep the transfer walk open-coded so the loop step observes that cursor
update and skips the consumed transfer. This preserves the existing
message sequencing semantics and prepares the code for the list iterator
update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 drivers/spi/spi-fsi.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-fsi.c b/drivers/spi/spi-fsi.c
index f6a75f0184c4..44999f00f5f6 100644
--- a/drivers/spi/spi-fsi.c
+++ b/drivers/spi/spi-fsi.c
@@ -434,7 +434,10 @@ static int fsi_spi_transfer_one_message(struct spi_controller *ctlr,
 	if (rc)
 		goto error;
 
-	list_for_each_entry(transfer, &mesg->transfers, transfer_list) {
+	for (transfer = list_first_entry(&mesg->transfers,
+					 typeof(*transfer), transfer_list);
+	     !list_entry_is_head(transfer, &mesg->transfers, transfer_list);
+	     transfer = list_next_entry(transfer, transfer_list)) {
 		struct fsi_spi_sequence seq;
 		struct spi_transfer *next = NULL;
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 06/14] drm/ttm: Open-code reservation list walk
From: Kaitao Cheng @ 2026-06-09  6:25 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609062526.94907-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. ttm_eu_reserve_buffers() may move the
current validation buffer to the duplicates list and then rewinds the
cursor before continuing.

Keep the reservation walk open-coded so the loop step uses the cursor
selected by that duplicate handling. This preserves the existing
traversal semantics and prepares the code for the list iterator update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 drivers/gpu/drm/ttm/ttm_execbuf_util.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index bc7a83a9fe44..8072f07d5557 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -86,7 +86,9 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
 	if (ticket)
 		ww_acquire_init(ticket, &reservation_ww_class);
 
-	list_for_each_entry(entry, list, head) {
+	for (entry = list_first_entry(list, typeof(*entry), head);
+	     !list_entry_is_head(entry, list, head);
+	     entry = list_next_entry(entry, head)) {
 		struct ttm_buffer_object *bo = entry->bo;
 		unsigned int num_fences;
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 05/14] drm/i915: Open-code DFS dependency list walk
From: Kaitao Cheng @ 2026-06-09  6:25 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609062526.94907-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. __i915_schedule() builds its DFS work list
while walking it by moving newly discovered dependencies to the tail.

Keep the DFS walk open-coded so the next dependency is resolved after any
tail moves performed by the body. This preserves the existing traversal
semantics and prepares the code for the list iterator update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 drivers/gpu/drm/i915/i915_scheduler.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index aec1342402ca..da1f60282df8 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -190,7 +190,9 @@ static void __i915_schedule(struct i915_sched_node *node,
 	 * end result is a topological list of requests in reverse order, the
 	 * last element in the list is the request we must execute first.
 	 */
-	list_for_each_entry(dep, &dfs, dfs_link) {
+	for (dep = list_first_entry(&dfs, typeof(*dep), dfs_link);
+	     !list_entry_is_head(dep, &dfs, dfs_link);
+	     dep = list_next_entry(dep, dfs_link)) {
 		struct i915_sched_node *node = dep->signaler;
 
 		/* If we are already flying, we know we have no signalers */
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 04/14] drm/i915/gt: Open-code active timeline walk
From: Kaitao Cheng @ 2026-06-09  6:25 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609062526.94907-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. __intel_gt_unset_wedged() drops
timelines->lock while waiting on a fence and then restarts the walk from
the list head after the lock is reacquired.

Keep the loop open-coded so the next timeline is selected after that
restart logic has run. This preserves the existing lock-drop traversal
semantics and prepares the code for the list iterator update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index adff482a6c9c..fe0d87e248a7 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -1077,7 +1077,9 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt)
 	 * No more can be submitted until we reset the wedged bit.
 	 */
 	spin_lock(&timelines->lock);
-	list_for_each_entry(tl, &timelines->active_list, link) {
+	for (tl = list_first_entry(&timelines->active_list, typeof(*tl), link);
+	     !list_entry_is_head(tl, &timelines->active_list, link);
+	     tl = list_next_entry(tl, link)) {
 		struct dma_fence *fence;
 
 		fence = i915_active_fence_get(&tl->last_request);
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 03/14] drm/bridge: Open-code bridge chain list walks
From: Kaitao Cheng @ 2026-06-09  6:25 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609061347.93688-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry_from() and
list_for_each_entry_reverse() cache the next or previous element before
entering the loop body. The bridge enable and disable ordering code
adjusts its cursor to skip ranges that have already been handled.

Keep those walks open-coded so the loop step observes the cursor
selected by the body. This preserves the existing bridge ordering
semantics and prepares the code for the list iterator update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 drivers/gpu/drm/drm_bridge.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c
index d6f512b73389..a538aabc4e0b 100644
--- a/drivers/gpu/drm/drm_bridge.c
+++ b/drivers/gpu/drm/drm_bridge.c
@@ -868,7 +868,8 @@ void drm_atomic_bridge_chain_post_disable(struct drm_bridge *bridge,
 
 	encoder = bridge->encoder;
 
-	list_for_each_entry_from(bridge, &encoder->bridge_chain, chain_node) {
+	for (; !list_entry_is_head(bridge, &encoder->bridge_chain, chain_node);
+	     bridge = list_next_entry(bridge, chain_node)) {
 		limit = NULL;
 
 		if (!list_is_last(&bridge->chain_node, &encoder->bridge_chain)) {
@@ -962,7 +963,9 @@ void drm_atomic_bridge_chain_pre_enable(struct drm_bridge *bridge,
 
 	encoder = bridge->encoder;
 
-	list_for_each_entry_reverse(iter, &encoder->bridge_chain, chain_node) {
+	for (iter = list_last_entry(&encoder->bridge_chain, typeof(*iter), chain_node);
+	     !list_entry_is_head(iter, &encoder->bridge_chain, chain_node);
+	     iter = list_prev_entry(iter, chain_node)) {
 		if (iter->pre_enable_prev_first) {
 			next = iter;
 			limit = bridge;
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 02/14] firewire: core: Open-code topology list walk
From: Kaitao Cheng @ 2026-06-09  6:13 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609061347.93688-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. for_each_fw_node() intentionally appends
newly discovered child nodes to the temporary walk list while the list is
being traversed.

Keep the loop open-coded so the next node is looked up only after
children have been appended. This preserves the current breadth-first
traversal semantics and prepares the code for the list iterator update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 drivers/firewire/core-topology.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/firewire/core-topology.c b/drivers/firewire/core-topology.c
index bb2d2db30795..df2ac0dab106 100644
--- a/drivers/firewire/core-topology.c
+++ b/drivers/firewire/core-topology.c
@@ -272,7 +272,9 @@ static void for_each_fw_node(struct fw_card *card, struct fw_node *root,
 	fw_node_get(root);
 	list_add_tail(&root->link, &list);
 	parent = NULL;
-	list_for_each_entry(node, &list, link) {
+	for (node = list_first_entry(&list, typeof(*node), link);
+	     !list_entry_is_head(node, &list, link);
+	     node = list_next_entry(node, link)) {
 		node->color = card->color;
 
 		for (i = 0; i < node->port_count; i++) {
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 01/14] drbd: Open-code transfer log list walk
From: Kaitao Cheng @ 2026-06-09  6:13 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaitao Cheng
In-Reply-To: <20260609061347.93688-1-kaitao.cheng@linux.dev>

From: Kaitao Cheng <chengkaitao@kylinos.cn>

A later change will make list_for_each_entry() cache the next element
before entering the loop body. That is the desired behaviour for the
common case, but this transfer log walk temporarily drops
resource->req_lock and revalidates the cursor before continuing.

Keep the loop open-coded so the next request is derived after the body
has completed and after the cursor has been adjusted. This preserves the
existing traversal semantics and prepares the code for the list iterator
update.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 drivers/block/drbd/drbd_debugfs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_debugfs.c b/drivers/block/drbd/drbd_debugfs.c
index 12460b584bcb..e90cead90e9d 100644
--- a/drivers/block/drbd/drbd_debugfs.c
+++ b/drivers/block/drbd/drbd_debugfs.c
@@ -308,7 +308,9 @@ static void seq_print_resource_transfer_log_summary(struct seq_file *m,
 
 	seq_puts(m, "n\tdevice\tvnr\t" RQ_HDR);
 	spin_lock_irq(&resource->req_lock);
-	list_for_each_entry(req, &connection->transfer_log, tl_requests) {
+	for (req = list_first_entry(&connection->transfer_log, typeof(*req), tl_requests);
+	     !list_entry_is_head(req, &connection->transfer_log, tl_requests);
+	     req = list_next_entry(req, tl_requests)) {
 		unsigned int tmp = 0;
 		unsigned int s;
 		++count;
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 00/14] list: Prepare entry iterators to cache cursor state
From: Kaitao Cheng @ 2026-06-09  6:13 UTC (permalink / raw)
  To: Andy Shevchenko, Muchun Song, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Jens Axboe, Takashi Sakamoto,
	Andrzej Hajda, Neil Armstrong, Robert Foss, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	Christian Koenig, Huang Rui, Eddie James, Mark Brown,
	Maxime Coquelin, Alexandre Torgue, Laxman Dewangan,
	Thierry Reding, Jonathan Hunter, Sowjanya Komatineni,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Liam Girdwood,
	Jaroslav Kysela, Takashi Iwai
  Cc: Laurent Pinchart, Jonas Karlman, Jernej Skrabec, Matthew Auld,
	Matthew Brost, Waiman Long, drbd-dev, linux-block,
	linux1394-devel, dri-devel, intel-gfx, linux-spi, linux-stm32,
	linux-arm-kernel, linux-tegra, linux-sound, linux-kernel,
	Andrew Morton, Randy Dunlap, Christian Brauner, David Howells,
	Luca Ceresoli, Kaitao Cheng, Kaito Cheng

From: Kaito Cheng <chengkaitao@kylinos.cn>

This series prepares for, and then updates, the list_for_each_entry()
family so the common entry iterators cache their next or previous cursor
before the loop body runs.

The first 13 patches open-code loops that intentionally depend on the
old "derive the next entry from the current cursor at the end of the
iteration" behaviour.  These loops append work to the list being walked,
restart traversal after dropping a lock, skip an entry consumed by the
current iteration, or otherwise adjust the cursor in the loop body.

The final patch changes include/linux/list.h to keep a private cursor in
the common entry iterators while preserving the public macro interface.
The safe variants remain available when callers need the temporary
cursor explicitly or have stronger mutation requirements.

Changes in v2 (Muchun Song, Andy Shevchenko):
 - Drop the list_for_each_entry_mutable*() helpers from v1 and make the
   cursor change directly in the existing list_for_each_entry*() helpers.
 - Open-code special list walks that rely on updating the loop cursor in
   the body, preserving their existing traversal semantics.

Link to v1:
https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/

Kaitao Cheng (14):
  drbd: Open-code transfer log list walk
  firewire: core: Open-code topology list walk
  drm/bridge: Open-code bridge chain list walks
  drm/i915/gt: Open-code active timeline walk
  drm/i915: Open-code DFS dependency list walk
  drm/ttm: Open-code reservation list walk
  spi: fsi: Open-code message transfer walk
  spi: stm32-ospi: Open-code message transfer walk
  spi: stm32-qspi: Open-code message transfer walk
  spi: tegra210-quad: Open-code message transfer walk
  locking/locktorture: Open-code ww mutex list walk
  locking/ww_mutex: Open-code stress reorder list walk
  ASoC: dapm: Open-code widget invalidation walk
  list: Cache cursors in entry iterators

 drivers/block/drbd/drbd_debugfs.c      |  4 ++-
 drivers/firewire/core-topology.c       |  4 ++-
 drivers/gpu/drm/drm_bridge.c           |  7 ++--
 drivers/gpu/drm/i915/gt/intel_reset.c  |  4 ++-
 drivers/gpu/drm/i915/i915_scheduler.c  |  4 ++-
 drivers/gpu/drm/ttm/ttm_execbuf_util.c |  4 ++-
 drivers/spi/spi-fsi.c                  |  5 ++-
 drivers/spi/spi-stm32-ospi.c           |  4 ++-
 drivers/spi/spi-stm32-qspi.c           |  5 ++-
 drivers/spi/spi-tegra210-quad.c        |  4 ++-
 include/linux/list.h                   | 46 ++++++++++++++++++++------
 kernel/locking/locktorture.c           |  4 ++-
 kernel/locking/test-ww_mutex.c         |  4 ++-
 sound/soc/soc-dapm.c                   |  4 ++-
 14 files changed, 78 insertions(+), 25 deletions(-)

-- 
2.43.0


^ permalink raw reply

* Re: [PATCH v2 0/4] i2c: tegra: Improve DMA mapping, latency, and power management
From: Akhil R @ 2026-06-09  5:18 UTC (permalink / raw)
  To: andi.shyti
  Cc: akhilrajeev, digetx, jonathanh, kkartik, ldewangan, linux-i2c,
	linux-kernel, linux-tegra, thierry.reding, wsa
In-Reply-To: <aicYDVct2VElN4IN@zenone.zhora.eu>

Hi Andi,

On Mon, 8 Jun 2026 21:30:59 +0200, Andi Shyti wrote:
>> Akhil R (4):
>>   i2c: tegra: use dmaengine_get_dma_device() for DMA buffer allocation
>>   i2c: tegra: Disable fair arbitration for non-MCTP buses
> 
> I merged these two patches in i2c/i2c-host.
> 
>>   i2c: tegra: Update Tegra410 I2C timing parameters
>>   i2c: tegra: Fix NOIRQ suspend/resume
> 
> I merged these two patches in i2c/i2c-host-fixes and they will
> hit mainline first.
> 
> I haven't seen any link between the two groups. Let me know if I
> missed something.

You are right. They are independent patches and do not have any link.
I missed to put them first in the series.

Thanks for merging the changes.

Best Regards,
Akhil

^ permalink raw reply

* RE: [PATCH v4] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA Olympus
From: Besar Wicaksono @ 2026-06-09  0:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland@arm.com, james.clark@linaro.org,
	yangyccccc@gmail.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
	Thierry Reding, Jon Hunter, Vikram Sethi, Rich Wiley,
	Shanker Donthineni, Matt Ochs, Nirmoy Das, Sean Kelley
In-Reply-To: <agxMXsznrU3mvcfE@willie-the-truck>

Hi Will,

My apology for taking a while to respond.
Please see my reply inline.

> -----Original Message-----
> From: Will Deacon <will@kernel.org>
> Sent: Tuesday, May 19, 2026 6:41 AM
> To: Besar Wicaksono <bwicaksono@nvidia.com>
> Cc: mark.rutland@arm.com; james.clark@linaro.org; yangyccccc@gmail.com;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> tegra@vger.kernel.org; Thierry Reding <treding@nvidia.com>; Jon Hunter
> <jonathanh@nvidia.com>; Vikram Sethi <vsethi@nvidia.com>; Rich Wiley
> <rwiley@nvidia.com>; Shanker Donthineni <sdonthineni@nvidia.com>; Matt
> Ochs <mochs@nvidia.com>; Nirmoy Das <nirmoyd@nvidia.com>; Sean Kelley
> <skelley@nvidia.com>
> Subject: Re: [PATCH v4] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA
> Olympus
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, May 04, 2026 at 05:52:04PM +0000, Besar Wicaksono wrote:
> > PMCCNTR_EL0 may continue to increment on NVIDIA Olympus CPUs while
> the
> > PE is in WFI/WFE. That does not necessarily match the CPU_CYCLES event
> > counted by a programmable counter, so using PMCCNTR_EL0 for cycles can
> > give results that differ from the programmable counter path.
> >
> > Extend the existing PMCCNTR avoidance decision from the SMT case to
> > also cover Olympus. Store the result in the common arm_pmu state at
> > registration time, so arm_pmuv3 can keep using a single flag when
> > deciding whether CPU_CYCLES may use PMCCNTR_EL0.
> >
> > Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> > ---
> >
> > Changes from v1:
> >   * add CONFIG_ARM64 check to fix build error found by kernel test robot
> >   * add explicit include of <asm/cputype.h>
> > v1: https://lore.kernel.org/linux-arm-kernel/20260406232034.2566133-1-
> bwicaksono@nvidia.com/
> >
> > Changes from v2:
> >   * Move the Olympus PMCCNTR avoidance check from arm_pmuv3.c to the
> >     common arm_pmu registration path.
> >   * Replace the PMUv3-only has_smt flag with avoid_pmccntr, covering both
> >     the existing SMT restriction and the Olympus MIDR restriction.
> >   * Use the cached per-CPU MIDR from cpu_data instead of calling
> >     is_midr_in_range_list() from armv8pmu_can_use_pmccntr().
> >   * Add the required asm/cpu.h include for cpu_data.
> > v2: https://lore.kernel.org/linux-arm-kernel/20260421203856.3539186-1-
> bwicaksono@nvidia.com/#t
> >
> > Changes from v3:
> >   * Move avoidance check based on MIDR to __armv8pmu_probe_pmu() to
> make sure
> >     the MIDR is retrieved from the correct online CPU.
> > v3: https://lore.kernel.org/linux-arm-kernel/20260429215614.1793131-1-
> bwicaksono@nvidia.com/
> >
> > ---
> >  drivers/perf/arm_pmu.c       |  7 ++++-
> >  drivers/perf/arm_pmuv3.c     | 51
> +++++++++++++++++++++++++++++++-----
> >  include/linux/perf/arm_pmu.h |  2 +-
> >  3 files changed, 51 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> > index 939bcbd433aa..aa1dac0b440f 100644
> > --- a/drivers/perf/arm_pmu.c
> > +++ b/drivers/perf/arm_pmu.c
> > @@ -931,8 +931,13 @@ int armpmu_register(struct arm_pmu *pmu)
> >       /*
> >        * By this stage we know our supported CPUs on either DT/ACPI
> platforms,
> >        * detect the SMT implementation.
> > +      * On SMT CPUs, the PMCCNTR_EL0 increments from the processor clock
> rather
> > +      * than the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll
> continue
> > +      * counting on a WFI PE if one of its SMT sibling is not idle on a
> > +      * multi-threaded implementation. So don't use it on SMT cores.
> >        */
> > -     pmu->has_smt = topology_core_has_smt(cpumask_first(&pmu-
> >supported_cpus));
> > +     pmu->avoid_pmccntr |=
> > +             topology_core_has_smt(cpumask_first(&pmu->supported_cpus));
> >
> >       if (!pmu->set_event_filter)
> >               pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
> > diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> > index 8014ff766cff..1ee4a09d0dcc 100644
> > --- a/drivers/perf/arm_pmuv3.c
> > +++ b/drivers/perf/arm_pmuv3.c
> > @@ -8,6 +8,7 @@
> >   * This code is based heavily on the ARMv7 perf event code.
> >   */
> >
> > +#include <asm/cputype.h>
> >  #include <asm/irq_regs.h>
> >  #include <asm/perf_event.h>
> >  #include <asm/virt.h>
> > @@ -1002,13 +1003,7 @@ static bool armv8pmu_can_use_pmccntr(struct
> pmu_hw_events *cpuc,
> >       if (has_branch_stack(event))
> >               return false;
> >
> > -     /*
> > -      * The PMCCNTR_EL0 increments from the processor clock rather than
> > -      * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
> > -      * counting on a WFI PE if one of its SMT sibling is not idle on a
> > -      * multi-threaded implementation. So don't use it on SMT cores.
> > -      */
> > -     if (cpu_pmu->has_smt)
> > +     if (cpu_pmu->avoid_pmccntr)
> >               return false;
> >
> >       return true;
> > @@ -1299,6 +1294,41 @@ static int armv8_vulcan_map_event(struct
> perf_event *event)
> >                                      &armv8_vulcan_perf_cache_map);
> >  }
> >
> > +#ifdef CONFIG_ARM64
> > +/*
> > + * List of CPUs that should avoid using PMCCNTR_EL0.
> > + */
> > +static struct midr_range armv8pmu_avoid_pmccntr_cpus[] = {
> > +     /*
> > +      * The PMCCNTR_EL0 in Olympus CPU may still increment while in
> WFI/WFE state.
> > +      * This is an implementation specific behavior and not an erratum.
> > +      *
> > +      * From ARM DDI0487 D14.4:
> > +      *   It is IMPLEMENTATION SPECIFIC whether CPU_CYCLES and PMCCNTR
> count
> > +      *   when the PE is in WFI or WFE state, even if the clocks are not stopped.
> 
> So surely the weird part here is that Olypmus chose one behaviour for
> PMCCNTR and another for the CPU_CYCLES event? The Arm ARM text isn't

That is correct.

> clear to me as to whether that's permitted but I think we should call
> it out here.
> 

Sure, I will call it out explicitly.

> > +      * From ARM DDI0487 D24.5.2:
> > +      *   All counters are subject to any changes in clock frequency, including
> > +      *   clock stopping caused by the WFI and WFE instructions.
> > +      *   This means that it is CONSTRAINED UNPREDICTABLE whether or not
> > +      *   PMCCNTR_EL0 continues to increment when clocks are stopped by
> WFI and
> > +      *   WFE instructions.
> > +      */
> > +     MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
> > +     {}
> > +};
> > +
> > +static bool armv8pmu_is_in_avoid_pmccntr_cpus(void)
> > +{
> > +     return is_midr_in_range_list(armv8pmu_avoid_pmccntr_cpus);
> > +}
> > +#else
> > +static bool armv8pmu_is_in_avoid_pmccntr_cpus(void)
> > +{
> > +     return false;
> > +}
> > +#endif
> > +
> >  struct armv8pmu_probe_info {
> >       struct arm_pmu *pmu;
> >       bool present;
> > @@ -1348,6 +1378,13 @@ static void __armv8pmu_probe_pmu(void
> *info)
> >       else
> >               cpu_pmu->reg_pmmir = 0;
> >
> > +     /*
> > +      * On some CPUs, PMCCNTR_EL0 does not match the behavior of
> CPU_CYCLES
> > +      * programmable counter, so avoid routing cycles through PMCCNTR_EL0
> to
> > +      * prevent inconsistency in the results.
> > +      */
> > +     cpu_pmu->avoid_pmccntr |= armv8pmu_is_in_avoid_pmccntr_cpus();
> 
> Do we also want to hide the cycle counter from userspace? It sounds like
> it's going to get very confused if it tries to use it...
> 

Makes sense. I tried making the change on v5.
Please check https://lore.kernel.org/linux-arm-kernel/20260608234135.1856911-1-bwicaksono@nvidia.com/T/#u

Thanks,
Besar


^ permalink raw reply

* [PATCH v5] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA Olympus
From: Besar Wicaksono @ 2026-06-08 23:41 UTC (permalink / raw)
  To: will, mark.rutland, james.clark, yangyccccc
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, treding, jonathanh,
	vsethi, rwiley, sdonthineni, mochs, nirmoyd, skelley,
	Besar Wicaksono

The PMCCNTR_EL0 in NVIDIA Olympus CPU may increment while
in WFI/WFE, which does not align with counting CPU_CYCLES
on a programmable counter. Add a MIDR range entry and refuse
PMCCNTR_EL0 for cycle events on affected parts so perf does
not mix the two behaviors.

Also keep PMCCNTR_EL0 unavailable to EL0 direct counter reads
on affected CPUs. When userspace counter access is enabled,
avoid setting PMUSERENR_EL0.CR for PMUs that must avoid
PMCCNTR_EL0, while still allowing direct reads from programmable
event counters. For 64-bit userspace CPU_CYCLES events on PMUs
without native long event counters, reject the event if the only
valid direct-read path would be PMCCNTR_EL0.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---

Changes from v1:
  * add CONFIG_ARM64 check to fix build error found by kernel test robot
  * add explicit include of <asm/cputype.h>
v1: https://lore.kernel.org/linux-arm-kernel/20260406232034.2566133-1-bwicaksono@nvidia.com/

Changes from v2:
  * Move the Olympus PMCCNTR avoidance check from arm_pmuv3.c to the
    common arm_pmu registration path.
  * Replace the PMUv3-only has_smt flag with avoid_pmccntr, covering both
    the existing SMT restriction and the Olympus MIDR restriction.
  * Use the cached per-CPU MIDR from cpu_data instead of calling
    is_midr_in_range_list() from armv8pmu_can_use_pmccntr().
  * Add the required asm/cpu.h include for cpu_data.
v2: https://lore.kernel.org/linux-arm-kernel/20260421203856.3539186-1-bwicaksono@nvidia.com/#t

Changes from v3:
  * Move avoidance check based on MIDR to __armv8pmu_probe_pmu() to make sure
    the MIDR is retrieved from the correct online CPU.
v3: https://lore.kernel.org/linux-arm-kernel/20260429215614.1793131-1-bwicaksono@nvidia.com/

Changes from v4:
  * Avoid granting PMCCNTR_EL0 direct userspace access by leaving
    PMUSERENR_EL0.CR clear on PMUs that must avoid PMCCNTR_EL0.
  * Keep direct userspace access available for programmable event counters.
  * Reject 64-bit userspace CPU_CYCLES events on PMUs without native long
    counters when the only valid direct-read path would be PMCCNTR_EL0.
  * Expand the Olympus comment to describe the mismatch with programmable
    CPU_CYCLES counters.
v4: https://lore.kernel.org/linux-arm-kernel/20260504175204.3122979-1-bwicaksono@nvidia.com/

---
 drivers/perf/arm_pmu.c       |  7 +++-
 drivers/perf/arm_pmuv3.c     | 64 +++++++++++++++++++++++++++++++-----
 include/linux/perf/arm_pmu.h |  2 +-
 3 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 939bcbd433aa..aa1dac0b440f 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -931,8 +931,13 @@ int armpmu_register(struct arm_pmu *pmu)
 	/*
 	 * By this stage we know our supported CPUs on either DT/ACPI platforms,
 	 * detect the SMT implementation.
+	 * On SMT CPUs, the PMCCNTR_EL0 increments from the processor clock rather
+	 * than the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
+	 * counting on a WFI PE if one of its SMT sibling is not idle on a
+	 * multi-threaded implementation. So don't use it on SMT cores.
 	 */
-	pmu->has_smt = topology_core_has_smt(cpumask_first(&pmu->supported_cpus));
+	pmu->avoid_pmccntr |=
+		topology_core_has_smt(cpumask_first(&pmu->supported_cpus));
 
 	if (!pmu->set_event_filter)
 		pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 8014ff766cff..6d4d57342352 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -8,6 +8,7 @@
  * This code is based heavily on the ARMv7 perf event code.
  */
 
+#include <asm/cputype.h>
 #include <asm/irq_regs.h>
 #include <asm/perf_event.h>
 #include <asm/virt.h>
@@ -795,6 +796,7 @@ static void armv8pmu_disable_user_access(void)
 static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
 {
 	int i;
+	u64 userenr = ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_UEN;
 	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
 
 	if (is_pmuv3p9(cpu_pmu->pmuver)) {
@@ -817,7 +819,10 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
 		}
 	}
 
-	update_pmuserenr(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR | ARMV8_PMU_USERENR_UEN);
+	if (!cpu_pmu->avoid_pmccntr)
+		userenr |= ARMV8_PMU_USERENR_CR;
+
+	update_pmuserenr(userenr);
 }
 
 static void armv8pmu_enable_event(struct perf_event *event)
@@ -1002,13 +1007,7 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
 	if (has_branch_stack(event))
 		return false;
 
-	/*
-	 * The PMCCNTR_EL0 increments from the processor clock rather than
-	 * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
-	 * counting on a WFI PE if one of its SMT sibling is not idle on a
-	 * multi-threaded implementation. So don't use it on SMT cores.
-	 */
-	if (cpu_pmu->has_smt)
+	if (cpu_pmu->avoid_pmccntr)
 		return false;
 
 	return true;
@@ -1250,7 +1249,8 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
 		if (!(event->attach_state & PERF_ATTACH_TASK))
 			return -EINVAL;
 		if (armv8pmu_event_is_64bit(event) &&
-		    (hw_event_id != ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
+		    (hw_event_id != ARMV8_PMUV3_PERFCTR_CPU_CYCLES ||
+		     armpmu->avoid_pmccntr) &&
 		    !armv8pmu_has_long_event(armpmu))
 			return -EOPNOTSUPP;
 
@@ -1299,6 +1299,45 @@ static int armv8_vulcan_map_event(struct perf_event *event)
 				       &armv8_vulcan_perf_cache_map);
 }
 
+#ifdef CONFIG_ARM64
+/*
+ * List of CPUs that should avoid using PMCCNTR_EL0.
+ */
+static struct midr_range armv8pmu_avoid_pmccntr_cpus[] = {
+	/*
+	 * NVIDIA Olympus may expose different WFI/WFE behaviour between the
+	 * PMCCNTR_EL0 and the CPU_CYCLES event on programmable counters.
+	 * While the CPU is in WFI/WFE state, the PMCCNTR_EL0 may still increment
+	 * but the programmable counter may not. This is an implementation specific
+	 * behavior and not an erratum. Perf assumes those two paths are
+	 * interchangeable, so avoid using PMCCNTR_EL0 for CPU_CYCLES event.
+	 *
+	 * From ARM DDI0487 D14.4:
+	 *   It is IMPLEMENTATION SPECIFIC whether CPU_CYCLES and PMCCNTR count
+	 *   when the PE is in WFI or WFE state, even if the clocks are not stopped.
+	 *
+	 * From ARM DDI0487 D24.5.2:
+	 *   All counters are subject to any changes in clock frequency, including
+	 *   clock stopping caused by the WFI and WFE instructions.
+	 *   This means that it is CONSTRAINED UNPREDICTABLE whether or not
+	 *   PMCCNTR_EL0 continues to increment when clocks are stopped by WFI and
+	 *   WFE instructions.
+	 */
+	MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
+	{}
+};
+
+static bool armv8pmu_is_in_avoid_pmccntr_cpus(void)
+{
+	return is_midr_in_range_list(armv8pmu_avoid_pmccntr_cpus);
+}
+#else
+static bool armv8pmu_is_in_avoid_pmccntr_cpus(void)
+{
+	return false;
+}
+#endif
+
 struct armv8pmu_probe_info {
 	struct arm_pmu *pmu;
 	bool present;
@@ -1348,6 +1387,13 @@ static void __armv8pmu_probe_pmu(void *info)
 	else
 		cpu_pmu->reg_pmmir = 0;
 
+	/*
+	 * On some CPUs, PMCCNTR_EL0 does not match the behavior of CPU_CYCLES
+	 * programmable counter, so avoid routing cycles through PMCCNTR_EL0 to
+	 * prevent inconsistency in the results.
+	 */
+	cpu_pmu->avoid_pmccntr |= armv8pmu_is_in_avoid_pmccntr_cpus();
+
 	brbe_probe(cpu_pmu);
 }
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 52b37f7bdbf9..02d2c7f45b52 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -119,7 +119,7 @@ struct arm_pmu {
 
 	/* PMUv3 only */
 	int		pmuver;
-	bool		has_smt;
+	bool		avoid_pmccntr;
 	u64		reg_pmmir;
 	u64		reg_brbidr;
 #define ARMV8_PMUV3_MAX_COMMON_EVENTS		0x40
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v5 0/4] Enable sysfs module symlink for more built-in drivers
From: Danilo Krummrich @ 2026-06-08 22:24 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: Suzuki K Poulose, James Clark, Alexander Shishkin,
	Greg Kroah-Hartman, Rafael J . Wysocki, Danilo Krummrich,
	Miguel Ojeda, Boqun Feng, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Jonathan Corbet, Shuah Khan, Luis Chamberlain, Petr Pavlu,
	Daniel Gomez, Sami Tolvanen, Aaron Tomlin, Mike Leach, Leo Yan,
	Thierry Reding, Jonathan Hunter, Rahul Bukte, linux-kernel,
	coresight, linux-arm-kernel, driver-core, rust-for-linux,
	linux-doc, Daniel Palmer, Tim Bird, linux-modules, linux-tegra,
	Sumit Gupta
In-Reply-To: <20260518-acpi_mod_name-v5-0-705ccc430885@sony.com>

On Mon, 18 May 2026 19:19:56 +0900, Shashank Balaji wrote:
> [PATCH v5 0/4] Enable sysfs module symlink for more built-in drivers

Applied, thanks!

  Branch: driver-core-testing
  Tree:   git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git

[1/4] soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
      commit: cd6e95e7ab29
[2/4] kernel: param: initialize module_kset in a pure_initcall
      commit: c82dfce47833
[3/4] coresight: pass THIS_MODULE implicitly through a macro
      commit: efc22b3f89a3
[4/4] driver core: platform: set mod_name in driver registration
      commit: a7a7dc5c46a0

The patches will appear in the next linux-next integration (typically within 24
hours on weekdays).

The patches are in the driver-core-testing branch and will be promoted to
driver-core-next after validation.

^ permalink raw reply

* Re: [PATCH v5 0/4] Enable sysfs module symlink for more built-in drivers
From: Danilo Krummrich @ 2026-06-08 22:16 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: Suzuki K Poulose, James Clark, Alexander Shishkin,
	Greg Kroah-Hartman, Rafael J. Wysocki, Miguel Ojeda, Boqun Feng,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Jonathan Corbet, Shuah Khan,
	Luis Chamberlain, Petr Pavlu, Daniel Gomez, Sami Tolvanen,
	Aaron Tomlin, Mike Leach, Leo Yan, Thierry Reding,
	Jonathan Hunter, Rahul Bukte, linux-kernel, coresight,
	linux-arm-kernel, driver-core, rust-for-linux, linux-doc,
	Daniel Palmer, Tim Bird, linux-modules, linux-tegra, Sumit Gupta
In-Reply-To: <20260518-acpi_mod_name-v5-0-705ccc430885@sony.com>

On Mon May 18, 2026 at 12:19 PM CEST, Shashank Balaji wrote:
> Shashank Balaji (4):
>       soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
>       kernel: param: initialize module_kset in a pure_initcall
>       coresight: pass THIS_MODULE implicitly through a macro
>       driver core: platform: set mod_name in driver registration

Picking this up now, so it can still make it for 7.2-rc1 and get some time in
linux-next.

Suzuki, since I haven't heard back I figured it should be fine to also pick the
coresight change as it is purely mechanic and driver-core motivated, but please
let me know if you have any concerns.

Thanks,
Danilo

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox