public inbox for linux-modules@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v17 28/47] dept: add documentation for dept
From: Bagas Sanjaya @ 2025-10-03  2:44 UTC (permalink / raw)
  To: Byungchul Park, linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-29-byungchul@sk.com>

On Thu, Oct 02, 2025 at 05:12:28PM +0900, Byungchul Park wrote:
> This document describes the concept and APIs of dept.
> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> ---
>  Documentation/dependency/dept.txt     | 735 ++++++++++++++++++++++++++
>  Documentation/dependency/dept_api.txt | 117 ++++
>  2 files changed, 852 insertions(+)
>  create mode 100644 Documentation/dependency/dept.txt
>  create mode 100644 Documentation/dependency/dept_api.txt

What about writing dept docs in reST (like the rest of kernel documentation)?

---- >8 ----
diff --git a/Documentation/dependency/dept.txt b/Documentation/locking/dept.rst
similarity index 92%
rename from Documentation/dependency/dept.txt
rename to Documentation/locking/dept.rst
index 5dd358b96734e6..7b90a0d95f0876 100644
--- a/Documentation/dependency/dept.txt
+++ b/Documentation/locking/dept.rst
@@ -8,7 +8,7 @@ How lockdep works
 
 Lockdep detects a deadlock by checking lock acquisition order. For
 example, a graph to track acquisition order built by lockdep might look
-like:
+like::
 
    A -> B -
            \
@@ -16,12 +16,12 @@ like:
            /
    C -> D -
 
-   where 'A -> B' means that acquisition A is prior to acquisition B
-   with A still held.
+where 'A -> B' means that acquisition A is prior to acquisition B
+with A still held.
 
 Lockdep keeps adding each new acquisition order into the graph in
 runtime. For example, 'E -> C' will be added when the two locks have
-been acquired in the order, E and then C. The graph will look like:
+been acquired in the order, E and then C. The graph will look like::
 
        A -> B -
                \
@@ -32,10 +32,10 @@ been acquired in the order, E and then C. The graph will look like:
    \                  /
     ------------------
 
-   where 'A -> B' means that acquisition A is prior to acquisition B
-   with A still held.
+where 'A -> B' means that acquisition A is prior to acquisition B
+with A still held.
 
-This graph contains a subgraph that demonstrates a loop like:
+This graph contains a subgraph that demonstrates a loop like::
 
                 -> E -
                /      \
@@ -67,6 +67,8 @@ mechanisms, lockdep doesn't work.
 
 Can lockdep detect the following deadlock?
 
+::
+
    context X	   context Y	   context Z
 
 		   mutex_lock A
@@ -80,6 +82,8 @@ Can lockdep detect the following deadlock?
 
 No. What about the following?
 
+::
+
    context X		   context Y
 
 			   mutex_lock A
@@ -101,7 +105,7 @@ What leads a deadlock
 ---------------------
 
 A deadlock occurs when one or multi contexts are waiting for events that
-will never happen. For example:
+will never happen. For example::
 
    context X	   context Y	   context Z
 
@@ -121,24 +125,24 @@ We call this *deadlock*.
 If an event occurrence is a prerequisite to reaching another event, we
 call it *dependency*. In this example:
 
-   Event A occurrence is a prerequisite to reaching event C.
-   Event C occurrence is a prerequisite to reaching event B.
-   Event B occurrence is a prerequisite to reaching event A.
+   * Event A occurrence is a prerequisite to reaching event C.
+   * Event C occurrence is a prerequisite to reaching event B.
+   * Event B occurrence is a prerequisite to reaching event A.
 
 In terms of dependency:
 
-   Event C depends on event A.
-   Event B depends on event C.
-   Event A depends on event B.
+   * Event C depends on event A.
+   * Event B depends on event C.
+   * Event A depends on event B.
 
-Dependency graph reflecting this example will look like:
+Dependency graph reflecting this example will look like::
 
     -> C -> A -> B -
    /                \
    \                /
     ----------------
 
-   where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
 
 A circular dependency exists. Such a circular dependency leads a
 deadlock since no waiters can have desired events triggered.
@@ -152,7 +156,7 @@ Introduce DEPT
 --------------
 
 DEPT(DEPendency Tracker) tracks wait and event instead of lock
-acquisition order so as to recognize the following situation:
+acquisition order so as to recognize the following situation::
 
    context X	   context Y	   context Z
 
@@ -165,18 +169,18 @@ acquisition order so as to recognize the following situation:
 				   event A
 
 and builds up a dependency graph in runtime that is similar to lockdep.
-The graph might look like:
+The graph might look like::
 
     -> C -> A -> B -
    /                \
    \                /
     ----------------
 
-   where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
 
 DEPT keeps adding each new dependency into the graph in runtime. For
 example, 'B -> D' will be added when event D occurrence is a
-prerequisite to reaching event B like:
+prerequisite to reaching event B like::
 
    |
    v
@@ -184,7 +188,7 @@ prerequisite to reaching event B like:
    .
    event B
 
-After the addition, the graph will look like:
+After the addition, the graph will look like::
 
                      -> D
                     /
@@ -209,6 +213,8 @@ How DEPT works
 Let's take a look how DEPT works with the 1st example in the section
 'Limitation of lockdep'.
 
+::
+
    context X	   context Y	   context Z
 
 		   mutex_lock A
@@ -220,7 +226,7 @@ Let's take a look how DEPT works with the 1st example in the section
 		   mutex_unlock A
 				   mutex_unlock A
 
-Adding comments to describe DEPT's view in terms of wait and event:
+Adding comments to describe DEPT's view in terms of wait and event::
 
    context X	   context Y	   context Z
 
@@ -248,7 +254,7 @@ Adding comments to describe DEPT's view in terms of wait and event:
 				   mutex_unlock A
 				   /* event A */
 
-Adding more supplementary comments to describe DEPT's view in detail:
+Adding more supplementary comments to describe DEPT's view in detail::
 
    context X	   context Y	   context Z
 
@@ -283,7 +289,7 @@ Adding more supplementary comments to describe DEPT's view in detail:
 				   mutex_unlock A
 				   /* event A that's been valid since 4 */
 
-Let's build up dependency graph with this example. Firstly, context X:
+Let's build up dependency graph with this example. Firstly, context X::
 
    context X
 
@@ -292,7 +298,7 @@ Let's build up dependency graph with this example. Firstly, context X:
    /* start to take into account event B's context */
    /* 2 */
 
-There are no events to create dependency. Next, context Y:
+There are no events to create dependency. Next, context Y::
 
    context Y
 
@@ -317,13 +323,13 @@ waits between 3 and the event, event B does not create dependency. For
 event A, there is a wait, folio_lock B, between 1 and the event. Which
 means event A cannot be triggered if event B does not wake up the wait.
 Therefore, we can say event A depends on event B, say, 'A -> B'. The
-graph will look like after adding the dependency:
+graph will look like after adding the dependency::
 
    A -> B
 
-   where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
 
-Lastly, context Z:
+Lastly, context Z::
 
    context Z
 
@@ -343,7 +349,7 @@ wait, mutex_lock A, between 2 and the event - remind 2 is at a very
 start and before the wait in timeline. Which means event B cannot be
 triggered if event A does not wake up the wait. Therefore, we can say
 event B depends on event A, say, 'B -> A'. The graph will look like
-after adding the dependency:
+after adding the dependency::
 
     -> A -> B -
    /           \
@@ -367,6 +373,8 @@ Interpret DEPT report
 
 The following is the example in the section 'How DEPT works'.
 
+::
+
    context X	   context Y	   context Z
 
 		   mutex_lock A
@@ -402,7 +410,7 @@ The following is the example in the section 'How DEPT works'.
 
 We can Simplify this by replacing each waiting point with [W], each
 point where its event's context starts with [S] and each event with [E].
-This example will look like after the replacement:
+This example will look like after the replacement::
 
    context X	   context Y	   context Z
 
@@ -419,6 +427,8 @@ This example will look like after the replacement:
 DEPT uses the symbols [W], [S] and [E] in its report as described above.
 The following is an example reported by DEPT for a real problem.
 
+::
+
    Link: https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t
    Link: https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/
 
@@ -620,6 +630,8 @@ The following is an example reported by DEPT for a real problem.
 
 Let's take a look at the summary that is the most important part.
 
+::
+
    ---------------------------------------------------
    summary
    ---------------------------------------------------
@@ -639,7 +651,7 @@ Let's take a look at the summary that is the most important part.
    [W]: the wait blocked
    [E]: the event not reachable
 
-The summary shows the following scenario:
+The summary shows the following scenario::
 
    context A	   context B	   context ?(unknown)
 
@@ -652,7 +664,7 @@ The summary shows the following scenario:
 
    [E] unlock(&ni->ni_lock:0)
 
-Adding supplementary comments to describe DEPT's view in detail:
+Adding supplementary comments to describe DEPT's view in detail::
 
    context A	   context B	   context ?(unknown)
 
@@ -677,7 +689,7 @@ Adding supplementary comments to describe DEPT's view in detail:
    [E] unlock(&ni->ni_lock:0)
    /* event that's been valid since 2 */
 
-Let's build up dependency graph with this report. Firstly, context A:
+Let's build up dependency graph with this report. Firstly, context A::
 
    context A
 
@@ -697,13 +709,13 @@ wait, folio_lock(&f1), between 2 and the event. Which means
 unlock(&ni->ni_lock:0) is not reachable if folio_unlock(&f1) does not
 wake up the wait. Therefore, we can say unlock(&ni->ni_lock:0) depends
 on folio_unlock(&f1), say, 'unlock(&ni->ni_lock:0) -> folio_unlock(&f1)'.
-The graph will look like after adding the dependency:
+The graph will look like after adding the dependency::
 
    unlock(&ni->ni_lock:0) -> folio_unlock(&f1)
 
-   where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
 
-Secondly, context B:
+Secondly, context B::
 
    context B
 
@@ -719,14 +731,14 @@ very start and before the wait in timeline. Which means folio_unlock(&f1)
 is not reachable if unlock(&ni->ni_lock:0) does not wake up the wait.
 Therefore, we can say folio_unlock(&f1) depends on unlock(&ni->ni_lock:0),
 say, 'folio_unlock(&f1) -> unlock(&ni->ni_lock:0)'. The graph will look
-like after adding the dependency:
+like after adding the dependency::
 
     -> unlock(&ni->ni_lock:0) -> folio_unlock(&f1) -
    /                                                \
    \                                                /
     ------------------------------------------------
 
-   where 'A -> B' means that event A depends on event B.
+where 'A -> B' means that event A depends on event B.
 
 A new loop has been created. So DEPT can report it as a deadlock! Cool!
 
diff --git a/Documentation/dependency/dept_api.txt b/Documentation/locking/dept_api.rst
similarity index 97%
rename from Documentation/dependency/dept_api.txt
rename to Documentation/locking/dept_api.rst
index 8e0d5a118a460e..96c4d65f4a9a2d 100644
--- a/Documentation/dependency/dept_api.txt
+++ b/Documentation/locking/dept_api.rst
@@ -10,6 +10,8 @@ already applied into the existing synchronization primitives e.g.
 waitqueue, swait, wait_for_completion(), dma fence and so on.  The basic
 APIs of SDT are:
 
+.. code-block:: c
+
    /*
     * After defining 'struct dept_map map', initialize the instance.
     */
@@ -27,6 +29,8 @@ APIs of SDT are:
 
 The advanced APIs of SDT are:
 
+.. code-block:: c
+
    /*
     * After defining 'struct dept_map map', initialize the instance
     * using an external key.
@@ -83,6 +87,8 @@ Do not use these APIs directly.  These are the wrappers for typical
 locks, that have been already applied into major locks internally e.g.
 spin lock, mutex, rwlock and so on.  The APIs of LDT are:
 
+.. code-block:: c
+   
    ldt_init(map, key, sub, name);
    ldt_lock(map, sub_local, try, nest, ip);
    ldt_rlock(map, sub_local, try, nest, ip, queued);
@@ -96,6 +102,8 @@ Raw APIs
 --------
 Do not use these APIs directly.  The raw APIs of dept are:
 
+.. code-block:: c
+
    dept_free_range(start, size);
    dept_map_init(map, key, sub, name);
    dept_map_reinit(map, key, sub, name);
diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index 6a9ea96c8bcb70..7ec3dce7fee425 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -24,6 +24,8 @@ Locking
     percpu-rw-semaphore
     robust-futexes
     robust-futex-ABI
+    dept
+    dept_api
 
 .. only::  subproject and html
 

> +Can lockdep detect the following deadlock?
> +
> +   context X	   context Y	   context Z
> +
> +		   mutex_lock A
> +   folio_lock B
> +		   folio_lock B <- DEADLOCK
> +				   mutex_lock A <- DEADLOCK
> +				   folio_unlock B
> +		   folio_unlock B
> +		   mutex_unlock A
> +				   mutex_unlock A
> +
> +No. What about the following?
> +
> +   context X		   context Y
> +
> +			   mutex_lock A
> +   mutex_lock A <- DEADLOCK
> +			   wait_for_complete B <- DEADLOCK
> +   complete B
> +			   mutex_unlock A
> +   mutex_unlock A

Can you explain how DEPT detects deadlock on the second example above (like
the first one being described in "How DEPT works" section)?

Confused...

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply related

* Re: [PATCH v17 09/47] arm64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64
From: Byungchul Park @ 2025-10-03  1:46 UTC (permalink / raw)
  To: Mark Brown
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, josef,
	linux-fsdevel, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	harry.yoo, chris.p.wilson, gwan-gyeong.mun, max.byungchul.park,
	boqun.feng, longman, yunseong.kim, ysk, yeoreum.yun, netdev,
	matthew.brost, her0gyugyu, corbet, catalin.marinas, bp,
	dave.hansen, x86, hpa, luto, sumit.semwal, gustavo,
	christian.koenig, andi.shyti, arnd, lorenzo.stoakes, Liam.Howlett,
	rppt, surenb, mcgrof, petr.pavlu, da.gomez, samitolvanen, paulmck,
	frederic, neeraj.upadhyay, joelagnelf, josh, urezki,
	mathieu.desnoyers, jiangshanlai, qiang.zhang, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid,
	chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy, anna, kees,
	bigeasy, clrkwllms, mark.rutland, ada.coupriediaz,
	kristina.martsenko, wangkefeng.wang, kevin.brodsky, dwmw,
	shakeel.butt, ast, ziy, yuzhao, baolin.wang, usamaarif642,
	joel.granados, richard.weiyang, geert+renesas, tim.c.chen, linux,
	alexander.shishkin, lillian, chenhuacai, francesco,
	guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <a7f41101-d80a-4cee-ada5-9c591321b1d7@sirena.org.uk>

On Thu, Oct 02, 2025 at 12:39:31PM +0100, Mark Brown wrote:
> On Thu, Oct 02, 2025 at 05:12:09PM +0900, Byungchul Park wrote:
> > dept needs to notice every entrance from user to kernel mode to treat
> > every kernel context independently when tracking wait-event dependencies.
> > Roughly, system call and user oriented fault are the cases.
> > 
> > Make dept aware of the entrances of arm64 and add support
> > CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64.
> 
> The description of what needs to be tracked probably needs some
> tightening up here, it's not clear to me for example why exceptions for
> mops or the vector extensions aren't included here, or what the
> distinction is with error faults like BTI or GCS not being tracked?

Thanks for the feedback but I'm afraid I don't get you.  Can you explain
in more detail with example?

JFYI, pairs of wait and its event need to be tracked to see if each
event can be prevented from being reachable by other waits like:

   context X				context Y

   lock L
   ...
   initiate event A context		start toward event A
   ...					...
   wait A // wait for event A and	lock L // wait for unlock L and
          // prevent unlock L		       // prevent event A
   ...					...
   unlock L				unlock L
					...
					event A

I meant things like this need to be tracked.

	Byungchul

^ permalink raw reply

* Re: [PATCH v17 30/47] fs/jbd2: use a weaker annotation in journal handling
From: Byungchul Park @ 2025-10-03  1:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, josef,
	linux-fsdevel, jlayton, dan.j.williams, hch, djwong, dri-devel,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa, harry.yoo,
	chris.p.wilson, gwan-gyeong.mun, max.byungchul.park, boqun.feng,
	longman, yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost,
	her0gyugyu, corbet, catalin.marinas, bp, dave.hansen, x86, hpa,
	luto, sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <bmthlv2tsd76mgzaoy5gspzdkved6le5xv23xjsc3yafkhrsgh@vvmjdwygm7gn>

On Thu, Oct 02, 2025 at 10:40:56AM +0200, Jan Kara wrote:
> On Thu 02-10-25 17:12:30, Byungchul Park wrote:
> > jbd2 journal handling code doesn't want jbd2_might_wait_for_commit()
> > to be placed between start_this_handle() and stop_this_handle().  So it
> > marks the region with rwsem_acquire_read() and rwsem_release().
> >
> > However, the annotation is too strong for that purpose.  We don't have
> > to use more than try lock annotation for that.
> >
> > rwsem_acquire_read() implies:
> >
> >    1. might be a waiter on contention of the lock.
> >    2. enter to the critical section of the lock.
> >
> > All we need in here is to act 2, not 1.  So trylock version of
> > annotation is sufficient for that purpose.  Now that dept partially
> > relies on lockdep annotaions, dept interpets rwsem_acquire_read() as a
> > potential wait and might report a deadlock by the wait.
> >
> > Replace it with trylock version of annotation.
> >
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> 
> Indeed. Feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Thank you, Jan.

	Byungchul

>                                                                 Honza
> 
> > ---
> >  fs/jbd2/transaction.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> > index c7867139af69..b4e65f51bf5e 100644
> > --- a/fs/jbd2/transaction.c
> > +++ b/fs/jbd2/transaction.c
> > @@ -441,7 +441,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle,
> >       read_unlock(&journal->j_state_lock);
> >       current->journal_info = handle;
> >
> > -     rwsem_acquire_read(&journal->j_trans_commit_map, 0, 0, _THIS_IP_);
> > +     rwsem_acquire_read(&journal->j_trans_commit_map, 0, 1, _THIS_IP_);
> >       jbd2_journal_free_transaction(new_transaction);
> >       /*
> >        * Ensure that no allocations done while the transaction is open are
> > --
> > 2.17.1
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH v17 08/47] x86_64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to x86_64
From: Byungchul Park @ 2025-10-03  1:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, josef,
	linux-fsdevel, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	harry.yoo, chris.p.wilson, gwan-gyeong.mun, max.byungchul.park,
	boqun.feng, longman, yunseong.kim, ysk, yeoreum.yun, netdev,
	matthew.brost, her0gyugyu, corbet, catalin.marinas, bp,
	dave.hansen, x86, hpa, luto, sumit.semwal, gustavo,
	christian.koenig, andi.shyti, arnd, lorenzo.stoakes, Liam.Howlett,
	rppt, surenb, mcgrof, petr.pavlu, da.gomez, samitolvanen, paulmck,
	frederic, neeraj.upadhyay, joelagnelf, josh, urezki,
	mathieu.desnoyers, jiangshanlai, qiang.zhang, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid,
	chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy, anna, kees,
	bigeasy, clrkwllms, mark.rutland, ada.coupriediaz,
	kristina.martsenko, wangkefeng.wang, broonie, kevin.brodsky, dwmw,
	shakeel.butt, ast, ziy, yuzhao, baolin.wang, usamaarif642,
	joel.granados, richard.weiyang, geert+renesas, tim.c.chen, linux,
	alexander.shishkin, lillian, chenhuacai, francesco,
	guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <cd056d80-aadd-4f8a-8aad-c34b55686fac@intel.com>

On Thu, Oct 02, 2025 at 08:22:29AM -0700, Dave Hansen wrote:
> On 10/2/25 01:12, Byungchul Park wrote:
> > dept needs to notice every entrance from user to kernel mode to treat
> > every kernel context independently when tracking wait-event dependencies.
> > Roughly, system call and user oriented fault are the cases.
> 
> "Roughly"?

I will change it to a better one.

> >  #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
> >  #define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
> > @@ -86,6 +87,12 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
> >  /* Returns true to return using SYSRET, or false to use IRET */
> >  __visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
> >  {
> > +     /*
> > +      * This is a system call from user mode.  Make dept work with a
> > +      * new kernel mode context.
> > +      */
> > +     dept_update_cxt();
> > +
> >       add_random_kstack_offset();
> >       nr = syscall_enter_from_user_mode(regs, nr);
> 
> Please take a look in syscall_enter_from_user_mode(). You'll see the
> quite nicely-named function: enter_from_user_mode(). That might be a
> nice place to put code that you want to run when the kernel is entered
> from user mode.

I wanted to put dept_update_cxt() to the very beginning of c code but..
yeah enter_from_user_mode() looks fine or even better.  Thanks a lot.

> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index 998bd807fc7b..017edb75f0a0 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -19,6 +19,7 @@
> >  #include <linux/mm_types.h>
> >  #include <linux/mm.h>                        /* find_and_lock_vma() */
> >  #include <linux/vmalloc.h>
> > +#include <linux/dept.h>
> >
> >  #include <asm/cpufeature.h>          /* boot_cpu_has, ...            */
> >  #include <asm/traps.h>                       /* dotraplinkage, ...           */
> > @@ -1219,6 +1220,12 @@ void do_user_addr_fault(struct pt_regs *regs,
> >       tsk = current;
> >       mm = tsk->mm;
> >
> > +     /*
> > +      * This fault comes from user mode.  Make dept work with a new
> > +      * kernel mode context.
> > +      */
> > +     dept_update_cxt();
> No, this fault does not come from user mode. That's why we call it "user
> addr" fault, not "user mode" fault. You end up here if, for instance,
> the kernel faults doing a copy_from_user().

My bad.  Thank you.  I will fix it.  Thank you very much.

	Byungchul

^ permalink raw reply

* Re: [PATCH v17 01/47] llist: move llist_{head,node} definition to types.h
From: Arnd Bergmann @ 2025-10-02 23:19 UTC (permalink / raw)
  To: Mathieu Desnoyers, Greg Kroah-Hartman, Byungchul Park
  Cc: linux-kernel, kernel_team, Linus Torvalds, Damien Le Moal,
	linux-ide, Andreas Dilger, linux-ext4, Ingo Molnar,
	Peter Zijlstra, Will Deacon, Thomas Gleixner, Steven Rostedt,
	Joel Fernandes, Sasha Levin, Daniel Vetter, duyuyang,
	Johannes Berg, Tejun Heo, Theodore Ts'o, Matthew Wilcox,
	Dave Chinner, Amir Goldstein, kernel-team, linux-mm,
	Andrew Morton, Michal Hocko, Minchan Kim, Johannes Weiner,
	vdavydov.dev, SeongJae Park, jglisse, Dennis Zhou,
	Christoph Lameter, Pekka Enberg, David Rientjes, Vlastimil Babka,
	ngupta, linux-block, Josef Bacik, linux-fsdevel, Jan Kara,
	Jeff Layton, Dan Williams, Christoph Hellwig, Darrick J. Wong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	harry.yoo, chris.p.wilson, Gwan-gyeong Mun, max.byungchul.park,
	Boqun Feng, Waiman Long, yunseong.kim, ysk, Yeoreum Yun, Netdev,
	Matthew Brost, her0gyugyu, Jonathan Corbet, Catalin Marinas,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Andy Lutomirski, Sumit Semwal, gustavo, Christian König,
	Andi Shyti, Lorenzo Stoakes, Liam R. Howlett, Mike Rapoport,
	Suren Baghdasaryan, Luis Chamberlain, Petr Pavlu, da.gomez,
	Sami Tolvanen, Paul E. McKenney, Frederic Weisbecker,
	neeraj.upadhyay, joelagnelf, Josh Triplett,
	Uladzislau Rezki (Sony), Lai Jiangshan, qiang.zhang, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Benjamin Segall, Mel Gorman,
	Valentin Schneider, Chuck Lever, neil, okorniev, Dai Ngo,
	Tom Talpey, trondmy, Anna Schumaker, Kees Cook,
	Sebastian Andrzej Siewior, Clark Williams, Mark Rutland,
	ada.coupriediaz, kristina.martsenko, Kefeng Wang, Mark Brown,
	Kevin Brodsky, David Woodhouse, Shakeel Butt, Alexei Starovoitov,
	Zi Yan, Yu Zhao, Baolin Wang, usamaarif642, joel.granados,
	Wei Yang, Geert Uytterhoeven, tim.c.chen, linux,
	Alexander Shishkin, lillian, Huacai Chen, francesco,
	guoweikang.kernel, link, Josh Poimboeuf, Masahiro Yamada,
	Christian Brauner, Thomas Weißschuh, Oleg Nesterov,
	Mateusz Guzik, Andrii Nakryiko, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	Linux-Arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <63034035-03e4-4184-afce-7e1a897a90e9@efficios.com>

On Thu, Oct 2, 2025, at 15:53, Mathieu Desnoyers wrote:
> On 2025-10-02 04:24, Greg KH wrote:
>> On Thu, Oct 02, 2025 at 05:12:01PM +0900, Byungchul Park wrote:
>>> llist_head and llist_node can be used by some other header files.  For
>>> example, dept for tracking dependencies uses llist in its header.  To
>>> avoid header dependency, move them to types.h.
>> 
>> If you need llist in your code, then include llist.h.  Don't force all
>> types.h users to do so as there is not a dependency in types.h for
>> llist.h.
>> 
>> This patch shouldn't be needed as you are hiding "header dependency" for
>> other files.
>
> I agree that moving this into a catch-all types.h is not what we should
> aim for.
>
> However, it's a good practice to move the type declarations to a
> separate header file, so code that only cares about type and not
> implementation of static inline functions can include just that.
>
> Perhaps we can move struct llist_head and struct llist_node to a new
> include/linux/llist_types.h instead ?

We have around a dozen types of linked lists, and the most common
two of them are currently defined in linux/types.h, while the
rest of them are each defined in the same header as the inteface
definition.

Duplicating each of those headers by splitting out the trivial
type definition doesn't quite seem right either, as we'd end
up with even more headers that have to be included indirectly
in each compilation unit.

Maybe a shared linux/list_types.h would work, to specifically
contain all the list_head variants that are meant to be included
in larger structures?

    Arnd

^ permalink raw reply

* Re: [PATCH v7 8/8] kbuild: vmlinux.unstripped should always depend on .vmlinux.export.o
From: ChaosEsque Team @ 2025-10-02 16:37 UTC (permalink / raw)
  To: Alexey Gladkov
  Cc: Nathan Chancellor, Nicolas Schier, Petr Pavlu, Luis Chamberlain,
	Sami Tolvanen, Daniel Gomez, linux-kernel, linux-modules,
	linux-kbuild
In-Reply-To: <b2d50fde039a0a7e2d1a9db6bae8b83cd4ec4856.1755535876.git.legion@kernel.org>

Alexey Gladkov.
RUSSSSKKIIEEEEEEE
(Rus)

On Mon, Aug 18, 2025 at 1:00 PM Alexey Gladkov <legion@kernel.org> wrote:
>
> Since .vmlinux.export.c is used to add generated by modpost modaliases
> for builtin modules the .vmlinux.export.o is no longer optional and
> should always be created. The generation of this file is not dependent
> on CONFIG_MODULES.
>
> Signed-off-by: Alexey Gladkov <legion@kernel.org>
> ---
>  scripts/Makefile.vmlinux | 9 ++-------
>  scripts/link-vmlinux.sh  | 5 +----
>  2 files changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
> index fcc188d26ead1..dbbe3bf0cf234 100644
> --- a/scripts/Makefile.vmlinux
> +++ b/scripts/Makefile.vmlinux
> @@ -53,11 +53,6 @@ endif
>  # vmlinux.unstripped
>  # ---------------------------------------------------------------------------
>
> -ifdef CONFIG_MODULES
> -targets += .vmlinux.export.o
> -vmlinux.unstripped: .vmlinux.export.o
> -endif
> -
>  ifdef CONFIG_ARCH_WANTS_PRE_LINK_VMLINUX
>  vmlinux.unstripped: arch/$(SRCARCH)/tools/vmlinux.arch.o
>
> @@ -72,8 +67,8 @@ cmd_link_vmlinux =                                                    \
>         $< "$(LD)" "$(KBUILD_LDFLAGS)" "$(LDFLAGS_vmlinux)" "$@";       \
>         $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
>
> -targets += vmlinux.unstripped
> -vmlinux.unstripped: scripts/link-vmlinux.sh vmlinux.o $(KBUILD_LDS) FORCE
> +targets += vmlinux.unstripped .vmlinux.export.o
> +vmlinux.unstripped: scripts/link-vmlinux.sh vmlinux.o .vmlinux.export.o $(KBUILD_LDS) FORCE
>         +$(call if_changed_dep,link_vmlinux)
>  ifdef CONFIG_DEBUG_INFO_BTF
>  vmlinux.unstripped: $(RESOLVE_BTFIDS)
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 51367c2bfc21e..433849ff7529e 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -73,10 +73,7 @@ vmlinux_link()
>                 objs="${objs} .builtin-dtbs.o"
>         fi
>
> -       if is_enabled CONFIG_MODULES; then
> -               objs="${objs} .vmlinux.export.o"
> -       fi
> -
> +       objs="${objs} .vmlinux.export.o"
>         objs="${objs} init/version-timestamp.o"
>
>         if [ "${SRCARCH}" = "um" ]; then
> --
> 2.50.1
>
>

^ permalink raw reply

* Re: [PATCH v17 08/47] x86_64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to x86_64
From: Dave Hansen @ 2025-10-02 15:22 UTC (permalink / raw)
  To: Byungchul Park, linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-9-byungchul@sk.com>

On 10/2/25 01:12, Byungchul Park wrote:
> dept needs to notice every entrance from user to kernel mode to treat
> every kernel context independently when tracking wait-event dependencies.
> Roughly, system call and user oriented fault are the cases.

"Roughly"?

>  #define __SYSCALL(nr, sym) extern long __x64_##sym(const struct pt_regs *);
>  #define __SYSCALL_NORETURN(nr, sym) extern long __noreturn __x64_##sym(const struct pt_regs *);
> @@ -86,6 +87,12 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
>  /* Returns true to return using SYSRET, or false to use IRET */
>  __visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
>  {
> +	/*
> +	 * This is a system call from user mode.  Make dept work with a
> +	 * new kernel mode context.
> +	 */
> +	dept_update_cxt();
> +
>  	add_random_kstack_offset();
>  	nr = syscall_enter_from_user_mode(regs, nr);

Please take a look in syscall_enter_from_user_mode(). You'll see the
quite nicely-named function: enter_from_user_mode(). That might be a
nice place to put code that you want to run when the kernel is entered
from user mode.

> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 998bd807fc7b..017edb75f0a0 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -19,6 +19,7 @@
>  #include <linux/mm_types.h>
>  #include <linux/mm.h>			/* find_and_lock_vma() */
>  #include <linux/vmalloc.h>
> +#include <linux/dept.h>
>  
>  #include <asm/cpufeature.h>		/* boot_cpu_has, ...		*/
>  #include <asm/traps.h>			/* dotraplinkage, ...		*/
> @@ -1219,6 +1220,12 @@ void do_user_addr_fault(struct pt_regs *regs,
>  	tsk = current;
>  	mm = tsk->mm;
>  
> +	/*
> +	 * This fault comes from user mode.  Make dept work with a new
> +	 * kernel mode context.
> +	 */
> +	dept_update_cxt();
No, this fault does not come from user mode. That's why we call it "user
addr" fault, not "user mode" fault. You end up here if, for instance,
the kernel faults doing a copy_from_user().

^ permalink raw reply

* Re: [PATCH v17 01/47] llist: move llist_{head,node} definition to types.h
From: Mathieu Desnoyers @ 2025-10-02 13:53 UTC (permalink / raw)
  To: Greg KH, Byungchul Park
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, josef, linux-fsdevel, jack,
	jlayton, dan.j.williams, hch, djwong, dri-devel,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa, harry.yoo,
	chris.p.wilson, gwan-gyeong.mun, max.byungchul.park, boqun.feng,
	longman, yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost,
	her0gyugyu, corbet, catalin.marinas, bp, dave.hansen, x86, hpa,
	luto, sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, jiangshanlai, qiang.zhang, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid,
	chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy, anna, kees,
	bigeasy, clrkwllms, mark.rutland, ada.coupriediaz,
	kristina.martsenko, wangkefeng.wang, broonie, kevin.brodsky, dwmw,
	shakeel.butt, ast, ziy, yuzhao, baolin.wang, usamaarif642,
	joel.granados, richard.weiyang, geert+renesas, tim.c.chen, linux,
	alexander.shishkin, lillian, chenhuacai, francesco,
	guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <2025100230-grafted-alias-22a2@gregkh>

On 2025-10-02 04:24, Greg KH wrote:
> On Thu, Oct 02, 2025 at 05:12:01PM +0900, Byungchul Park wrote:
>> llist_head and llist_node can be used by some other header files.  For
>> example, dept for tracking dependencies uses llist in its header.  To
>> avoid header dependency, move them to types.h.
> 
> If you need llist in your code, then include llist.h.  Don't force all
> types.h users to do so as there is not a dependency in types.h for
> llist.h.
> 
> This patch shouldn't be needed as you are hiding "header dependency" for
> other files.

I agree that moving this into a catch-all types.h is not what we should
aim for.

However, it's a good practice to move the type declarations to a
separate header file, so code that only cares about type and not
implementation of static inline functions can include just that.

Perhaps we can move struct llist_head and struct llist_node to a new
include/linux/llist_types.h instead ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

^ permalink raw reply

* Re: [PATCH v17 02/47] dept: implement DEPT(DEPendency Tracker)
From: Geert Uytterhoeven @ 2025-10-02 12:56 UTC (permalink / raw)
  To: Greg KH
  Cc: Byungchul Park, linux-kernel, kernel_team, torvalds,
	damien.lemoal, linux-ide, adilger.kernel, linux-ext4, mingo,
	peterz, will, tglx, rostedt, joel, sashal, daniel.vetter,
	duyuyang, johannes.berg, tj, tytso, willy, david, amir73il,
	kernel-team, linux-mm, akpm, mhocko, minchan, hannes,
	vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes, vbabka,
	ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <2025100255-tapestry-elite-31b0@gregkh>

Hi Greg,

On Thu, 2 Oct 2025 at 10:25, Greg KH <gregkh@linuxfoundation.org> wrote:
> > @@ -0,0 +1,446 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * DEPT(DEPendency Tracker) - runtime dependency tracker
> > + *
> > + * Started by Byungchul Park <max.byungchul.park@gmail.com>:
> > + *
> > + *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
> > + *  Copyright (c) 2024 SK hynix, Inc., Byungchul Park
>
> Nit, it's now 2025 :)

The last non-trivial change to this file was between the last version
posted in 2024 (v14) and the first version posted in 2025 (v15),
so 2024 doesn't sound that off to me.
You are not supposed to bump the copyright year when republishing
without any actual changes.  It is meant to be the work’s first year
of publication.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH v17 09/47] arm64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64
From: Mark Brown @ 2025-10-02 11:39 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, josef,
	linux-fsdevel, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	harry.yoo, chris.p.wilson, gwan-gyeong.mun, max.byungchul.park,
	boqun.feng, longman, yunseong.kim, ysk, yeoreum.yun, netdev,
	matthew.brost, her0gyugyu, corbet, catalin.marinas, bp,
	dave.hansen, x86, hpa, luto, sumit.semwal, gustavo,
	christian.koenig, andi.shyti, arnd, lorenzo.stoakes, Liam.Howlett,
	rppt, surenb, mcgrof, petr.pavlu, da.gomez, samitolvanen, paulmck,
	frederic, neeraj.upadhyay, joelagnelf, josh, urezki,
	mathieu.desnoyers, jiangshanlai, qiang.zhang, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid,
	chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy, anna, kees,
	bigeasy, clrkwllms, mark.rutland, ada.coupriediaz,
	kristina.martsenko, wangkefeng.wang, kevin.brodsky, dwmw,
	shakeel.butt, ast, ziy, yuzhao, baolin.wang, usamaarif642,
	joel.granados, richard.weiyang, geert+renesas, tim.c.chen, linux,
	alexander.shishkin, lillian, chenhuacai, francesco,
	guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-10-byungchul@sk.com>

[-- Attachment #1: Type: text/plain, Size: 659 bytes --]

On Thu, Oct 02, 2025 at 05:12:09PM +0900, Byungchul Park wrote:
> dept needs to notice every entrance from user to kernel mode to treat
> every kernel context independently when tracking wait-event dependencies.
> Roughly, system call and user oriented fault are the cases.
> 
> Make dept aware of the entrances of arm64 and add support
> CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64.

The description of what needs to be tracked probably needs some
tightening up here, it's not clear to me for example why exceptions for
mops or the vector extensions aren't included here, or what the
distinction is with error faults like BTI or GCS not being tracked?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v17 30/47] fs/jbd2: use a weaker annotation in journal handling
From: Jan Kara @ 2025-10-02  8:40 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, gregkh, kernel-team, linux-mm, akpm,
	mhocko, minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl,
	penberg, rientjes, vbabka, ngupta, linux-block, josef,
	linux-fsdevel, jack, jlayton, dan.j.williams, hch, djwong,
	dri-devel, rodrigosiqueiramelo, melissa.srw, hamohammed.sa,
	harry.yoo, chris.p.wilson, gwan-gyeong.mun, max.byungchul.park,
	boqun.feng, longman, yunseong.kim, ysk, yeoreum.yun, netdev,
	matthew.brost, her0gyugyu, corbet, catalin.marinas, bp,
	dave.hansen, x86, hpa, luto, sumit.semwal, gustavo,
	christian.koenig, andi.shyti, arnd, lorenzo.stoakes, Liam.Howlett,
	rppt, surenb, mcgrof, petr.pavlu, da.gomez, samitolvanen, paulmck,
	frederic, neeraj.upadhyay, joelagnelf, josh, urezki,
	mathieu.desnoyers, jiangshanlai, qiang.zhang, juri.lelli,
	vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid,
	chuck.lever, neil, okorniev, Dai.Ngo, tom, trondmy, anna, kees,
	bigeasy, clrkwllms, mark.rutland, ada.coupriediaz,
	kristina.martsenko, wangkefeng.wang, broonie, kevin.brodsky, dwmw,
	shakeel.butt, ast, ziy, yuzhao, baolin.wang, usamaarif642,
	joel.granados, richard.weiyang, geert+renesas, tim.c.chen, linux,
	alexander.shishkin, lillian, chenhuacai, francesco,
	guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-31-byungchul@sk.com>

On Thu 02-10-25 17:12:30, Byungchul Park wrote:
> jbd2 journal handling code doesn't want jbd2_might_wait_for_commit()
> to be placed between start_this_handle() and stop_this_handle().  So it
> marks the region with rwsem_acquire_read() and rwsem_release().
> 
> However, the annotation is too strong for that purpose.  We don't have
> to use more than try lock annotation for that.
> 
> rwsem_acquire_read() implies:
> 
>    1. might be a waiter on contention of the lock.
>    2. enter to the critical section of the lock.
> 
> All we need in here is to act 2, not 1.  So trylock version of
> annotation is sufficient for that purpose.  Now that dept partially
> relies on lockdep annotaions, dept interpets rwsem_acquire_read() as a
> potential wait and might report a deadlock by the wait.
> 
> Replace it with trylock version of annotation.
> 
> Signed-off-by: Byungchul Park <byungchul@sk.com>

Indeed. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/jbd2/transaction.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index c7867139af69..b4e65f51bf5e 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -441,7 +441,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle,
>  	read_unlock(&journal->j_state_lock);
>  	current->journal_info = handle;
>  
> -	rwsem_acquire_read(&journal->j_trans_commit_map, 0, 0, _THIS_IP_);
> +	rwsem_acquire_read(&journal->j_trans_commit_map, 0, 1, _THIS_IP_);
>  	jbd2_journal_free_transaction(new_transaction);
>  	/*
>  	 * Ensure that no allocations done while the transaction is open are
> -- 
> 2.17.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH v17 02/47] dept: implement DEPT(DEPendency Tracker)
From: Greg KH @ 2025-10-02  8:25 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, josef, linux-fsdevel, jack,
	jlayton, dan.j.williams, hch, djwong, dri-devel,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa, harry.yoo,
	chris.p.wilson, gwan-gyeong.mun, max.byungchul.park, boqun.feng,
	longman, yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost,
	her0gyugyu, corbet, catalin.marinas, bp, dave.hansen, x86, hpa,
	luto, sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-3-byungchul@sk.com>

> @@ -0,0 +1,446 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * DEPT(DEPendency Tracker) - runtime dependency tracker
> + *
> + * Started by Byungchul Park <max.byungchul.park@gmail.com>:
> + *
> + *  Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
> + *  Copyright (c) 2024 SK hynix, Inc., Byungchul Park

Nit, it's now 2025 :)

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v17 01/47] llist: move llist_{head,node} definition to types.h
From: Greg KH @ 2025-10-02  8:24 UTC (permalink / raw)
  To: Byungchul Park
  Cc: linux-kernel, kernel_team, torvalds, damien.lemoal, linux-ide,
	adilger.kernel, linux-ext4, mingo, peterz, will, tglx, rostedt,
	joel, sashal, daniel.vetter, duyuyang, johannes.berg, tj, tytso,
	willy, david, amir73il, kernel-team, linux-mm, akpm, mhocko,
	minchan, hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg,
	rientjes, vbabka, ngupta, linux-block, josef, linux-fsdevel, jack,
	jlayton, dan.j.williams, hch, djwong, dri-devel,
	rodrigosiqueiramelo, melissa.srw, hamohammed.sa, harry.yoo,
	chris.p.wilson, gwan-gyeong.mun, max.byungchul.park, boqun.feng,
	longman, yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost,
	her0gyugyu, corbet, catalin.marinas, bp, dave.hansen, x86, hpa,
	luto, sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-2-byungchul@sk.com>

On Thu, Oct 02, 2025 at 05:12:01PM +0900, Byungchul Park wrote:
> llist_head and llist_node can be used by some other header files.  For
> example, dept for tracking dependencies uses llist in its header.  To
> avoid header dependency, move them to types.h.

If you need llist in your code, then include llist.h.  Don't force all
types.h users to do so as there is not a dependency in types.h for
llist.h.

This patch shouldn't be needed as you are hiding "header dependency" for
other files.

thanks,

greg k-h

^ permalink raw reply

* [PATCH v17 46/47] SUNRPC: relocate struct rcu_head to the first field of struct rpc_xprt
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

While compiling Linux kernel with DEPT on, the following error was
observed:

   ./include/linux/rcupdate.h:1084:17: note: in expansion of macro
   ‘BUILD_BUG_ON’
   1084 | BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096);	\
        | ^~~~~~~~~~~~
   ./include/linux/rcupdate.h:1047:29: note: in expansion of macro
   'kvfree_rcu_arg_2'
   1047 | #define kfree_rcu(ptr, rhf) kvfree_rcu_arg_2(ptr, rhf)
        |                             ^~~~~~~~~~~~~~~~
   net/sunrpc/xprt.c:1856:9: note: in expansion of macro 'kfree_rcu'
   1856 | kfree_rcu(xprt, rcu);
        | ^~~~~~~~~
    CC net/kcm/kcmproc.o
   make[4]: *** [scripts/Makefile.build:203: net/sunrpc/xprt.o] Error 1

Since kfree_rcu() assumes 'offset of struct rcu_head in a rcu-managed
struct < 4096', the offest of struct rcu_head in struct rpc_xprt should
not exceed 4096 but does, due to the debug information added by DEPT.

Relocate struct rcu_head to the first field of struct rpc_xprt from an
arbitrary location to avoid the issue and meet the assumption.

Reported-by: Yunseong Kim <ysk@kzalloc.com>
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/sunrpc/xprt.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index f46d1fb8f71a..666e42a17a31 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -211,6 +211,14 @@ enum xprt_transports {
 
 struct rpc_sysfs_xprt;
 struct rpc_xprt {
+	/*
+	 * Place struct rcu_head within the first 4096 bytes of struct
+	 * rpc_xprt if sizeof(struct rpc_xprt) > 4096, so that
+	 * kfree_rcu() can simply work assuming that.  See the comment
+	 * in kfree_rcu().
+	 */
+	struct rcu_head		rcu;
+
 	struct kref		kref;		/* Reference count */
 	const struct rpc_xprt_ops *ops;		/* transport methods */
 	unsigned int		id;		/* transport id */
@@ -317,7 +325,6 @@ struct rpc_xprt {
 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
 	struct dentry		*debugfs;		/* debugfs directory */
 #endif
-	struct rcu_head		rcu;
 	const struct xprt_class	*xprt_class;
 	struct rpc_sysfs_xprt	*xprt_sysfs;
 	bool			main; /*mark if this is the 1st transport */
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 47/47] mm: percpu: increase PERCPU_DYNAMIC_SIZE_SHIFT on DEPT and large PAGE_SIZE
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

Yunseong reported a build failure due to the BUILD_BUG_ON() statement in
alloc_kmem_cache_cpus().  In the following test:

  PERCPU_DYNAMIC_EARLY_SIZE < NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * sizeof(struct kmem_cache_cpu)

The following factors increase the right side of the equation:

  1. PAGE_SIZE > 4KiB increases KMALLOC_SHIFT_HIGH.
  2. DEPT increases the size of the local_lock_t in kmem_cache_cpu.

Increase PERCPU_DYNAMIC_SIZE_SHIFT to 11 on configs with PAGE_SIZE
larger than 4KiB and DEPT enabled.

Reported-by: Yunseong Kim <ysk@kzalloc.com>
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/percpu.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 85bf8dd9f087..dd74321d4bbd 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -43,7 +43,11 @@
 # define PERCPU_DYNAMIC_SIZE_SHIFT      12
 #endif /* LOCKDEP and PAGE_SIZE > 4KiB */
 #else
+#if defined(CONFIG_DEPT) && !defined(CONFIG_PAGE_SIZE_4KB)
+#define PERCPU_DYNAMIC_SIZE_SHIFT      11
+#else
 #define PERCPU_DYNAMIC_SIZE_SHIFT      10
+#endif /* DEPT and PAGE_SIZE > 4KiB */
 #endif
 
 /*
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 45/47] dept: track PG_writeback with dept
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

Makes dept able to track PG_writeback waits and events, which will be
useful in practice.

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/mm_types.h   |  1 +
 include/linux/page-flags.h |  7 +++++++
 mm/filemap.c               | 11 +++++++++++
 mm/mm_init.c               |  1 +
 4 files changed, 20 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8ccbb030500c..bed1a3bc81e1 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -226,6 +226,7 @@ struct page {
 #endif
 	struct dept_page_usage usage;
 	struct dept_ext_wgen pg_locked_wgen;
+	struct dept_ext_wgen pg_writeback_wgen;
 } _struct_page_alignment;
 
 /*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 3fd3660ddc6f..b965b16c8cee 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -203,6 +203,7 @@ enum pageflags {
 #include <linux/dept.h>
 
 extern struct dept_map pg_locked_map;
+extern struct dept_map pg_writeback_map;
 
 static inline int dept_set_page_usage(struct page *p,
 		unsigned int new_type)
@@ -279,6 +280,8 @@ static inline void dept_page_set_bit(struct page *p, int bit_nr)
 	dept_update_page_usage(p);
 	if (bit_nr == PG_locked)
 		dept_request_event(&pg_locked_map, &p->pg_locked_wgen);
+	else if (bit_nr == PG_writeback)
+		dept_request_event(&pg_writeback_map, &p->pg_writeback_wgen);
 }
 
 static inline void dept_page_clear_bit(struct page *p, int bit_nr)
@@ -288,6 +291,8 @@ static inline void dept_page_clear_bit(struct page *p, int bit_nr)
 	evt_f = dept_event_flags(p, false);
 	if (bit_nr == PG_locked)
 		dept_event(&pg_locked_map, evt_f, _RET_IP_, __func__, &p->pg_locked_wgen);
+	else if (bit_nr == PG_writeback)
+		dept_event(&pg_writeback_map, evt_f, _RET_IP_, __func__, &p->pg_writeback_wgen);
 }
 
 static inline void dept_page_wait_on_bit(struct page *p, int bit_nr)
@@ -298,6 +303,8 @@ static inline void dept_page_wait_on_bit(struct page *p, int bit_nr)
 	evt_f = dept_event_flags(p, true);
 	if (bit_nr == PG_locked)
 		dept_wait(&pg_locked_map, evt_f, _RET_IP_, __func__, 0, -1L);
+	else if (bit_nr == PG_writeback)
+		dept_wait(&pg_writeback_map, evt_f, _RET_IP_, __func__, 0, -1L);
 }
 
 static inline void dept_folio_set_bit(struct folio *f, int bit_nr)
diff --git a/mm/filemap.c b/mm/filemap.c
index edb0710ddb3f..d8f1816dc6c2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1187,6 +1187,13 @@ static void folio_wake_bit(struct folio *folio, int bit_nr)
 	key.bit_nr = bit_nr;
 	key.page_match = 0;
 
+	/*
+	 * dept_page_clear_bit() being called multiple times is harmless.
+	 * The worst case is to miss some dependencies but it's okay.
+	 */
+	if (bit_nr == PG_locked || bit_nr == PG_writeback)
+		dept_page_clear_bit(&folio->page, bit_nr);
+
 	spin_lock_irqsave(&q->lock, flags);
 	__wake_up_locked_key(q, TASK_NORMAL, &key);
 
@@ -1241,6 +1248,9 @@ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
 struct dept_map __maybe_unused pg_locked_map = DEPT_MAP_INITIALIZER(pg_locked_map, NULL);
 EXPORT_SYMBOL(pg_locked_map);
 
+struct dept_map __maybe_unused pg_writeback_map = DEPT_MAP_INITIALIZER(pg_writeback_map, NULL);
+EXPORT_SYMBOL(pg_writeback_map);
+
 static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 		int state, enum behavior behavior)
 {
@@ -1683,6 +1693,7 @@ void folio_end_writeback(struct folio *folio)
 	 * reused before the folio_wake_bit().
 	 */
 	folio_get(folio);
+	dept_page_clear_bit(&folio->page, PG_writeback);
 	if (__folio_end_writeback(folio))
 		folio_wake_bit(folio, PG_writeback);
 
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 09e4ac6a73c7..fd2bf6689afa 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -589,6 +589,7 @@ void __meminit __init_single_page(struct page *page, unsigned long pfn,
 	page_cpupid_reset_last(page);
 	page_kasan_tag_reset(page);
 	dept_ext_wgen_init(&page->pg_locked_wgen);
+	dept_ext_wgen_init(&page->pg_writeback_wgen);
 
 	INIT_LIST_HEAD(&page->lru);
 #ifdef WANT_PAGE_VIRTUAL
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 43/47] rcu/update: fix same dept key collision between various types of RCU
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

The current implementation shares the same dept key for multiple
synchronization points, which can lead to false positive reports in
dependency tracking and potential confusion in debugging.  For example,
both normal RCU and tasks trace RCU synchronization points use the same
dept key.  Specifically:

   1. synchronize_rcu() uses a dept key embedded in __wait_rcu_gp():

      synchronize_rcu()
         synchronize_rcu_normal()
            _wait_rcu_gp()
               __wait_rcu_gp() <- the key as static variable

   2. synchronize_rcu_tasks_trace() uses the dept key, too:

      synchronize_rcu_tasks_trace()
         synchronize_rcu_tasks_generic()
            _wait_rcu_gp()
               __wait_rcu_gp() <- the key as static variable

Since the both rely on the same dept key, dept may report false positive
circular dependency.  To resolve this, separate dept keys and maps
should be assigned to each struct rcu_synchronize.

   ===================================================
   DEPT: Circular dependency has been detected.
   6.15.0-rc6-00042-ged94bafc6405 #2 Not tainted
   ---------------------------------------------------
   summary
   ---------------------------------------------------
   *** DEADLOCK ***

   context A
      [S] lock(cpu_hotplug_lock:0)
      [W] __wait_rcu_gp(<sched>:0)
      [E] unlock(cpu_hotplug_lock:0)

   context B
      [S] (unknown)(<sched>:0)
      [W] lock(cpu_hotplug_lock:0)
      [E] try_to_wake_up(<sched>:0)

   [S]: start of the event context
   [W]: the wait blocked
   [E]: the event not reachable
   ---------------------------------------------------
   context A's detail
   ---------------------------------------------------
   context A
      [S] lock(cpu_hotplug_lock:0)
      [W] __wait_rcu_gp(<sched>:0)
      [E] unlock(cpu_hotplug_lock:0)

   [S] lock(cpu_hotplug_lock:0):
   [<ffff8000802ce964>] cpus_read_lock+0x14/0x20
   stacktrace:
         percpu_down_read.constprop.0+0x88/0x2ec
         cpus_read_lock+0x14/0x20
         cgroup_procs_write_start+0x164/0x634
         __cgroup_procs_write+0xdc/0x4d0
         cgroup_procs_write+0x34/0x74
         cgroup_file_write+0x25c/0x670
         kernfs_fop_write_iter+0x2ec/0x498
         vfs_write+0x574/0xc30
         ksys_write+0x124/0x244
         __arm64_sys_write+0x70/0xa4
         invoke_syscall+0x88/0x2e0
         el0_svc_common.constprop.0+0xe8/0x2e0
         do_el0_svc+0x44/0x60
         el0_svc+0x50/0x188
         el0t_64_sync_handler+0x10c/0x140
         el0t_64_sync+0x198/0x19c

   [W] __wait_rcu_gp(<sched>:0):
   [<ffff8000804ce88c>] __wait_rcu_gp+0x324/0x498
   stacktrace:
         schedule+0xcc/0x348
         schedule_timeout+0x1a4/0x268
         __wait_for_common+0x1c4/0x3f0
         __wait_for_completion_state+0x20/0x38
         __wait_rcu_gp+0x35c/0x498
         synchronize_rcu_normal+0x200/0x218
         synchronize_rcu+0x234/0x2a0
         rcu_sync_enter+0x11c/0x300
         percpu_down_write+0xb4/0x3e0
         cgroup_procs_write_start+0x174/0x634
         __cgroup_procs_write+0xdc/0x4d0
         cgroup_procs_write+0x34/0x74
         cgroup_file_write+0x25c/0x670
         kernfs_fop_write_iter+0x2ec/0x498
         vfs_write+0x574/0xc30
         ksys_write+0x124/0x244

   [E] unlock(cpu_hotplug_lock:0):
   (N/A)
   ---------------------------------------------------
   context B's detail
   ---------------------------------------------------
   context B
      [S] (unknown)(<sched>:0)
      [W] lock(cpu_hotplug_lock:0)
      [E] try_to_wake_up(<sched>:0)

   [S] (unknown)(<sched>:0):
   (N/A)

   [W] lock(cpu_hotplug_lock:0):
   [<ffff8000802ce964>] cpus_read_lock+0x14/0x20
   stacktrace:
         percpu_down_read.constprop.0+0x6c/0x2ec
         cpus_read_lock+0x14/0x20
         check_all_holdout_tasks_trace+0x90/0xa30
         rcu_tasks_wait_gp+0x47c/0x938
         rcu_tasks_one_gp+0x75c/0xef8
         rcu_tasks_kthread+0x180/0x1dc
         kthread+0x3ac/0x74c
         ret_from_fork+0x10/0x20

   [E] try_to_wake_up(<sched>:0):
   [<ffff8000804233b8>] complete+0xb8/0x1e8
   stacktrace:
         try_to_wake_up+0x374/0x1164
         complete+0xb8/0x1e8
         wakeme_after_rcu+0x14/0x20
         rcu_tasks_invoke_cbs+0x218/0xaa8
         rcu_tasks_one_gp+0x834/0xef8
         rcu_tasks_kthread+0x180/0x1dc
         kthread+0x3ac/0x74c
         ret_from_fork+0x10/0x20
   (wait to wake up)
   stacktrace:
         __schedule+0xf64/0x3614
         schedule+0xcc/0x348
         schedule_timeout+0x1a4/0x268
         __wait_for_common+0x1c4/0x3f0
         __wait_for_completion_state+0x20/0x38
         __wait_rcu_gp+0x35c/0x498
         synchronize_rcu_tasks_generic+0x14c/0x220
         synchronize_rcu_tasks_trace+0x24/0x8c
         rcu_init_tasks_generic+0x168/0x194
         do_one_initcall+0x174/0xa00
         kernel_init_freeable+0x744/0x7dc
         kernel_init+0x78/0x220
         ret_from_fork+0x10/0x20

Separating the dept key and map for each of struct rcu_synchronize,
ensuring proper tracking for each execution context.

Signed-off-by: Yunseong Kim <ysk@kzalloc.com>
[ Rewrote the changelog. ]
Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/rcupdate_wait.h | 13 ++++++++-----
 kernel/rcu/rcu.h              |  1 +
 kernel/rcu/update.c           |  5 +++--
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h
index 4c92d4291cce..ee598e70b4bc 100644
--- a/include/linux/rcupdate_wait.h
+++ b/include/linux/rcupdate_wait.h
@@ -19,17 +19,20 @@ struct rcu_synchronize {
 
 	/* This is for debugging. */
 	struct rcu_gp_oldstate oldstate;
+	struct dept_map dmap;
+	struct dept_key dkey;
 };
 void wakeme_after_rcu(struct rcu_head *head);
 
 void __wait_rcu_gp(bool checktiny, unsigned int state, int n, call_rcu_func_t *crcu_array,
-		   struct rcu_synchronize *rs_array);
+		   struct rcu_synchronize *rs_array, struct dept_key *dkey);
 
 #define _wait_rcu_gp(checktiny, state, ...) \
-do {												\
-	call_rcu_func_t __crcu_array[] = { __VA_ARGS__ };					\
-	struct rcu_synchronize __rs_array[ARRAY_SIZE(__crcu_array)];				\
-	__wait_rcu_gp(checktiny, state, ARRAY_SIZE(__crcu_array), __crcu_array, __rs_array);	\
+do {													\
+	call_rcu_func_t __crcu_array[] = { __VA_ARGS__ };						\
+	static struct dept_key __key;									\
+	struct rcu_synchronize __rs_array[ARRAY_SIZE(__crcu_array)];					\
+	__wait_rcu_gp(checktiny, state, ARRAY_SIZE(__crcu_array), __crcu_array, __rs_array, &__key);	\
 } while (0)
 
 #define wait_rcu_gp(...) _wait_rcu_gp(false, TASK_UNINTERRUPTIBLE, __VA_ARGS__)
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 9cf01832a6c3..c0d8ea139596 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -12,6 +12,7 @@
 
 #include <linux/slab.h>
 #include <trace/events/rcu.h>
+#include <linux/dept_sdt.h>
 
 /*
  * Grace-period counter management.
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index c912b594ba98..82292337d5b0 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -409,7 +409,7 @@ void wakeme_after_rcu(struct rcu_head *head)
 EXPORT_SYMBOL_GPL(wakeme_after_rcu);
 
 void __wait_rcu_gp(bool checktiny, unsigned int state, int n, call_rcu_func_t *crcu_array,
-		   struct rcu_synchronize *rs_array)
+		   struct rcu_synchronize *rs_array, struct dept_key *dkey)
 {
 	int i;
 	int j;
@@ -426,7 +426,8 @@ void __wait_rcu_gp(bool checktiny, unsigned int state, int n, call_rcu_func_t *c
 				break;
 		if (j == i) {
 			init_rcu_head_on_stack(&rs_array[i].head);
-			init_completion(&rs_array[i].completion);
+			sdt_map_init_key(&rs_array[i].dmap, dkey);
+			init_completion_dmap(&rs_array[i].completion, &rs_array[i].dmap);
 			(crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
 		}
 	}
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 44/47] dept: introduce APIs to set page usage and use subclasses_evt for the usage
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

False positive reports have been observed since dept works with the
assumption that all the pages have the same dept class, but the class
should be split since the problematic call paths are different depending
on what the page is used for.

At least, ones in block device's address_space and ones in regular
file's address_space have exclusively different usages.

Thus, define usage candidates like:

   DEPT_PAGE_REGFILE_CACHE /* page in regular file's address_space */
   DEPT_PAGE_BDEV_CACHE    /* page in block device's address_space */
   DEPT_PAGE_DEFAULT       /* the others */

Introduce APIs to set each page's usage properly and make sure not to
interact between at least between DEPT_PAGE_REGFILE_CACHE and
DEPT_PAGE_BDEV_CACHE.  However, besides the exclusive usages, allow any
other combinations to interact to the other for example:

   PG_locked for DEPT_PAGE_DEFAULT page can wait for PG_locked for
   DEPT_PAGE_REGFILE_CACHE page and vice versa.

   PG_locked for DEPT_PAGE_DEFAULT page can wait for PG_locked for
   DEPT_PAGE_BDEV_CACHE page and vice versa.

   PG_locked for DEPT_PAGE_DEFAULT page can wait for PG_locked for
   DEPT_PAGE_DEFAULT page.

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/dept.h       | 31 +++++++++++++++-
 include/linux/mm_types.h   |  1 +
 include/linux/page-flags.h | 76 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 104 insertions(+), 4 deletions(-)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index 0ac13129f308..fbbc41048fac 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -21,8 +21,8 @@ struct task_struct;
 #define DEPT_MAX_WAIT_HIST		64
 #define DEPT_MAX_ECXT_HELD		48
 
-#define DEPT_MAX_SUBCLASSES		16
-#define DEPT_MAX_SUBCLASSES_EVT		2
+#define DEPT_MAX_SUBCLASSES		24
+#define DEPT_MAX_SUBCLASSES_EVT		3
 #define DEPT_MAX_SUBCLASSES_USR		(DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
 #define DEPT_MAX_SUBCLASSES_CACHE	2
 
@@ -390,6 +390,32 @@ struct dept_ext_wgen {
 	unsigned int wgen;
 };
 
+enum {
+	DEPT_PAGE_DEFAULT = 0,
+	DEPT_PAGE_REGFILE_CACHE,	/* regular file page cache */
+	DEPT_PAGE_BDEV_CACHE,		/* block device cache */
+	DEPT_PAGE_USAGE_NR,		/* nr of usages options */
+};
+
+#define DEPT_PAGE_USAGE_SHIFT 16
+#define DEPT_PAGE_USAGE_MASK ((1U << DEPT_PAGE_USAGE_SHIFT) - 1)
+#define DEPT_PAGE_USAGE_PENDING_MASK (DEPT_PAGE_USAGE_MASK << DEPT_PAGE_USAGE_SHIFT)
+
+/*
+ * Identify each page's usage type
+ */
+struct dept_page_usage {
+	/*
+	 * low 16 bits  : the current usage type
+	 * high 16 bits : usage type requested to be set
+	 *
+	 * Do not apply the type requested immediately but defer until
+	 * after clearing PG_locked bit of the folio or page e.g. by
+	 * folio_unlock().
+	 */
+	atomic_t type; /* Update and read atomically */
+};
+
 struct dept_event_site {
 	/*
 	 * event site name
@@ -562,6 +588,7 @@ extern void dept_hardirqs_off(void);
 struct dept_key { };
 struct dept_map { };
 struct dept_ext_wgen { };
+struct dept_page_usage { };
 struct dept_event_site { };
 
 #define DEPT_MAP_INITIALIZER(n, k) { }
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5ebc565309af..8ccbb030500c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -224,6 +224,7 @@ struct page {
 	struct page *kmsan_shadow;
 	struct page *kmsan_origin;
 #endif
+	struct dept_page_usage usage;
 	struct dept_ext_wgen pg_locked_wgen;
 } _struct_page_alignment;
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d3c4954c4218..3fd3660ddc6f 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -204,6 +204,68 @@ enum pageflags {
 
 extern struct dept_map pg_locked_map;
 
+static inline int dept_set_page_usage(struct page *p,
+		unsigned int new_type)
+{
+	unsigned int type = atomic_read(&p->usage.type);
+
+	if (WARN_ON_ONCE(new_type >= DEPT_PAGE_USAGE_NR))
+		return -1;
+
+	new_type <<= DEPT_PAGE_USAGE_SHIFT;
+retry:
+	new_type &= ~DEPT_PAGE_USAGE_MASK;
+	new_type |= type & DEPT_PAGE_USAGE_MASK;
+
+	if (!atomic_try_cmpxchg(&p->usage.type, &type, new_type))
+		goto retry;
+
+	return 0;
+}
+
+static inline int dept_reset_page_usage(struct page *p)
+{
+	return dept_set_page_usage(p, DEPT_PAGE_DEFAULT);
+}
+
+static inline void dept_update_page_usage(struct page *p)
+{
+	unsigned int type = atomic_read(&p->usage.type);
+	unsigned int new_type;
+
+retry:
+	new_type = type & DEPT_PAGE_USAGE_PENDING_MASK;
+	new_type >>= DEPT_PAGE_USAGE_SHIFT;
+	new_type |= type & DEPT_PAGE_USAGE_PENDING_MASK;
+
+	/*
+	 * Already updated by others.
+	 */
+	if (type == new_type)
+		return;
+
+	if (!atomic_try_cmpxchg(&p->usage.type, &type, new_type))
+		goto retry;
+}
+
+static inline unsigned long dept_event_flags(struct page *p, bool wait)
+{
+	unsigned int type;
+
+	type = atomic_read(&p->usage.type) & DEPT_PAGE_USAGE_MASK;
+
+	if (WARN_ON_ONCE(type >= DEPT_PAGE_USAGE_NR))
+		return 0;
+
+	/*
+	 * event
+	 */
+	if (!wait)
+		return 1UL << type;
+
+	return (1UL << DEPT_PAGE_DEFAULT) | (1UL << type);
+}
+
 /*
  * Place the following annotations in its suitable point in code:
  *
@@ -214,20 +276,28 @@ extern struct dept_map pg_locked_map;
 
 static inline void dept_page_set_bit(struct page *p, int bit_nr)
 {
+	dept_update_page_usage(p);
 	if (bit_nr == PG_locked)
 		dept_request_event(&pg_locked_map, &p->pg_locked_wgen);
 }
 
 static inline void dept_page_clear_bit(struct page *p, int bit_nr)
 {
+	unsigned long evt_f;
+
+	evt_f = dept_event_flags(p, false);
 	if (bit_nr == PG_locked)
-		dept_event(&pg_locked_map, 1UL, _RET_IP_, __func__, &p->pg_locked_wgen);
+		dept_event(&pg_locked_map, evt_f, _RET_IP_, __func__, &p->pg_locked_wgen);
 }
 
 static inline void dept_page_wait_on_bit(struct page *p, int bit_nr)
 {
+	unsigned long evt_f;
+
+	dept_update_page_usage(p);
+	evt_f = dept_event_flags(p, true);
 	if (bit_nr == PG_locked)
-		dept_wait(&pg_locked_map, 1UL, _RET_IP_, __func__, 0, -1L);
+		dept_wait(&pg_locked_map, evt_f, _RET_IP_, __func__, 0, -1L);
 }
 
 static inline void dept_folio_set_bit(struct folio *f, int bit_nr)
@@ -245,6 +315,8 @@ static inline void dept_folio_wait_on_bit(struct folio *f, int bit_nr)
 	dept_page_wait_on_bit(&f->page, bit_nr);
 }
 #else
+#define dept_set_page_usage(p, t)		do { } while (0)
+#define dept_reset_page_usage(p)		do { } while (0)
 #define dept_page_set_bit(p, bit_nr)		do { } while (0)
 #define dept_page_clear_bit(p, bit_nr)		do { } while (0)
 #define dept_page_wait_on_bit(p, bit_nr)	do { } while (0)
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 42/47] dept: call dept_hardirqs_off() in local_irq_*() regardless of irq state
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

For dept to function properly, dept_task()->hardirqs_enabled must be set
correctly.  If it fails to set this value to false, for example, dept
may mistakenly think irq is still enabled even when it's not.

Do dept_hardirqs_off() regardless of irq state not to miss any
unexpected cases by any chance e.g. changes of the state by asm code.

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/irqflags.h | 14 ++++++++++++++
 kernel/dependency/dept.c |  1 +
 2 files changed, 15 insertions(+)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index d8b9cf093f83..586f5bad4da7 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -214,6 +214,13 @@ extern void warn_bogus_irq_restore(void);
 		raw_local_irq_disable();		\
 		if (!was_disabled)			\
 			trace_hardirqs_off();		\
+		/*					\
+		 * Just in case that C code has missed	\
+		 * trace_hardirqs_off() at the first	\
+		 * place e.g. disabling irq at asm code.\
+		 */					\
+		else					\
+			dept_hardirqs_off();		\
 	} while (0)
 
 #define local_irq_save(flags)				\
@@ -221,6 +228,13 @@ extern void warn_bogus_irq_restore(void);
 		raw_local_irq_save(flags);		\
 		if (!raw_irqs_disabled_flags(flags))	\
 			trace_hardirqs_off();		\
+		/*					\
+		 * Just in case that C code has missed	\
+		 * trace_hardirqs_off() at the first	\
+		 * place e.g. disabling irq at asm code.\
+		 */					\
+		else					\
+			dept_hardirqs_off();		\
 	} while (0)
 
 #define local_irq_restore(flags)			\
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 0f4464657288..a17b185d6a6a 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -2248,6 +2248,7 @@ void noinstr dept_hardirqs_off(void)
 	 */
 	dept_task()->hardirqs_enabled = false;
 }
+EXPORT_SYMBOL_GPL(dept_hardirqs_off);
 
 void noinstr dept_update_cxt(void)
 {
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 41/47] dept: implement a basic unit test for dept
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

Implement CONFIG_DEPT_UNIT_TEST introducing a kernel module that runs
basic unit test for dept.

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/dept_unit_test.h     |  67 +++++++++++
 kernel/dependency/Makefile         |   1 +
 kernel/dependency/dept.c           |  12 ++
 kernel/dependency/dept_unit_test.c | 173 +++++++++++++++++++++++++++++
 lib/Kconfig.debug                  |  12 ++
 5 files changed, 265 insertions(+)
 create mode 100644 include/linux/dept_unit_test.h
 create mode 100644 kernel/dependency/dept_unit_test.c

diff --git a/include/linux/dept_unit_test.h b/include/linux/dept_unit_test.h
new file mode 100644
index 000000000000..7612b4e97e69
--- /dev/null
+++ b/include/linux/dept_unit_test.h
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * DEPT unit test
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2025 SK hynix, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_UNIT_TEST_H
+#define __LINUX_DEPT_UNIT_TEST_H
+
+#if defined(CONFIG_DEPT_UNIT_TEST) || defined(CONFIG_DEPT_UNIT_TEST_MODULE)
+struct dept_ut {
+	bool circle_detected;
+	bool recover_circle_detected;
+
+	int ecxt_stack_total_cnt;
+	int wait_stack_total_cnt;
+	int evnt_stack_total_cnt;
+	int ecxt_stack_valid_cnt;
+	int wait_stack_valid_cnt;
+	int evnt_stack_valid_cnt;
+};
+
+extern struct dept_ut dept_ut_results;
+
+static inline void dept_ut_circle_detect(void)
+{
+	dept_ut_results.circle_detected = true;
+}
+static inline void dept_ut_recover_circle_detect(void)
+{
+	dept_ut_results.recover_circle_detected = true;
+}
+static inline void dept_ut_ecxt_stack_account(bool valid)
+{
+	dept_ut_results.ecxt_stack_total_cnt++;
+
+	if (valid)
+		dept_ut_results.ecxt_stack_valid_cnt++;
+}
+static inline void dept_ut_wait_stack_account(bool valid)
+{
+	dept_ut_results.wait_stack_total_cnt++;
+
+	if (valid)
+		dept_ut_results.wait_stack_valid_cnt++;
+}
+static inline void dept_ut_evnt_stack_account(bool valid)
+{
+	dept_ut_results.evnt_stack_total_cnt++;
+
+	if (valid)
+		dept_ut_results.evnt_stack_valid_cnt++;
+}
+#else
+struct dept_ut {};
+
+#define dept_ut_circle_detect() do { } while (0)
+#define dept_ut_recover_circle_detect() do { } while (0)
+#define dept_ut_ecxt_stack_account(v) do { } while (0)
+#define dept_ut_wait_stack_account(v) do { } while (0)
+#define dept_ut_evnt_stack_account(v) do { } while (0)
+
+#endif
+#endif /* __LINUX_DEPT_UNIT_TEST_H */
diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
index 92f165400187..fc584ca87124 100644
--- a/kernel/dependency/Makefile
+++ b/kernel/dependency/Makefile
@@ -2,3 +2,4 @@
 
 obj-$(CONFIG_DEPT) += dept.o
 obj-$(CONFIG_DEPT) += dept_proc.o
+obj-$(CONFIG_DEPT_UNIT_TEST) += dept_unit_test.o
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 3c3ec2701bd6..0f4464657288 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -78,8 +78,12 @@
 #include <linux/workqueue.h>
 #include <linux/irq_work.h>
 #include <linux/vmalloc.h>
+#include <linux/dept_unit_test.h>
 #include "dept_internal.h"
 
+struct dept_ut dept_ut_results;
+EXPORT_SYMBOL_GPL(dept_ut_results);
+
 static int dept_stop;
 static int dept_per_cpu_ready;
 
@@ -826,6 +830,10 @@ static void print_dep(struct dept_dep *d)
 			pr_warn("(wait to wake up)\n");
 			print_ip_stack(0, e->ewait_stack);
 		}
+
+		dept_ut_ecxt_stack_account(valid_stack(e->ecxt_stack));
+		dept_ut_wait_stack_account(valid_stack(w->wait_stack));
+		dept_ut_evnt_stack_account(valid_stack(e->event_stack));
 	}
 }
 
@@ -920,6 +928,8 @@ static void print_circle(struct dept_class *c)
 	dump_stack();
 
 	dept_outworld_exit();
+
+	dept_ut_circle_detect();
 }
 
 /*
@@ -1021,6 +1031,8 @@ static void print_recover_circle(struct dept_event_site *es)
 	dump_stack();
 
 	dept_outworld_exit();
+
+	dept_ut_recover_circle_detect();
 }
 
 static void bfs_init_recover(void *node, void *in, void **out)
diff --git a/kernel/dependency/dept_unit_test.c b/kernel/dependency/dept_unit_test.c
new file mode 100644
index 000000000000..88e846b9f876
--- /dev/null
+++ b/kernel/dependency/dept_unit_test.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * DEPT unit test
+ *
+ * Started by Byungchul Park <max.byungchul.park@gmail.com>:
+ *
+ *  Copyright (c) 2025 SK hynix, Inc., Byungchul Park
+ */
+
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/mutex.h>
+#include <linux/dept.h>
+#include <linux/dept_unit_test.h>
+
+MODULE_DESCRIPTION("DEPT unit test");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Byungchul Park <max.byungchul.park@sk.com>");
+
+struct unit {
+	const char *name;
+	bool (*func)(void);
+	bool result;
+};
+
+static DEFINE_SPINLOCK(s1);
+static DEFINE_SPINLOCK(s2);
+static bool test_spin_lock_deadlock(void)
+{
+	dept_ut_results.circle_detected = false;
+
+	spin_lock(&s1);
+	spin_lock(&s2);
+	spin_unlock(&s2);
+	spin_unlock(&s1);
+
+	spin_lock(&s2);
+	spin_lock(&s1);
+	spin_unlock(&s1);
+	spin_unlock(&s2);
+
+	return dept_ut_results.circle_detected;
+}
+
+static DEFINE_MUTEX(m1);
+static DEFINE_MUTEX(m2);
+static bool test_mutex_lock_deadlock(void)
+{
+	dept_ut_results.circle_detected = false;
+
+	mutex_lock(&m1);
+	mutex_lock(&m2);
+	mutex_unlock(&m2);
+	mutex_unlock(&m1);
+
+	mutex_lock(&m2);
+	mutex_lock(&m1);
+	mutex_unlock(&m1);
+	mutex_unlock(&m2);
+
+	return dept_ut_results.circle_detected;
+}
+
+static bool test_wait_event_deadlock(void)
+{
+	struct dept_map dmap1;
+	struct dept_map dmap2;
+
+	sdt_map_init(&dmap1);
+	sdt_map_init(&dmap2);
+
+	dept_ut_results.circle_detected = false;
+
+	sdt_request_event(&dmap1); /* [S] */
+	sdt_wait(&dmap2); /* [W] */
+	sdt_event(&dmap1); /* [E] */
+
+	sdt_request_event(&dmap2); /* [S] */
+	sdt_wait(&dmap1); /* [W] */
+	sdt_event(&dmap2); /* [E] */
+
+	return dept_ut_results.circle_detected;
+}
+
+static void dummy_event(void)
+{
+	/* Do nothing. */
+}
+
+static DEFINE_DEPT_EVENT_SITE(es1);
+static DEFINE_DEPT_EVENT_SITE(es2);
+static bool test_recover_deadlock(void)
+{
+	dept_ut_results.recover_circle_detected = false;
+
+	dept_recover_event(&es1, &es2);
+	dept_recover_event(&es2, &es1);
+
+	event_site(&es1, dummy_event);
+	event_site(&es2, dummy_event);
+
+	return dept_ut_results.recover_circle_detected;
+}
+
+static struct unit units[] = {
+	{
+		.name = "spin lock deadlock test",
+		.func = test_spin_lock_deadlock,
+	},
+	{
+		.name = "mutex lock deadlock test",
+		.func = test_mutex_lock_deadlock,
+	},
+	{
+		.name = "wait event deadlock test",
+		.func = test_wait_event_deadlock,
+	},
+	{
+		.name = "event recover deadlock test",
+		.func = test_recover_deadlock,
+	},
+};
+
+static int __init dept_ut_init(void)
+{
+	int i;
+
+	lockdep_off();
+
+	dept_ut_results.ecxt_stack_valid_cnt = 0;
+	dept_ut_results.ecxt_stack_total_cnt = 0;
+	dept_ut_results.wait_stack_valid_cnt = 0;
+	dept_ut_results.wait_stack_total_cnt = 0;
+	dept_ut_results.evnt_stack_valid_cnt = 0;
+	dept_ut_results.evnt_stack_total_cnt = 0;
+
+	for (i = 0; i < ARRAY_SIZE(units); i++)
+		units[i].result = units[i].func();
+
+	pr_info("\n");
+	pr_info("******************************************\n");
+	pr_info("DEPT unit test results\n");
+	pr_info("******************************************\n");
+	for (i = 0; i < ARRAY_SIZE(units); i++) {
+		pr_info("(%s) %s\n", units[i].result ? "pass" : "fail",
+				units[i].name);
+	}
+	pr_info("ecxt stack valid count = %d/%d\n",
+			dept_ut_results.ecxt_stack_valid_cnt,
+			dept_ut_results.ecxt_stack_total_cnt);
+	pr_info("wait stack valid count = %d/%d\n",
+			dept_ut_results.wait_stack_valid_cnt,
+			dept_ut_results.wait_stack_total_cnt);
+	pr_info("event stack valid count = %d/%d\n",
+			dept_ut_results.evnt_stack_valid_cnt,
+			dept_ut_results.evnt_stack_total_cnt);
+	pr_info("******************************************\n");
+	pr_info("\n");
+
+	lockdep_on();
+
+	return 0;
+}
+
+static void dept_ut_cleanup(void)
+{
+	/*
+	 * Do nothing for now.
+	 */
+}
+
+module_init(dept_ut_init);
+module_exit(dept_ut_cleanup);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 290563fa8b58..f0c58bee263a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1404,6 +1404,18 @@ config DEPT_AGGRESSIVE_TIMEOUT_WAIT
 	  that timeout is used to avoid a deadlock. Say N if you'd like
 	  to avoid verbose reports.
 
+config DEPT_UNIT_TEST
+	tristate "unit test for DEPT"
+	depends on DEBUG_KERNEL && DEPT
+	default n
+	help
+	  This option provides a kernel module that runs unit test for
+	  DEPT.
+
+	  Say Y if you want DEPT unit test to be built into the kernel.
+	  Say M if you want DEPT unit test to build as a module.
+	  Say N if you are unsure.
+
 config LOCK_DEBUGGING_SUPPORT
 	bool
 	depends on TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 39/47] dept: add module support for struct dept_event_site and dept_event_site_dep
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

struct dept_event_site and struct dept_event_site_dep have been
introduced to track dependencies between multi event sites for a single
wait, that will be loaded to data segment.  Plus, a custom section,
'.dept.event_sites', also has been introduced to keep pointers to the
objects to make sure all the event sites defined exist in code.

dept should work with the section and segment of module.  Add the
support to handle the section and segment properly whenever modules are
loaded and unloaded.

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/dept.h     | 14 +++++++
 include/linux/module.h   |  5 +++
 kernel/dependency/dept.c | 79 +++++++++++++++++++++++++++++++++++-----
 kernel/module/main.c     | 15 ++++++++
 4 files changed, 103 insertions(+), 10 deletions(-)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index 988aceee36ad..25fdd324614a 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -414,6 +414,11 @@ struct dept_event_site {
 	struct dept_event_site		*bfs_parent;
 	struct list_head		bfs_node;
 
+	/*
+	 * for linking all dept_event_site's
+	 */
+	struct list_head		all_node;
+
 	/*
 	 * flag indicating the event is not only declared but also
 	 * actually used in code
@@ -430,6 +435,11 @@ struct dept_event_site_dep {
 	 */
 	struct list_head		dep_node;
 	struct list_head		dep_rev_node;
+
+	/*
+	 * for linking all dept_event_site_dep's
+	 */
+	struct list_head		all_node;
 };
 
 #define DEPT_EVENT_SITE_INITIALIZER(es)					\
@@ -441,6 +451,7 @@ struct dept_event_site_dep {
 	.bfs_gen = 0,							\
 	.bfs_parent = NULL,						\
 	.bfs_node = LIST_HEAD_INIT((es).bfs_node),			\
+	.all_node = LIST_HEAD_INIT((es).all_node),			\
 	.used = false,							\
 }
 
@@ -450,6 +461,7 @@ struct dept_event_site_dep {
 	.recover_site = NULL,						\
 	.dep_node = LIST_HEAD_INIT((esd).dep_node),			\
 	.dep_rev_node = LIST_HEAD_INIT((esd).dep_rev_node),		\
+	.all_node = LIST_HEAD_INIT((esd).all_node),			\
 }
 
 struct dept_event_site_init {
@@ -473,6 +485,7 @@ extern void dept_init(void);
 extern void dept_task_init(struct task_struct *t);
 extern void dept_task_exit(struct task_struct *t);
 extern void dept_free_range(void *start, unsigned int sz);
+extern void dept_mark_event_site_used(void *start, void *end);
 
 extern void dept_map_init(struct dept_map *m, struct dept_key *k, int sub_u, const char *n);
 extern void dept_map_reinit(struct dept_map *m, struct dept_key *k, int sub_u, const char *n);
@@ -536,6 +549,7 @@ struct dept_event_site { };
 #define dept_task_init(t)				do { } while (0)
 #define dept_task_exit(t)				do { } while (0)
 #define dept_free_range(s, sz)				do { } while (0)
+#define dept_mark_event_site_used(s, e)			do { } while (0)
 
 #define dept_map_init(m, k, su, n)			do { (void)(n); (void)(k); } while (0)
 #define dept_map_reinit(m, k, su, n)			do { (void)(n); (void)(k); } while (0)
diff --git a/include/linux/module.h b/include/linux/module.h
index 3319a5269d28..4f360c7c9e96 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -29,6 +29,7 @@
 #include <linux/srcu.h>
 #include <linux/static_call_types.h>
 #include <linux/dynamic_debug.h>
+#include <linux/dept.h>
 
 #include <linux/percpu.h>
 #include <asm/module.h>
@@ -579,6 +580,10 @@ struct module {
 #ifdef CONFIG_DYNAMIC_DEBUG_CORE
 	struct _ddebug_info dyndbg_info;
 #endif
+#ifdef CONFIG_DEPT
+	struct dept_event_site **dept_event_sites;
+	unsigned int num_dept_event_sites;
+#endif
 } ____cacheline_aligned __randomize_layout;
 #ifndef MODULE_ARCH_INIT
 #define MODULE_ARCH_INIT {}
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index b14400c4f83b..07d883579269 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -984,6 +984,9 @@ static void bfs(void *root, struct bfs_ops *ops, void *in, void **out)
  * event sites.
  */
 
+static LIST_HEAD(dept_event_sites);
+static LIST_HEAD(dept_event_site_deps);
+
 /*
  * Print all events in the circle.
  */
@@ -2043,6 +2046,33 @@ static void del_dep_rcu(struct rcu_head *rh)
 	preempt_enable();
 }
 
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void disconnect_event_site_dep(struct dept_event_site_dep *esd)
+{
+	list_del_rcu(&esd->dep_node);
+	list_del_rcu(&esd->dep_rev_node);
+}
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void disconnect_event_site(struct dept_event_site *es)
+{
+	struct dept_event_site_dep *esd, *next_esd;
+
+	list_for_each_entry_safe(esd, next_esd, &es->dep_head, dep_node) {
+		list_del_rcu(&esd->dep_node);
+		list_del_rcu(&esd->dep_rev_node);
+	}
+
+	list_for_each_entry_safe(esd, next_esd, &es->dep_rev_head, dep_rev_node) {
+		list_del_rcu(&esd->dep_node);
+		list_del_rcu(&esd->dep_rev_node);
+	}
+}
+
 /*
  * NOTE: Must be called with dept_lock held.
  */
@@ -2384,6 +2414,8 @@ void dept_free_range(void *start, unsigned int sz)
 {
 	struct dept_task *dt = dept_task();
 	struct dept_class *c, *n;
+	struct dept_event_site_dep *esd, *next_esd;
+	struct dept_event_site *es, *next_es;
 	unsigned long flags;
 
 	if (unlikely(!dept_working()))
@@ -2405,6 +2437,24 @@ void dept_free_range(void *start, unsigned int sz)
 	while (unlikely(!dept_lock()))
 		cpu_relax();
 
+	list_for_each_entry_safe(esd, next_esd, &dept_event_site_deps, all_node) {
+		if (!within((void *)esd, start, sz))
+			continue;
+
+		disconnect_event_site_dep(esd);
+		list_del(&esd->all_node);
+	}
+
+	list_for_each_entry_safe(es, next_es, &dept_event_sites, all_node) {
+		if (!within((void *)es, start, sz) &&
+		    !within(es->name, start, sz) &&
+		    !within(es->func_name, start, sz))
+			continue;
+
+		disconnect_event_site(es);
+		list_del(&es->all_node);
+	}
+
 	list_for_each_entry_safe(c, n, &dept_classes, all_node) {
 		if (!within((void *)c->key, start, sz) &&
 		    !within(c->name, start, sz))
@@ -3337,6 +3387,7 @@ void __dept_recover_event(struct dept_event_site_dep *esd,
 
 	list_add(&esd->dep_node, &es->dep_head);
 	list_add(&esd->dep_rev_node, &rs->dep_rev_head);
+	list_add(&esd->all_node, &dept_event_site_deps);
 	check_recover_dl_bfs(esd);
 unlock:
 	dept_unlock();
@@ -3347,6 +3398,23 @@ EXPORT_SYMBOL_GPL(__dept_recover_event);
 
 #define B2KB(B) ((B) / 1024)
 
+void dept_mark_event_site_used(void *start, void *end)
+{
+	struct dept_event_site_init **evtinitpp;
+
+	for (evtinitpp = (struct dept_event_site_init **)start;
+	     evtinitpp < (struct dept_event_site_init **)end;
+	     evtinitpp++) {
+		(*evtinitpp)->evt_site->used = true;
+		(*evtinitpp)->evt_site->func_name = (*evtinitpp)->func_name;
+		list_add(&(*evtinitpp)->evt_site->all_node, &dept_event_sites);
+
+		pr_info("dept_event_site %s@%s is initialized.\n",
+				(*evtinitpp)->evt_site->name,
+				(*evtinitpp)->evt_site->func_name);
+	}
+}
+
 extern char __dept_event_sites_start[], __dept_event_sites_end[];
 
 /*
@@ -3356,20 +3424,11 @@ extern char __dept_event_sites_start[], __dept_event_sites_end[];
 void __init dept_init(void)
 {
 	size_t mem_total = 0;
-	struct dept_event_site_init **evtinitpp;
 
 	/*
 	 * dept recover dependency tracking works from now on.
 	 */
-	for (evtinitpp = (struct dept_event_site_init **)__dept_event_sites_start;
-	     evtinitpp < (struct dept_event_site_init **)__dept_event_sites_end;
-	     evtinitpp++) {
-		(*evtinitpp)->evt_site->used = true;
-		(*evtinitpp)->evt_site->func_name = (*evtinitpp)->func_name;
-		pr_info("dept_event %s@%s is initialized.\n",
-				(*evtinitpp)->evt_site->name,
-				(*evtinitpp)->evt_site->func_name);
-	}
+	dept_mark_event_site_used(__dept_event_sites_start, __dept_event_sites_end);
 	dept_recover_ready = true;
 
 	local_irq_disable();
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 6ad78f0a58b6..fe0b62a45ed2 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2720,6 +2720,11 @@ static int find_module_sections(struct module *mod, struct load_info *info)
 						&mod->dyndbg_info.num_classes);
 #endif
 
+#ifdef CONFIG_DEPT
+	mod->dept_event_sites = section_objs(info, ".dept.event_sites",
+					sizeof(*mod->dept_event_sites),
+					&mod->num_dept_event_sites);
+#endif
 	return 0;
 }
 
@@ -3346,6 +3351,14 @@ static int early_mod_check(struct load_info *info, int flags)
 	return err;
 }
 
+static void dept_mark_event_site_used_module(struct module *mod)
+{
+#ifdef CONFIG_DEPT
+	dept_mark_event_site_used(mod->dept_event_sites,
+			     mod->dept_event_sites + mod->num_dept_event_sites);
+#endif
+}
+
 /*
  * Allocate and load the module: note that size of section 0 is always
  * zero, and we rely on this for optional sections.
@@ -3508,6 +3521,8 @@ static int load_module(struct load_info *info, const char __user *uargs,
 	/* Done! */
 	trace_module_load(mod);
 
+	dept_mark_event_site_used_module(mod);
+
 	return do_init_module(mod);
 
  sysfs_cleanup:
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 38/47] dept: introduce a new type of dependency tracking between multi event sites
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

It's worth reporting wait-event circular dependency even if it doesn't
lead to an actual deadlock, because it's a good information about a
circular dependency anyway.  However, it should be suppressed once
turning out it doesn't lead an actual deadlock, for instance, there are
other wake-up(or event) paths.

The report needs to be suppressed by annotating that an event can be
recovered by other sites triggering the desired wake-up, using a newly
introduced API, dept_recover_event() specifying an event site and its
recover site.

By the introduction, need of a new type of dependency tracking arises
since a loop of recover dependency could trigger another type of
deadlock.  So implement a logic to track the new type of dependency
between multi event sites for a single wait.

Lastly, to make sure that recover sites must be used in code, introduce
a section '.dept.event_sites' to mark it as 'used' only if used in code,
and warn it if dept_recover_event()s are annotated with recover sites,
not used in code.

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/asm-generic/vmlinux.lds.h |  13 +-
 include/linux/dept.h              |  91 ++++++++++++++
 kernel/dependency/dept.c          | 196 ++++++++++++++++++++++++++++++
 3 files changed, 299 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index ae2d2359b79e..704bb47ed843 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -700,6 +700,16 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
 #define KERNEL_CTORS()
 #endif
 
+#ifdef CONFIG_DEPT
+#define DEPT_EVNET_SITES_USED()						\
+	. = ALIGN(8);							\
+	__dept_event_sites_start = .;					\
+	KEEP(*(.dept.event_sites))					\
+	__dept_event_sites_end = .;
+#else
+#define DEPT_EVNET_SITES_USED()
+#endif
+
 /* init and exit section handling */
 #define INIT_DATA							\
 	KEEP(*(SORT(___kentry+*)))					\
@@ -724,7 +734,8 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
 	EARLYCON_TABLE()						\
 	LSM_TABLE()							\
 	EARLY_LSM_TABLE()						\
-	KUNIT_INIT_TABLE()
+	KUNIT_INIT_TABLE()						\
+	DEPT_EVNET_SITES_USED()
 
 #define INIT_TEXT							\
 	*(.init.text .init.text.*)					\
diff --git a/include/linux/dept.h b/include/linux/dept.h
index b164f74e86e5..988aceee36ad 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -390,6 +390,82 @@ struct dept_ext_wgen {
 	unsigned int wgen;
 };
 
+struct dept_event_site {
+	/*
+	 * event site name
+	 */
+	const char			*name;
+
+	/*
+	 * function name where the event is triggered in
+	 */
+	const char			*func_name;
+
+	/*
+	 * for associating its recover dependencies
+	 */
+	struct list_head		dep_head;
+	struct list_head		dep_rev_head;
+
+	/*
+	 * for BFS
+	 */
+	unsigned int			bfs_gen;
+	struct dept_event_site		*bfs_parent;
+	struct list_head		bfs_node;
+
+	/*
+	 * flag indicating the event is not only declared but also
+	 * actually used in code
+	 */
+	bool				used;
+};
+
+struct dept_event_site_dep {
+	struct dept_event_site		*evt_site;
+	struct dept_event_site		*recover_site;
+
+	/*
+	 * for linking to dept_event objects
+	 */
+	struct list_head		dep_node;
+	struct list_head		dep_rev_node;
+};
+
+#define DEPT_EVENT_SITE_INITIALIZER(es)					\
+{									\
+	.name = #es,							\
+	.func_name = NULL,						\
+	.dep_head = LIST_HEAD_INIT((es).dep_head),			\
+	.dep_rev_head = LIST_HEAD_INIT((es).dep_rev_head),		\
+	.bfs_gen = 0,							\
+	.bfs_parent = NULL,						\
+	.bfs_node = LIST_HEAD_INIT((es).bfs_node),			\
+	.used = false,							\
+}
+
+#define DEPT_EVENT_SITE_DEP_INITIALIZER(esd)				\
+{									\
+	.evt_site = NULL,						\
+	.recover_site = NULL,						\
+	.dep_node = LIST_HEAD_INIT((esd).dep_node),			\
+	.dep_rev_node = LIST_HEAD_INIT((esd).dep_rev_node),		\
+}
+
+struct dept_event_site_init {
+	struct dept_event_site *evt_site;
+	const char *func_name;
+};
+
+#define dept_event_site_used(es)					\
+do {									\
+	static struct dept_event_site_init _evtinit __initdata =	\
+		{ .evt_site = (es), .func_name = __func__ };		\
+	static struct dept_event_site_init *_evtinitp __used		\
+		__attribute__((__section__(".dept.event_sites"))) =	\
+		&_evtinit;						\
+} while (0)
+
 extern void dept_stop_emerg(void);
 extern void dept_on(void);
 extern void dept_off(void);
@@ -427,6 +503,14 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
 extern void dept_key_init(struct dept_key *k);
 extern void dept_key_destroy(struct dept_key *k);
 extern void dept_map_ecxt_modify(struct dept_map *m, unsigned long e_f, struct dept_key *new_k, unsigned long new_e_f, unsigned long new_ip, const char *new_c_fn, const char *new_e_fn, int new_sub_l);
+extern void __dept_recover_event(struct dept_event_site_dep *esd, struct dept_event_site *es, struct dept_event_site *rs);
+
+#define dept_recover_event(es, rs)					\
+do {									\
+	static struct dept_event_site_dep _esd = DEPT_EVENT_SITE_DEP_INITIALIZER(_esd);\
+									\
+	__dept_recover_event(&_esd, es, rs);				\
+} while (0)
 
 extern void dept_softirq_enter(void);
 extern void dept_hardirq_enter(void);
@@ -440,8 +524,10 @@ extern void dept_hardirqs_off(void);
 struct dept_key { };
 struct dept_map { };
 struct dept_ext_wgen { };
+struct dept_event_site { };
 
 #define DEPT_MAP_INITIALIZER(n, k) { }
+#define DEPT_EVENT_SITE_INITIALIZER(es) { }
 
 #define dept_stop_emerg()				do { } while (0)
 #define dept_on()					do { } while (0)
@@ -472,6 +558,7 @@ struct dept_ext_wgen { };
 #define dept_key_init(k)				do { (void)(k); } while (0)
 #define dept_key_destroy(k)				do { (void)(k); } while (0)
 #define dept_map_ecxt_modify(m, e_f, n_k, n_e_f, n_ip, n_c_fn, n_e_fn, n_sl) do { (void)(n_k); (void)(n_c_fn); (void)(n_e_fn); } while (0)
+#define dept_recover_event(es, rs)			do { } while (0)
 
 #define dept_softirq_enter()				do { } while (0)
 #define dept_hardirq_enter()				do { } while (0)
@@ -482,4 +569,8 @@ struct dept_ext_wgen { };
 
 #define dept_set_lockdep_map(m, lockdep_m)		do { } while (0)
 #endif
+
+#define DECLARE_DEPT_EVENT_SITE(es) extern struct dept_event_site (es)
+#define DEFINE_DEPT_EVENT_SITE(es) struct dept_event_site (es) = DEPT_EVENT_SITE_INITIALIZER(es)
+
 #endif /* __LINUX_DEPT_H */
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 1de61306418b..b14400c4f83b 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -973,6 +973,117 @@ static void bfs(void *root, struct bfs_ops *ops, void *in, void **out)
 	}
 }
 
+/*
+ * Recover dependency between event sites
+ * =====================================================================
+ * Even though an event is in a chain of wait-event circular dependency,
+ * the corresponding wait might be woken up by another site triggering
+ * the desired event.  To reflect that, dept allows to annotate the
+ * recover relationship between event sites using __dept_recover_event().
+ * However, that requires to track a new type of dependency between the
+ * event sites.
+ */
+
+/*
+ * Print all events in the circle.
+ */
+static void print_recover_circle(struct dept_event_site *es)
+{
+	struct dept_event_site *from = es->bfs_parent;
+	struct dept_event_site *to = es;
+
+	dept_outworld_enter();
+
+	pr_warn("===================================================\n");
+	pr_warn("DEPT: Circular recover dependency has been detected.\n");
+	pr_warn("%s %.*s %s\n", init_utsname()->release,
+		(int)strcspn(init_utsname()->version, " "),
+		init_utsname()->version,
+		print_tainted());
+	pr_warn("---------------------------------------------------\n");
+
+	do {
+		print_spc(1, "event site(%s@%s)\n", from->name, from->func_name);
+		print_spc(1, "-> event site(%s@%s)\n", to->name, to->func_name);
+		to = from;
+		from = from->bfs_parent;
+
+		if (to != es)
+			pr_warn("\n");
+	} while (to != es);
+
+	pr_warn("---------------------------------------------------\n");
+	pr_warn("information that might be helpful\n");
+	pr_warn("---------------------------------------------------\n");
+	dump_stack();
+
+	dept_outworld_exit();
+}
+
+static void bfs_init_recover(void *node, void *in, void **out)
+{
+	struct dept_event_site *root = (struct dept_event_site *)node;
+	struct dept_event_site_dep *new = (struct dept_event_site_dep *)in;
+
+	root->bfs_gen = bfs_gen;
+	new->recover_site->bfs_parent = new->evt_site;
+}
+
+static void bfs_extend_recover(struct list_head *h, void *node)
+{
+	struct dept_event_site *cur = (struct dept_event_site *)node;
+	struct dept_event_site_dep *esd;
+
+	list_for_each_entry(esd, &cur->dep_head, dep_node) {
+		struct dept_event_site *next = esd->recover_site;
+
+		if (bfs_gen == next->bfs_gen)
+			continue;
+		next->bfs_parent = cur;
+		next->bfs_gen = bfs_gen;
+		list_add_tail(&next->bfs_node, h);
+	}
+}
+
+static void *bfs_dequeue_recover(struct list_head *h)
+{
+	struct dept_event_site *es;
+
+	DEPT_WARN_ON(list_empty(h));
+
+	es = list_first_entry(h, struct dept_event_site, bfs_node);
+	list_del(&es->bfs_node);
+	return es;
+}
+
+static enum bfs_ret cb_check_recover_dl(void *node, void *in, void **out)
+{
+	struct dept_event_site *cur = (struct dept_event_site *)node;
+	struct dept_event_site_dep *new = (struct dept_event_site_dep *)in;
+
+	if (cur == new->evt_site) {
+		print_recover_circle(new->recover_site);
+		return BFS_DONE;
+	}
+
+	return BFS_CONTINUE;
+}
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void check_recover_dl_bfs(struct dept_event_site_dep *esd)
+{
+	struct bfs_ops ops = {
+		.bfs_init = bfs_init_recover,
+		.extend = bfs_extend_recover,
+		.dequeue = bfs_dequeue_recover,
+		.callback = cb_check_recover_dl,
+	};
+
+	bfs((void *)esd->recover_site, &ops, (void *)esd, NULL);
+}
+
 /*
  * Main operations
  * =====================================================================
@@ -3166,8 +3277,78 @@ static void migrate_per_cpu_pool(void)
 	}
 }
 
+static bool dept_recover_ready;
+
+void __dept_recover_event(struct dept_event_site_dep *esd,
+		struct dept_event_site *es, struct dept_event_site *rs)
+{
+	struct dept_task *dt = dept_task();
+	unsigned long flags;
+
+	if (unlikely(!dept_working()))
+		return;
+
+	if (dt->recursive)
+		return;
+
+	if (!esd || !es || !rs) {
+		DEPT_WARN_ONCE("All the parameters should be !NULL.\n");
+		return;
+	}
+
+	/*
+	 * Check locklessly if another already has done it for us.
+	 */
+	if (READ_ONCE(esd->evt_site))
+		return;
+
+	if (!dept_recover_ready) {
+		DEPT_WARN("Should be called once dept_recover_ready.\n");
+		return;
+	}
+
+	flags = dept_enter();
+	if (unlikely(!dept_lock()))
+		goto exit;
+
+	/*
+	 * Check if another already has done it for us with lock held.
+	 */
+	if (esd->evt_site)
+		goto unlock;
+
+	/*
+	 * Can be used as an indicator of whether this
+	 * __dept_recover_event() has been processed or not as well as
+	 * for storing its associated events.
+	 */
+	WRITE_ONCE(esd->evt_site, es);
+	esd->recover_site = rs;
+
+	if (!es->used || !rs->used) {
+		if (!es->used)
+			DEPT_INFO("dept_event_site %s has never been used.\n", es->name);
+		if (!rs->used)
+			DEPT_INFO("dept_event_site %s has never been used.\n", rs->name);
+
+		DEPT_WARN("Cannot track recover dependency with events that never used.\n");
+		goto unlock;
+	}
+
+	list_add(&esd->dep_node, &es->dep_head);
+	list_add(&esd->dep_rev_node, &rs->dep_rev_head);
+	check_recover_dl_bfs(esd);
+unlock:
+	dept_unlock();
+exit:
+	dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(__dept_recover_event);
+
 #define B2KB(B) ((B) / 1024)
 
+extern char __dept_event_sites_start[], __dept_event_sites_end[];
+
 /*
  * Should be called after setup_per_cpu_areas() and before no non-boot
  * CPUs have been on.
@@ -3175,6 +3356,21 @@ static void migrate_per_cpu_pool(void)
 void __init dept_init(void)
 {
 	size_t mem_total = 0;
+	struct dept_event_site_init **evtinitpp;
+
+	/*
+	 * dept recover dependency tracking works from now on.
+	 */
+	for (evtinitpp = (struct dept_event_site_init **)__dept_event_sites_start;
+	     evtinitpp < (struct dept_event_site_init **)__dept_event_sites_end;
+	     evtinitpp++) {
+		(*evtinitpp)->evt_site->used = true;
+		(*evtinitpp)->evt_site->func_name = (*evtinitpp)->func_name;
+		pr_info("dept_event %s@%s is initialized.\n",
+				(*evtinitpp)->evt_site->name,
+				(*evtinitpp)->evt_site->func_name);
+	}
+	dept_recover_ready = true;
 
 	local_irq_disable();
 	dept_per_cpu_ready = 1;
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 40/47] dept: introduce event_site() to disable event tracking if it's recoverable
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

With multi event sites for a single wait, dept allows to skip tracking
an event that is recoverable by other recover paths.

Introduce an API, event_site(), to skip tracking the event in the case.

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/dept.h     | 30 ++++++++++++++++++++++++++++++
 include/linux/sched.h    |  6 ++++++
 kernel/dependency/dept.c | 20 ++++++++++++++++++++
 3 files changed, 56 insertions(+)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index 25fdd324614a..0ac13129f308 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -487,6 +487,31 @@ extern void dept_task_exit(struct task_struct *t);
 extern void dept_free_range(void *start, unsigned int sz);
 extern void dept_mark_event_site_used(void *start, void *end);
 
+extern void disable_event_track(void);
+extern void enable_event_track(void);
+
+#define event_site(es, evt_func, ...)					\
+do {									\
+	unsigned long _flags;						\
+	bool _disable;							\
+									\
+	local_irq_save(_flags);						\
+	dept_event_site_used(es);					\
+	/*								\
+	 * If !list_empty(&(es)->dept_head), the event site can be	\
+	 * recovered by others.  Do not track event dependency if so.	\
+	 */								\
+	_disable = !list_empty(&(es)->dep_head);			\
+	if (_disable)							\
+		disable_event_track();					\
+									\
+	evt_func(__VA_ARGS__);						\
+									\
+	if (_disable)							\
+		enable_event_track();					\
+	local_irq_restore(_flags);					\
+} while (0)
+
 extern void dept_map_init(struct dept_map *m, struct dept_key *k, int sub_u, const char *n);
 extern void dept_map_reinit(struct dept_map *m, struct dept_key *k, int sub_u, const char *n);
 extern void dept_ext_wgen_init(struct dept_ext_wgen *ewg);
@@ -550,6 +575,11 @@ struct dept_event_site { };
 #define dept_task_exit(t)				do { } while (0)
 #define dept_free_range(s, sz)				do { } while (0)
 #define dept_mark_event_site_used(s, e)			do { } while (0)
+#define event_site(es, evt_func, ...)					\
+do {									\
+	(void)(es);							\
+	evt_func(__VA_ARGS__);						\
+} while (0)
 
 #define dept_map_init(m, k, su, n)			do { (void)(n); (void)(k); } while (0)
 #define dept_map_reinit(m, k, su, n)			do { (void)(n); (void)(k); } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a01c10f28dfd..24a9dc9d6f5f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -876,6 +876,11 @@ struct dept_task {
 	 */
 	int				missing_ecxt;
 
+	/*
+	 * not to track events
+	 */
+	int				disable_event_track_cnt;
+
 	/*
 	 * for tracking IRQ-enable state
 	 */
@@ -913,6 +918,7 @@ struct dept_task {
 	.stage_wait_stack = NULL,				\
 	.stage_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED,\
 	.missing_ecxt = 0,					\
+	.disable_event_track_cnt = 0,				\
 	.hardirqs_enabled = false,				\
 	.softirqs_enabled = false,				\
 	.task_exit = false,					\
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 07d883579269..3c3ec2701bd6 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -2573,6 +2573,23 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
 	}
 }
 
+void disable_event_track(void)
+{
+	dept_task()->disable_event_track_cnt++;
+}
+EXPORT_SYMBOL_GPL(disable_event_track);
+
+void enable_event_track(void)
+{
+	dept_task()->disable_event_track_cnt--;
+}
+EXPORT_SYMBOL_GPL(enable_event_track);
+
+static bool event_track_disabled(void)
+{
+	return !!dept_task()->disable_event_track_cnt;
+}
+
 /*
  * Called between dept_enter() and dept_exit().
  */
@@ -2585,6 +2602,9 @@ static void __dept_event(struct dept_map *m, struct dept_map *real_m,
 	struct dept_key *k;
 	int e;
 
+	if (event_track_disabled())
+		return;
+
 	e = find_first_bit(&e_f, DEPT_MAX_SUBCLASSES_EVT);
 
 	if (DEPT_WARN_ON(e >= DEPT_MAX_SUBCLASSES_EVT))
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 37/47] completion, dept: introduce init_completion_dmap() API
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

Currently, dept uses dept's map embedded in task_struct to track
dependencies related to wait_for_completion() and its family.  So it
doesn't need an explicit map basically.

However, for those who want to set the maps with customized class or
key, introduce a new API to use external maps.

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/completion.h | 40 +++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 4d8fb1d95c0a..e50f7d9b4b97 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -27,17 +27,15 @@
 struct completion {
 	unsigned int done;
 	struct swait_queue_head wait;
+	struct dept_map *dmap;
 };
 
-#define init_completion(x)				\
-do {							\
-	__init_completion(x);				\
-} while (0)
+#define init_completion(x) init_completion_dmap(x, NULL)
 
 /*
- * XXX: No use cases for now. Fill the body when needed.
+ * XXX: This usage using lockdep's map should be deprecated.
  */
-#define init_completion_map(x, m) init_completion(x)
+#define init_completion_map(x, m) init_completion_dmap(x, NULL)
 
 static inline void complete_acquire(struct completion *x, long timeout)
 {
@@ -48,8 +46,11 @@ static inline void complete_release(struct completion *x)
 }
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait), }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait), .dmap = NULL, }
 
+/*
+ * XXX: This usage using lockdep's map should be deprecated.
+ */
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
 
@@ -90,15 +91,18 @@ static inline void complete_release(struct completion *x)
 #endif
 
 /**
- * __init_completion - Initialize a dynamically allocated completion
+ * init_completion_dmap - Initialize a dynamically allocated completion
  * @x:  pointer to completion structure that is to be initialized
+ * @dmap:  pointer to external dept's map to be used as a separated map
  *
  * This inline function will initialize a dynamically created completion
  * structure.
  */
-static inline void __init_completion(struct completion *x)
+static inline void init_completion_dmap(struct completion *x,
+		struct dept_map *dmap)
 {
 	x->done = 0;
+	x->dmap = dmap;
 	init_swait_queue_head(&x->wait);
 }
 
@@ -136,13 +140,13 @@ extern void complete_all(struct completion *);
 
 #define wait_for_completion(x)						\
 ({									\
-	sdt_might_sleep_start_timeout(NULL, -1L);			\
+	sdt_might_sleep_start_timeout((x)->dmap, -1L);			\
 	__wait_for_completion(x);					\
 	sdt_might_sleep_end();						\
 })
 #define wait_for_completion_io(x)					\
 ({									\
-	sdt_might_sleep_start_timeout(NULL, -1L);			\
+	sdt_might_sleep_start_timeout((x)->dmap, -1L);			\
 	__wait_for_completion_io(x);					\
 	sdt_might_sleep_end();						\
 })
@@ -150,7 +154,7 @@ extern void complete_all(struct completion *);
 ({									\
 	int __ret;							\
 									\
-	sdt_might_sleep_start_timeout(NULL, -1L);			\
+	sdt_might_sleep_start_timeout((x)->dmap, -1L);			\
 	__ret = __wait_for_completion_interruptible(x);			\
 	sdt_might_sleep_end();						\
 	__ret;								\
@@ -159,7 +163,7 @@ extern void complete_all(struct completion *);
 ({									\
 	int __ret;							\
 									\
-	sdt_might_sleep_start_timeout(NULL, -1L);			\
+	sdt_might_sleep_start_timeout((x)->dmap, -1L);			\
 	__ret = __wait_for_completion_killable(x);			\
 	sdt_might_sleep_end();						\
 	__ret;								\
@@ -168,7 +172,7 @@ extern void complete_all(struct completion *);
 ({									\
 	int __ret;							\
 									\
-	sdt_might_sleep_start_timeout(NULL, -1L);			\
+	sdt_might_sleep_start_timeout((x)->dmap, -1L);			\
 	__ret = __wait_for_completion_state(x, s);			\
 	sdt_might_sleep_end();						\
 	__ret;								\
@@ -177,7 +181,7 @@ extern void complete_all(struct completion *);
 ({									\
 	unsigned long __ret;						\
 									\
-	sdt_might_sleep_start_timeout(NULL, t);				\
+	sdt_might_sleep_start_timeout((x)->dmap, t);			\
 	__ret = __wait_for_completion_timeout(x, t);			\
 	sdt_might_sleep_end();						\
 	__ret;								\
@@ -186,7 +190,7 @@ extern void complete_all(struct completion *);
 ({									\
 	unsigned long __ret;						\
 									\
-	sdt_might_sleep_start_timeout(NULL, t);				\
+	sdt_might_sleep_start_timeout((x)->dmap, t);			\
 	__ret = __wait_for_completion_io_timeout(x, t);			\
 	sdt_might_sleep_end();						\
 	__ret;								\
@@ -195,7 +199,7 @@ extern void complete_all(struct completion *);
 ({									\
 	long __ret;							\
 									\
-	sdt_might_sleep_start_timeout(NULL, t);				\
+	sdt_might_sleep_start_timeout((x)->dmap, t);			\
 	__ret = __wait_for_completion_interruptible_timeout(x, t);	\
 	sdt_might_sleep_end();						\
 	__ret;								\
@@ -204,7 +208,7 @@ extern void complete_all(struct completion *);
 ({									\
 	long __ret;							\
 									\
-	sdt_might_sleep_start_timeout(NULL, t);				\
+	sdt_might_sleep_start_timeout((x)->dmap, t);			\
 	__ret = __wait_for_completion_killable_timeout(x, t);		\
 	sdt_might_sleep_end();						\
 	__ret;								\
-- 
2.17.1


^ permalink raw reply related

* [PATCH v17 36/47] dept: assign unique dept_key to each distinct wait_for_completion() caller
From: Byungchul Park @ 2025-10-02  8:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel_team, torvalds, damien.lemoal, linux-ide, adilger.kernel,
	linux-ext4, mingo, peterz, will, tglx, rostedt, joel, sashal,
	daniel.vetter, duyuyang, johannes.berg, tj, tytso, willy, david,
	amir73il, gregkh, kernel-team, linux-mm, akpm, mhocko, minchan,
	hannes, vdavydov.dev, sj, jglisse, dennis, cl, penberg, rientjes,
	vbabka, ngupta, linux-block, josef, linux-fsdevel, jack, jlayton,
	dan.j.williams, hch, djwong, dri-devel, rodrigosiqueiramelo,
	melissa.srw, hamohammed.sa, harry.yoo, chris.p.wilson,
	gwan-gyeong.mun, max.byungchul.park, boqun.feng, longman,
	yunseong.kim, ysk, yeoreum.yun, netdev, matthew.brost, her0gyugyu,
	corbet, catalin.marinas, bp, dave.hansen, x86, hpa, luto,
	sumit.semwal, gustavo, christian.koenig, andi.shyti, arnd,
	lorenzo.stoakes, Liam.Howlett, rppt, surenb, mcgrof, petr.pavlu,
	da.gomez, samitolvanen, paulmck, frederic, neeraj.upadhyay,
	joelagnelf, josh, urezki, mathieu.desnoyers, jiangshanlai,
	qiang.zhang, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, vschneid, chuck.lever, neil, okorniev, Dai.Ngo,
	tom, trondmy, anna, kees, bigeasy, clrkwllms, mark.rutland,
	ada.coupriediaz, kristina.martsenko, wangkefeng.wang, broonie,
	kevin.brodsky, dwmw, shakeel.butt, ast, ziy, yuzhao, baolin.wang,
	usamaarif642, joel.granados, richard.weiyang, geert+renesas,
	tim.c.chen, linux, alexander.shishkin, lillian, chenhuacai,
	francesco, guoweikang.kernel, link, jpoimboe, masahiroy, brauner,
	thomas.weissschuh, oleg, mjguzik, andrii, wangfushuai, linux-doc,
	linux-arm-kernel, linux-media, linaro-mm-sig, linux-i2c,
	linux-arch, linux-modules, rcu, linux-nfs, linux-rt-devel
In-Reply-To: <20251002081247.51255-1-byungchul@sk.com>

wait_for_completion() can be used at various points in the code and it's
very hard to distinguish wait_for_completion()s between different usages.
Using a single dept_key for all the wait_for_completion()s could trigger
false positive reports.

Assign unique dept_key to each distinct wait_for_completion() caller to
avoid false positive reports.

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/completion.h | 100 +++++++++++++++++++++++++++++++------
 kernel/sched/completion.c  |  60 +++++++++++-----------
 2 files changed, 115 insertions(+), 45 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 3200b741de28..4d8fb1d95c0a 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -27,12 +27,10 @@
 struct completion {
 	unsigned int done;
 	struct swait_queue_head wait;
-	struct dept_map dmap;
 };
 
 #define init_completion(x)				\
 do {							\
-	sdt_map_init(&(x)->dmap);			\
 	__init_completion(x);				\
 } while (0)
 
@@ -43,17 +41,14 @@ do {							\
 
 static inline void complete_acquire(struct completion *x, long timeout)
 {
-	sdt_might_sleep_start_timeout(&x->dmap, timeout);
 }
 
 static inline void complete_release(struct completion *x)
 {
-	sdt_might_sleep_end();
 }
 
 #define COMPLETION_INITIALIZER(work) \
-	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
-	  .dmap = DEPT_MAP_INITIALIZER(work, NULL), }
+	{ 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait), }
 
 #define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
 	(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -119,18 +114,18 @@ static inline void reinit_completion(struct completion *x)
 	x->done = 0;
 }
 
-extern void wait_for_completion(struct completion *);
-extern void wait_for_completion_io(struct completion *);
-extern int wait_for_completion_interruptible(struct completion *x);
-extern int wait_for_completion_killable(struct completion *x);
-extern int wait_for_completion_state(struct completion *x, unsigned int state);
-extern unsigned long wait_for_completion_timeout(struct completion *x,
+extern void __wait_for_completion(struct completion *);
+extern void __wait_for_completion_io(struct completion *);
+extern int __wait_for_completion_interruptible(struct completion *x);
+extern int __wait_for_completion_killable(struct completion *x);
+extern int __wait_for_completion_state(struct completion *x, unsigned int state);
+extern unsigned long __wait_for_completion_timeout(struct completion *x,
 						   unsigned long timeout);
-extern unsigned long wait_for_completion_io_timeout(struct completion *x,
+extern unsigned long __wait_for_completion_io_timeout(struct completion *x,
 						    unsigned long timeout);
-extern long wait_for_completion_interruptible_timeout(
+extern long __wait_for_completion_interruptible_timeout(
 	struct completion *x, unsigned long timeout);
-extern long wait_for_completion_killable_timeout(
+extern long __wait_for_completion_killable_timeout(
 	struct completion *x, unsigned long timeout);
 extern bool try_wait_for_completion(struct completion *x);
 extern bool completion_done(struct completion *x);
@@ -139,4 +134,79 @@ extern void complete(struct completion *);
 extern void complete_on_current_cpu(struct completion *x);
 extern void complete_all(struct completion *);
 
+#define wait_for_completion(x)						\
+({									\
+	sdt_might_sleep_start_timeout(NULL, -1L);			\
+	__wait_for_completion(x);					\
+	sdt_might_sleep_end();						\
+})
+#define wait_for_completion_io(x)					\
+({									\
+	sdt_might_sleep_start_timeout(NULL, -1L);			\
+	__wait_for_completion_io(x);					\
+	sdt_might_sleep_end();						\
+})
+#define wait_for_completion_interruptible(x)				\
+({									\
+	int __ret;							\
+									\
+	sdt_might_sleep_start_timeout(NULL, -1L);			\
+	__ret = __wait_for_completion_interruptible(x);			\
+	sdt_might_sleep_end();						\
+	__ret;								\
+})
+#define wait_for_completion_killable(x)					\
+({									\
+	int __ret;							\
+									\
+	sdt_might_sleep_start_timeout(NULL, -1L);			\
+	__ret = __wait_for_completion_killable(x);			\
+	sdt_might_sleep_end();						\
+	__ret;								\
+})
+#define wait_for_completion_state(x, s)					\
+({									\
+	int __ret;							\
+									\
+	sdt_might_sleep_start_timeout(NULL, -1L);			\
+	__ret = __wait_for_completion_state(x, s);			\
+	sdt_might_sleep_end();						\
+	__ret;								\
+})
+#define wait_for_completion_timeout(x, t)				\
+({									\
+	unsigned long __ret;						\
+									\
+	sdt_might_sleep_start_timeout(NULL, t);				\
+	__ret = __wait_for_completion_timeout(x, t);			\
+	sdt_might_sleep_end();						\
+	__ret;								\
+})
+#define wait_for_completion_io_timeout(x, t)				\
+({									\
+	unsigned long __ret;						\
+									\
+	sdt_might_sleep_start_timeout(NULL, t);				\
+	__ret = __wait_for_completion_io_timeout(x, t);			\
+	sdt_might_sleep_end();						\
+	__ret;								\
+})
+#define wait_for_completion_interruptible_timeout(x, t)			\
+({									\
+	long __ret;							\
+									\
+	sdt_might_sleep_start_timeout(NULL, t);				\
+	__ret = __wait_for_completion_interruptible_timeout(x, t);	\
+	sdt_might_sleep_end();						\
+	__ret;								\
+})
+#define wait_for_completion_killable_timeout(x, t)			\
+({									\
+	long __ret;							\
+									\
+	sdt_might_sleep_start_timeout(NULL, t);				\
+	__ret = __wait_for_completion_killable_timeout(x, t);		\
+	sdt_might_sleep_end();						\
+	__ret;								\
+})
 #endif
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 5e45a60ff7b3..7262000db114 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -4,7 +4,7 @@
  * Generic wait-for-completion handler;
  *
  * It differs from semaphores in that their default case is the opposite,
- * wait_for_completion default blocks whereas semaphore default non-block. The
+ * __wait_for_completion default blocks whereas semaphore default non-block. The
  * interface also makes it easy to 'complete' multiple waiting threads,
  * something which isn't entirely natural for semaphores.
  *
@@ -42,7 +42,7 @@ void complete_on_current_cpu(struct completion *x)
  * This will wake up a single thread waiting on this completion. Threads will be
  * awakened in the same order in which they were queued.
  *
- * See also complete_all(), wait_for_completion() and related routines.
+ * See also complete_all(), __wait_for_completion() and related routines.
  *
  * If this function wakes up a task, it executes a full memory barrier before
  * accessing the task state.
@@ -139,23 +139,23 @@ wait_for_common_io(struct completion *x, long timeout, int state)
 }
 
 /**
- * wait_for_completion: - waits for completion of a task
+ * __wait_for_completion: - waits for completion of a task
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It is NOT
  * interruptible and there is no timeout.
  *
- * See also similar routines (i.e. wait_for_completion_timeout()) with timeout
+ * See also similar routines (i.e. __wait_for_completion_timeout()) with timeout
  * and interrupt capability. Also see complete().
  */
-void __sched wait_for_completion(struct completion *x)
+void __sched __wait_for_completion(struct completion *x)
 {
 	wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion);
+EXPORT_SYMBOL(__wait_for_completion);
 
 /**
- * wait_for_completion_timeout: - waits for completion of a task (w/timeout)
+ * __wait_for_completion_timeout: - waits for completion of a task (w/timeout)
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -167,28 +167,28 @@ EXPORT_SYMBOL(wait_for_completion);
  * till timeout) if completed.
  */
 unsigned long __sched
-wait_for_completion_timeout(struct completion *x, unsigned long timeout)
+__wait_for_completion_timeout(struct completion *x, unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_timeout);
+EXPORT_SYMBOL(__wait_for_completion_timeout);
 
 /**
- * wait_for_completion_io: - waits for completion of a task
+ * __wait_for_completion_io: - waits for completion of a task
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It is NOT
  * interruptible and there is no timeout. The caller is accounted as waiting
  * for IO (which traditionally means blkio only).
  */
-void __sched wait_for_completion_io(struct completion *x)
+void __sched __wait_for_completion_io(struct completion *x)
 {
 	wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_io);
+EXPORT_SYMBOL(__wait_for_completion_io);
 
 /**
- * wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
+ * __wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -201,14 +201,14 @@ EXPORT_SYMBOL(wait_for_completion_io);
  * till timeout) if completed.
  */
 unsigned long __sched
-wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
+__wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
 {
 	return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_io_timeout);
+EXPORT_SYMBOL(__wait_for_completion_io_timeout);
 
 /**
- * wait_for_completion_interruptible: - waits for completion of a task (w/intr)
+ * __wait_for_completion_interruptible: - waits for completion of a task (w/intr)
  * @x:  holds the state of this particular completion
  *
  * This waits for completion of a specific task to be signaled. It is
@@ -216,7 +216,7 @@ EXPORT_SYMBOL(wait_for_completion_io_timeout);
  *
  * Return: -ERESTARTSYS if interrupted, 0 if completed.
  */
-int __sched wait_for_completion_interruptible(struct completion *x)
+int __sched __wait_for_completion_interruptible(struct completion *x)
 {
 	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);
 
@@ -224,10 +224,10 @@ int __sched wait_for_completion_interruptible(struct completion *x)
 		return t;
 	return 0;
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible);
+EXPORT_SYMBOL(__wait_for_completion_interruptible);
 
 /**
- * wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
+ * __wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -238,15 +238,15 @@ EXPORT_SYMBOL(wait_for_completion_interruptible);
  * or number of jiffies left till timeout) if completed.
  */
 long __sched
-wait_for_completion_interruptible_timeout(struct completion *x,
+__wait_for_completion_interruptible_timeout(struct completion *x,
 					  unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);
 }
-EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
+EXPORT_SYMBOL(__wait_for_completion_interruptible_timeout);
 
 /**
- * wait_for_completion_killable: - waits for completion of a task (killable)
+ * __wait_for_completion_killable: - waits for completion of a task (killable)
  * @x:  holds the state of this particular completion
  *
  * This waits to be signaled for completion of a specific task. It can be
@@ -254,7 +254,7 @@ EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
  *
  * Return: -ERESTARTSYS if interrupted, 0 if completed.
  */
-int __sched wait_for_completion_killable(struct completion *x)
+int __sched __wait_for_completion_killable(struct completion *x)
 {
 	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE);
 
@@ -262,9 +262,9 @@ int __sched wait_for_completion_killable(struct completion *x)
 		return t;
 	return 0;
 }
-EXPORT_SYMBOL(wait_for_completion_killable);
+EXPORT_SYMBOL(__wait_for_completion_killable);
 
-int __sched wait_for_completion_state(struct completion *x, unsigned int state)
+int __sched __wait_for_completion_state(struct completion *x, unsigned int state)
 {
 	long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, state);
 
@@ -272,10 +272,10 @@ int __sched wait_for_completion_state(struct completion *x, unsigned int state)
 		return t;
 	return 0;
 }
-EXPORT_SYMBOL(wait_for_completion_state);
+EXPORT_SYMBOL(__wait_for_completion_state);
 
 /**
- * wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
+ * __wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
  * @x:  holds the state of this particular completion
  * @timeout:  timeout value in jiffies
  *
@@ -287,12 +287,12 @@ EXPORT_SYMBOL(wait_for_completion_state);
  * or number of jiffies left till timeout) if completed.
  */
 long __sched
-wait_for_completion_killable_timeout(struct completion *x,
+__wait_for_completion_killable_timeout(struct completion *x,
 				     unsigned long timeout)
 {
 	return wait_for_common(x, timeout, TASK_KILLABLE);
 }
-EXPORT_SYMBOL(wait_for_completion_killable_timeout);
+EXPORT_SYMBOL(__wait_for_completion_killable_timeout);
 
 /**
  *	try_wait_for_completion - try to decrement a completion without blocking
@@ -334,7 +334,7 @@ EXPORT_SYMBOL(try_wait_for_completion);
  *	completion_done - Test to see if a completion has any waiters
  *	@x:	completion structure
  *
- *	Return: 0 if there are waiters (wait_for_completion() in progress)
+ *	Return: 0 if there are waiters (__wait_for_completion() in progress)
  *		 1 if there are no waiters.
  *
  *	Note, this will always return true if complete_all() was called on @X.
-- 
2.17.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox