public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Byungchul Park <byungchul@sk.com>
To: linux-kernel@vger.kernel.org
Cc: kernel_team@skhynix.com, torvalds@linux-foundation.org,
	damien.lemoal@opensource.wdc.com, linux-ide@vger.kernel.org,
	adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org,
	mingo@redhat.com, peterz@infradead.org, will@kernel.org,
	tglx@linutronix.de, rostedt@goodmis.org, joel@joelfernandes.org,
	sashal@kernel.org, daniel.vetter@ffwll.ch, duyuyang@gmail.com,
	johannes.berg@intel.com, tj@kernel.org, tytso@mit.edu,
	willy@infradead.org, david@fromorbit.com, amir73il@gmail.com,
	gregkh@linuxfoundation.org, kernel-team@lge.com,
	linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@kernel.org,
	minchan@kernel.org, hannes@cmpxchg.org, vdavydov.dev@gmail.com,
	sj@kernel.org, jglisse@redhat.com, dennis@kernel.org,
	cl@linux.com, penberg@kernel.org, rientjes@google.com,
	vbabka@suse.cz, ngupta@vflare.org, linux-block@vger.kernel.org,
	josef@toxicpanda.com, linux-fsdevel@vger.kernel.org,
	jack@suse.cz, jlayton@kernel.org, dan.j.williams@intel.com,
	hch@infradead.org, djwong@kernel.org,
	dri-devel@lists.freedesktop.org, rodrigosiqueiramelo@gmail.com,
	melissa.srw@gmail.com, hamohammed.sa@gmail.com,
	harry.yoo@oracle.com, chris.p.wilson@intel.com,
	gwan-gyeong.mun@intel.com, max.byungchul.park@gmail.com,
	boqun.feng@gmail.com, longman@redhat.com, yskelg@gmail.com,
	yunseong.kim@ericsson.com, yeoreum.yun@arm.com,
	netdev@vger.kernel.org, matthew.brost@intel.com,
	her0gyugyu@gmail.com
Subject: [PATCH v16 31/42] dept: assign dept map to mmu notifier invalidation synchronization
Date: Mon, 19 May 2025 18:18:15 +0900	[thread overview]
Message-ID: <20250519091826.19752-32-byungchul@sk.com> (raw)
In-Reply-To: <20250519091826.19752-1-byungchul@sk.com>

Resolved the following false positive by introducing explicit dept map
and annotations for dealing with this case:

   *** DEADLOCK ***
   context A
       [S] (unknown)(<sched>:0)
       [W] lock(&mm->mmap_lock:0)
       [E] try_to_wake_up(<sched>:0)

   context B
       [S] lock(&mm->mmap_lock:0)
       [W] mmu_interval_read_begin(<sched>:0)
       [E] unlock(&mm->mmap_lock:0)

   [S]: start of the event context
   [W]: the wait blocked
   [E]: the event not reachable

dept already tracks dependencies between scheduler sleep and ttwu based
on internal timestamp called wgen.  However, in case that more than one
event contexts are overwrapped, dept has chance to wrongly guess the
start of the event context like the following:

   <before this patch>

   context A: lock L
   context A: mmu_notifier_invalidate_range_start()

   context B: lock L'
   context B: mmu_interval_read_begin() : wait
   <- here is the start of the event context of C.
   context B: unlock L'

   context C: lock L''
   context C: mmu_notifier_invalidate_range_start()

   context A: mmu_notifier_invalidate_range_end()
   context A: unlock L

   context C: mmu_notifier_invalidate_range_end() : ttwu
   <- here is the end of the event context of C.  dept observes a wait,
      lock L'' within the event context of C.  Which causes a false
      positive dept report.

   context C: unlock L''

By explicitly annotating the interesting event context range, make dept
work with more precise information like:

   <after this patch>

   context A: lock L
   context A: mmu_notifier_invalidate_range_start()

   context B: lock L'
   context B: mmu_interval_read_begin() : wait
   context B: unlock L'

   context C: lock L''
   context C: mmu_notifier_invalidate_range_start()
   <- here is the start of the event context of C.

   context A: mmu_notifier_invalidate_range_end()
   context A: unlock L

   context C: mmu_notifier_invalidate_range_end() : ttwu
   <- here is the end of the event context of C.  dept doesn't observe
      the wait, lock L'' within the event context of C.  context C is
      responsible only for the range delimited by
      mmu_notifier_invalidate_range_{start,end}().

   context C: unlock L''

Signed-off-by: Byungchul Park <byungchul@sk.com>
---
 include/linux/mmu_notifier.h | 26 ++++++++++++++++++++++++++
 mm/mmu_notifier.c            | 31 +++++++++++++++++++++++++++++--
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index bc2402a45741..1e256f5305b7 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -428,6 +428,14 @@ static inline int mmu_notifier_test_young(struct mm_struct *mm,
 	return 0;
 }
 
+#ifdef CONFIG_DEPT
+void mmu_notifier_invalidate_dept_ecxt_start(struct mmu_notifier_range *range);
+void mmu_notifier_invalidate_dept_ecxt_end(struct mmu_notifier_range *range);
+#else
+static inline void mmu_notifier_invalidate_dept_ecxt_start(struct mmu_notifier_range *range) {}
+static inline void mmu_notifier_invalidate_dept_ecxt_end(struct mmu_notifier_range *range) {}
+#endif
+
 static inline void
 mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
 {
@@ -439,6 +447,12 @@ mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
 		__mmu_notifier_invalidate_range_start(range);
 	}
 	lock_map_release(&__mmu_notifier_invalidate_range_start_map);
+
+	/*
+	 * From now on, waiters could be there by this start until
+	 * mmu_notifier_invalidate_range_end().
+	 */
+	mmu_notifier_invalidate_dept_ecxt_start(range);
 }
 
 /*
@@ -459,6 +473,12 @@ mmu_notifier_invalidate_range_start_nonblock(struct mmu_notifier_range *range)
 		ret = __mmu_notifier_invalidate_range_start(range);
 	}
 	lock_map_release(&__mmu_notifier_invalidate_range_start_map);
+
+	/*
+	 * From now on, waiters could be there by this start until
+	 * mmu_notifier_invalidate_range_end().
+	 */
+	mmu_notifier_invalidate_dept_ecxt_start(range);
 	return ret;
 }
 
@@ -470,6 +490,12 @@ mmu_notifier_invalidate_range_end(struct mmu_notifier_range *range)
 
 	if (mm_has_notifiers(range->mm))
 		__mmu_notifier_invalidate_range_end(range);
+
+	/*
+	 * The event context that has been started by
+	 * mmu_notifier_invalidate_range_start() ends.
+	 */
+	mmu_notifier_invalidate_dept_ecxt_end(range);
 }
 
 static inline void mmu_notifier_arch_invalidate_secondary_tlbs(struct mm_struct *mm,
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index fc18fe274505..850d75952f98 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -46,6 +46,7 @@ struct mmu_notifier_subscriptions {
 	unsigned long active_invalidate_ranges;
 	struct rb_root_cached itree;
 	wait_queue_head_t wq;
+	struct dept_map dmap;
 	struct hlist_head deferred_list;
 };
 
@@ -165,6 +166,25 @@ static void mn_itree_inv_end(struct mmu_notifier_subscriptions *subscriptions)
 	wake_up_all(&subscriptions->wq);
 }
 
+#ifdef CONFIG_DEPT
+void mmu_notifier_invalidate_dept_ecxt_start(struct mmu_notifier_range *range)
+{
+	struct mmu_notifier_subscriptions *subscriptions =
+		range->mm->notifier_subscriptions;
+
+	if (subscriptions)
+		sdt_ecxt_enter(&subscriptions->dmap);
+}
+void mmu_notifier_invalidate_dept_ecxt_end(struct mmu_notifier_range *range)
+{
+	struct mmu_notifier_subscriptions *subscriptions =
+		range->mm->notifier_subscriptions;
+
+	if (subscriptions)
+		sdt_ecxt_exit(&subscriptions->dmap);
+}
+#endif
+
 /**
  * mmu_interval_read_begin - Begin a read side critical section against a VA
  *                           range
@@ -246,9 +266,12 @@ mmu_interval_read_begin(struct mmu_interval_notifier *interval_sub)
 	 */
 	lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
 	lock_map_release(&__mmu_notifier_invalidate_range_start_map);
-	if (is_invalidating)
+	if (is_invalidating) {
+		sdt_might_sleep_start(&subscriptions->dmap);
 		wait_event(subscriptions->wq,
 			   READ_ONCE(subscriptions->invalidate_seq) != seq);
+		sdt_might_sleep_end();
+	}
 
 	/*
 	 * Notice that mmu_interval_read_retry() can already be true at this
@@ -625,6 +648,7 @@ int __mmu_notifier_register(struct mmu_notifier *subscription,
 
 		INIT_HLIST_HEAD(&subscriptions->list);
 		spin_lock_init(&subscriptions->lock);
+		sdt_map_init(&subscriptions->dmap);
 		subscriptions->invalidate_seq = 2;
 		subscriptions->itree = RB_ROOT_CACHED;
 		init_waitqueue_head(&subscriptions->wq);
@@ -1070,9 +1094,12 @@ void mmu_interval_notifier_remove(struct mmu_interval_notifier *interval_sub)
 	 */
 	lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
 	lock_map_release(&__mmu_notifier_invalidate_range_start_map);
-	if (seq)
+	if (seq) {
+		sdt_might_sleep_start(&subscriptions->dmap);
 		wait_event(subscriptions->wq,
 			   mmu_interval_seq_released(subscriptions, seq));
+		sdt_might_sleep_end();
+	}
 
 	/* pairs with mmgrab in mmu_interval_notifier_insert() */
 	mmdrop(mm);
-- 
2.17.1


  parent reply	other threads:[~2025-05-19  9:18 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-19  9:17 [PATCH v16 00/42] DEPT(DEPendency Tracker) Byungchul Park
2025-05-19  9:17 ` [PATCH v16 01/42] llist: move llist_{head,node} definition to types.h Byungchul Park
2025-05-19  9:17 ` [PATCH v16 02/42] dept: implement DEPT(DEPendency Tracker) Byungchul Park
2025-05-19  9:17 ` [PATCH v16 03/42] dept: add single event dependency tracker APIs Byungchul Park
2025-05-19  9:17 ` [PATCH v16 04/42] dept: add lock " Byungchul Park
2025-05-19  9:17 ` [PATCH v16 05/42] dept: tie to lockdep and IRQ tracing Byungchul Park
2025-05-19  9:17 ` [PATCH v16 06/42] dept: add proc knobs to show stats and dependency graph Byungchul Park
2025-05-19  9:17 ` [PATCH v16 07/42] dept: distinguish each kernel context from another Byungchul Park
2025-05-19  9:17 ` [PATCH v16 08/42] x86_64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to x86_64 Byungchul Park
2025-05-19  9:17 ` [PATCH v16 09/42] arm64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64 Byungchul Park
2025-05-19  9:17 ` [PATCH v16 10/42] dept: distinguish each work from another Byungchul Park
2025-05-19  9:17 ` [PATCH v16 11/42] dept: add a mechanism to refill the internal memory pools on running out Byungchul Park
2025-05-19  9:17 ` [PATCH v16 12/42] dept: record the latest one out of consecutive waits of the same class Byungchul Park
2025-05-19  9:17 ` [PATCH v16 13/42] dept: apply sdt_might_sleep_{start,end}() to wait_for_completion()/complete() Byungchul Park
2025-05-19  9:17 ` [PATCH v16 14/42] dept: apply sdt_might_sleep_{start,end}() to swait Byungchul Park
2025-05-19  9:17 ` [PATCH v16 15/42] dept: apply sdt_might_sleep_{start,end}() to waitqueue wait Byungchul Park
2025-05-19  9:18 ` [PATCH v16 16/42] dept: apply sdt_might_sleep_{start,end}() to hashed-waitqueue wait Byungchul Park
2025-05-19  9:18 ` [PATCH v16 17/42] dept: apply sdt_might_sleep_{start,end}() to dma fence Byungchul Park
2025-05-19  9:18 ` [PATCH v16 18/42] dept: track timeout waits separately with a new Kconfig Byungchul Park
2025-05-19  9:18 ` [PATCH v16 19/42] dept: apply timeout consideration to wait_for_completion()/complete() Byungchul Park
2025-05-19  9:18 ` [PATCH v16 20/42] dept: apply timeout consideration to swait Byungchul Park
2025-05-19  9:18 ` [PATCH v16 21/42] dept: apply timeout consideration to waitqueue wait Byungchul Park
2025-05-19  9:18 ` [PATCH v16 22/42] dept: apply timeout consideration to hashed-waitqueue wait Byungchul Park
2025-05-19  9:18 ` [PATCH v16 23/42] dept: apply timeout consideration to dma fence wait Byungchul Park
2025-05-19  9:18 ` [PATCH v16 24/42] dept: make dept able to work with an external wgen Byungchul Park
2025-05-19  9:18 ` [PATCH v16 25/42] dept: track PG_locked with dept Byungchul Park
2025-05-19  9:18 ` [PATCH v16 26/42] dept: print staged wait's stacktrace on report Byungchul Park
2025-05-19  9:18 ` [PATCH v16 27/42] locking/lockdep: prevent various lockdep assertions when lockdep_off()'ed Byungchul Park
2025-05-19  9:18 ` [PATCH v16 28/42] dept: add documentation for dept Byungchul Park
2025-05-19  9:18 ` [PATCH v16 29/42] cpu/hotplug: use a weaker annotation in AP thread Byungchul Park
2025-05-19  9:18 ` [PATCH v16 30/42] fs/jbd2: use a weaker annotation in journal handling Byungchul Park
2025-05-19  9:18 ` Byungchul Park [this message]
2025-05-19  9:18 ` [PATCH v16 32/42] dept: assign unique dept_key to each distinct dma fence caller Byungchul Park
2025-05-19  9:18 ` [PATCH v16 33/42] dept: make dept aware of lockdep_set_lock_cmp_fn() annotation Byungchul Park
2025-05-19  9:18 ` [PATCH v16 34/42] dept: make dept stop from working on debug_locks_off() Byungchul Park
2025-05-19  9:18 ` [PATCH v16 35/42] i2c: rename wait_for_completion callback to wait_for_completion_cb Byungchul Park
2025-05-19  9:18 ` [PATCH v16 36/42] dept: assign unique dept_key to each distinct wait_for_completion() caller Byungchul Park
2025-05-19  9:18 ` [PATCH v16 37/42] completion, dept: introduce init_completion_dmap() API Byungchul Park
2025-05-19  9:18 ` [PATCH v16 38/42] dept: introduce a new type of dependency tracking between multi event sites Byungchul Park
2025-05-19  9:18 ` [PATCH v16 39/42] dept: add module support for struct dept_event_site and dept_event_site_dep Byungchul Park
2025-05-19  9:18 ` [PATCH v16 40/42] dept: introduce event_site() to disable event tracking if it's recoverable Byungchul Park
2025-05-19  9:18 ` [PATCH v16 41/42] dept: implement a basic unit test for dept Byungchul Park
2025-05-19  9:18 ` [PATCH v16 42/42] dept: call dept_hardirqs_off() in local_irq_*() regardless of irq state Byungchul Park
2025-07-25  2:09 ` [PATCH v16 00/42] DEPT(DEPendency Tracker) Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250519091826.19752-32-byungchul@sk.com \
    --to=byungchul@sk.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=chris.p.wilson@intel.com \
    --cc=cl@linux.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=david@fromorbit.com \
    --cc=dennis@kernel.org \
    --cc=djwong@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=duyuyang@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gwan-gyeong.mun@intel.com \
    --cc=hamohammed.sa@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=hch@infradead.org \
    --cc=her0gyugyu@gmail.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=jlayton@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=johannes.berg@intel.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@lge.com \
    --cc=kernel_team@skhynix.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=matthew.brost@intel.com \
    --cc=max.byungchul.park@gmail.com \
    --cc=melissa.srw@gmail.com \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=ngupta@vflare.org \
    --cc=penberg@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=rodrigosiqueiramelo@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=sashal@kernel.org \
    --cc=sj@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yeoreum.yun@arm.com \
    --cc=yskelg@gmail.com \
    --cc=yunseong.kim@ericsson.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox