From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68BBF2C9D
	for <linux-kernel@vger.kernel.org>; Wed,  1 Jul 2026 23:29:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1782948586; cv=none; b=PPTbkfi4fky3LlBCZ+AYrr8xQp6vmwl5mSsEdWiUjWJmcMXr3+Cyg1XO2PDFUgemBbACoWtrS8f4lESS1Kg0TsdOe7t6+pDPtZo8XbJW2xzyNaf2G2CFuc5OKTZkCegrA00tAopHOq350P6sNWrL/cLkMCYqv/wy4sWB/XbgTSo=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1782948586; c=relaxed/simple;
	bh=tg4tnZlrEiZpc5cLzV2nwjku53yX+qldZZ+KLpbwoXE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version; b=NGRcOeNI1nrUbs33Kn98YGoUUw7ho4nu3flj/cqszNlrOtxsjjektN+dKjApFV/FHmyWItAIvoiIwMLQBU/H7DDcfaMfIrQVxK6xoBUl8vohtUQGHgr4uav1tniIVU2PdBCiZgkYxHINgHRAPJ1EapqwuksyHs0lqblhVJCRHPU=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BD/vWVtR; arc=none smtp.client-ip=100.103.45.18
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BD/vWVtR"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id A465A1F000E9;
	Wed,  1 Jul 2026 23:29:44 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org;
	s=k20260515; t=1782948585;
	bh=R4Q8EIyzKjGiuA36+iLs5iGLzb0r1XX60du/1F2NN18=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References;
	b=BD/vWVtRo0Rnc/TXGOBKm19zBg+LpdTzTodtmBuVC2mwFYMFL7FMUSeW6vvhiftzu
	 ZazfvZffYYkiEZS6ehbZz0gKn+LYfpkPZOZduAGTHT2EJBrm3cpqDjh6LBBl/ET983
	 2Gfl3ww3e6YarOaE1CpFj8CFdommg5njKv2PbSlm4b9G53esxjVZkldjGL3mzAZtHv
	 kKPCulsBL9G/05AL7NqB6mrhdur0SAcX1Vl+dZuzhDG9mOOyEJuSQQpy5bVJP7M5ew
	 B5MJRpuG5CoXQquPbfFTt7ZeDjsRScGrbD2jWfQwhinf6W+ZxMU52qh4F5VxAhuluX
	 0LHHYV8EUuCfw==
From: SJ Park <sj@kernel.org>
To: Usama Arif <usama.arif@linux.dev>
Cc: SJ Park <sj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	apopple@nvidia.com,
	byungchul@sk.com,
	david@kernel.org,
	gourry@gourry.net,
	joshua.hahnjy@gmail.com,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	matthew.brost@intel.com,
	rakie.kim@sk.com,
	ying.huang@linux.alibaba.com,
	ziy@nvidia.com,
	shakeel.butt@linux.dev,
	hannes@cmpxchg.org,
	kernel-team@meta.com,
	sashiko-bot <sashiko-bot@kernel.org>
Subject: Re: [PATCH] mm/migrate_device: pin large folios before splitting
Date: Wed,  1 Jul 2026 16:29:36 -0700
Message-ID: <20260701232936.85398-1-sj@kernel.org>
X-Mailer: git-send-email 2.47.3
In-Reply-To: <20260701140638.840773-1-usama.arif@linux.dev>
References: 
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

On Wed,  1 Jul 2026 07:06:38 -0700 Usama Arif <usama.arif@linux.dev> wrote:

> migrate_vma_collect_pmd() can detect a large folio while holding the PTE
> lock, then drop the PTE lock before calling migrate_vma_split_folio(). The
> split helper took its own reference, but only after the lock had already
> been dropped.
> 
> One way to hit this is device migration over a range that contains a large
> folio. The walker reads the PTE while holding the PTE lock and derives the
> folio either from a present PTE via vm_normal_page(), or from a non-present
> PTE that encodes a device-private softleaf entry. It then has to drop the
> PTE lock because split_folio() can block. Before migrate_vma_split_folio()
> gets a folio reference, concurrent reclaim, migration, or truncation can
> replace or clear the entry and drop the last reference to the folio. The
> split helper would then take a reference and lock on a stale folio pointer.
> 
> Take a temporary reference before dropping the PTE lock and pass that
> reference into migrate_vma_split_folio(). The helper consumes the
> reference, so split_folio() still sees only the expected caller pin instead
> of an extra pin that could make the split fail.
> 
> Reported-by: sashiko-bot <sashiko-bot@kernel.org>

Just curious.  Has the bot sent the review directly to you?  Or, you read it
from sashiko.dev web site but kindly giving the credit to the bot?

> Link: https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev

The link shows the comments for the entire series, so I was little bit confused
where the real finding is.  Seems like that is on the reply to the patch 5?
And today I learned Sashiko.dev provides a way to point a reply to a specific
patch, like this:

https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev?part=5

> Fixes: 022a12deda53 ("mm/migrate_device: handle partially mapped folios during collection")

Seems merged in 6.19.  Should we Cc stable@ ?

> Signed-off-by: Usama Arif <usama.arif@linux.dev>

Reviewed-by: SJ Park <sj@kernel.org>

> ---
>  mm/migrate_device.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 2f8b646302c2..f5a5f699e98e 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -77,6 +77,9 @@ static int migrate_vma_collect_hole(unsigned long start,
>   * @folio: the folio to split
>   * @fault_page: struct page associated with the fault if any
>   *
> + * If @folio is not the folio containing @fault_page, the caller must hold a
> + * reference on @folio. The helper consumes that reference.
> + *
>   * Returns 0 on success
>   */
>  static int migrate_vma_split_folio(struct folio *folio,
> @@ -86,10 +89,8 @@ static int migrate_vma_split_folio(struct folio *folio,
>  	struct folio *fault_folio = fault_page ? page_folio(fault_page) : NULL;
>  	struct folio *new_fault_folio = NULL;
>  
> -	if (folio != fault_folio) {
> -		folio_get(folio);
> +	if (folio != fault_folio)
>  		folio_lock(folio);
> -	}
>  
>  	ret = split_folio(folio);
>  	if (ret) {
> @@ -310,6 +311,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>  			if (folio_test_large(folio)) {
>  				int ret;
>  
> +				/*
> +				 * Keep the folio stable after dropping the PTE
> +				 * lock. migrate_vma_split_folio() consumes this
> +				 * reference.
> +				 */
> +				if (folio != fault_folio)
> +					folio_get(folio);
>  				lazy_mmu_mode_disable();
>  				pte_unmap_unlock(ptep, ptl);
>  				ret = migrate_vma_split_folio(folio,
> @@ -353,6 +361,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>  			if (folio && folio_test_large(folio)) {
>  				int ret;
>  
> +				/*
> +				 * Keep the folio stable after dropping the
> +				 * PTE lock. migrate_vma_split_folio() consumes
> +				 * this reference.
> +				 */

I don't mind having the above comments.  Nonetheless, having same content but
different wrapping look special to me.

If people really hate having two (incompletely) duplicated comments, what about
separating the recurring code out and adding the coment once on the function?
Maybe having a hotfix version without the comment and later refactoring with
the function can also be an option?  Just a pure loud thinking.

Anyway, thank you for making Linux safer! :)

> +				if (folio != fault_folio)
> +					folio_get(folio);
>  				lazy_mmu_mode_disable();
>  				pte_unmap_unlock(ptep, ptl);
>  				ret = migrate_vma_split_folio(folio,
> -- 
> 2.53.0-Meta


Thanks,
SJ