* [PATCH] mm/migrate_device: pin large folios before splitting
@ 2026-07-01 14:06 Usama Arif
2026-07-01 16:49 ` David Hildenbrand (Arm)
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Usama Arif @ 2026-07-01 14:06 UTC (permalink / raw)
To: Andrew Morton, apopple, byungchul, david, gourry, joshua.hahnjy,
linux-kernel, linux-mm, matthew.brost, rakie.kim, ying.huang, ziy
Cc: shakeel.butt, hannes, kernel-team, Usama Arif, sashiko-bot
migrate_vma_collect_pmd() can detect a large folio while holding the PTE
lock, then drop the PTE lock before calling migrate_vma_split_folio(). The
split helper took its own reference, but only after the lock had already
been dropped.
One way to hit this is device migration over a range that contains a large
folio. The walker reads the PTE while holding the PTE lock and derives the
folio either from a present PTE via vm_normal_page(), or from a non-present
PTE that encodes a device-private softleaf entry. It then has to drop the
PTE lock because split_folio() can block. Before migrate_vma_split_folio()
gets a folio reference, concurrent reclaim, migration, or truncation can
replace or clear the entry and drop the last reference to the folio. The
split helper would then take a reference and lock on a stale folio pointer.
Take a temporary reference before dropping the PTE lock and pass that
reference into migrate_vma_split_folio(). The helper consumes the
reference, so split_folio() still sees only the expected caller pin instead
of an extra pin that could make the split fail.
Reported-by: sashiko-bot <sashiko-bot@kernel.org>
Link: https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev
Fixes: 022a12deda53 ("mm/migrate_device: handle partially mapped folios during collection")
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
mm/migrate_device.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 2f8b646302c2..f5a5f699e98e 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -77,6 +77,9 @@ static int migrate_vma_collect_hole(unsigned long start,
* @folio: the folio to split
* @fault_page: struct page associated with the fault if any
*
+ * If @folio is not the folio containing @fault_page, the caller must hold a
+ * reference on @folio. The helper consumes that reference.
+ *
* Returns 0 on success
*/
static int migrate_vma_split_folio(struct folio *folio,
@@ -86,10 +89,8 @@ static int migrate_vma_split_folio(struct folio *folio,
struct folio *fault_folio = fault_page ? page_folio(fault_page) : NULL;
struct folio *new_fault_folio = NULL;
- if (folio != fault_folio) {
- folio_get(folio);
+ if (folio != fault_folio)
folio_lock(folio);
- }
ret = split_folio(folio);
if (ret) {
@@ -310,6 +311,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (folio_test_large(folio)) {
int ret;
+ /*
+ * Keep the folio stable after dropping the PTE
+ * lock. migrate_vma_split_folio() consumes this
+ * reference.
+ */
+ if (folio != fault_folio)
+ folio_get(folio);
lazy_mmu_mode_disable();
pte_unmap_unlock(ptep, ptl);
ret = migrate_vma_split_folio(folio,
@@ -353,6 +361,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (folio && folio_test_large(folio)) {
int ret;
+ /*
+ * Keep the folio stable after dropping the
+ * PTE lock. migrate_vma_split_folio() consumes
+ * this reference.
+ */
+ if (folio != fault_folio)
+ folio_get(folio);
lazy_mmu_mode_disable();
pte_unmap_unlock(ptep, ptl);
ret = migrate_vma_split_folio(folio,
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-01 14:06 [PATCH] mm/migrate_device: pin large folios before splitting Usama Arif
@ 2026-07-01 16:49 ` David Hildenbrand (Arm)
2026-07-01 17:02 ` Zi Yan
` (3 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-07-01 16:49 UTC (permalink / raw)
To: Usama Arif, Andrew Morton, apopple, byungchul, gourry,
joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim,
ying.huang, ziy
Cc: shakeel.butt, hannes, kernel-team, sashiko-bot
On 7/1/26 16:06, Usama Arif wrote:
> migrate_vma_collect_pmd() can detect a large folio while holding the PTE
> lock, then drop the PTE lock before calling migrate_vma_split_folio(). The
> split helper took its own reference, but only after the lock had already
> been dropped.
>
> One way to hit this is device migration over a range that contains a large
> folio. The walker reads the PTE while holding the PTE lock and derives the
> folio either from a present PTE via vm_normal_page(), or from a non-present
> PTE that encodes a device-private softleaf entry. It then has to drop the
> PTE lock because split_folio() can block. Before migrate_vma_split_folio()
> gets a folio reference, concurrent reclaim, migration, or truncation can
> replace or clear the entry and drop the last reference to the folio. The
> split helper would then take a reference and lock on a stale folio pointer.
>
> Take a temporary reference before dropping the PTE lock and pass that
> reference into migrate_vma_split_folio(). The helper consumes the
> reference, so split_folio() still sees only the expected caller pin instead
> of an extra pin that could make the split fail.
>
> Reported-by: sashiko-bot <sashiko-bot@kernel.org>
> Link: https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev
> Fixes: 022a12deda53 ("mm/migrate_device: handle partially mapped folios during collection")
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> mm/migrate_device.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 2f8b646302c2..f5a5f699e98e 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -77,6 +77,9 @@ static int migrate_vma_collect_hole(unsigned long start,
> * @folio: the folio to split
> * @fault_page: struct page associated with the fault if any
> *
> + * If @folio is not the folio containing @fault_page, the caller must hold a
> + * reference on @folio. The helper consumes that reference.
> + *
> * Returns 0 on success
> */
> static int migrate_vma_split_folio(struct folio *folio,
> @@ -86,10 +89,8 @@ static int migrate_vma_split_folio(struct folio *folio,
> struct folio *fault_folio = fault_page ? page_folio(fault_page) : NULL;
> struct folio *new_fault_folio = NULL;
>
> - if (folio != fault_folio) {
> - folio_get(folio);
> + if (folio != fault_folio)
> folio_lock(folio);
> - }
>
> ret = split_folio(folio);
> if (ret) {
> @@ -310,6 +311,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> if (folio_test_large(folio)) {
> int ret;
>
> + /*
> + * Keep the folio stable after dropping the PTE
> + * lock. migrate_vma_split_folio() consumes this
> + * reference.
> + */
Do we really need that comment (same below?). It's a common mechanism when
keeping to refer to folios after dropping the PTL.
> + if (folio != fault_folio)
> + folio_get(folio);
> lazy_mmu_mode_disable();
> pte_unmap_unlock(ptep, ptl);
> ret = migrate_vma_split_folio(folio,
> @@ -353,6 +361,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> if (folio && folio_test_large(folio)) {
> int ret;
>
> + /*
> + * Keep the folio stable after dropping the
> + * PTE lock. migrate_vma_split_folio() consumes
> + * this reference.
> + */
> + if (folio != fault_folio)
> + folio_get(folio);
> lazy_mmu_mode_disable();
> pte_unmap_unlock(ptep, ptl);
> ret = migrate_vma_split_folio(folio,
LGTM. The code duplication in this function is concerning.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-01 14:06 [PATCH] mm/migrate_device: pin large folios before splitting Usama Arif
2026-07-01 16:49 ` David Hildenbrand (Arm)
@ 2026-07-01 17:02 ` Zi Yan
2026-07-01 19:27 ` Andrew Morton
2026-07-01 23:29 ` SJ Park
` (2 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Zi Yan @ 2026-07-01 17:02 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, apopple, byungchul, david, gourry, joshua.hahnjy,
linux-kernel, linux-mm, matthew.brost, rakie.kim, ying.huang,
shakeel.butt, hannes, kernel-team, sashiko-bot
On 1 Jul 2026, at 10:06, Usama Arif wrote:
> migrate_vma_collect_pmd() can detect a large folio while holding the PTE
> lock, then drop the PTE lock before calling migrate_vma_split_folio(). The
> split helper took its own reference, but only after the lock had already
> been dropped.
>
> One way to hit this is device migration over a range that contains a large
> folio. The walker reads the PTE while holding the PTE lock and derives the
> folio either from a present PTE via vm_normal_page(), or from a non-present
> PTE that encodes a device-private softleaf entry. It then has to drop the
> PTE lock because split_folio() can block. Before migrate_vma_split_folio()
> gets a folio reference, concurrent reclaim, migration, or truncation can
> replace or clear the entry and drop the last reference to the folio. The
> split helper would then take a reference and lock on a stale folio pointer.
>
> Take a temporary reference before dropping the PTE lock and pass that
> reference into migrate_vma_split_folio(). The helper consumes the
> reference, so split_folio() still sees only the expected caller pin instead
> of an extra pin that could make the split fail.
>
> Reported-by: sashiko-bot <sashiko-bot@kernel.org>
> Link: https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev
> Fixes: 022a12deda53 ("mm/migrate_device: handle partially mapped folios during collection")
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> mm/migrate_device.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
LGTM. Like David said, the comments might not be needed. Thanks.
Reviewed-by: Zi Yan <ziy@nvidia.com>
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-01 17:02 ` Zi Yan
@ 2026-07-01 19:27 ` Andrew Morton
2026-07-01 20:06 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2026-07-01 19:27 UTC (permalink / raw)
To: Zi Yan
Cc: Usama Arif, apopple, byungchul, david, gourry, joshua.hahnjy,
linux-kernel, linux-mm, matthew.brost, rakie.kim, ying.huang,
shakeel.butt, hannes, kernel-team, sashiko-bot
On Wed, 01 Jul 2026 13:02:10 -0400 Zi Yan <ziy@nvidia.com> wrote:
> LGTM. Like David said, the comments might not be needed. Thanks.
I like the comments! They may be uninteresting to those who are
already familiar with these things, but they aren't the target audience.
How are others to become familiar, if not by this?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-01 19:27 ` Andrew Morton
@ 2026-07-01 20:06 ` David Hildenbrand (Arm)
2026-07-01 20:16 ` Zi Yan
0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-07-01 20:06 UTC (permalink / raw)
To: Andrew Morton, Zi Yan
Cc: Usama Arif, apopple, byungchul, gourry, joshua.hahnjy,
linux-kernel, linux-mm, matthew.brost, rakie.kim, ying.huang,
shakeel.butt, hannes, kernel-team, sashiko-bot
On 7/1/26 21:27, Andrew Morton wrote:
> On Wed, 01 Jul 2026 13:02:10 -0400 Zi Yan <ziy@nvidia.com> wrote:
>
>> LGTM. Like David said, the comments might not be needed. Thanks.
>
> I like the comments! They may be uninteresting to those who are
> already familiar with these things, but they aren't the target audience.
>
> How are others to become familiar, if not by this?
I mean, it's one of the basic rules: if you lookup a page in the page table, the
moment you drop the lock that might be invalid.
If we were to document that everywhere... this is not really the secret sauce we
want to document everywhere.
--
Cheers,
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-01 20:06 ` David Hildenbrand (Arm)
@ 2026-07-01 20:16 ` Zi Yan
2026-07-02 0:33 ` Alistair Popple
0 siblings, 1 reply; 12+ messages in thread
From: Zi Yan @ 2026-07-01 20:16 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand (Arm)
Cc: Usama Arif, apopple, byungchul, gourry, joshua.hahnjy,
linux-kernel, linux-mm, matthew.brost, rakie.kim, ying.huang,
shakeel.butt, hannes, kernel-team, sashiko-bot
On 1 Jul 2026, at 16:06, David Hildenbrand (Arm) wrote:
> On 7/1/26 21:27, Andrew Morton wrote:
>> On Wed, 01 Jul 2026 13:02:10 -0400 Zi Yan <ziy@nvidia.com> wrote:
>>
>>> LGTM. Like David said, the comments might not be needed. Thanks.
>>
>> I like the comments! They may be uninteresting to those who are
>> already familiar with these things, but they aren't the target audience.
>>
>> How are others to become familiar, if not by this?
>
> I mean, it's one of the basic rules: if you lookup a page in the page table, the
> moment you drop the lock that might be invalid.
>
> If we were to document that everywhere... this is not really the secret sauce we
> want to document everywhere.
In this particular case, split_folio() is used in migrate_vma_split_folio()
and split_folio() wraps __split_huge_page_to_list_to_order(), which documents
the requirement of elevated folio reference. One who wants to get familiar
with kernel code can be expected to chase down to
__split_huge_page_to_list_to_order() and find the requirement.
With LLMs, it is probably as easy as asking “why is folio_get() needed here?”.
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-01 14:06 [PATCH] mm/migrate_device: pin large folios before splitting Usama Arif
2026-07-01 16:49 ` David Hildenbrand (Arm)
2026-07-01 17:02 ` Zi Yan
@ 2026-07-01 23:29 ` SJ Park
2026-07-02 10:47 ` Usama Arif
2026-07-02 4:45 ` Lance Yang
2026-07-02 10:56 ` Usama Arif
4 siblings, 1 reply; 12+ messages in thread
From: SJ Park @ 2026-07-01 23:29 UTC (permalink / raw)
To: Usama Arif
Cc: SJ Park, Andrew Morton, apopple, byungchul, david, gourry,
joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim,
ying.huang, ziy, shakeel.butt, hannes, kernel-team, sashiko-bot
On Wed, 1 Jul 2026 07:06:38 -0700 Usama Arif <usama.arif@linux.dev> wrote:
> migrate_vma_collect_pmd() can detect a large folio while holding the PTE
> lock, then drop the PTE lock before calling migrate_vma_split_folio(). The
> split helper took its own reference, but only after the lock had already
> been dropped.
>
> One way to hit this is device migration over a range that contains a large
> folio. The walker reads the PTE while holding the PTE lock and derives the
> folio either from a present PTE via vm_normal_page(), or from a non-present
> PTE that encodes a device-private softleaf entry. It then has to drop the
> PTE lock because split_folio() can block. Before migrate_vma_split_folio()
> gets a folio reference, concurrent reclaim, migration, or truncation can
> replace or clear the entry and drop the last reference to the folio. The
> split helper would then take a reference and lock on a stale folio pointer.
>
> Take a temporary reference before dropping the PTE lock and pass that
> reference into migrate_vma_split_folio(). The helper consumes the
> reference, so split_folio() still sees only the expected caller pin instead
> of an extra pin that could make the split fail.
>
> Reported-by: sashiko-bot <sashiko-bot@kernel.org>
Just curious. Has the bot sent the review directly to you? Or, you read it
from sashiko.dev web site but kindly giving the credit to the bot?
> Link: https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev
The link shows the comments for the entire series, so I was little bit confused
where the real finding is. Seems like that is on the reply to the patch 5?
And today I learned Sashiko.dev provides a way to point a reply to a specific
patch, like this:
https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev?part=5
> Fixes: 022a12deda53 ("mm/migrate_device: handle partially mapped folios during collection")
Seems merged in 6.19. Should we Cc stable@ ?
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
Reviewed-by: SJ Park <sj@kernel.org>
> ---
> mm/migrate_device.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 2f8b646302c2..f5a5f699e98e 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -77,6 +77,9 @@ static int migrate_vma_collect_hole(unsigned long start,
> * @folio: the folio to split
> * @fault_page: struct page associated with the fault if any
> *
> + * If @folio is not the folio containing @fault_page, the caller must hold a
> + * reference on @folio. The helper consumes that reference.
> + *
> * Returns 0 on success
> */
> static int migrate_vma_split_folio(struct folio *folio,
> @@ -86,10 +89,8 @@ static int migrate_vma_split_folio(struct folio *folio,
> struct folio *fault_folio = fault_page ? page_folio(fault_page) : NULL;
> struct folio *new_fault_folio = NULL;
>
> - if (folio != fault_folio) {
> - folio_get(folio);
> + if (folio != fault_folio)
> folio_lock(folio);
> - }
>
> ret = split_folio(folio);
> if (ret) {
> @@ -310,6 +311,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> if (folio_test_large(folio)) {
> int ret;
>
> + /*
> + * Keep the folio stable after dropping the PTE
> + * lock. migrate_vma_split_folio() consumes this
> + * reference.
> + */
> + if (folio != fault_folio)
> + folio_get(folio);
> lazy_mmu_mode_disable();
> pte_unmap_unlock(ptep, ptl);
> ret = migrate_vma_split_folio(folio,
> @@ -353,6 +361,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> if (folio && folio_test_large(folio)) {
> int ret;
>
> + /*
> + * Keep the folio stable after dropping the
> + * PTE lock. migrate_vma_split_folio() consumes
> + * this reference.
> + */
I don't mind having the above comments. Nonetheless, having same content but
different wrapping look special to me.
If people really hate having two (incompletely) duplicated comments, what about
separating the recurring code out and adding the coment once on the function?
Maybe having a hotfix version without the comment and later refactoring with
the function can also be an option? Just a pure loud thinking.
Anyway, thank you for making Linux safer! :)
> + if (folio != fault_folio)
> + folio_get(folio);
> lazy_mmu_mode_disable();
> pte_unmap_unlock(ptep, ptl);
> ret = migrate_vma_split_folio(folio,
> --
> 2.53.0-Meta
Thanks,
SJ
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-01 20:16 ` Zi Yan
@ 2026-07-02 0:33 ` Alistair Popple
2026-07-02 7:59 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 12+ messages in thread
From: Alistair Popple @ 2026-07-02 0:33 UTC (permalink / raw)
To: Zi Yan
Cc: Andrew Morton, David Hildenbrand (Arm), Usama Arif, byungchul,
gourry, joshua.hahnjy, linux-kernel, linux-mm, matthew.brost,
rakie.kim, ying.huang, shakeel.butt, hannes, kernel-team,
sashiko-bot
On 2026-07-02 at 06:16 +1000, Zi Yan <ziy@nvidia.com> wrote...
> On 1 Jul 2026, at 16:06, David Hildenbrand (Arm) wrote:
>
> > On 7/1/26 21:27, Andrew Morton wrote:
> >> On Wed, 01 Jul 2026 13:02:10 -0400 Zi Yan <ziy@nvidia.com> wrote:
> >>
> >>> LGTM. Like David said, the comments might not be needed. Thanks.
> >>
> >> I like the comments! They may be uninteresting to those who are
> >> already familiar with these things, but they aren't the target audience.
> >>
> >> How are others to become familiar, if not by this?
> >
> > I mean, it's one of the basic rules: if you lookup a page in the page table, the
> > moment you drop the lock that might be invalid.
> >
> > If we were to document that everywhere... this is not really the secret sauce we
> > want to document everywhere.
>
> In this particular case, split_folio() is used in migrate_vma_split_folio()
> and split_folio() wraps __split_huge_page_to_list_to_order(), which documents
> the requirement of elevated folio reference. One who wants to get familiar
> with kernel code can be expected to chase down to
> __split_huge_page_to_list_to_order() and find the requirement.
>
> With LLMs, it is probably as easy as asking “why is folio_get() needed here?”.
The interesting part of the comment isn't why the folio_get() is needed, it's
where the corresponding reference is dropped/returned. I've spent far too many
hours in this code trying to figure out why a page didn't migrate because of
a spurious refcount and it's not easy figuring out where the matching get/puts
occur.
So the comment could probably be condensed but I think it's valuable.
- Alistair
> Best Regards,
> Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-01 14:06 [PATCH] mm/migrate_device: pin large folios before splitting Usama Arif
` (2 preceding siblings ...)
2026-07-01 23:29 ` SJ Park
@ 2026-07-02 4:45 ` Lance Yang
2026-07-02 10:56 ` Usama Arif
4 siblings, 0 replies; 12+ messages in thread
From: Lance Yang @ 2026-07-02 4:45 UTC (permalink / raw)
To: usama.arif
Cc: akpm, apopple, byungchul, david, gourry, joshua.hahnjy,
linux-kernel, linux-mm, matthew.brost, rakie.kim, ying.huang, ziy,
shakeel.butt, hannes, kernel-team, sashiko-bot, Lance Yang
On Wed, Jul 01, 2026 at 07:06:38AM -0700, Usama Arif wrote:
>migrate_vma_collect_pmd() can detect a large folio while holding the PTE
>lock, then drop the PTE lock before calling migrate_vma_split_folio(). The
>split helper took its own reference, but only after the lock had already
>been dropped.
>
>One way to hit this is device migration over a range that contains a large
>folio. The walker reads the PTE while holding the PTE lock and derives the
>folio either from a present PTE via vm_normal_page(), or from a non-present
>PTE that encodes a device-private softleaf entry. It then has to drop the
>PTE lock because split_folio() can block. Before migrate_vma_split_folio()
>gets a folio reference, concurrent reclaim, migration, or truncation can
>replace or clear the entry and drop the last reference to the folio. The
>split helper would then take a reference and lock on a stale folio pointer.
>
>Take a temporary reference before dropping the PTE lock and pass that
>reference into migrate_vma_split_folio(). The helper consumes the
>reference, so split_folio() still sees only the expected caller pin instead
>of an extra pin that could make the split fail.
Yeah, that's how it should have been:
lookup under PTL -> get folio ref -> drop PTL.
Getting a ref after dropping PTL feels a bit too optimistic ...
>Reported-by: sashiko-bot <sashiko-bot@kernel.org>
>Link: https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev
>Fixes: 022a12deda53 ("mm/migrate_device: handle partially mapped folios during collection")
>Signed-off-by: Usama Arif <usama.arif@linux.dev>
>---
Thanks for fixing it! Feel free to add:
Reviewed-by: Lance Yang <lance.yang@linux.dev>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-02 0:33 ` Alistair Popple
@ 2026-07-02 7:59 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-07-02 7:59 UTC (permalink / raw)
To: Alistair Popple, Zi Yan
Cc: Andrew Morton, Usama Arif, byungchul, gourry, joshua.hahnjy,
linux-kernel, linux-mm, matthew.brost, rakie.kim, ying.huang,
shakeel.butt, hannes, kernel-team, sashiko-bot
On 7/2/26 02:33, Alistair Popple wrote:
> On 2026-07-02 at 06:16 +1000, Zi Yan <ziy@nvidia.com> wrote...
>> On 1 Jul 2026, at 16:06, David Hildenbrand (Arm) wrote:
>>
>>>
>>> I mean, it's one of the basic rules: if you lookup a page in the page table, the
>>> moment you drop the lock that might be invalid.
>>>
>>> If we were to document that everywhere... this is not really the secret sauce we
>>> want to document everywhere.
>>
>> In this particular case, split_folio() is used in migrate_vma_split_folio()
>> and split_folio() wraps __split_huge_page_to_list_to_order(), which documents
>> the requirement of elevated folio reference. One who wants to get familiar
>> with kernel code can be expected to chase down to
>> __split_huge_page_to_list_to_order() and find the requirement.
>>
>> With LLMs, it is probably as easy as asking “why is folio_get() needed here?”.
>
> The interesting part of the comment isn't why the folio_get() is needed, it's
> where the corresponding reference is dropped/returned. I've spent far too many
> hours in this code trying to figure out why a page didn't migrate because of
> a spurious refcount and it's not easy figuring out where the matching get/puts
> occur.
>
> So the comment could probably be condensed but I think it's valuable.
"migrate_vma_split_folio() consumes this reference" would be better indeed.
--
Cheers,
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-01 23:29 ` SJ Park
@ 2026-07-02 10:47 ` Usama Arif
0 siblings, 0 replies; 12+ messages in thread
From: Usama Arif @ 2026-07-02 10:47 UTC (permalink / raw)
To: SJ Park
Cc: Andrew Morton, apopple, byungchul, david, gourry, joshua.hahnjy,
linux-kernel, linux-mm, matthew.brost, rakie.kim, ying.huang, ziy,
shakeel.butt, hannes, kernel-team, sashiko-bot
On 02/07/2026 00:29, SJ Park wrote:
> On Wed, 1 Jul 2026 07:06:38 -0700 Usama Arif <usama.arif@linux.dev> wrote:
>
>> migrate_vma_collect_pmd() can detect a large folio while holding the PTE
>> lock, then drop the PTE lock before calling migrate_vma_split_folio(). The
>> split helper took its own reference, but only after the lock had already
>> been dropped.
>>
>> One way to hit this is device migration over a range that contains a large
>> folio. The walker reads the PTE while holding the PTE lock and derives the
>> folio either from a present PTE via vm_normal_page(), or from a non-present
>> PTE that encodes a device-private softleaf entry. It then has to drop the
>> PTE lock because split_folio() can block. Before migrate_vma_split_folio()
>> gets a folio reference, concurrent reclaim, migration, or truncation can
>> replace or clear the entry and drop the last reference to the folio. The
>> split helper would then take a reference and lock on a stale folio pointer.
>>
>> Take a temporary reference before dropping the PTE lock and pass that
>> reference into migrate_vma_split_folio(). The helper consumes the
>> reference, so split_folio() still sees only the expected caller pin instead
>> of an extra pin that could make the split fail.
>>
>> Reported-by: sashiko-bot <sashiko-bot@kernel.org>
>
> Just curious. Has the bot sent the review directly to you? Or, you read it
> from sashiko.dev web site but kindly giving the credit to the bot?
I read it from sashiko.dev!
>
>> Link: https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev
>
> The link shows the comments for the entire series, so I was little bit confused
> where the real finding is. Seems like that is on the reply to the patch 5?
> And today I learned Sashiko.dev provides a way to point a reply to a specific
> patch, like this:
>
> https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev?part=5
>
Thanks for this!
>> Fixes: 022a12deda53 ("mm/migrate_device: handle partially mapped folios during collection")
>
> Seems merged in 6.19. Should we Cc stable@ ?
>
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>
> Reviewed-by: SJ Park <sj@kernel.org>
Thanks SJ!
>
>> ---
>> mm/migrate_device.c | 21 ++++++++++++++++++---
>> 1 file changed, 18 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index 2f8b646302c2..f5a5f699e98e 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -77,6 +77,9 @@ static int migrate_vma_collect_hole(unsigned long start,
>> * @folio: the folio to split
>> * @fault_page: struct page associated with the fault if any
>> *
>> + * If @folio is not the folio containing @fault_page, the caller must hold a
>> + * reference on @folio. The helper consumes that reference.
>> + *
>> * Returns 0 on success
>> */
>> static int migrate_vma_split_folio(struct folio *folio,
>> @@ -86,10 +89,8 @@ static int migrate_vma_split_folio(struct folio *folio,
>> struct folio *fault_folio = fault_page ? page_folio(fault_page) : NULL;
>> struct folio *new_fault_folio = NULL;
>>
>> - if (folio != fault_folio) {
>> - folio_get(folio);
>> + if (folio != fault_folio)
>> folio_lock(folio);
>> - }
>>
>> ret = split_folio(folio);
>> if (ret) {
>> @@ -310,6 +311,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>> if (folio_test_large(folio)) {
>> int ret;
>>
>> + /*
>> + * Keep the folio stable after dropping the PTE
>> + * lock. migrate_vma_split_folio() consumes this
>> + * reference.
>> + */
>> + if (folio != fault_folio)
>> + folio_get(folio);
>> lazy_mmu_mode_disable();
>> pte_unmap_unlock(ptep, ptl);
>> ret = migrate_vma_split_folio(folio,
>> @@ -353,6 +361,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>> if (folio && folio_test_large(folio)) {
>> int ret;
>>
>> + /*
>> + * Keep the folio stable after dropping the
>> + * PTE lock. migrate_vma_split_folio() consumes
>> + * this reference.
>> + */
>
> I don't mind having the above comments. Nonetheless, having same content but
> different wrapping look special to me.
>
> If people really hate having two (incompletely) duplicated comments, what about
> separating the recurring code out and adding the coment once on the function?
> Maybe having a hotfix version without the comment and later refactoring with
> the function can also be an option? Just a pure loud thinking.
>
> Anyway, thank you for making Linux safer! :)
>
>> + if (folio != fault_folio)
>> + folio_get(folio);
>> lazy_mmu_mode_disable();
>> pte_unmap_unlock(ptep, ptl);
>> ret = migrate_vma_split_folio(folio,
>> --
>> 2.53.0-Meta
>
>
> Thanks,
> SJ
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/migrate_device: pin large folios before splitting
2026-07-01 14:06 [PATCH] mm/migrate_device: pin large folios before splitting Usama Arif
` (3 preceding siblings ...)
2026-07-02 4:45 ` Lance Yang
@ 2026-07-02 10:56 ` Usama Arif
4 siblings, 0 replies; 12+ messages in thread
From: Usama Arif @ 2026-07-02 10:56 UTC (permalink / raw)
To: Andrew Morton, apopple, byungchul, david, gourry, joshua.hahnjy,
linux-kernel, linux-mm, matthew.brost, rakie.kim, ying.huang, ziy
Cc: shakeel.butt, hannes, kernel-team, sashiko-bot
On 01/07/2026 15:06, Usama Arif wrote:
> migrate_vma_collect_pmd() can detect a large folio while holding the PTE
> lock, then drop the PTE lock before calling migrate_vma_split_folio(). The
> split helper took its own reference, but only after the lock had already
> been dropped.
>
> One way to hit this is device migration over a range that contains a large
> folio. The walker reads the PTE while holding the PTE lock and derives the
> folio either from a present PTE via vm_normal_page(), or from a non-present
> PTE that encodes a device-private softleaf entry. It then has to drop the
> PTE lock because split_folio() can block. Before migrate_vma_split_folio()
> gets a folio reference, concurrent reclaim, migration, or truncation can
> replace or clear the entry and drop the last reference to the folio. The
> split helper would then take a reference and lock on a stale folio pointer.
>
> Take a temporary reference before dropping the PTE lock and pass that
> reference into migrate_vma_split_folio(). The helper consumes the
> reference, so split_folio() still sees only the expected caller pin instead
> of an extra pin that could make the split fail.
>
> Reported-by: sashiko-bot <sashiko-bot@kernel.org>
> Link: https://sashiko.dev/#/patchset/20260630164143.1595669-1-usama.arif%40linux.dev
> Fixes: 022a12deda53 ("mm/migrate_device: handle partially mapped folios during collection")
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> mm/migrate_device.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
Hi Andrew!
Could you apply the below fixlet to condense the comment as per Davids', Alistairs' and Zis'
suggestion? I can send a v2 if its easier as well. Thanks!
From 022bfeb9573f745cc1d93468d1cc123cf6507581 Mon Sep 17 00:00:00 2001
From: Usama Arif <usama.arif@linux.dev>
Date: Wed, 1 Jul 2026 11:12:54 -0700
Subject: [fixup] condense comment about folio reference
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
mm/migrate_device.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index f5a5f699e98e..052167f9ad54 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -311,11 +311,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (folio_test_large(folio)) {
int ret;
- /*
- * Keep the folio stable after dropping the PTE
- * lock. migrate_vma_split_folio() consumes this
- * reference.
- */
+ /* migrate_vma_split_folio() consumes this reference */
if (folio != fault_folio)
folio_get(folio);
lazy_mmu_mode_disable();
@@ -361,11 +357,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
if (folio && folio_test_large(folio)) {
int ret;
- /*
- * Keep the folio stable after dropping the
- * PTE lock. migrate_vma_split_folio() consumes
- * this reference.
- */
+ /* migrate_vma_split_folio() consumes this reference */
if (folio != fault_folio)
folio_get(folio);
lazy_mmu_mode_disable();
--
2.53.0-Meta
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-07-02 10:56 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 14:06 [PATCH] mm/migrate_device: pin large folios before splitting Usama Arif
2026-07-01 16:49 ` David Hildenbrand (Arm)
2026-07-01 17:02 ` Zi Yan
2026-07-01 19:27 ` Andrew Morton
2026-07-01 20:06 ` David Hildenbrand (Arm)
2026-07-01 20:16 ` Zi Yan
2026-07-02 0:33 ` Alistair Popple
2026-07-02 7:59 ` David Hildenbrand (Arm)
2026-07-01 23:29 ` SJ Park
2026-07-02 10:47 ` Usama Arif
2026-07-02 4:45 ` Lance Yang
2026-07-02 10:56 ` Usama Arif
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox