All of lore.kernel.org
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@linux.dev>
To: Balbir Singh <balbirs@nvidia.com>, Zi Yan <ziy@nvidia.com>,
	Kiryl Shutsemau <kas@kernel.org>,
	matthew.brost@intel.com, npache@redhat.com, david@kernel.org
Cc: Usama Arif <usamaarif642@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, joshua.hahnjy@gmail.com, hannes@cmpxchg.org,
	rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
	ying.huang@linux.alibaba.com, apopple@nvidia.com,
	riel@surriel.com, shakeel.butt@linux.dev, kas@kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
Date: Thu, 5 Mar 2026 02:28:39 +0300	[thread overview]
Message-ID: <622eb392-8c04-473d-b42a-ecdc489799c4@linux.dev> (raw)
In-Reply-To: <5e59c077-9f06-4e45-86e1-ca696e6105b4@nvidia.com>



On 04/03/2026 22:09, Balbir Singh wrote:
> On 3/5/26 08:54, Zi Yan wrote:
>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>
>>> On 3/5/26 02:17, Zi Yan wrote:
>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>
>>>>> From: Usama Arif <usama.arif@linux.dev>
>>>>>
>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>> folio_get() before calling folio_split_unmapped().  On success, the
>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>
>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>> -EAGAIN), the function returns without calling folio_put().  The extra
>>>>> reference is never released.
>>>>>
>>>>> Add the missing folio_put() on the error path.
>>>>>
>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/
>>>>> Reported-by: Nico Pache <npache@redhat.com>
>>>>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>>>>> ---
>>>>>  mm/migrate_device.c | 4 +++-
>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>> --- a/mm/migrate_device.c
>>>>> +++ b/mm/migrate_device.c
>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>>  	folio_get(folio);
>>>>>  	split_huge_pmd_address(migrate->vma, addr, true);
>>>>>  	ret = folio_split_unmapped(folio, 0);
>>>>> -	if (ret)
>>>>> +	if (ret) {
>>>>> +		folio_put(folio);
>>>>>  		return ret;
>>>>> +	}
>>>>>  	migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>>  	flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>>  	pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>> -- 
>>>>> 2.47.3
>>>>
>>>> Add Balbir, who wrote the code, to comment on this.
>>>>
>>>
>>> Thanks Zi!
>>>
>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>> drop the lock.
>>
>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>> If so, how does it distinguish between split folios and failed-to-split folios?
>> By comparing source and destination folio orders?
>>
> 
> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns.  We do a folio_put
> on the src in finalize, if it is split then on all the split folios as well.
> 
>> What we see from migrate_vma_split_unmapped_folio() is that
>> it adds a refcount for all input folios, but only drops a refcount
>> for the split folio. Isn’t it cause failed-to-split folios to have
>> additional refcount?
>>

Hello!

Thanks for reviewing everyone. So its very difficult to create a reproducer I think
the extra reference would need to appear after migrate_device_unmap() but before
folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
userspace.

The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
fails in my patch [1].

Below is my understanding of how refcounting is working over here step by step. I
might very well be wrong on this, and the refcounting is a bit all over the place
and I might miss a reference change somewhere so would really appreciate if someone
can confirm this!


1. migrate_vma_collect_huge_pmd():
  a) folio_get(folio) -> +1 (collect reference)
2. migrate_device_unmap():
  a) folio_isolate_lru() -> +1 (isolation reference)
  b) folio_put() -> -1 (drops the collect reference)


Without this patch fix:

3. migrate_vma_split_unmapped_folio():
  a) folio_get(folio) -> +1 (split reference)
  b) folio_split_unmapped() -> fails
  c) Returns error — without folio_put() which is the fix
4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
  a) remove_migration_ptes(src, src) — re-establishes user PTEs
  b) folio_unlock(src)
  c) folio_put(src) -> -1 (drops the isolation reference)

The split reference in 3.a is never released and the folio has a permanently elevated refcount.
Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?

Please let me know if this makes sense!

[1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@mail.gmail.com/

> 
> Thanks! Yes, the patch makes sense
> 
> Acked-by: Balbir Singh <balbirs@nvidia.com>
> 
> Balbir



  reply	other threads:[~2026-03-04 23:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-04 12:01 [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure Usama Arif
2026-03-04 14:00 ` Kiryl Shutsemau
2026-03-04 15:17 ` Zi Yan
2026-03-04 21:48   ` Balbir Singh
2026-03-04 21:54     ` Zi Yan
2026-03-04 22:02       ` Matthew Brost
2026-03-04 22:09       ` Balbir Singh
2026-03-04 23:28         ` Usama Arif [this message]
2026-03-05  6:09           ` Mika Penttilä
2026-03-05 11:44             ` Usama Arif
2026-03-05 12:09               ` Mika Penttilä
2026-03-05 16:36                 ` Usama Arif
2026-03-05 16:39                   ` Zi Yan
2026-03-05 17:00                     ` Usama Arif
2026-03-05 17:32                       ` Zi Yan
2026-03-06 10:47                         ` Usama Arif
2026-03-05 22:04                       ` Balbir Singh
2026-03-06 10:51                         ` Usama Arif
2026-03-04 15:25 ` Joshua Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=622eb392-8c04-473d-b42a-ecdc489799c4@linux.dev \
    --to=usama.arif@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=david@kernel.org \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=npache@redhat.com \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.