[PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference
@ 2025-05-26 18:28 Shivank Garg
  2025-05-26 18:28 ` [PATCH V3 2/2] mm/khugepaged: clean up refcount check using folio_expected_ref_count() Shivank Garg
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Shivank Garg @ 2025-05-26 18:28 UTC (permalink / raw)
  To: akpm, david, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, dev.jain, fengwei.yin, shivankg, bharata,
	syzbot+2b99589e33edbe9475ca

hpage_collapse_scan_file() calls is_refcount_suitable(), which in turn
calls folio_mapcount(). folio_mapcount() checks folio_test_large() before
proceeding to folio_large_mapcount(), but there is a race window where the
folio may get split/freed between these checks, triggering:

  VM_WARN_ON_FOLIO(!folio_test_large(folio), folio)

Take a temporary reference to the folio in hpage_collapse_scan_file().
This stabilizes the folio during refcount check and prevents incorrect
large folio detection due to concurrent split/free. Use helper
folio_expected_ref_count() + 1 to compare with folio_ref_count()
instead of using is_refcount_suitable().

Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
Changes since V2:
- https://lore.kernel.org/linux-mm/20250523091432.17588-1-shivankg@amd.com
- Reorder the patches to bring the fix in first patch and clean-up in second.
---
 mm/khugepaged.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index cc945c6ab3bd..fe1fe7eace54 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2295,6 +2295,17 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
 			continue;
 		}
 
+		if (!folio_try_get(folio)) {
+			xas_reset(&xas);
+			continue;
+		}
+
+		if (unlikely(folio != xas_reload(&xas))) {
+			folio_put(folio);
+			xas_reset(&xas);
+			continue;
+		}
+
 		if (folio_order(folio) == HPAGE_PMD_ORDER &&
 		    folio->index == start) {
 			/* Maybe PMD-mapped */
@@ -2305,23 +2316,27 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
 			 * it's safe to skip LRU and refcount checks before
 			 * returning.
 			 */
+			folio_put(folio);
 			break;
 		}
 
 		node = folio_nid(folio);
 		if (hpage_collapse_scan_abort(node, cc)) {
 			result = SCAN_SCAN_ABORT;
+			folio_put(folio);
 			break;
 		}
 		cc->node_load[node]++;
 
 		if (!folio_test_lru(folio)) {
 			result = SCAN_PAGE_LRU;
+			folio_put(folio);
 			break;
 		}
 
-		if (!is_refcount_suitable(folio)) {
+		if (folio_expected_ref_count(folio) + 1 != folio_ref_count(folio)) {
 			result = SCAN_PAGE_COUNT;
+			folio_put(folio);
 			break;
 		}
 
@@ -2333,6 +2348,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
 		 */
 
 		present += folio_nr_pages(folio);
+		folio_put(folio);
 
 		if (need_resched()) {
 			xas_pause(&xas);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V3 2/2] mm/khugepaged: clean up refcount check using folio_expected_ref_count()
  2025-05-26 18:28 [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference Shivank Garg
@ 2025-05-26 18:28 ` Shivank Garg
  2025-05-27  3:28   ` Dev Jain
  2025-05-27  6:26   ` Baolin Wang
  2025-05-27  3:20 ` [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference Dev Jain
  2025-05-27  6:25 ` Baolin Wang
  2 siblings, 2 replies; 10+ messages in thread
From: Shivank Garg @ 2025-05-26 18:28 UTC (permalink / raw)
  To: akpm, david, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, dev.jain, fengwei.yin, shivankg, bharata

Use folio_expected_ref_count() instead of open-coded logic in
is_refcount_suitable(). This avoids code duplication and improves
clarity.

Drop is_refcount_suitable() as it is no longer needed.

Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
 mm/khugepaged.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fe1fe7eace54..685eb949f4ce 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -548,19 +548,6 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
 	}
 }
 
-static bool is_refcount_suitable(struct folio *folio)
-{
-	int expected_refcount = folio_mapcount(folio);
-
-	if (!folio_test_anon(folio) || folio_test_swapcache(folio))
-		expected_refcount += folio_nr_pages(folio);
-
-	if (folio_test_private(folio))
-		expected_refcount++;
-
-	return folio_ref_count(folio) == expected_refcount;
-}
-
 static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 					unsigned long address,
 					pte_t *pte,
@@ -652,7 +639,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		 * but not from this process. The other process cannot write to
 		 * the page, only trigger CoW.
 		 */
-		if (!is_refcount_suitable(folio)) {
+		if (folio_expected_ref_count(folio) != folio_ref_count(folio)) {
 			folio_unlock(folio);
 			result = SCAN_PAGE_COUNT;
 			goto out;
@@ -1402,7 +1389,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 		 * has excessive GUP pins (i.e. 512).  Anyway the same check
 		 * will be done again later the risk seems low.
 		 */
-		if (!is_refcount_suitable(folio)) {
+		if (folio_expected_ref_count(folio) != folio_ref_count(folio)) {
 			result = SCAN_PAGE_COUNT;
 			goto out_unmap;
 		}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference
  2025-05-26 18:28 [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference Shivank Garg
  2025-05-26 18:28 ` [PATCH V3 2/2] mm/khugepaged: clean up refcount check using folio_expected_ref_count() Shivank Garg
@ 2025-05-27  3:20 ` Dev Jain
  2025-05-27  7:48   ` David Hildenbrand
  2025-05-27  6:25 ` Baolin Wang
  2 siblings, 1 reply; 10+ messages in thread
From: Dev Jain @ 2025-05-27  3:20 UTC (permalink / raw)
  To: Shivank Garg, akpm, david, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca


On 26/05/25 11:58 pm, Shivank Garg wrote:
> hpage_collapse_scan_file() calls is_refcount_suitable(), which in turn
> calls folio_mapcount(). folio_mapcount() checks folio_test_large() before
> proceeding to folio_large_mapcount(), but there is a race window where the
> folio may get split/freed between these checks, triggering:
>
>    VM_WARN_ON_FOLIO(!folio_test_large(folio), folio)
>
> Take a temporary reference to the folio in hpage_collapse_scan_file().
> This stabilizes the folio during refcount check and prevents incorrect
> large folio detection due to concurrent split/free. Use helper
> folio_expected_ref_count() + 1 to compare with folio_ref_count()
> instead of using is_refcount_suitable().
>
> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
> Suggested-by: David Hildenbrand <david@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---

The patch looks fine.

I was just wondering about the implications of this on migration. Earlier
we had a refcount race between migration and shmem page fault via filemap_get_entry()
taking a reference and not releasing it till we take the folio lock, which was held
by the migration path. I would like to *think* that real workloads, when migrating
pages, will *not* be faulting on those pages simultaneously, just guessing. But now
we have a kernel thread (khugepaged) racing against migration. I may just be over-speculating.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/2] mm/khugepaged: clean up refcount check using folio_expected_ref_count()
  2025-05-26 18:28 ` [PATCH V3 2/2] mm/khugepaged: clean up refcount check using folio_expected_ref_count() Shivank Garg
@ 2025-05-27  3:28   ` Dev Jain
  2025-05-27  6:26   ` Baolin Wang
  1 sibling, 0 replies; 10+ messages in thread
From: Dev Jain @ 2025-05-27  3:28 UTC (permalink / raw)
  To: Shivank Garg, akpm, david, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, fengwei.yin, bharata


On 26/05/25 11:58 pm, Shivank Garg wrote:
> Use folio_expected_ref_count() instead of open-coded logic in
> is_refcount_suitable(). This avoids code duplication and improves
> clarity.
>
> Drop is_refcount_suitable() as it is no longer needed.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---

Acked-by: Dev Jain <dev.jain@arm.com>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference
  2025-05-26 18:28 [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference Shivank Garg
  2025-05-26 18:28 ` [PATCH V3 2/2] mm/khugepaged: clean up refcount check using folio_expected_ref_count() Shivank Garg
  2025-05-27  3:20 ` [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference Dev Jain
@ 2025-05-27  6:25 ` Baolin Wang
  2 siblings, 0 replies; 10+ messages in thread
From: Baolin Wang @ 2025-05-27  6:25 UTC (permalink / raw)
  To: Shivank Garg, akpm, david, linux-mm, linux-kernel
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca



On 2025/5/27 02:28, Shivank Garg wrote:
> hpage_collapse_scan_file() calls is_refcount_suitable(), which in turn
> calls folio_mapcount(). folio_mapcount() checks folio_test_large() before
> proceeding to folio_large_mapcount(), but there is a race window where the
> folio may get split/freed between these checks, triggering:
> 
>    VM_WARN_ON_FOLIO(!folio_test_large(folio), folio)
> 
> Take a temporary reference to the folio in hpage_collapse_scan_file().
> This stabilizes the folio during refcount check and prevents incorrect
> large folio detection due to concurrent split/free. Use helper
> folio_expected_ref_count() + 1 to compare with folio_ref_count()
> instead of using is_refcount_suitable().
> 
> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
> Suggested-by: David Hildenbrand <david@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Shivank Garg <shivankg@amd.com>

LGTM.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 2/2] mm/khugepaged: clean up refcount check using folio_expected_ref_count()
  2025-05-26 18:28 ` [PATCH V3 2/2] mm/khugepaged: clean up refcount check using folio_expected_ref_count() Shivank Garg
  2025-05-27  3:28   ` Dev Jain
@ 2025-05-27  6:26   ` Baolin Wang
  1 sibling, 0 replies; 10+ messages in thread
From: Baolin Wang @ 2025-05-27  6:26 UTC (permalink / raw)
  To: Shivank Garg, akpm, david, linux-mm, linux-kernel
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, fengwei.yin, bharata



On 2025/5/27 02:28, Shivank Garg wrote:
> Use folio_expected_ref_count() instead of open-coded logic in
> is_refcount_suitable(). This avoids code duplication and improves
> clarity.
> 
> Drop is_refcount_suitable() as it is no longer needed.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Shivank Garg <shivankg@amd.com>

Nice cleanup.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference
  2025-05-27  3:20 ` [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference Dev Jain
@ 2025-05-27  7:48   ` David Hildenbrand
  2025-05-27  8:06     ` Dev Jain
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-05-27  7:48 UTC (permalink / raw)
  To: Dev Jain, Shivank Garg, akpm, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca

On 27.05.25 05:20, Dev Jain wrote:
> 
> On 26/05/25 11:58 pm, Shivank Garg wrote:
>> hpage_collapse_scan_file() calls is_refcount_suitable(), which in turn
>> calls folio_mapcount(). folio_mapcount() checks folio_test_large() before
>> proceeding to folio_large_mapcount(), but there is a race window where the
>> folio may get split/freed between these checks, triggering:
>>
>>     VM_WARN_ON_FOLIO(!folio_test_large(folio), folio)
>>
>> Take a temporary reference to the folio in hpage_collapse_scan_file().
>> This stabilizes the folio during refcount check and prevents incorrect
>> large folio detection due to concurrent split/free. Use helper
>> folio_expected_ref_count() + 1 to compare with folio_ref_count()
>> instead of using is_refcount_suitable().
>>
>> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single value")
>> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>> ---
> 
> The patch looks fine.
> 
> I was just wondering about the implications of this on migration. Earlier
> we had a refcount race between migration and shmem page fault via filemap_get_entry()
> taking a reference and not releasing it till we take the folio lock, which was held
> by the migration path. I would like to *think* that real workloads, when migrating
> pages, will *not* be faulting on those pages simultaneously, just guessing. But now
> we have a kernel thread (khugepaged) racing against migration. I may just be over-speculating.

I'm not quite sure I understand the concern you have. Any temporary 
reference can temporarily block migration, however, the retry logic 
should be able to handle that just fine -- and this code is not really 
special (see filemap_get_entry()).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference
  2025-05-27  7:48   ` David Hildenbrand
@ 2025-05-27  8:06     ` Dev Jain
  2025-05-27  8:40       ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Dev Jain @ 2025-05-27  8:06 UTC (permalink / raw)
  To: David Hildenbrand, Shivank Garg, akpm, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca


On 27/05/25 1:18 pm, David Hildenbrand wrote:
> On 27.05.25 05:20, Dev Jain wrote:
>>
>> On 26/05/25 11:58 pm, Shivank Garg wrote:
>>> hpage_collapse_scan_file() calls is_refcount_suitable(), which in turn
>>> calls folio_mapcount(). folio_mapcount() checks folio_test_large() 
>>> before
>>> proceeding to folio_large_mapcount(), but there is a race window 
>>> where the
>>> folio may get split/freed between these checks, triggering:
>>>
>>>     VM_WARN_ON_FOLIO(!folio_test_large(folio), folio)
>>>
>>> Take a temporary reference to the folio in hpage_collapse_scan_file().
>>> This stabilizes the folio during refcount check and prevents incorrect
>>> large folio detection due to concurrent split/free. Use helper
>>> folio_expected_ref_count() + 1 to compare with folio_ref_count()
>>> instead of using is_refcount_suitable().
>>>
>>> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single 
>>> value")
>>> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
>>> Closes: 
>>> https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com 
>>>
>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>> Acked-by: David Hildenbrand <david@redhat.com>
>>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>>> ---
>>
>> The patch looks fine.
>>
>> I was just wondering about the implications of this on migration. 
>> Earlier
>> we had a refcount race between migration and shmem page fault via 
>> filemap_get_entry()
>> taking a reference and not releasing it till we take the folio lock, 
>> which was held
>> by the migration path. I would like to *think* that real workloads, 
>> when migrating
>> pages, will *not* be faulting on those pages simultaneously, just 
>> guessing. But now
>> we have a kernel thread (khugepaged) racing against migration. I may 
>> just be over-speculating.
>
> I'm not quite sure I understand the concern you have. Any temporary 
> reference can temporarily block migration, however, the retry logic 
> should be able to handle that just fine -- and this code is not really 
> special (see filemap_get_entry()).


You are correct that any temp ref can block migration, however, that 
reference has to come after the folios have been isolated in the 
migration path.

So the probability of someone taking a reference on the folio is quite 
low since it has been isolated. The problem with filemap_get_entry() is 
that it finds

the folio in the pagecache, so isolation is useless, then takes a 
reference and then shmem_get_folio_gfp() does a folio_lock() instead of 
folio_try_lock().

This was the race which I talked about an year back at [1]. My concern 
is that we are adding another candidate to that race; just wondering if 
there is

a better solution to fix the race mentioned in Shivank's patchset.


[1] https://lore.kernel.org/all/20240801081657.1386743-1-dev.jain@arm.com/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference
  2025-05-27  8:06     ` Dev Jain
@ 2025-05-27  8:40       ` David Hildenbrand
  2025-05-27  8:57         ` Dev Jain
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-05-27  8:40 UTC (permalink / raw)
  To: Dev Jain, Shivank Garg, akpm, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca

On 27.05.25 10:06, Dev Jain wrote:
> 
> On 27/05/25 1:18 pm, David Hildenbrand wrote:
>> On 27.05.25 05:20, Dev Jain wrote:
>>>
>>> On 26/05/25 11:58 pm, Shivank Garg wrote:
>>>> hpage_collapse_scan_file() calls is_refcount_suitable(), which in turn
>>>> calls folio_mapcount(). folio_mapcount() checks folio_test_large()
>>>> before
>>>> proceeding to folio_large_mapcount(), but there is a race window
>>>> where the
>>>> folio may get split/freed between these checks, triggering:
>>>>
>>>>      VM_WARN_ON_FOLIO(!folio_test_large(folio), folio)
>>>>
>>>> Take a temporary reference to the folio in hpage_collapse_scan_file().
>>>> This stabilizes the folio during refcount check and prevents incorrect
>>>> large folio detection due to concurrent split/free. Use helper
>>>> folio_expected_ref_count() + 1 to compare with folio_ref_count()
>>>> instead of using is_refcount_suitable().
>>>>
>>>> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single
>>>> value")
>>>> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
>>>> Closes:
>>>> https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com
>>>>
>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>>>> ---
>>>
>>> The patch looks fine.
>>>
>>> I was just wondering about the implications of this on migration.
>>> Earlier
>>> we had a refcount race between migration and shmem page fault via
>>> filemap_get_entry()
>>> taking a reference and not releasing it till we take the folio lock,
>>> which was held
>>> by the migration path. I would like to *think* that real workloads,
>>> when migrating
>>> pages, will *not* be faulting on those pages simultaneously, just
>>> guessing. But now
>>> we have a kernel thread (khugepaged) racing against migration. I may
>>> just be over-speculating.
>>
>> I'm not quite sure I understand the concern you have. Any temporary
>> reference can temporarily block migration, however, the retry logic
>> should be able to handle that just fine -- and this code is not really
>> special (see filemap_get_entry()).
> 
> 
> You are correct that any temp ref can block migration, however, that
> reference has to come after the folios have been isolated in the
> migration path.
> 
> So the probability of someone taking a reference on the folio is quite
> low since it has been isolated. The problem with filemap_get_entry() is
> that it finds
> 
> the folio in the pagecache, so isolation is useless, then takes a
> reference and then shmem_get_folio_gfp() does a folio_lock() instead of
> folio_try_lock().
> 
> This was the race which I talked about an year back at [1]. My concern
> is that we are adding another candidate to that race; just wondering if
> there is
> 
> a better solution to fix the race mentioned in Shivank's patchset.

Note that in this code here, we're not locking the folio just yet, we're 
only grabbing a reference for a very short time.

We will isolate+lock the folios in collapse_file(), where we also lock 
all slots in the pagecache.

The alternative would be to also lock all slots here, which is arguably 
worse.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference
  2025-05-27  8:40       ` David Hildenbrand
@ 2025-05-27  8:57         ` Dev Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Dev Jain @ 2025-05-27  8:57 UTC (permalink / raw)
  To: David Hildenbrand, Shivank Garg, akpm, linux-mm, linux-kernel
  Cc: ziy, baolin.wang, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, fengwei.yin, bharata, syzbot+2b99589e33edbe9475ca


On 27/05/25 2:10 pm, David Hildenbrand wrote:
> On 27.05.25 10:06, Dev Jain wrote:
>>
>> On 27/05/25 1:18 pm, David Hildenbrand wrote:
>>> On 27.05.25 05:20, Dev Jain wrote:
>>>>
>>>> On 26/05/25 11:58 pm, Shivank Garg wrote:
>>>>> hpage_collapse_scan_file() calls is_refcount_suitable(), which in 
>>>>> turn
>>>>> calls folio_mapcount(). folio_mapcount() checks folio_test_large()
>>>>> before
>>>>> proceeding to folio_large_mapcount(), but there is a race window
>>>>> where the
>>>>> folio may get split/freed between these checks, triggering:
>>>>>
>>>>>      VM_WARN_ON_FOLIO(!folio_test_large(folio), folio)
>>>>>
>>>>> Take a temporary reference to the folio in 
>>>>> hpage_collapse_scan_file().
>>>>> This stabilizes the folio during refcount check and prevents 
>>>>> incorrect
>>>>> large folio detection due to concurrent split/free. Use helper
>>>>> folio_expected_ref_count() + 1 to compare with folio_ref_count()
>>>>> instead of using is_refcount_suitable().
>>>>>
>>>>> Fixes: 05c5323b2a34 ("mm: track mapcount of large folios in single
>>>>> value")
>>>>> Reported-by: syzbot+2b99589e33edbe9475ca@syzkaller.appspotmail.com
>>>>> Closes:
>>>>> https://lore.kernel.org/all/6828470d.a70a0220.38f255.000c.GAE@google.com 
>>>>>
>>>>>
>>>>> Suggested-by: David Hildenbrand <david@redhat.com>
>>>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>>>>> ---
>>>>
>>>> The patch looks fine.
>>>>
>>>> I was just wondering about the implications of this on migration.
>>>> Earlier
>>>> we had a refcount race between migration and shmem page fault via
>>>> filemap_get_entry()
>>>> taking a reference and not releasing it till we take the folio lock,
>>>> which was held
>>>> by the migration path. I would like to *think* that real workloads,
>>>> when migrating
>>>> pages, will *not* be faulting on those pages simultaneously, just
>>>> guessing. But now
>>>> we have a kernel thread (khugepaged) racing against migration. I may
>>>> just be over-speculating.
>>>
>>> I'm not quite sure I understand the concern you have. Any temporary
>>> reference can temporarily block migration, however, the retry logic
>>> should be able to handle that just fine -- and this code is not really
>>> special (see filemap_get_entry()).
>>
>>
>> You are correct that any temp ref can block migration, however, that
>> reference has to come after the folios have been isolated in the
>> migration path.
>>
>> So the probability of someone taking a reference on the folio is quite
>> low since it has been isolated. The problem with filemap_get_entry() is
>> that it finds
>>
>> the folio in the pagecache, so isolation is useless, then takes a
>> reference and then shmem_get_folio_gfp() does a folio_lock() instead of
>> folio_try_lock().
>>
>> This was the race which I talked about an year back at [1]. My concern
>> is that we are adding another candidate to that race; just wondering if
>> there is
>>
>> a better solution to fix the race mentioned in Shivank's patchset.
>
> Note that in this code here, we're not locking the folio just yet, 
> we're only grabbing a reference for a very short time.


Ah now I notice that the reference is taken for a very short time.


>
> We will isolate+lock the folios in collapse_file(), where we also lock 
> all slots in the pagecache.
>
> The alternative would be to also lock all slots here, which is 
> arguably worse.


Makes sense.

Acked-by: Dev Jain <dev.jain@arm.com>




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-05-27  8:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-26 18:28 [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference Shivank Garg
2025-05-26 18:28 ` [PATCH V3 2/2] mm/khugepaged: clean up refcount check using folio_expected_ref_count() Shivank Garg
2025-05-27  3:28   ` Dev Jain
2025-05-27  6:26   ` Baolin Wang
2025-05-27  3:20 ` [PATCH V3 1/2] mm/khugepaged: fix race with folio split/free using temporary reference Dev Jain
2025-05-27  7:48   ` David Hildenbrand
2025-05-27  8:06     ` Dev Jain
2025-05-27  8:40       ` David Hildenbrand
2025-05-27  8:57         ` Dev Jain
2025-05-27  6:25 ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).