All of lore.kernel.org
 help / color / mirror / Atom feed
From: Harry Yoo <harry.yoo@oracle.com>
To: Lance Yang <ioworker0@gmail.com>
Cc: akpm@linux-foundation.org, david@redhat.com, 21cnbao@gmail.com,
	baolin.wang@linux.alibaba.com, chrisl@kernel.org,
	kasong@tencent.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com,
	ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org,
	huang.ying.caritas@gmail.com, zhengtangquan@oppo.com,
	riel@surriel.com, Liam.Howlett@oracle.com, vbabka@suse.cz,
	mingzhe.yang@ly.com, stable@vger.kernel.org,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>
Subject: Re: [PATCH v4 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap
Date: Mon, 7 Jul 2025 14:40:18 +0900	[thread overview]
Message-ID: <aGtdwn0bLlO2FzZ6@harry> (raw)
In-Reply-To: <20250701143100.6970-1-lance.yang@linux.dev>

On Tue, Jul 01, 2025 at 10:31:00PM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
> 
> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
> may read past the end of a PTE table when a large folio's PTE mappings
> are not fully contained within a single page table.
> 
> While this scenario might be rare, an issue triggerable from userspace must
> be fixed regardless of its likelihood. This patch fixes the out-of-bounds
> access by refactoring the logic into a new helper, folio_unmap_pte_batch().
> 
> The new helper correctly calculates the safe batch size by capping the scan
> at both the VMA and PMD boundaries. To simplify the code, it also supports
> partial batching (i.e., any number of pages from 1 up to the calculated
> safe maximum), as there is no strong reason to special-case for fully
> mapped folios.
> 
> [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
> 
> Cc: <stable@vger.kernel.org>
> Reported-by: David Hildenbrand <david@redhat.com>
> Closes: https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
> Suggested-by: Barry Song <baohua@kernel.org>
> Acked-by: Barry Song <baohua@kernel.org>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---

LGTM,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

With a minor comment below.

> diff --git a/mm/rmap.c b/mm/rmap.c
> index fb63d9256f09..1320b88fab74 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  			hugetlb_remove_rmap(folio);
>  		} else {
>  			folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
> -			folio_ref_sub(folio, nr_pages - 1);
>  		}
>  		if (vma->vm_flags & VM_LOCKED)
>  			mlock_drain_local();
> -		folio_put(folio);
> -		/* We have already batched the entire folio */
> -		if (nr_pages > 1)
> +		folio_put_refs(folio, nr_pages);
> +
> +		/*
> +		 * If we are sure that we batched the entire folio and cleared
> +		 * all PTEs, we can just optimize and stop right here.
> +		 */
> +		if (nr_pages == folio_nr_pages(folio))
>  			goto walk_done;

Just a minor comment.

We should probably teach page_vma_mapped_walk() to skip nr_pages pages,
or just rely on next_pte: do { ... } while (pte_none(ptep_get(pvmw->pte)))
loop in page_vma_mapped_walk() to skip those ptes?

Taking different paths depending on (nr_pages == folio_nr_pages(folio))
doesn't seem sensible.

>  		continue;

-- 
Cheers,
Harry / Hyeonggon


WARNING: multiple messages have this Message-ID (diff)
From: Harry Yoo <harry.yoo@oracle.com>
To: Lance Yang <ioworker0@gmail.com>
Cc: akpm@linux-foundation.org, david@redhat.com, 21cnbao@gmail.com,
	baolin.wang@linux.alibaba.com, chrisl@kernel.org,
	kasong@tencent.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com,
	ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org,
	huang.ying.caritas@gmail.com, zhengtangquan@oppo.com,
	riel@surriel.com, Liam.Howlett@oracle.com, vbabka@suse.cz,
	mingzhe.yang@ly.com, stable@vger.kernel.org,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>
Subject: Re: [PATCH v4 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap
Date: Mon, 7 Jul 2025 14:40:18 +0900	[thread overview]
Message-ID: <aGtdwn0bLlO2FzZ6@harry> (raw)
In-Reply-To: <20250701143100.6970-1-lance.yang@linux.dev>

On Tue, Jul 01, 2025 at 10:31:00PM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
> 
> As pointed out by David[1], the batched unmap logic in try_to_unmap_one()
> may read past the end of a PTE table when a large folio's PTE mappings
> are not fully contained within a single page table.
> 
> While this scenario might be rare, an issue triggerable from userspace must
> be fixed regardless of its likelihood. This patch fixes the out-of-bounds
> access by refactoring the logic into a new helper, folio_unmap_pte_batch().
> 
> The new helper correctly calculates the safe batch size by capping the scan
> at both the VMA and PMD boundaries. To simplify the code, it also supports
> partial batching (i.e., any number of pages from 1 up to the calculated
> safe maximum), as there is no strong reason to special-case for fully
> mapped folios.
> 
> [1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
> 
> Cc: <stable@vger.kernel.org>
> Reported-by: David Hildenbrand <david@redhat.com>
> Closes: https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
> Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
> Suggested-by: Barry Song <baohua@kernel.org>
> Acked-by: Barry Song <baohua@kernel.org>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---

LGTM,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

With a minor comment below.

> diff --git a/mm/rmap.c b/mm/rmap.c
> index fb63d9256f09..1320b88fab74 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>  			hugetlb_remove_rmap(folio);
>  		} else {
>  			folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
> -			folio_ref_sub(folio, nr_pages - 1);
>  		}
>  		if (vma->vm_flags & VM_LOCKED)
>  			mlock_drain_local();
> -		folio_put(folio);
> -		/* We have already batched the entire folio */
> -		if (nr_pages > 1)
> +		folio_put_refs(folio, nr_pages);
> +
> +		/*
> +		 * If we are sure that we batched the entire folio and cleared
> +		 * all PTEs, we can just optimize and stop right here.
> +		 */
> +		if (nr_pages == folio_nr_pages(folio))
>  			goto walk_done;

Just a minor comment.

We should probably teach page_vma_mapped_walk() to skip nr_pages pages,
or just rely on next_pte: do { ... } while (pte_none(ptep_get(pvmw->pte)))
loop in page_vma_mapped_walk() to skip those ptes?

Taking different paths depending on (nr_pages == folio_nr_pages(folio))
doesn't seem sensible.

>  		continue;

-- 
Cheers,
Harry / Hyeonggon

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  parent reply	other threads:[~2025-07-07  5:43 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-01 14:31 [PATCH v4 1/1] mm/rmap: fix potential out-of-bounds page table access during batched unmap Lance Yang
2025-07-01 14:31 ` Lance Yang
2025-07-01 21:17 ` Andrew Morton
2025-07-01 21:17   ` Andrew Morton
2025-07-02  1:29   ` Lance Yang
2025-07-02  1:29     ` Lance Yang
2025-07-07  5:40 ` Harry Yoo [this message]
2025-07-07  5:40   ` Harry Yoo
2025-07-07  9:13   ` Lance Yang
2025-07-07  9:13     ` Lance Yang
2025-07-07 15:40   ` Barry Song
2025-07-07 15:40     ` Barry Song
2025-07-08  8:19     ` Harry Yoo
2025-07-08  8:19       ` Harry Yoo
2025-07-16 15:21 ` patchwork-bot+linux-riscv
2025-07-16 15:21   ` patchwork-bot+linux-riscv

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGtdwn0bLlO2FzZ6@harry \
    --to=harry.yoo@oracle.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=huang.ying.caritas@gmail.com \
    --cc=ioworker0@gmail.com \
    --cc=kasong@tencent.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mingzhe.yang@ly.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=stable@vger.kernel.org \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    --cc=zhengtangquan@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.