public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Kiryl Shutsemau <kas@kernel.org>
To: Usama Arif <usama.arif@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	david@kernel.org,  Lorenzo Stoakes <ljs@kernel.org>,
	willy@infradead.org, linux-mm@kvack.org, fvdl@google.com,
	 hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev,
	baohua@kernel.org,  dev.jain@arm.com,
	baolin.wang@linux.alibaba.com, npache@redhat.com,
	 Liam.Howlett@oracle.com, ryan.roberts@arm.com,
	Vlastimil Babka <vbabka@kernel.org>,
	 lance.yang@linux.dev, linux-kernel@vger.kernel.org,
	kernel-team@meta.com,  maddy@linux.ibm.com, mpe@ellerman.id.au,
	linuxppc-dev@lists.ozlabs.org,  hca@linux.ibm.com,
	gor@linux.ibm.com, agordeev@linux.ibm.com,
	 borntraeger@linux.ibm.com, svens@linux.ibm.com,
	linux-s390@vger.kernel.org
Subject: Re: [v3 05/24] mm: thp: handle split failure in zap_pmd_range()
Date: Mon, 30 Mar 2026 14:13:31 +0000	[thread overview]
Message-ID: <acqC010JLTfjHF0y@thinkstation> (raw)
In-Reply-To: <20260327021403.214713-6-usama.arif@linux.dev>

On Thu, Mar 26, 2026 at 07:08:47PM -0700, Usama Arif wrote:
> zap_pmd_range() splits a huge PMD when the zap range doesn't cover the
> full PMD (partial unmap).  If the split fails, the PMD stays huge.
> Falling through to zap_pte_range() would dereference the huge PMD entry
> as a PTE page table pointer.
> 
> Skip the range covered by the PMD on split failure instead.

Ughh... This is hacky as hell.

> The skip is safe across all call paths into zap_pmd_range():
> 
> - exit_mmap() and OOM reaper: the zap range covers entire VMAs, so
>   every PMD is fully covered (next - addr == HPAGE_PMD_SIZE).  The
>   zap_huge_pmd() branch handles these without splitting.  The split
>   failure path is unreachable.
> 
> - munmap / mmap overlay: vma_adjust_trans_huge() (called from
>   __split_vma) splits any PMD straddling the VMA boundary before the
>   VMA is split.  If that PMD split fails, __split_vma() returns
>   -ENOMEM and the munmap is aborted before reaching zap_pmd_range().
>   The split failure path is unreachable.
> 
> - MADV_DONTNEED: advisory hint, the kernel is allowed to ignore it.
>   The pages remain valid and accessible.  A subsequent access returns
>   existing data without faulting.

Em, no. MADV_DONTNEED users expect memory to be zeroed after the
"advise" is complete. At very least you need to zero the skipped range.

And are you sure that the list of users is complete?

I am also worried about a possible new user that is not aware about this
skip-on-split-failure semantics.

I think it hast o be opt-in. Maybe a ZAP_FLAG_WHATEVER?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


  reply	other threads:[~2026-03-30 14:13 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27  2:08 [v3 00/24] mm: thp: lazy PTE page table allocation at PMD split time Usama Arif
2026-03-27  2:08 ` [v3 01/24] mm: thp: make split_huge_pmd functions return int for error propagation Usama Arif
2026-03-27  2:08 ` [v3 02/24] mm: thp: propagate split failure from vma_adjust_trans_huge() Usama Arif
2026-03-27  2:08 ` [v3 03/24] mm: thp: handle split failure in copy_huge_pmd() Usama Arif
2026-03-27  2:08 ` [v3 04/24] mm: thp: handle split failure in do_huge_pmd_wp_page() Usama Arif
2026-03-27  2:08 ` [v3 05/24] mm: thp: handle split failure in zap_pmd_range() Usama Arif
2026-03-30 14:13   ` Kiryl Shutsemau [this message]
2026-03-30 15:09     ` David Hildenbrand (Arm)
2026-03-27  2:08 ` [v3 06/24] mm: thp: handle split failure in wp_huge_pmd() Usama Arif
2026-03-27  2:08 ` [v3 07/24] mm: thp: retry on split failure in change_pmd_range() Usama Arif
2026-03-30 14:27   ` Kiryl Shutsemau
2026-03-27  2:08 ` [v3 08/24] mm: thp: handle split failure in follow_pmd_mask() Usama Arif
2026-03-27  2:08 ` [v3 09/24] mm: handle walk_page_range() failure from THP split Usama Arif
2026-03-27  2:08 ` [v3 10/24] mm: thp: handle split failure in mremap move_page_tables() Usama Arif
2026-03-27  2:08 ` [v3 11/24] mm: thp: handle split failure in userfaultfd move_pages() Usama Arif
2026-03-27  2:08 ` [v3 12/24] mm: thp: handle split failure in device migration Usama Arif
2026-03-27  2:08 ` [v3 13/24] mm: proc: handle split_huge_pmd failure in pagemap_scan Usama Arif
2026-03-27  2:08 ` [v3 14/24] powerpc/mm: handle split_huge_pmd failure in subpage_prot Usama Arif
2026-03-27  2:08 ` [v3 15/24] fs/dax: handle split_huge_pmd failure in dax_iomap_pmd_fault Usama Arif
2026-03-27  2:08 ` [v3 16/24] mm: huge_mm: Make sure all split_huge_pmd calls are checked Usama Arif
2026-03-30 14:41   ` Kiryl Shutsemau
2026-03-27  2:08 ` [v3 17/24] mm: thp: allocate PTE page tables lazily at split time Usama Arif
2026-03-27  2:09 ` [v3 18/24] mm: thp: remove pgtable_trans_huge_{deposit/withdraw} when not needed Usama Arif
2026-03-27  2:09 ` [v3 19/24] mm: thp: add THP_SPLIT_PMD_FAILED counter Usama Arif
2026-03-27  2:09 ` [v3 20/24] selftests/mm: add THP PMD split test infrastructure Usama Arif
2026-03-27  2:09 ` [v3 21/24] selftests/mm: add partial_mprotect test for change_pmd_range Usama Arif
2026-03-27  2:09 ` [v3 22/24] selftests/mm: add partial_mlock test Usama Arif
2026-03-27  2:09 ` [v3 23/24] selftests/mm: add partial_mremap test for move_page_tables Usama Arif
2026-03-27  2:09 ` [v3 24/24] selftests/mm: add madv_dontneed_partial test Usama Arif
2026-03-27  8:51 ` [v3 00/24] mm: thp: lazy PTE page table allocation at PMD split time David Hildenbrand (Arm)
2026-03-27  9:25   ` Lorenzo Stoakes (Oracle)
2026-03-27 14:40     ` Usama Arif
2026-03-27 14:34   ` Usama Arif
2026-04-05 23:34 ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acqC010JLTfjHF0y@thinkstation \
    --to=kas@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=fvdl@google.com \
    --cc=gor@linux.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hca@linux.ibm.com \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=ljs@kernel.org \
    --cc=maddy@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=svens@linux.ibm.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox