From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CD6B39447A
	for <linux-kernel@vger.kernel.org>; Wed, 25 Feb 2026 13:43:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1772027026; cv=none; b=uzbFExYfIqqznHujfvDlwe2WrExgUwjdB6X+j+4Yrby/yfRdwkYP6YRmxn+UVAe/smOAFFjJJdqE6XzgBJrQwXIUnlkDJn/G5ivUr4DXUsFE/l/lIADIXteyN9DvkyJSkgZFUMUA0W5wrlbwZXXBKe5mWGBSkvWsWqe1lJhutCA=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1772027026; c=relaxed/simple;
	bh=X1wEjjAYlFTfzQHK869ldggLOwWMHG36v14bBGfZGLY=;
	h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type:
	 Content-Disposition; b=qYLjvKNEzPfxlWmYLY9n6jo5Zhl+7dFruRuDkDRK7TlFsev1L/patVCmP1YjCPyDOLE/KeWzcnMHCX7dOsr+su2AWBk2ERCYTdrwimU5SU6v66YrRp6H7dlM9v8FvbtlctqMjIqTaLuJeSULOCtCTKgiHi4lb0SfPsJZJTEoEfU=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DxRkY5y2; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DxRkY5y2"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 56914C116D0;
	Wed, 25 Feb 2026 13:43:45 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1772027025;
	bh=X1wEjjAYlFTfzQHK869ldggLOwWMHG36v14bBGfZGLY=;
	h=Date:From:To:Cc:Subject:From;
	b=DxRkY5y25Z2rF2Mj744oKM+l9U11z/v+T6At6FKy1NeMZxChGv7OChrkDnEXeNSAc
	 92DX8Hym9WgdcYTTjFJOlBVnLSdIfDHmYf9WX3rcTWmYh8+mzXlo8+cmRZq210nRQB
	 l0IAqJrqSDjwd3oLn5nSCEr7iT01xh/RXL0uWR/8fej/mJN9WmITFx0//l2fBHf4yA
	 eYyrJ3JVRV/bGXvNqa9r+aVfaGQFzXQMm/RYpYFe6Tnqakf5mqAMMzp0aQhsVEFBs9
	 8sIqatsxDlYXdFgAcR/ynQ4yGXagN+oWjorXojaIlCcURiVdcg/xpYYjWPPLXrax9J
	 Zo1Xjb14qsiaA==
Date: Wed, 25 Feb 2026 08:43:44 -0500
From: Sasha Levin <sashal@kernel.org>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Hugh Dickins <hughd@google.com>, Zi Yan <ziy@nvidia.com>,
	Gavin Guo <gavinguo@igalia.com>
Subject: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full
 VMA range
Message-ID: <aZ78kFBEA2SjgT93@laps>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Disposition: inline

Hi,

I've been playing around with improvements to syzkaller locally, and hit the
following crash on v7.0-rc1:

   vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm ffff8881048e1780
   prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
   pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
   refcnt 1
   flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
   ------------[ cut here ]------------
   kernel BUG at mm/huge_memory.c:2999!
   Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
   CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G                 N  7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
   RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
   RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
   RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
   RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
   RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
   R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
   R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
   FS:  0000000000000000(0000) GS:ffff88816f701000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
   PKRU: 80000000
   Call Trace:
    <TASK>
    __split_huge_pmd+0x201/0x350
    unmap_page_range+0xa6a/0x3db0
    unmap_single_vma+0x14b/0x230
    unmap_vmas+0x28f/0x580
    exit_mmap+0x203/0xa80
    __mmput+0x11b/0x540
    mmput+0x81/0xa0
    do_exit+0x7b9/0x2c60
    do_group_exit+0xd5/0x2a0
    get_signal+0x1fdc/0x2340
    arch_do_signal_or_restart+0x93/0x790
    exit_to_user_mode_loop+0x84/0x480
    do_syscall_64+0x4df/0x700
    entry_SYSCALL_64_after_hwframe+0x77/0x7f
    </TASK>
   Kernel panic - not syncing: Fatal exception

The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
a 136KB region starting 816KB past the PMD base.

---

The following analysis was performed with the help of an LLM:

The crash path is:

   exit_mmap -> unmap_vmas -> unmap_page_range -> zap_pmd_range
     -> sees pmd_is_huge(*pmd) is true
     -> range doesn't cover full HPAGE_PMD_SIZE
     -> calls __split_huge_pmd()
       -> haddr = address & HPAGE_PMD_MASK = 0x555580c00000
       -> __split_huge_pmd_locked()
         -> VM_BUG_ON_VMA(vma->vm_start > haddr) fires
            because 0x555580cc0000 > 0x555580c00000

The root cause appears to be remove_migration_pmd() (mm/huge_memory.c:4906).
This function reinstalls a huge PMD via set_pmd_at() after migration
completes, but it never checks whether the VMA still covers the full
PMD-aligned 2MB range.

Every other code path that installs a huge PMD validates VMA boundaries:

   - do_huge_pmd_anonymous_page(): thp_vma_suitable_order()
   - collapse_huge_page():         hugepage_vma_revalidate()
   - MADV_COLLAPSE:                hugepage_vma_revalidate()
   - do_set_pmd() (shmem/tmpfs):   thp_vma_suitable_order()

remove_migration_pmd() checks none of these.

The suspected race window is:

   1. VMA [A, A+2MB) has a THP. Migration starts, PMD becomes a migration
      entry.

   2. Concurrently, __split_vma() runs under mmap_write_lock. It calls
      vma_adjust_trans_huge() which acquires the PMD lock, splits the PMD
      migration entry into 512 PTE migration entries, and releases the PMD
      lock. Then VMA boundaries are modified (e.g., vma->vm_start = A+X).

   3. remove_migration_ptes() runs via rmap_walk_anon() WITHOUT mmap_lock
      (only the anon_vma lock). page_vma_mapped_walk() acquires the PMD
      lock. If it wins the lock BEFORE step 2's split, it finds the PMD
      migration entry still intact and returns with pvmw->pte == NULL.

   4. remove_migration_pmd() then reinstalls the huge PMD via set_pmd_at()
      without checking that the VMA (whose boundaries may have already been
      modified in step 2) still covers the full PMD range.

   5. Later, exit_mmap -> unmap_page_range -> zap_pmd_range encounters the
      huge PMD, calls __split_huge_pmd(), and the VM_BUG_ON_VMA fires
      because vma->vm_start no longer aligns with the PMD base.

The fix should add a VMA boundary check in remove_migration_pmd(). If
haddr < vma->vm_start or haddr + HPAGE_PMD_SIZE > vma->vm_end, the
function should split the PMD migration entry into PTE-level migration
entries instead of reinstalling the huge PMD, allowing PTE-level removal
to handle each page individually.

-- 
Thanks,
Sasha