From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C04A356741
	for <linux-kernel@vger.kernel.org>; Wed, 13 May 2026 11:02:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1778670156; cv=none; b=qNGffIbo20Yo+79i8G3qgOtoq4whhrFtsXNiGRHIcmCbhJZoK25Hv9uVAPXXEBBEC0BsYv+NFssmKJTKvRIZJm1ppFXAJfgkwlU4AmjsWgRznj/jZvPe1ynm8MBaB4gD0ujPgWAilBFYiAGjXMyMihusOFJjN3EYnosKZQKUFvg=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1778670156; c=relaxed/simple;
	bh=nnZHtK4i4FoXFTFVrb7e0AfQto80JKRsQyrHKmhMWPs=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=opoYKUrJElr8aWIeXP59wlx2bN00goY5WBjl3SvUYogSo+4tsExKgvJxPwI29l7deA5psGKKC8fIMYKmAps4Th8VxbW6QiAcel8foM7zb+SdgDbg29EJ0LOeifbG1ezHfAYp8eLV+xcdRMJcXE+QPDadkHi3xeOKdC1KCj6Y3pg=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ipHxZ5KS; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ipHxZ5KS"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8B9C8C2BCB7;
	Wed, 13 May 2026 11:02:34 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1778670156;
	bh=nnZHtK4i4FoXFTFVrb7e0AfQto80JKRsQyrHKmhMWPs=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=ipHxZ5KSIEFkUOvp6RWk9GYNLe/OWxCv+/2HYW4xTh1FIudx3ZTHkWbsCh2Bk9Lqh
	 v+WlSXvJy/aT5HKZnYCnmCykrEpsgBLlh8/GUDQN/R3ho7Hv8+QgJY5q1hv6h/7ffO
	 zBm5lFnwm+Cp38kL1tuM1kPERHEpzrRKI+DvY2LxJUJ3m04mIB+z4VMsMQNQwP7X8o
	 aa46i/1teBeEnJmJy7QN5u188xMZwGGx39HJJfmBCqWp6n1e4z6Nj3tSmuTGKbnCTz
	 sxZqdi1SYZW7kmgdZOUyxjnrusLL9czCk0hm40Wn5zV4zUCDi6FXQrPZsIURw+eXyJ
	 b0eHqzzzCLAOg==
Date: Wed, 13 May 2026 12:02:31 +0100
From: Lorenzo Stoakes <ljs@kernel.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, 
	Muchun Song <muchun.song@linux.dev>, Oscar Salvador <osalvador@suse.de>, Jann Horn <jannh@google.com>, 
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH mm-hotfixes] mm/hugetlb: avoid false positive lockdep
 assertion
Message-ID: <agRZ_DdOHzDELWOg@lucifer>
References: <20260513085658.45264-1-ljs@kernel.org>
 <291cd4df-7c52-426e-a8cc-b0cf77654c52@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <291cd4df-7c52-426e-a8cc-b0cf77654c52@kernel.org>

On Wed, May 13, 2026 at 12:15:23PM +0200, David Hildenbrand (Arm) wrote:
> On 5/13/26 10:56, Lorenzo Stoakes wrote:
> > Commit 081056dc00a2 ("mm/hugetlb: unshare page tables during VMA split, not
> > before") changed the locking model around hugetlbfs PMD unsharing on VMA
> > split, but did not update the function which asserts the locks,
> > hugetlb_vma_assert_locked().
> >
> > This function asserts that either the hugetlb VMA lock is held (if a shared
> > mapping) or that the reservation map lock is held (if private).
> >
> > If you get an unfortunate race between something which results in one of
> > these locks being released and a hugetlb split and you have CONFIG_LOCKDEP
>
> "hugetlb split": I assume you used that terminology because of hugetlb_split().
> Which is all just rather nasty #justhugetlbthings
>
> "hugetlb VMA split" is probably easier to get.

Yeah another one of those overloaded terms :>)

Andrew - do you mind doing s/hugetlb split/hugetlb VMA split/? Thanks!

>
> > enabled, you can therefore see a false positive assertion arise when there
> > is in fact no issue.
> >
> > Since this change introduced a new take_locks parameter to
> > hugetlb_unshare_pmds(), which, when set to false, indicates that locking is
> > sufficient, simply pass this to the unsharing logic and predicate the
> > lock assertions on this.
> >
> > This is safe, as we already asserted the file rmap lock and the VMA write
> > lock prior to this (implying exclusive mmap write lock), so we cannot be
> > raced by either rmap or page fault page table walkers which the asserted
> > locks are intended to protect against (we don't mind GUP-fast).
> >
> > Separate out huge_pmd_unshare() into __huge_pmd_unshare() to add a
> > check_locks parameter, and update hugetlb_unshare_pmds() to pass this
> > parameter to it.
> >
> > This leaves all other callers of huge_pmd_unshare() still correctly
> > asserting the locks.
> >
> > The below reproducer will trigger the assert in a kernel with
> > CONFIG_LOCKDEP enabled by racing process teardown (which will release the
> > hugetlb lock) against a hugetlb split.
> >
> > void execute_one(void)
> > {
> > 	void *ptr;
> > 	pid_t pid;
> >
> > 	/*
> > 	 * Create a hugetlb mapping spanning a PUD entry.
> > 	 *
> > 	 * We force the hugetlb page allocation with populate and
> > 	 * noreserve.
> > 	 *
> > 	 * |---------------------|
> > 	 * |                     |
> > 	 * |---------------------|
> > 	 * 0                 PUD boundary
> > 	 */
> > 	ptr = mmap(0, PUD_SIZE, PROT_READ | PROT_WRITE,
> > 		   MAP_FIXED | MAP_SHARED | MAP_ANON |
> > 		   MAP_NORESERVE | MAP_HUGETLB | MAP_POPULATE,
> > 		   -1, 0);
> > 	if (ptr == MAP_FAILED) {
> > 		perror("mmap");
> > 		exit(EXIT_FAILURE);
> > 	}
> >
> > 	/*
> > 	 * Fork but with a bogus stack pointer so we try to execute code in
> > 	 * a non-VM_EXEC VMA, causing segfault + teardown via exit_mmap().
> > 	 *
> > 	 * The clone will cause PMD page table sharing between the
> > 	 * processes first via:
> > 	 * copy_process() -> ... -> huge_pte_alloc() -> huge_pmd_share()
> > 	 *
> > 	 * Then tear down and release the hugetlb 'VMA' lock via:
> > 	 * exit_mmap() -> ... -> vma_close() -> hugetlb_vma_lock_free()
> > 	 */
> > 	pid = syscall(__NR_clone, 0, 2 * PMD_SIZE, 0, 0, 0);
> > 	if (pid < 0) {
> > 		perror("clone");
> > 		exit(EXIT_FAILURE);
> > 	} if (pid == 0) {
> > 		/* Pop stack... */
> > 		return;
> > 	}
> >
> > 	/*
> > 	 * We are the parent process.
> > 	 *
> > 	 * Race the child process's teardown with a PMD unshare.
> > 	 *
> > 	 * We do this by triggering:
> > 	 *
> > 	 * __split_vma() -> hugetlb_split() -> hugetlb_unshare_pmds()
> > 	 *
> > 	 * Which, importantly, doesn't hold the hugetlb VMA lock (nor can
> > 	 * it), meaning we assert in hugetlb_vma_assert_locked().
> > 	 *
> > 	 *            .
> > 	 * |----------.----------|
> > 	 * |          .          |
> > 	 * |----------.----------|
> > 	 * 0          .     PUD boundary
> > 	 */
> > 	mmap(0, PUD_SIZE / 2, PROT_READ | PROT_WRITE,
> > 	     MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0);
> > }
> >
> > int main(void)
> > {
> > 	int i;
> >
> > 	/* Kick off fork children. */
> > 	for (i = 0; i < NUM_FORKS; i++) {
> > 		pid_t pid = fork();
> >
> > 		if (pid < 0) {
> > 			perror("fork");
> > 			exit(EXIT_FAILURE);
> > 		}
> >
> > 		/* Fork children do their work and exit. */
> > 		if (!pid) {
> > 			int j;
> >
> > 			for (j = 0; j < NUM_ITERS; j++)
> > 				execute_one();
> > 			return EXIT_SUCCESS;
> > 		}
> > 	}
> >
> > 	/* If we succeeded, wait on children. */
> > 	for (i = 0; i < NUM_FORKS; i++)
> > 		wait(NULL);
> >
> > 	return EXIT_SUCCESS;
> > }
> >
> > Fixes: 081056dc00a2 ("mm/hugetlb: unshare page tables during VMA split, not before")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
> > ---
>
> LGTM, all rather nasty with "take_locks" parameters ...

Yeah it is, but since that's already there, I guess this is the easiest way to
do it! :)

>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>

Cheers!

>
> --
> Cheers,
>
> David