Re: [RESEND PATCH v5] arm64: Enable vmalloc-huge with ptdump

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Will Deacon <will@kernel.org>
To: Dev Jain <dev.jain@arm.com>
Cc: catalin.marinas@arm.com, anshuman.khandual@arm.com,
	quic_zhenhuah@quicinc.com, ryan.roberts@arm.com,
	kevin.brodsky@arm.com, yangyicong@hisilicon.com,
	joey.gouly@arm.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, david@redhat.com,
	mark.rutland@arm.com, urezki@gmail.com, jthoughton@google.com
Subject: Re: [RESEND PATCH v5] arm64: Enable vmalloc-huge with ptdump
Date: Tue, 16 Sep 2025 11:30:26 +0100	[thread overview]
Message-ID: <aMk8QhkumtEoPVTh@willie-the-truck> (raw)
In-Reply-To: <20250723161827.15802-1-dev.jain@arm.com>

Hi Dev,

On Wed, Jul 23, 2025 at 09:48:27PM +0530, Dev Jain wrote:
> @@ -1301,16 +1319,76 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
>  	}
>  
>  	table = pmd_offset(pudp, addr);
> +
> +	/*
> +	 * Our objective is to prevent ptdump from reading a PMD table which has
> +	 * been freed.  Assume that ptdump_walk_pgd() (call this thread T1)
> +	 * executes completely on CPU1 and pud_free_pmd_page() (call this thread
> +	 * T2) executes completely on CPU2. Let the region sandwiched by the
> +	 * mmap_write_lock/unlock in T1 be called CS (the critical section).
> +	 *
> +	 * Claim: The CS of T1 will never operate on a freed PMD table.
> +	 *
> +	 * Proof:
> +	 *
> +	 * Case 1: The static branch is visible to T2.
> +	 *
> +	 * Case 1 (a): T1 acquires the lock before T2 can.
> +	 * T2 will block until T1 drops the lock, so pmd_free() will only be
> +	 * executed after T1 exits CS.
> +	 *
> +	 * Case 1 (b): T2 acquires the lock before T1 can.
> +	 * The sequence of barriers issued in __flush_tlb_kernel_pgtable()
> +	 * ensures that an empty PUD (via pud_clear()) is visible to T1 before
> +	 * T1 can enter CS, therefore it is impossible for the CS to get hold
> +	 * of the address of the isolated PMD table.
> +	 *
> +	 * Case 2: The static branch is not visible to T2.
> +	 *
> +	 * Since static_branch_enable() (via dmb(ish)) and mmap_write_lock()
> +	 * have acquire semantics, it is guaranteed that the static branch
> +	 * will be visible to all CPUs before T1 can enter CS. The static
> +	 * branch not being visible to T2 therefore guarantees that T1 has
> +	 * not yet entered CS .... (i)
> +	 * The sequence of barriers via __flush_tlb_kernel_pgtable() in T2
> +	 * implies that if the invisibility of the static branch has been
> +	 * observed by T2 (i.e static_branch_unlikely() is observed as false),
> +	 * then all CPUs will have observed an empty PUD ... (ii)
> +	 * Combining (i) and (ii), we conclude that T1 observes an empty PUD
> +	 * before entering CS => it is impossible for the CS to get hold of
> +	 * the address of the isolated PMD table. Q.E.D
> +	 *
> +	 * We have proven that the claim is true on the assumption that
> +	 * there is no context switch for T1 and T2. Note that the reasoning
> +	 * of the proof uses barriers operating on the inner shareable domain,
> +	 * which means that they will affect all CPUs, and also a context switch
> +	 * will insert extra barriers into the code paths => the claim will
> +	 * stand true even if we drop the assumption.
> +	 *
> +	 * It is also worth reasoning whether something can go wrong via
> +	 * pud_free_pmd_page() -> __pmd_free_pte_page(), since the latter
> +	 * will be called locklessly on this code path.
> +	 *
> +	 * For Case 1 (a), T2 will block until CS is finished, so we are safe.
> +	 * For Case 1 (b) and Case 2, the PMD table will be isolated before
> +	 * T1 can enter CS, therefore it is safe for T2 to operate on the
> +	 * PMD table locklessly.
> +	 */

Although I can see that you put a lot of effort into this comment, I
think we should just remove it. Instead, we should have a litmus test
in the commit message and probably just some small comments here to
explain e.g. why the mmap_read_lock() critical section is empty.

I'm currently trying to put together a litmus test with James (cc'd) so
maybe we can help you out with that part.

In the meantime...

> diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c
> index 421a5de806c6..65335c7ba482 100644
> --- a/arch/arm64/mm/ptdump.c
> +++ b/arch/arm64/mm/ptdump.c
> @@ -283,6 +283,13 @@ void note_page_flush(struct ptdump_state *pt_st)
>  	note_page(pt_st, 0, -1, pte_val(pte_zero));
>  }
>  
> +static void arm64_ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
> +{
> +	static_branch_enable(&arm64_ptdump_lock_key);
> +	ptdump_walk_pgd(st, mm, NULL);
> +	static_branch_disable(&arm64_ptdump_lock_key);
> +}

What serialises the toggling of the static key here? For example, I can't
see what prevents a kernel page-table dump running concurrently with a
check_wx() and the key ending up in the wrong state.

Either we need an additional lock or perhaps using
static_branch_{inc,dec}() would work instead? I haven't thought too hard
about that but it looks like we need _something_.

Will

next prev parent reply	other threads:[~2025-09-16 10:30 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-23 16:18 [RESEND PATCH v5] arm64: Enable vmalloc-huge with ptdump Dev Jain
2025-07-24  5:50 ` Anshuman Khandual
2025-07-24  7:20   ` Dev Jain
2025-07-30 17:00 ` Catalin Marinas
2025-07-30 18:29   ` Ryan Roberts
2025-07-31  4:30     ` Dev Jain
2025-07-31 11:38       ` Catalin Marinas
2025-07-31  7:12   ` Dev Jain
2025-07-31 17:06     ` Catalin Marinas
2025-08-01 12:15       ` Dev Jain
2025-08-01 15:48         ` Catalin Marinas
2025-09-16 10:30 ` Will Deacon [this message]
2025-09-17 15:43   ` Will Deacon
2025-09-19  7:27     ` Will Deacon
2025-09-19  7:46       ` Dev Jain
2025-09-19 10:28     ` Dev Jain
2025-09-19 10:59       ` Will Deacon
2025-09-19 12:14         ` Dev Jain
2025-09-19 12:26           ` Will Deacon
2025-09-19  7:53   ` Dev Jain
2025-09-19  8:11     ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMk8QhkumtEoPVTh@willie-the-truck \
    --to=will@kernel.org \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=jthoughton@google.com \
    --cc=kevin.brodsky@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=quic_zhenhuah@quicinc.com \
    --cc=ryan.roberts@arm.com \
    --cc=urezki@gmail.com \
    --cc=yangyicong@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox