From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-170.mta1.migadu.com (out-170.mta1.migadu.com [95.215.58.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DDBA227B35B for ; Sun, 18 Jan 2026 08:39:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768725582; cv=none; b=rmPKhURFkRIDbWR4dvvP4DAlkYVEPKZ93Tt1596zUMVe6gwYXFJx5WfI8GMfpYEi8OAxB/PqDrtGHaHll5+hQhR5GzXbq7kF+SKdGWUMqVw3dU7iKaLq2PDySsb4tJ1ELfNMuAqED9KBq+UYCT/h6FR3s3Ex1wMOQNq3CDu+Ius= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768725582; c=relaxed/simple; bh=0HUioRRFEvhrz209yNLDcVtajU0SySNvFymGe3q5uaQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=izlj20tEVh6bsPA+b3dWhAdwJjzp5gkpItjOFKkQf2KVgTT5U7r8tezR6SoyNtcgeeo0o9O8h3Dtb8v0ut5JoacGx5U+ZdRQwpj/M6DjoiXVBnrPx25Va35NcVWJcolXzS24l16P0fuw1JoyXKb1IPIuNeZJkHu4b5VpL+nLC48= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=elxkmbIR; arc=none smtp.client-ip=95.215.58.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="elxkmbIR" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1768725574; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Kf5vvI0XpsxUrrKsjVxrAI30uxGfTH908lu1v83Vp98=; b=elxkmbIRMuIO52A3PVmj1hYbRcVOj9JHklVu1OiD+esMg2jsnOg4EkVoymcRpF/Kj4EBOH 8A0VpPlvwAVutjPo2aRt+dvRLvxvumTLM5a1U1F0dcUSa7poBs1NPecvubbRBRsOG9+wwC vDwuTxJHyNcsBgTe+7Ed8yD17SmwbDk= From: Lance Yang To: hughd@google.com Cc: Liam.Howlett@oracle.com, akpm@linux-foundation.org, baohua@kernel.org, baolin.wang@linux.alibaba.com, david@kernel.org, dev.jain@arm.com, ioworker0@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lorenzo.stoakes@oracle.com, mhocko@suse.com, npache@redhat.com, rppt@kernel.org, ryan.roberts@arm.com, surenb@google.com, vbabka@suse.cz, ziy@nvidia.com, Lance Yang Subject: Re: [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out Date: Sun, 18 Jan 2026 16:39:11 +0800 Message-ID: <20260118083911.21523-1-lance.yang@linux.dev> In-Reply-To: <62e637cf-91e6-454d-a943-e5946bdf7784@linux.dev> References: <62e637cf-91e6-454d-a943-e5946bdf7784@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Hi Hugh, Could you check if my understanding is correct? On PAE, pmdp_get_lockless() reads pmd_low first, then pmd_high. There's a risk of reading mismatched values if another CPU modifies the PMD between the two reads. Commit 146b42e07494[1] introduced local_irq_save() to protect the split-read, blocking TLB flush IPIs during the operation. After modifying the PMD, pmdp_get_lockless_sync() sends an IPI to ensure all ongoing split-reads complete before proceeding with pte_free_defer(). As commit 146b42e07494[1] says: ``` Complement this pmdp_get_lockless_start() and pmdp_get_lockless_end(), used only locally in __pte_offset_map(), with a pmdp_get_lockless_sync() synonym for tlb_remove_table_sync_one(): to send the necessary interrupt at the right moment on those configs which do not already send it. ``` And commit 1043173eb5eb[2] says: ``` Follow the pattern in retract_page_tables(); and using pte_free_defer() removes most of the need for tlb_remove_table_sync_one() here; but call pmdp_get_lockless_sync() to use it in the PAE case. ``` Regarding moving pmdp_get_lockless_sync() out from under PTL: Since lockless readers (e.g., GUP-fast, __pte_offset_map()) are protected by local_irq_save() rather than PTL, pmdp_get_lockless_sync() can be called outside PTL as long as it's before pte_free_defer(). In contrast, for non-PAE, PMD reads are atomic, so pmdp_get_lockless_sync() is a no-op. [1] https://github.com/torvalds/linux/commit/146b42e07494e45f7c7bcf2cbf7afd1424afd78e [2] https://github.com/torvalds/linux/commit/1043173eb5eb351a1dba11cca12705075fe74a9e Thanks, Lance On Fri, 16 Jan 2026 09:25:54 +0800, Lance Yang wrote: > > > On 2026/1/16 09:03, Baolin Wang wrote: > > > > > > On 1/15/26 8:28 PM, Lance Yang wrote: > >> > >> > >> On 2026/1/15 18:00, Baolin Wang wrote: > >>> Hi Lance, > >>> > >>> On 1/15/26 3:16 PM, Lance Yang wrote: > >>>> From: Lance Yang > >>>> > >>>> tlb_remove_table_sync_one() sends IPIs to all CPUs and waits for them, > >>>> which we really don't want to do while holding PTL. > >>> > >>> Could you add more comments to explain why this is safe for the PAE > >>> case? > >> > >> Yep, IIUC, it is safe because we've already done pmdp_collapse_flush() > >> which ensures the PMD change is visible. > >> > >> pmdp_get_lockless_sync() (which calls tlb_remove_table_sync_one() on PAE) > >> is just to ensure any ongoing lockless pmd readers (e.g., GUP-fast) > >> complete > >> before we proceed. It sends IPIs to all CPUs and waits for responses - > >> a CPU > >> can only respond when it's not between local_irq_save() and > >> local_irq_restore(). > >> > >> Moving it out from under PTL doesn't change the synchronization > >> semantics, > >> since lockless readers don't depend on PTL anyway. > > > > Cc Hugh who introduced the pmdp_get_lockless_sync(), to double check. > > > > Sounds reasonable to me, please add these comments into the commit > > message. Thanks. > > Yes, will do. Thanks! > > > > >>> For the non-PAE case, you added a new tlb_remove_table_sync_one(), > >>> why we need this (to solve what problem)? Please also add more > >>> comments to explain. > >> > >> Oops, you're right, the original macro was a no-op for non-PAE. > >> > >> I should just move the macro call out from under PTL, rather than > >> replacing it with direct tlb_remove_table_sync_one() calls. > > > > OK. > > Cheers, > Lance >