From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-179.mta1.migadu.com (out-179.mta1.migadu.com [95.215.58.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F5332EAB6F for ; Thu, 18 Jun 2026 17:23:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.179 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781803385; cv=none; b=DpC8xVZFq/wNcifBwoUdbokxfGL13T9cLVN8OMI6uiYaqRRXE344OeNVKfnT4db399WK9epiYJONfmuYGTO+5FgqV2TGm90NIC5k3qtmbKIKfx1X214bCrmDpl9S35KIRgFR+rvZnvbHVWDGds9WaD2mpxMB0+EVZ8xG638NCnA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781803385; c=relaxed/simple; bh=vjxi8kirIH1YKm026KJtu8EYANGEW8H/l5t06gyiZuI=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=snYD/ah2DGdCMcNi0UT2/3o/tPNOOYJ9QvJ84iYcso2V/DzAq8JVG6rkEb087pI8TIPDfUJZqajvz9w/0IP3cxD1vs5/x2qnaZm9By4Aq+XCfVz9/MzZpbxmIFpQyTwaCMa4W1fKW4wm4hTlTuw2XJeJTvWBTsHW2Q4TTktUZ7w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=RgFPJQas; arc=none smtp.client-ip=95.215.58.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="RgFPJQas" Message-ID: <08a8a7c9-b60d-44f0-9028-f480e318d756@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1781803381; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0PJPXyL0vEYOxNtEDKLnBlyl99TpZJvo/qte7kjtaCc=; b=RgFPJQashPgEaWVAW9M8qHFisy7Pv/RbbHVWsTnXshdG8oGEfn0G4R49eQBlwHzibd3vkC cD6dYD3pxQtQ9HxL1brDswkl5m5oztJR+jxGlqPnIVZjf9SkQwTJhyGYprWXZjyNmShf9w K0GfTkflVqRBbGMILW7hhQscdlqK4k4= Date: Thu, 18 Jun 2026 18:22:51 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH 3/3] mm: read remote memory without the mmap lock where possible To: "David Hildenbrand (Arm)" , Rik van Riel Cc: linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Dmitry Ilvokhin , Borislav Petkov , Dave Hansen , Andrew Morton , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Suren Baghdasaryan References: <20260618170157.1375279-1-usama.arif@linux.dev> <929d36a3-f08d-47e5-94c0-b06739dac74c@kernel.org> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Usama Arif In-Reply-To: <929d36a3-f08d-47e5-94c0-b06739dac74c@kernel.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 18/06/2026 18:07, David Hildenbrand (Arm) wrote: > On 6/18/26 19:01, Usama Arif wrote: >> On Tue, 16 Jun 2026 15:03:00 -0400 Rik van Riel wrote: >> >>> __access_remote_vm() takes mmap_read_lock() for the entire transfer and >>> uses get_user_pages_remote(), which faults pages in. For the common >>> case of reading memory that is already resident -- /proc/PID/cmdline, >>> /proc/PID/environ, ptrace PEEK of resident pages -- the mmap lock is >>> unnecessary and is badly contended on large machines. >>> >>> Add an opportunistic, read-only fast path that transfers what it can >>> without the mmap lock. For each address it takes the per-VMA lock with >>> lock_vma_under_rcu(), re-checks the read-side VMA permissions, and uses >>> folio_walk_start(..., FW_VMA_LOCKED) to grab a short-lived reference to >>> a present page before copying it out. Anything non-trivial -- a not- >>> present page (needs faulting), a hugetlb or VM_IO/VM_PFNMAP mapping, or >>> a race with a VMA writer -- falls back to the existing mmap_lock path >>> for the remainder. >>> >>> untagged_addr_remote() asserts the mmap lock, so add an unlocked variant >>> for the fast path; the untag mask is a stable per-mm value. >>> >>> Only reads are handled here; writes keep using the slow path. >>> >>> Assisted-by: Claude:claude-opus-4-8 >>> Signed-off-by: Rik van Riel >>> --- >>> arch/x86/include/asm/uaccess_64.h | 12 +++ >>> include/linux/uaccess.h | 11 ++ >>> mm/memory.c | 166 +++++++++++++++++++++++++++++- >>> 3 files changed, 188 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h >>> index 4a52497ba6a1..c6fac900a747 100644 >>> --- a/arch/x86/include/asm/uaccess_64.h >>> +++ b/arch/x86/include/asm/uaccess_64.h >>> @@ -51,6 +51,18 @@ static inline unsigned long __untagged_addr_remote(struct mm_struct *mm, >>> (__force __typeof__(addr))__untagged_addr_remote(mm, __addr); \ >>> }) >>> >>> +/* Same as __untagged_addr_remote(), but usable without the mmap lock held. */ >>> +static inline unsigned long __untagged_addr_remote_unlocked(struct mm_struct *mm, >>> + unsigned long addr) >>> +{ >>> + return addr & READ_ONCE((mm)->context.untag_mask); >>> +} >>> + >>> +#define untagged_addr_remote_unlocked(mm, addr) ({ \ >>> + unsigned long __addr = (__force unsigned long)(addr); \ >>> + (__force __typeof__(addr))__untagged_addr_remote_unlocked(mm, __addr); \ >>> +}) >>> + >>> #endif >>> >>> #define valid_user_address(x) \ >>> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h >>> index 8a264662b242..c8c83372c9d8 100644 >>> --- a/include/linux/uaccess.h >>> +++ b/include/linux/uaccess.h >>> @@ -34,6 +34,17 @@ >>> }) >>> #endif >>> >>> +/* >>> + * Like untagged_addr_remote(), but for callers that stabilize @mm by other >>> + * means (e.g. a per-VMA lock) and must not assert the mmap lock. >>> + */ >>> +#ifndef untagged_addr_remote_unlocked >>> +#define untagged_addr_remote_unlocked(mm, addr) ({ \ >>> + (void)(mm); \ >>> + untagged_addr(addr); \ >>> +}) >>> +#endif >>> + >>> #ifdef masked_user_access_begin >>> #define can_do_masked_user_access() 1 >>> # ifndef masked_user_write_access_begin >>> diff --git a/mm/memory.c b/mm/memory.c >>> index 86a973119bd4..0b23b82eaa18 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -42,6 +42,8 @@ >>> #include >>> #include >>> #include >>> +#include >>> +#include >>> #include >>> #include >>> #include >>> @@ -7062,6 +7064,153 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, >>> EXPORT_SYMBOL_GPL(generic_access_phys); >>> #endif >>> >>> +/* >>> + * The fast path uses folio_walk_start(FW_VMA_LOCKED), which needs the per-VMA >>> + * lock and RCU-freed page tables to walk page tables without the mmap lock. >>> + */ >>> +#if defined(CONFIG_PER_VMA_LOCK) && defined(CONFIG_MMU_GATHER_RCU_TABLE_FREE) >>> +/* >>> + * Opportunistic lockless fast path for __access_remote_vm() reads. >>> + * >>> + * Memory already resident in @mm can be read without taking the heavily >>> + * contended mmap_lock: a per-VMA lock stabilizes the VMA, and folio_walk_start() >>> + * with FW_VMA_LOCKED grabs a short-lived reference to a present page via an >>> + * RCU/PTL protected page table walk (relying on MMU_GATHER_RCU_TABLE_FREE). >>> + * >>> + * Anything that would require faulting a page in, touching a hugetlb or >>> + * VM_IO/VM_PFNMAP mapping, or that races a VMA writer is left to the mmap_lock >>> + * path in __access_remote_vm(). Only reads are handled here. >>> + * >>> + * Returns the number of bytes transferred via the fast path. >>> + */ >>> +static int access_remote_vm_fast(struct mm_struct *mm, unsigned long addr, >>> + void *buf, int len, unsigned int gup_flags) >>> +{ >>> + void *old_buf = buf; >>> + >>> + addr = untagged_addr_remote_unlocked(mm, addr); >>> + >>> + while (len) { >>> + struct vm_area_struct *vma; >>> + vm_flags_t vm_flags; >>> + >>> + vma = lock_vma_under_rcu(mm, addr); >>> + if (!vma) >>> + break; >>> + >>> + /* >>> + * Mirror the read-side permission checks of check_vma_flags(), >>> + * and exclude what FW_VMA_LOCKED cannot handle (hugetlb) or what >>> + * needs the ->access() handler (VM_IO/VM_PFNMAP). Checked once >>> + * per VMA; anything not positively allowed falls back to the >>> + * slow path, which re-validates everything. >>> + */ >>> + vm_flags = vma->vm_flags; >>> + if ((vm_flags & (VM_IO | VM_PFNMAP)) || >>> + is_vm_hugetlb_page(vma) || vma_is_secretmem(vma) || >>> + (!(vm_flags & VM_READ) && >>> + (!(gup_flags & FOLL_FORCE) || !(vm_flags & VM_MAYREAD)))) { >>> + vma_end_read(vma); >>> + break; >>> + } >> >> This should also do the FOLL_ANON check from check_vma_flags(). >> >> check_vma_flags() rejects non-anonymous VMAs when FOLL_ANON is set: >> >> if ((gup_flags & FOLL_ANON) && !vma_anon) >> return -EFAULT; > > Duplicating GUP logic in a non-GUP file. Splendid. :) > Haha probably just need a common helper.