From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 239B01E98EF for ; Fri, 19 Jun 2026 12:24:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781871851; cv=none; b=tm2YQQkogeHfbCe+5EyHRxGdyKm2MKw6bMJfoP36ruepBFMkXuLut2k4vMIBXtW2ZR8zQPmEQIFw4wGuYF42mFwV6E1NZ+eGiZUB+f1fX95P9Yns21HjR7IXSBfXdmOoLXUYa2+Y6jH7RMa3q3avHY8Hsq6zm6bwLu0eLFDK0/c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781871851; c=relaxed/simple; bh=wOAPHbqHriDuigOf2rhKwj+AykSQpeutHpCMic+RZJs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=OrUDRbS+6bSy1u3zC3ErNbbTH/gZubKWHevOM5zUtmzC+MLq1gKJXHU6B7V6SS8Fr3fyCqGCnQgSSZBJ5eBi1MKmL7pWAM40oqaZfiuSJuADtf/OzWGkctc+e2boXJa60QIauk7hkPC2BiDx23i9o3LSD/z/oYxMimIU9Iv7GR4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OKRRV1pW; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OKRRV1pW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A9D671F000E9; Fri, 19 Jun 2026 12:24:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781871849; bh=8aEsNMA0XS4Iug0cWyAkQCLWSguLxaxVidh3weN6Rr0=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=OKRRV1pWj23erJJAYIeRZ4gm0HzkdqPkHAwADB1LpHg7vOFNlCEQG4kgDDTRuy/zt AK/e4+btNNmGrDPyqr3LaBhN9hlrRDBLg8P2CGW9sVcrE3FWErt6VtpkxoE4AZqRye RAgU0vmCY4vfNAai0fYipRRPygXHhkud5wLdG01ZxC5u7LIP0TqEQhozDt2Y2WmovX i1AZMg7IVzwJ4HUmfahAwNmiliSrHpa2mbaOLSw8K1MVVH7k+E+WGF6hVdB5bdptNg JudzPy6znfjwPXTOgil+cxIJqrFDEWoHcLQAovJgycH1YDybvzqDgIkdV7ewt5zNKy A3vViFHzn0ZUw== Date: Fri, 19 Jun 2026 13:24:03 +0100 From: Lorenzo Stoakes To: Suren Baghdasaryan Cc: Rik van Riel , linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Dmitry Ilvokhin , Borislav Petkov , Dave Hansen , Andrew Morton , David Hildenbrand , "Liam R. Howlett" , Vlastimil Babka Subject: Re: [PATCH 3/3] mm: read remote memory without the mmap lock where possible Message-ID: References: <20260616190300.1509639-1-riel@surriel.com> <20260616190300.1509639-4-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, Jun 16, 2026 at 11:19:12PM -0700, Suren Baghdasaryan wrote: > On Tue, Jun 16, 2026 at 12:04 PM Rik van Riel wrote: > > > > __access_remote_vm() takes mmap_read_lock() for the entire transfer and > > uses get_user_pages_remote(), which faults pages in. For the common > > case of reading memory that is already resident -- /proc/PID/cmdline, > > /proc/PID/environ, ptrace PEEK of resident pages -- the mmap lock is > > unnecessary and is badly contended on large machines. > > > > Add an opportunistic, read-only fast path that transfers what it can > > without the mmap lock. For each address it takes the per-VMA lock with > > lock_vma_under_rcu(), re-checks the read-side VMA permissions, and uses > > folio_walk_start(..., FW_VMA_LOCKED) to grab a short-lived reference to > > a present page before copying it out. Anything non-trivial -- a not- > > present page (needs faulting), a hugetlb or VM_IO/VM_PFNMAP mapping, or > > a race with a VMA writer -- falls back to the existing mmap_lock path > > for the remainder. > > I don't think we should be using per-VMA locks if the read spans > multiple VMAs. Doing that would risk a possibility of reading > inconsistent data since we are locking one VMA at a time. While we Yeah, very true. Suren has expounded on the possible cases that can occur elsewhere but you can observe strange states like that. You can see tools/testing/selftests/proc/proc-maps-race.c for a sense of it and https://lore.kernel.org/all/20260426062718.1238437-1-surenb@google.com/ Note that for e.g. madvise() this is exactly what we do. > load and read VMA, its neighboring VMA can be unmapped and another one > can be mapped in its place. So, our read spanning both VMAs will > return inconsistent data. access_remote_vm_fast() can check if the > entire read is contained within one VMA and if not, fall back to > mmap_lock. This would also vastly simplify the code. I expect most real-world cases are like this anyway? Cheers, Lorenzo