From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D03E63BB9F5; Fri, 24 Apr 2026 11:37:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777030658; cv=none; b=ZeP2Nxtm43kKSj65ASdwrYOenym/rwN8yDUuyZF8No32bgxJyoZ1eO+aWI0WZW7zgG4RG3rU5MTc4h51aiTEu6K2gdLuuZcvgiLyKICoOtMIJ/QzTFsYCkgYOSMSW4JDxH740qhThzgXviGM2plmtVjzaQ7Jz6YDMR2fVMr+B7U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777030658; c=relaxed/simple; bh=pQAkkgwBTm0wMFjEsUuolfLk3Se7Aw+kEEZDfgUAj2I=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kVUlTr/jduy9ar+CxpA1sfRTBB7gTikudGshU8p4zldZysfF77X3nVpH7/H9xamTYi5xgveidcj1TyP2KRS80S6BB/V9dS299b0BZVMGl8nVCKP8z50D6Sp1F4crz1jWzOUIqpc6bqyQEkKuwiC8/yikRd9iI0gJpQGo5RRv3iY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nzBIP5ab; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nzBIP5ab" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 41103C2BCB2; Fri, 24 Apr 2026 11:37:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777030658; bh=pQAkkgwBTm0wMFjEsUuolfLk3Se7Aw+kEEZDfgUAj2I=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=nzBIP5abjjPiZZKk8KmpEKOCx55UR2CcQo4kIqsUQXfA5UyKUBcFG6fL+eKBcY8h+ Z/IfrfmDAAZ1oTeJyQgyK7Tne3nUUZDa89ct2P6caB9rXKFqCCe4EcsScpHzDGFzeY ZOr9zDzVzse/BEdiDRVjawbGvPPmtHFoHnJfkatWPHbyZr0JPu6QKpCbMJTeGj3Feh ZyVyaI03ezjru7DS4fm2FbV7yfpGkqG79XotEA98ZaMGW//j2ekaeioZXvojXXT9g9 MVCJK/66fQqSt2ToUupcd7+2VDC/LsTQhO8gEr6GA/W3ei3jI14fr6IZXuhXPmP0Tg G9IeHmtC0ynOQ== Received: from phl-compute-02.internal (phl-compute-02.internal [10.202.2.42]) by mailfauth.phl.internal (Postfix) with ESMTP id 45D00F40068; Fri, 24 Apr 2026 07:37:37 -0400 (EDT) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-02.internal (MEProxy); Fri, 24 Apr 2026 07:37:37 -0400 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefhedrtddtgdeileelfecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjug hrpeffhffvvefukfhfgggtugfgjgesthekrodttddtjeenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhephfffieehtdegkeelkeegkeeijefhieeitdevledujeeuteekgfetveejjeeiteek necuffhomhgrihhnpehkvghrnhgvlhdrohhrghenucevlhhushhtvghrufhiiigvpedtne curfgrrhgrmhepmhgrihhlfhhrohhmpehkihhrihhllhdomhgvshhmthhprghuthhhphgv rhhsohhnrghlihhthidqudeiudduiedvieehhedqvdekgeeggeejvdekqdhkrghspeepkh gvrhhnvghlrdhorhhgsehshhhuthgvmhhovhdrnhgrmhgvpdhnsggprhgtphhtthhopeef iedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepphgvthgvrhigsehrvgguhhgrth drtghomhdprhgtphhtthhopegurghvihgusehkvghrnhgvlhdrohhrghdprhgtphhtthho pegrkhhpmheslhhinhhugidqfhhouhhnuggrthhiohhnrdhorhhgpdhrtghpthhtoheplh hjsheskhgvrhhnvghlrdhorhhgpdhrtghpthhtoheprhhpphhtsehkvghrnhgvlhdrohhr ghdprhgtphhtthhopehsuhhrvghnsgesghhoohhglhgvrdgtohhmpdhrtghpthhtohepvh gsrggskhgrsehkvghrnhgvlhdrohhrghdprhgtphhtthhopehlihgrmhdrhhhofihlvght thesohhrrggtlhgvrdgtohhmpdhrtghpthhtohepiihihiesnhhvihguihgrrdgtohhm X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 24 Apr 2026 07:37:36 -0400 (EDT) Date: Fri, 24 Apr 2026 12:37:35 +0100 From: Kiryl Shutsemau To: Peter Xu Cc: "David Hildenbrand (Arm)" , Andrew Morton , Lorenzo Stoakes , Mike Rapoport , Suren Baghdasaryan , Vlastimil Babka , "Liam R . Howlett" , Zi Yan , Jonathan Corbet , Shuah Khan , Sean Christopherson , Paolo Bonzini , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Message-ID: References: <34f75083-29a3-4860-8a6e-94551d37ac6a@kernel.org> <17b0dc02-eee3-46d6-9afb-5f81a3a20216@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Thu, Apr 23, 2026 at 04:10:30PM -0400, Peter Xu wrote: > On Thu, Apr 23, 2026 at 09:25:30PM +0200, David Hildenbrand (Arm) wrote: > > > > > > The other thing is, as I mentioned in the other email, I still don't know > > > how the current RW protection would work for anonymous. I don't yet think > > > the user swapper can read the anon page with RW-protected pgtables. So far > > > my understanding is maybe you only care about shmem so it's fine, but it'll > > > always be great to confirm with you. That's true. We use vhost and therefore shmem in our setup. One idea I had about how to make atomic eviction for anon is extending process_vm_read() and process_madvise(): - Add a flag to process_vm_read() to bypass the protnone check on accessible (or only RWP?) VMAs. - Allow process_madvise(MADV_DONTNEED) when the caller already has ptrace write access to the target. The standing objection to remote DONTNEED has been "destructive", but process_vm_writev() already lets a ptrace-capable caller overwrite arbitrary anon with attacker-chosen content. DONTNEED is strictly weaker — it zeroes, it does not inject — so the trust model is already established. > > I wonder if uffdio_move could be used for a swapper implementation instead? I considered it. UFFDIO_MOVE can in principle relocate the cold folio into a staging VMA inside the VMM, which then reads it and drops it. The downside is the VMM has to maintain a second address range and serialise eviction through it. A purpose-built primitive — something like UFFDIO_EVICT that zaps the PTE and returns the folio contents (optionally to an fd for io_uring) — seems cleaner. > If RW is justified to be useful first, maybe. > > I had a gut feeling Kirill's use case doesn't use anon at all, then if > nobody needs it we can still decide to not support anon. > > > > > If we ever have to read from a protnone page, maybe we could teach ptrace access > > to do it, or have something that can read from prot_none areas -- like > > uffdio_copy, which can write to prot-none areas. > > Somethinig like swap_access() in my proposal can also partly achieve that. > > https://lore.kernel.org/all/aYuad2k75iD9bnBE@x1.local/ A maccess()-style primitive that reads through PROT_NONE is a reasonable building block and overlaps with part of what UFFDIO_EVICT would need. > There, it was only about reading from swap so far, though. But that one > might be easier to be extended to read PROT_NONE and directly put data into > buffer user specified (ps: in my local tree impl I named it maccess() to > pair with mincore(), but it doesn't really matter; it doesn't even need to > be a syscall..). > > To me, the interfacing is not a major issue. The major question I have is > why RW protection can help in swap system impl when we already have uffd-wp. > > So I want to make sure the use case can't be implemented by uffd-wp already. > Because that's really what we might do for QEMU. Race-free eviction can definitely be implemented with uffd-wp already. But not proper working set discovery. -- Kiryl Shutsemau / Kirill A. Shutemov