From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B21E3D3CE9 for ; Fri, 24 Apr 2026 13:00:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777035617; cv=none; b=UMHSO9yI/IIDkp+rR/jJxyN0fYp5Fr9ngJ9dYBTVQ9QICBgoe6pzVAN5dGEJFoykRL7KdTb6QWQVpW9ZAsnBzzCH3IbdHgETIkirCK0Q9jGlL6jtMcO1kOpRd/320ZtI8M7gd2l5S4Nu9btg0EupV4mKcOrmE4oOblRXNRzIiL8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777035617; c=relaxed/simple; bh=mNaMVJ45d/XkfX9LnPfVcMylCsWv/6zRlhpVNM52jUA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=i8S9jec1jpfHA2bLYRjmT0pd/ZVypPSFzVd3u6YgeEOGvM1glrxIuqaUHOrZvgyUFEdI5Ketzsv8YSQ/3xet9W2Buq6dsUIms2t0F+hU/MBixGvi+2iXejzRsJb8DolxWyDmyRWtjLP4LRg0Ui6FcBhUOlo80596haFXMTjwQ8c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fFTV7/OR; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=KbSQaYvK; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fFTV7/OR"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="KbSQaYvK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777035602; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CuxHqotWAXKQn8tfukv8Lxf0MCnzWeGdcx1e3ImXNv4=; b=fFTV7/OR0y9acSovjvOioYx8UWjqexln4e+IcnaaPu2GDGFVpatcZaqVGWJLuAa00un+Cx s/Gp4c4b6QqFYLzrPwK2HfkpKfCejVleXCSJPLcN4HH5G8Te+51ncYV0A3egHdt4+ieoux 6sh3LcfKAdW0P8g1GBtpS7sZMpAAXhY= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-398-m1WqiJFtN-WGyfGycXg19Q-1; Fri, 24 Apr 2026 09:00:00 -0400 X-MC-Unique: m1WqiJFtN-WGyfGycXg19Q-1 X-Mimecast-MFC-AGG-ID: m1WqiJFtN-WGyfGycXg19Q_1777035600 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-8eb55e55362so1106785485a.2 for ; Fri, 24 Apr 2026 06:00:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1777035600; x=1777640400; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=CuxHqotWAXKQn8tfukv8Lxf0MCnzWeGdcx1e3ImXNv4=; b=KbSQaYvK0+FHgvQDrgrmhoPEgDWwUnb5mawymHIH/7+J0/T1I7pa4LdpgEi/5aBxo7 /xo05V/MLnkBDbH4AyEbM9WuEpoF8NhfCIENq+KMsJrplaAZikmVzZHVJqSi6lZJzDeN IPqaKjLsSW7o/Rz4fhQWNVkVeNNomvsKunceOnC/RbKhxLCmFsu3h7/IQnlvR+W+NA5g hKFnBSVbVEanfm6lxLjEg8FKGcZ+l57+n39rB8KjrbEEqLJmh5ZRYEMkQqkjg6LehPlz us4t6G3xuelpWXedBb76zRp37UaBmrm1w9OTFhX1IFMw3bPdLLC7+m2w2vgxmiXS5hti WjgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777035600; x=1777640400; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CuxHqotWAXKQn8tfukv8Lxf0MCnzWeGdcx1e3ImXNv4=; b=YqTcnnCyYDmjMnKJJZ5nWT2voJGyauK6r5+9X7C8eFdaaU3f/B7l0R3SSuzfCaP5Cb E+BluzcLiJp5N1EjEhvqLVxCnhYq6OnxPrKl5uaAeWpNO3G1WY1S4aMOFZfEm9n2s+XM i61s3x9tRQzXWEQWrDHK1dJVxmsOH5k1mDrObLtqQ10I3x/f3pwgzfedMvtCYA9kx8VR AF4/lfYRn7rEK4bhhjddoNaWcvph9H8uSAa3tgmBlcw3nnl/mlcUmqsdSi96/BSEh48c K+vLxNCIsJAlRbZNf3wI5iUUFMGGR9lHfI9j7qToXqMctHJ0k9AFA/Td8PHy4KfYZI9r V/4w== X-Forwarded-Encrypted: i=1; AFNElJ+A704OCsBo5lCbXTIxN7ObWUxmhHHcLpaVOr8KdKU83cAYD7uJMwYLI1YbHEBV1qOmy1pqBtBzTYkt/hnsI5Q=@vger.kernel.org X-Gm-Message-State: AOJu0YxbRMv3sRPfFpdX8GrPdrFP7khbFUicqFdHrc0ocrUDmsEvpE8+ yxNpW5kM/E07apLeJBFo3Rio8QBuC6KUBPkuWxKNzBIMdPllaSV9G8nvb0nedfL+rP3eQg3Ln+E k1kUlzJluIK39ND/2vHvdqfSMx10saV8bx5tBvubKzZlKaClp/0qRrWepXRNbnEy68FEcAg== X-Gm-Gg: AeBDieuB/J78i3iZF893uG4p/7EPQtmkJeX6SesjsBLlFBWKGbcADRa9VZAi+FvNhTb 9mffW+tiHD8sdazj6PHwEG11x5e+EN50WqRyx4C9swrPZLWQFfH2Lr/B/1ehVaErJGCdKY/xVTA UoWAAsAZrwesBdHrlUu95q4K3IJKH1KUSVHRePjPWpxb1xfnPIvmPB7VQK2POyMtUt/STXfIPYP 8azzEfSl7Nz1LLsBc+tWkFapKO4SCV57kZaEG/SWmtLB7X6mXjz/8kgwBuUmjDAX8S70EJ3Bv2E SuKfwsjFYLDrMt+RXuINXoM9cn59Gejwj0URlcw1U7ymijcyEbu7oJXCgIVb+Kp/cLcLhnpR71k HT+cST/J70GI/WYW2zopHGZ/DUd3Vux3JN335bSCdwfGq68QyMkQbIRWMzA== X-Received: by 2002:a05:620a:40cb:b0:8ef:47ae:94de with SMTP id af79cd13be357-8ef47ae9518mr2002365385a.39.1777035600037; Fri, 24 Apr 2026 06:00:00 -0700 (PDT) X-Received: by 2002:a05:620a:40cb:b0:8ef:47ae:94de with SMTP id af79cd13be357-8ef47ae9518mr2002358885a.39.1777035599426; Fri, 24 Apr 2026 05:59:59 -0700 (PDT) Received: from x1.local ([142.189.10.167]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8eb3aa60b99sm1676032585a.42.2026.04.24.05.59.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Apr 2026 05:59:58 -0700 (PDT) Date: Fri, 24 Apr 2026 08:59:57 -0400 From: Peter Xu To: Kiryl Shutsemau Cc: "David Hildenbrand (Arm)" , Andrew Morton , Lorenzo Stoakes , Mike Rapoport , Suren Baghdasaryan , Vlastimil Babka , "Liam R . Howlett" , Zi Yan , Jonathan Corbet , Shuah Khan , Sean Christopherson , Paolo Bonzini , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, kvm@vger.kernel.org Subject: Re: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Message-ID: References: <34f75083-29a3-4860-8a6e-94551d37ac6a@kernel.org> <17b0dc02-eee3-46d6-9afb-5f81a3a20216@kernel.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Fri, Apr 24, 2026 at 12:37:35PM +0100, Kiryl Shutsemau wrote: > On Thu, Apr 23, 2026 at 04:10:30PM -0400, Peter Xu wrote: > > On Thu, Apr 23, 2026 at 09:25:30PM +0200, David Hildenbrand (Arm) wrote: > > > > > > > > The other thing is, as I mentioned in the other email, I still don't know > > > > how the current RW protection would work for anonymous. I don't yet think > > > > the user swapper can read the anon page with RW-protected pgtables. So far > > > > my understanding is maybe you only care about shmem so it's fine, but it'll > > > > always be great to confirm with you. > > > That's true. We use vhost and therefore shmem in our setup. I see, thanks for confirming. Side note: I believe host works for anon too since GUP works for anon, but it doesn't matter as long as we know anon isn't a must. > > One idea I had about how to make atomic eviction for anon is extending > process_vm_read() and process_madvise(): > > - Add a flag to process_vm_read() to bypass the protnone check on > accessible (or only RWP?) VMAs. > > - Allow process_madvise(MADV_DONTNEED) when the caller already has > ptrace write access to the target. > > The standing objection to remote DONTNEED has been "destructive", but > process_vm_writev() already lets a ptrace-capable caller overwrite > arbitrary anon with attacker-chosen content. DONTNEED is strictly > weaker — it zeroes, it does not inject — so the trust model is already > established. > > > > I wonder if uffdio_move could be used for a swapper implementation instead? > > I considered it. UFFDIO_MOVE can in principle relocate the cold folio > into a staging VMA inside the VMM, which then reads it and drops it. > The downside is the VMM has to maintain a second address range and > serialise eviction through it. A purpose-built primitive — something > like UFFDIO_EVICT that zaps the PTE and returns the folio contents > (optionally to an fd for io_uring) — seems cleaner. Right, the other thing is unnecessary overhead on the extra pgtable operations when moving to the staging VMA (e.g. tlb flush). > > > > If RW is justified to be useful first, maybe. > > > > I had a gut feeling Kirill's use case doesn't use anon at all, then if > > nobody needs it we can still decide to not support anon. > > > > > > > > If we ever have to read from a protnone page, maybe we could teach ptrace access > > > to do it, or have something that can read from prot_none areas -- like > > > uffdio_copy, which can write to prot-none areas. > > > > Somethinig like swap_access() in my proposal can also partly achieve that. > > > > https://lore.kernel.org/all/aYuad2k75iD9bnBE@x1.local/ > > A maccess()-style primitive that reads through PROT_NONE is a reasonable > building block and overlaps with part of what UFFDIO_EVICT would need. > > > There, it was only about reading from swap so far, though. But that one > > might be easier to be extended to read PROT_NONE and directly put data into > > buffer user specified (ps: in my local tree impl I named it maccess() to > > pair with mincore(), but it doesn't really matter; it doesn't even need to > > be a syscall..). > > > > To me, the interfacing is not a major issue. The major question I have is > > why RW protection can help in swap system impl when we already have uffd-wp. > > > > So I want to make sure the use case can't be implemented by uffd-wp already. > > Because that's really what we might do for QEMU. > > Race-free eviction can definitely be implemented with uffd-wp already. > But not proper working set discovery. Good. Then we can focus the discussion on hotness tracking with RWP and its benefits, and compare it with a pure access bit focused tracking system (as I mentioned in the other reply). Thanks, -- Peter Xu