From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 029B13A875D for ; Thu, 9 Apr 2026 09:18:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775726294; cv=none; b=hTEhcpsUeRv//eudlpFMhklN1f5JmvLTjetF7tjk56pNLBPTio5H9GZc6auugwiCHNjXFv/BRn4iiVeljAP74UuSjGTJ5lk8ume1S+55/H5d9C+soPim05m5k2Ko5ZR+FuR37fwutr5W5r/aO8nSx2t3PpoRJxBbnnnpaIvgofY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775726294; c=relaxed/simple; bh=l4DQ125b/7uRFDTo0CRMQOSUlthuAZ0rRzbKGkw3cEc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=D4p/EKlH9Wi38LQpYjPIY50u2Qr2N9smctYQ8svaz7ANF1IXU1Xh9UsyIU1i3DgKF5AKoAUrqZYlQdjKJV//BAOHFcu9Khmaedfoc3H+IETs4XO+ZWJteOAo9v5Icj9bbtn9GhzaZVzzV8q3MHQ6miDfzJXDPkEJnFj3wySffAw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dn/tfxHF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dn/tfxHF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4A6BC19424; Thu, 9 Apr 2026 09:18:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775726292; bh=l4DQ125b/7uRFDTo0CRMQOSUlthuAZ0rRzbKGkw3cEc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=dn/tfxHFLwuOjXDZkAstNVEUjr98q2XVZSrqaAXDbkjYV+nyB76PREwXcJ8bAObzP QevxZFTHLxmqQjNlhNZcfK+Zgxk1MrpHuFRBHdXoHbBKkXCUUOxEcaUxet5IFYD0g5 7Ge2j+178bd6e5npbCPC+Ynma9khbpv7gy18NVb1fcp9dh6k5Z9JT9pff44F8pgFgv bqFkvhLAHrkDuQuUXGR2IhbYBgAOXDrKrY71108UaJbRaS94xdbWo2UHRB9mBU5BX4 r6vPMehdZTRH3CrLcAahl7JpstxKj962Tu3wmcnm+CA+1afgqT+ZeTCza42r8qy9n7 3q/TYDrJ+5zfw== Date: Thu, 9 Apr 2026 10:18:07 +0100 From: Lorenzo Stoakes To: "David Hildenbrand (Arm)" Cc: xu.xin16@zte.com.cn, hughd@google.com, akpm@linux-foundation.org, chengming.zhou@linux.dev, wang.yaxin@zte.com.cn, yang.yang29@zte.com.cn, michel@lespinasse.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 2/2] ksm: Optimize rmap_walk_ksm by passing a suitable address range Message-ID: References: <9950c6c1-f960-58c0-4312-e4f5ac122043@google.com> <20260407142141059pWDasxUAknP5rqvAMl28K@zte.com.cn> <8332aedb-e499-4789-8f46-832df8d60224@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8332aedb-e499-4789-8f46-832df8d60224@kernel.org> On Wed, Apr 08, 2026 at 02:57:10PM +0200, David Hildenbrand (Arm) wrote: > On 4/7/26 11:36, Lorenzo Stoakes (Oracle) wrote: > > On Tue, Apr 07, 2026 at 02:21:41PM +0800, xu.xin16@zte.com.cn wrote: > >>> > >>> I'd completely forgotten that patch by now! But it's dealing with a > >>> different issue; and note how it's intentionally leaving MADV_MERGEABLE > >>> on the vma itself, just using MADV_UNMERGEABLE (with &dummy) as an > >>> interface to CoW the KSM pages at that time, letting them be remerged after. > > > > Hmm yeah, we mark them unmergeable but don't update the VMA flags (since using > > &dummy), so they can just be merged later right? > > > > And then the: > > > > void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc) > > { > > ... > > const pgoff_t pgoff = rmap_item->address >> PAGE_SHIFT; > > ... > > anon_vma_interval_tree_foreach(vmac, &anon_vma->rb_root, > > pgoff, pgoff) { > > ... > > } > > ... > > } > > > > Would _assume_ that folio->pgoff == addr >> PAGE_SHIFT, which will no longer be > > the case here? > > I'm wondering whether we could figure the pgoff out, somehow, so we > wouldn't have to store it elsewhere. > > What we need is essentially what __folio_set_anon() would have done for > the original folio we replaced. > > folio->index = linear_page_index(vma, address); > > Could we obtain that from the anon_vma assigned to our rmap_item? > > pgoff_t pgoff; > > pgoff = (rmap_item->address - anon_vma->vma->vm_start) >> PAGE_SHIFT; > pgoff += anon_vma->vma->vm_pgoff; anon_vma doesn't have a vma field :) it has anon_vma->rb_root which maps to all 'related' VMAs. And we're already looking at what might be covered by the anon_vma by invoking anon_vma_interval_tree_foreach() on anon_vma->rb_root in [0, ULONG_MAX). > > It would be the same adjustment everywhere we look in child processes, > because the moment they would mremap() would be where we would have > unshared. > > Just a thought after reading avc_start_pgoff ... One interesting thing here is in the anon_vma_interval_tree_foreach() loop we check: if (addr < vma->vm_start || addr >= vma->vm_end) continue; Which is the same as saying 'hey we are ignoring remaps'. But... if _we_ got remapped previously (the unsharing is only temporary), then we'd _still_ have an anon_vma with an old index != addr >> PAGE_SHIFT, and would still not be able to figure out the correct pgoff after sharing. I wonder if we could just store the pgoff in the rmap_item though? Because we unshare on remap, so we'd expect a new share after remapping, at which point we could account for the remapping by just setting rmap_item->pgoff = vma->vm_pgoff I think? Then we're back in business. Another way around this issue is to do the rmap_walk_ksm() loop for (addr >> PAGE_SHIFT) _first_, but that'd only be useful for walkers that can exit early once they find the mapping they care about, and I worry about 'some how' missing remapped cases, so probably not actually all that useful. > > -- > Cheers, > > David Cheers, Lorenzo