From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A663422759C for ; Tue, 2 Dec 2025 05:38:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.131 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764653894; cv=none; b=RokCAi3qz10KpFCDj67TV5XVJ6nmVnj7M2R431Hb/tu4iOKIFXFpTUbcrZjwSyzEILwJWOJWrwJlPqngJH3mQH4l/dWe3P9lZtJ4ZY4Q9wU5kGpXBNItFxIZGqXGyOfwdiyfqTAhuILeqCYgK8hQo1GYYzwIT2a3ex7wSBTX9W8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764653894; c=relaxed/simple; bh=TI8eD0roRjOJCTpkBsOpQT5CRA5zMVX2hdOQ9oAxK9k=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=Snl7SnuexVWrmEKhGMKnDrF/GVXUmD7CC9R9Cc+S1QK1K3Chuc8lepl5scb3VakcXsjI1rcocMmOXYBmXgAis+IYXWW4/+fEKD67Rruc3pw2hA+GNujEAk5LiZebvqZtU7zqGQuw7Jof69joLBRHXGDEBcXn4Rlx25LBOto5KHM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=LD4q5qmJ; arc=none smtp.client-ip=115.124.30.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="LD4q5qmJ" DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1764653882; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=wmlGapgAOn6oKILJUALF6nuonOKxFzrwkGnc3n/c30M=; b=LD4q5qmJbxULP3L8SSU+9+jBLcdmrQL4BDcdZwANRvGiM63UIFdIjib5DbDTQv+iwRN+CmsveP7gKwBzTO1VPxkn+LwasBrLvNUG/JA8cKX38bOTJgYhMsnoE6YlNjLJhgh7vusyjpiyOea4ODRFtHLwfEt26pFciLSwrq7zDtg= Received: from 30.74.144.119(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Wtv94AB_1764653879 cluster:ay36) by smtp.aliyun-inc.com; Tue, 02 Dec 2025 13:38:00 +0800 Message-ID: Date: Tue, 2 Dec 2025 13:37:59 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/2] support batched checks of the references for large folios To: "David Hildenbrand (Red Hat)" , akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org Cc: lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, riel@surriel.com, harry.yoo@oracle.com, jannh@google.com, willy@infradead.org, baohua@kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <341d1aed-13ad-41ee-ad30-487c5baec399@kernel.org> From: Baolin Wang In-Reply-To: <341d1aed-13ad-41ee-ad30-487c5baec399@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 2025/12/2 00:23, David Hildenbrand (Red Hat) wrote: > On 11/25/25 01:56, Baolin Wang wrote: >> Currently, folio_referenced_one() always checks the young flag for >> each PTE >> sequentially, which is inefficient for large folios. This inefficiency is >> especially noticeable when reclaiming clean file-backed large folios, >> where >> folio_referenced() is observed as a significant performance hotspot. >> >> Moreover, on Arm architecture, which supports contiguous PTEs, there >> is already >> an optimization to clear the young flags for PTEs within a contiguous >> range. >> However, this is not sufficient. We can extend this to perform batched >> operations >> for the entire large folio (which might exceed the contiguous range: >> CONT_PTE_SIZE). >> >> By supporting batched checking of the young flags and flushing TLB >> entries, >> I observed a 33% performance improvement in my file-backed folios >> reclaim tests. > > Can you point at the benchmark or briefly explain what it does? What > exactly are we measuring that improves by 33%? Sorry for not being clear. I've described the performance test in patch 2, and I should have copied it to the cover letter: " Performance testing: Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to reclaim 8G file-backed folios via the memory.reclaim interface. I can observe 33% performance improvement on my Arm64 32-core server (and 10%+ improvement on my X86 machine). Meanwhile, the hotspot folio_check_references() dropped from approximately 35% to around 5%. W/o patchset: real 0m1.518s user 0m0.000s sys 0m1.518s W/ patchset: real 0m1.018s user 0m0.000s sys 0m1.018s "