From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB5D4E7716D for ; Thu, 5 Dec 2024 15:20:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 163436B00EB; Thu, 5 Dec 2024 10:19:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C10896B00C7; Thu, 5 Dec 2024 10:19:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFD106B00BF; Thu, 5 Dec 2024 10:19:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D03B36B0082 for ; Thu, 19 Sep 2024 04:21:05 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3D0751C5310 for ; Thu, 19 Sep 2024 08:21:05 +0000 (UTC) X-FDA: 82580792490.03.8590295 Received: from mail-vk1-f172.google.com (mail-vk1-f172.google.com [209.85.221.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 68FDCC0005 for ; Thu, 19 Sep 2024 08:21:03 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=quarantine); spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726733938; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CCUADm2kQU1BkGTZ1Hi0HjFkl5o/V1S+V/Jkh5Y1Shw=; b=AaquBWDqu7JRK1L8yyrJDuqsO7iSJ2Rcel2j8BbtHRQjdcJe0FWyY14LuoybqjsRJicP6j KnQiDZo5n42qv+RvxB8JeOaAm8eJfqaugFBdSRM25CniFszeHNThsXd8XtepeE6j1TbIYu l6RUCNIKB1+ollle/c/Z7CYvuKX9YAI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726733938; a=rsa-sha256; cv=none; b=Wt+OajeGdcyLmSZzp/3U7phSMeKUerBfIJn14yrzEZDITn/r+nUOiU17SRn9kJO+g4X75V FcYcVmGoDFaUXuQAFxrlwkDBBfwhPpvVr5BmqyU2TSSQiKRuR1YmdqFTzG6peT28jlZ19N gJ4KMjPXyQEc+IV0K27xK3Xpx2YaaJA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=quarantine); spf=pass (imf22.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.172 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-vk1-f172.google.com with SMTP id 71dfb90a1353d-502b405aa76so146520e0c.3 for ; Thu, 19 Sep 2024 01:21:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726734062; x=1727338862; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CCUADm2kQU1BkGTZ1Hi0HjFkl5o/V1S+V/Jkh5Y1Shw=; b=jgS0kALhqc2pYfGGGBGbT9wd2l4m+vMXttydX6bf2v/NF6y29KNVvBzlrbQ3ATaGMf MtrugCo7suIh65jHPZZ6jB8NjHW4gVl/WDPiB8Pspp7UIUXWLsiQt6HZRU53A+c+Fxjv HyHmEeb2XK4rCqAINdsyKED8x8tKiCCyWIL5MZ0g03eBzqa9iMdzVAjvw+IyqZKVdi+l vkoQAKtVID7bOK+PpnKzS6jWx8vBor/Bw4HyyKaQvGP+wHD558hP0ZRB9Pl3xLoOd9xf QvMm1x3VOiM720+ep3iDcT33IExTyt+v3FErT6S/QPejxtsT6aZX8Jo7nVyinNGqNNI9 F3Hw== X-Forwarded-Encrypted: i=1; AJvYcCUsjn5By4sDamfsCpYBNORfvzcUe3afl26/Rn1pDfMFoUyKzBzzZNnGPP2U/q5ALgLRln8hX4FzAA==@kvack.org X-Gm-Message-State: AOJu0YwEsctmSHh5UAAp8C49nSHFs2p+IkZv9JHWALVV8AR70ADq5iUn TiOqqSZL7rDVq6xqJlILoTYi+xXCHyuSTjSLNJWwnJ6r+3UZfITNR5UmSKq4ybP9i3GCX/QACpZ zjI/ejVivOlJqmNp4dg2a32LBKPw= X-Google-Smtp-Source: AGHT+IHW3DqaYFPw1emnm6M8nQHlmFNRqPvM4237wW2Jv3qXhBHNV1VCOVQmIToKbRZbnVZVyyA3NMh22oTwXb0AF2w= X-Received: by 2002:a05:6122:3119:b0:501:1c74:bfc9 with SMTP id 71dfb90a1353d-5032d73c3femr13927643e0c.12.1726734062336; Thu, 19 Sep 2024 01:21:02 -0700 (PDT) MIME-Version: 1.0 References: <20240717071257.4141363-1-ryan.roberts@arm.com> <480f34d0-a943-40da-9c69-2353fe311cf7@arm.com> In-Reply-To: <480f34d0-a943-40da-9c69-2353fe311cf7@arm.com> From: Barry Song Date: Thu, 19 Sep 2024 20:20:51 +1200 Message-ID: Subject: Re: [RFC PATCH v1 0/4] Control folio sizes used for page cache memory To: Ryan Roberts Cc: Andrew Morton , Hugh Dickins , Jonathan Corbet , "Matthew Wilcox (Oracle)" , David Hildenbrand , Lance Yang , Baolin Wang , Gavin Shan , Pankaj Raghav , Daniel Gomez , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 68FDCC0005 X-Stat-Signature: p94q6qw7qza5gkwu4ibe6o4hs5hk86zb X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam-User: X-Rspam: Yes X-HE-Tag: 1726734063-519279 X-HE-Meta: U2FsdGVkX18lyBork0UZ0otwhKWyf6ATuJqZblxNfmdHoPEnvxDfDa6jaSjcerttE/OSlJdvRvXYn/L45jjJt3lzRf5a76N29zWxd8YNYJm2s5u1SgP4761TRexOEde9QIkcgRWXiymcoYkayUnZs/EiIyhgZZj+8xODwujoeV5QTJI9hYggDrGRR6grPvPWIeLoy9R3E0tXvg9G6fkm/zcK4+K7FqYP89/uyxI2AFe0AD/E6bvmRHopYCW8HujSW3BHl05/tf+HLnIuyrI50jCHcQDq4eK/rYwiXUr+hyj3httdYMrAOYdJADU6Bxlvbrhtn6B+KChiOeIel4eq0F8Ah0m6qpuUVR/Ttz2Tqe2SAMWYmniZRjV0xkQKW1mcTgZgKGmQKXw6mfRVXB2i2RE6Gk80iMuWU96CH/3jKPFWRYiLksn1ARdnNDWAa3TOHx08mkhxifQ45cNTBSv09fIda4/9TpGxkMDNVOwDPkHfwa6vXkjme41XtV1KqjOSrUeRyF42m2nuybD7f8IjiXxYzk3XkT7OSBewrJXDmFEbdvsVKusHgNRWrC3WlGYrCiIeuzM61ktVXwvpc1znIE0yVi9KFZSwjy9YDYpPSnWMihTcFigkTlwe0y0kZVGubRAgrReEchQeLKFnIFxvqEpZASFhY1oi+gwSODRvvAzs1dw9xD+ef4p42DV0KgcWCr6Ke3IoH1sH56y2Q8h46YZr1pAfZ5w20iAoYkM/vLAhD7MevtxEmgE0tWhhZS53iLxTwHkPHNIRbXBo2nVjeSqHHA9l3cj3YLvLv914qpUjQr2UP82gyGJr5TQrfrSVBDvybblR16o6gHYlMFqZebJQIjI2l8B3ZDjkypgLCUf1IB0CCDy0bKgMXRH256jriDbTYwuzaA7fmdXXHIUtVJ/znXf6i1t9x70dhEAku0U1qqcNeV/q0lGRD5gE2eVPjtKpNP0OpJdPCvcWlu8 qO39EG3K e210jUt06SDxrSYZl56Pp5w+Inw2LFl2+/tohOTccZ0GnevPznPDyLI/j13M1HZ2Rz1MU5/15UCP1KxwmVoRXMvxbKIc1dSwjnElUV8ikAkUR1VeHE1pNG1G08p5vUakxk8g05O7/WTydBHs5NOzLbu0tmQKdLFOOr0pm8xiTj9j7UggNkqEYZE8VxJjWM9nb7D2X3cqsmo9eruYXyhBXwX9cDWmAqY/kdRZLmAkMmGtC69+WclABmspM8Hg6LgsBykIq+ncxn/de2pgSouWC82bNsHVDOsk9wL4NpWyVNeDDpyEzOc8wS/4MDAmDcVkphKtL9qN6an/O82oganmrJrHGzomR+GsDR+dNynVPIXLXuWqSLUUzxTzqONQFce9MUPGzzVqDqlVM0Sc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 8, 2024 at 10:27=E2=80=AFPM Ryan Roberts = wrote: > > On 17/07/2024 08:12, Ryan Roberts wrote: > > Hi All, > > > > This series is an RFC that adds sysfs and kernel cmdline controls to co= nfigure > > the set of allowed large folio sizes that can be used when allocating > > file-memory for the page cache. As part of the control mechanism, it pr= ovides > > for a special-case "preferred folio size for executable mappings" marke= r. > > > > I'm trying to solve 2 separate problems with this series: > > > > 1. Reduce pressure in iTLB and improve performance on arm64: This is a = modified > > approach for the change at [1]. Instead of hardcoding the preferred exe= cutable > > folio size into the arch, user space can now select it. This decouples = the arch > > code and also makes the mechanism more generic; it can be bypassed (the= default) > > or any folio size can be set. For my use case, 64K is preferred, but I'= ve also > > heard from Willy of a use case where putting all text into 2M PMD-sized= folios > > is preferred. This approach avoids the need for synchonous MADV_COLLAPS= E (and > > therefore faulting in all text ahead of time) to achieve that. > > Just a polite bump on this; I'd really like to get something like this me= rged to > help reduce iTLB pressure. We had a discussion at the THP Cabal meeting a= few > weeks back without solid conclusion. I haven't heard any concrete objecti= ons > yet, but also only a luke-warm reception. How can I move this forwards? Hi Ryan, These requirements seem to apply to anon, swap, pagecache, and shmem to some extent. While the swapin_enabled knob was rejected, the shmem_enabled option is already in place. I wonder if it's possible to use the existing 'enabled' setting across all cases, as from an architectural perspective with cont-pte, pagecache may not differ f= rom anon. The demand for reducing page faults, LRU overhead, etc., also seems quite similar. I imagine that once Android's file systems support mTHP, we=E2=80=99ll unif= ormly enable 64KB for anon, swap, shmem, and page cache. It should then be sufficient to enable all of them using a single knob: '/sys/kernel/mm/transparent_hugepage/hugepages-xxkB/enabled'. Is there anything that makes pagecache and shmem significantly different from anon? In my Android case, they all seem the same. However, I assume there might be other use cases where differentiating them is necessary? > > Thanks, > Ryan > > > > > > 2. Reduce memory fragmentation in systems under high memory pressure (e= .g. > > Android): The theory goes that if all folios are 64K, then failure to a= llocate a > > 64K folio should become unlikely. But if the page cache is allocating l= ots of > > different orders, with most allocations having an order below 64K (as i= s the > > case today) then ability to allocate 64K folios diminishes. By providin= g control > > over the allowed set of folio sizes, we can tune to avoid crucial 64K f= olio > > allocation failure. Additionally I've heard (second hand) of the need t= o disable > > large folios in the page cache entirely due to latency concerns in some > > settings. These controls allow all of this without kernel changes. > > > > The value of (1) is clear and the performance improvements are document= ed in > > patch 2. I don't yet have any data demonstrating the theory for (2) sin= ce I > > can't reproduce the setup that Barry had at [2]. But my view is that by= adding > > these controls we will enable the community to explore further, in the = same way > > that the anon mTHP controls helped harden the understanding for anonymo= us > > memory. > > > > --- > > This series depends on the "mTHP allocation stats for file-backed memor= y" series > > at [3], which itself applies on top of yesterday's mm-unstable (650b675= 2c8a3). All > > mm selftests have been run; no regressions were observed. > > > > [1] https://lore.kernel.org/linux-mm/20240215154059.2863126-1-ryan.robe= rts@arm.com/ > > [2] https://www.youtube.com/watch?v=3Dht7eGWqwmNs&list=3DPLbzoR-pLrL6oj= 1rVTXLnV7cOuetvjKn9q&index=3D4 > > [3] https://lore.kernel.org/linux-mm/20240716135907.4047689-1-ryan.robe= rts@arm.com/ > > > > Thanks, > > Ryan > > > > Ryan Roberts (4): > > mm: mTHP user controls to configure pagecache large folio sizes > > mm: Introduce "always+exec" for mTHP file_enabled control > > mm: Override mTHP "enabled" defaults at kernel cmdline > > mm: Override mTHP "file_enabled" defaults at kernel cmdline > > > > .../admin-guide/kernel-parameters.txt | 16 ++ > > Documentation/admin-guide/mm/transhuge.rst | 66 +++++++- > > include/linux/huge_mm.h | 61 ++++--- > > mm/filemap.c | 26 ++- > > mm/huge_memory.c | 158 +++++++++++++++++- > > mm/readahead.c | 43 ++++- > > 6 files changed, 329 insertions(+), 41 deletions(-) > > > > -- > > 2.43.0 > > > Thanks Barry