From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DC59C77B7C for ; Wed, 25 Jun 2025 06:26:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 41FA86B00B0; Wed, 25 Jun 2025 02:26:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D0286B00B3; Wed, 25 Jun 2025 02:26:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30DB46B00B4; Wed, 25 Jun 2025 02:26:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 244DA6B00B0 for ; Wed, 25 Jun 2025 02:26:35 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C33121031BC for ; Wed, 25 Jun 2025 06:26:34 +0000 (UTC) X-FDA: 83592939108.22.76DF501 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf03.hostedemail.com (Postfix) with ESMTP id CDD3920009 for ; Wed, 25 Jun 2025 06:26:31 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=qeSCml2t; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf03.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750832793; a=rsa-sha256; cv=none; b=7lyg+3To1y4NgJ5aABOO5yFVhmxGXMlYBHxE+PBrzYsMOrEMr6w6rydc7j0HS1t1p0R9ts dXr109WTRmp46XzZXk7t+7LFNz4o/Bw+1jKPjm3oGjaFWNgxMUbMf/QZ+2PeFo7Z+y4BE5 U+7umjVYZh7eEWOjoTm89tqp3ll0NiA= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=qeSCml2t; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf03.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.132 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750832793; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/x/IypZhaD2LEMN4KW3KUfldxpdTKPN44bIQTVmKbjk=; b=vQ4eI4YuV5gTDPn6yRBlMO3CwPq242zUYL/ObOPa1YhvCc9eQGmPDaOd+epEyk0DcF/rq3 R5OURH87WGAMkkqRzGZTw9wcOHodeWwl2db5dp+lVp59CMGZY93wVC7eyA5A4+fVQxx7kS ir/AWDDVxAls5SOQf4skDt3YB9z2bic= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1750832788; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=/x/IypZhaD2LEMN4KW3KUfldxpdTKPN44bIQTVmKbjk=; b=qeSCml2tKTaz+LyE/pWcpAId2bnkeV6brBghBfiuzpP+a9av6PFMlW7gBO45RSmtN7by+qUQMdY4e0C77V/n+FTrckBgvBhpRiuc2pCCZ90PUj1BiNwtgDp5beCPa02175G6tjaP0B0hyg2/vBgVsydjzFS+ylVaU9z6VGkpceE= Received: from 30.74.144.110(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WeuHXyE_1750832786 cluster:ay36) by smtp.aliyun-inc.com; Wed, 25 Jun 2025 14:26:27 +0800 Message-ID: Date: Wed, 25 Jun 2025 14:26:25 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP settings are disabled To: Hugh Dickins Cc: akpm@linux-foundation.org, david@redhat.com, ziy@nvidia.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, zokeefe@google.com, shy828301@gmail.com, usamaarif642@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <75c02dbf-4189-958d-515e-fa80bb2187fc@google.com> From: Baolin Wang In-Reply-To: <75c02dbf-4189-958d-515e-fa80bb2187fc@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: rnu8bbpgggqxudd48pamk6oumqio9uwi X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CDD3920009 X-HE-Tag: 1750832791-484109 X-HE-Meta: U2FsdGVkX1/OqbKJYuBfiGIYhdVLPt+mWLhXywc1xxaX9Mm/gKikiXSnZhmxdx4T7sYY2UvmeXw/hP4tH8sIK1WEjGiXZXnc8Sve7qKPKk7th7h7ZTLruKLlhapIrISFRY4utGij8S4GOtjclGTOw58kwAWu9ppXsYJCA8Gh/p7qF7gKlfNcncvLGFATwqM0YrmJ1rRDWiU9appX0ayV/Gz5fKy9xzHqyhsFofK0WCDxY+lYfY3yBkt1S4aNLrr/k5Z5YUwf7atoASpCwQF5t85NrmPWMpcfDFCi5atxeNYPzHh9q36TVImkK9qHvFvEBZ92L8Tq7joGPJlt3DezzvCVCRIBLap53RUAI+ZyRAi0OrfSZJ5ndM/0ptZfOIBcWVcEaaIPEE4QUMemjM8wNjscDA6qiqlK8/6fwS6hVLoki+ylbNRBN4KrckMDITpmrDoyciuYYOZcC3ADFSiooT5WcJc8LuFP7yp8z1B2BmadWi8esxpNwT74EU6lHVXiOAxavHTO+Fytbes2O1UuZ9RObhHYL6klMOZ8V9N/y+5+S2G0GZj00Rk/fAohBagUmtqAcgbLo/VkfEn2cUcsXjFraG45k5VPJOE+efxSAyyjZVANcB3bidiWfIAgHU/pioYqQhfqRLT+u2ko74rR2suOYZ9W4dqliFGaavMlqjoueYFF1bIYCLU8JXjIKIW3TE0cQDaErpILYF9ZGUVuhH5YRqCmfyTTY6X2pd7zv14KMPkq21/GbqdXCPOsvfpU8f3MDspaz8jB0zUL+2kfmEg9tlR1j7j0V9TiQ+RPvbtUq0VldQVJZWOey7gDcDZgYKRxjTZAMESlz7zYWJl5mV+3WYRxOfrTSV9P0evveeoYxj/BJRNVoZrHGs+WGJx/wSLdP+3MpwcuerUyP2lgncJzYoYXezydcX3G0LT+Kgs9iqsE9JuB7cjvz2PVnddJQVD4GqfSx31jGr1iTpW mLHeyAD2 zWtwII+5JP2m30jFuHnSBvlz6DG3nIe+yY8NroijTi5ejYHdX6Ix4hB8YgWKajJ4CqXfP5mspKjXvEN4Y36O7OJc5ulScUEo+0rwsQArei6wUFpXLk/Ev8B5tuXrcSA2rrZJ9R5m0BjwnKOmeeFPrjB3dtN5O35ESgJOzmi1kw//hcgB2LQlyEvmf99ily1+TAOwFuXfX5Zygd6ZHcV0mLhOeIIxuPUqoYaal+yb2GEl9VRwhhHmnkvACbHBbvkp8ugkkTRpjg0T1xcbu/1Qo1gTxx75SLsQmASpXmfLKSaWHE7qTw5O2u6EAPHuTNokNrVLd4ELTpg0xWn+vREDbIfnaeS0WDqzodyRlfeEYoEf8Q9Dx+qMEZWY/eWHV1sLhapioOCIalo7LVgmjjLsSEhEJkA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/6/25 13:53, Hugh Dickins wrote: > On Wed, 25 Jun 2025, Baolin Wang wrote: > >> When invoking thp_vma_allowable_orders(), if the TVA_ENFORCE_SYSFS flag is not >> specified, we will ignore the THP sysfs settings. Whilst it makes sense for the >> callers who do not specify this flag, it creates a odd and surprising situation >> where a sysadmin specifying 'never' for all THP sizes still observing THP pages >> being allocated and used on the system. And the MADV_COLLAPSE is an example of >> such a case, that means it will not set TVA_ENFORCE_SYSFS when calling >> thp_vma_allowable_orders(). >> >> As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore >> the system-wide anon/shmem THP sysfs settings, which means that even though >> we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still >> attempt to collapse into a anon/shmem THP. This violates the rule we have >> agreed upon: never means never. >> >> For example, system administrators who disabled THP everywhere must indeed very >> much not want THP to be used for whatever reason - having individual programs >> being able to quietly override this is very surprising and likely to cause headaches >> for those who desire this not to happen on their systems. >> >> This patch set will address the MADV_COLLAPSE issue. >> >> Test >> ==== >> 1. Tested the mm selftests and found no regressions. >> 2. With toggling different Anon mTHP settings, the allocation and madvise collapse for >> anonymous pages work well. >> 3. With toggling different shmem mTHP settings, the allocation and madvise collapse for >> shmem work well. >> 4. Tested the large order allocation for tmpfs, and works as expected. >> >> [1] https://lore.kernel.org/all/1f00fdc3-a3a3-464b-8565-4c1b23d34f8d@linux.alibaba.com/ >> >> Changes from v3: >> - Collect reviewed tags. Thanks. >> - Update the commit message, per David. >> >> Changes from v2: >> - Update the commit message and cover letter, per Lorenzo. Thanks. >> - Simplify the logic in thp_vma_allowable_orders(), per Lorenzo and David. Thanks. >> >> Changes from v1: >> - Update the commit message, per Zi. >> - Add Zi's reviewed tag. Thanks. >> - Update the shmem logic. >> >> Baolin Wang (2): >> mm: huge_memory: disallow hugepages if the system-wide THP sysfs >> settings are disabled >> mm: shmem: disallow hugepages if the system-wide shmem THP sysfs >> settings are disabled >> >> include/linux/huge_mm.h | 51 ++++++++++++++++++------- >> mm/shmem.c | 6 +-- >> tools/testing/selftests/mm/khugepaged.c | 8 +--- >> 3 files changed, 43 insertions(+), 22 deletions(-) >> >> -- >> 2.43.5 > > Sorry for chiming in so late, after so much effort: but I beg you, > please drop these. Thanks Hugh for your input. (yes, we put in a lot of effort on discussion and testing:( ). > I did not want to get into a fight, and had been hoping a voice of > reason would come from others, before I got around to responding. > > And indeed Ryan understood correctly at the start; and he, Usama > and Barry, perhaps others I've missed, have raised appropriate > concerns but not prevailed. > > If we're sloganeering, I much prefer "never break userspace" to > "never means never", attractive though that over-simplification is. Yes, agree. we should not break userspace, however, I suspect whether this can really break userspace. We can set '/sys/kernel/mm/transparent_hugepage/enabled' to 'madvise' to allow MADV_COLLAPSE. Additionally, I really doubt that when the system-wide THP settings are set to 'never', userspace would still expect to collapse into THP using MADV_COLLAPSE. Moreover, what makes this issue particularly frustrating is that when we introduce mTHP collapse[1], MADV_COLLAPSE complicates matters further. That is, when the system only enables 64K mTHP, MADV_COLLAPSE still allows collapsing into PMD-sized THP. This really breaks the user's settings. [1] https://lore.kernel.org/all/20250515032226.128900-1-npache@redhat.com/ > Seldom has a feature been so thorougly documented as MADV_COLLAPSE, > in its 6.1 commits and in the "man 2 madvise" page: which are > explicit about MADV_COLLAPSE providing a way to get THPs where the > sysfs setting governing automatic behaviour does not insert them. > > We would all prefer a less messy world of THP tunables. I certainly > find plenty to dislike there too; and wish that a less assertive name > than "never" had been chosen originally for the default off position. > > But please don't break the accepted and documented behaviour of > MADV_COLLAPSE now. > > If you want to exclude all possibility of THPs, then please use the > prctl(PR_SET_THP_DISABLE); or shmem_enabled=deny (I think it was me > who insisted that be respected by MADV_COLLAPSE back then). Yes, that will prevent MADV_COLLAPSE. > Add a "deny" option to /sys/kernel/mm/transparent_hugepage/enabled > if you like. (But in these days of filesystem large folios, adding > new protections against them seems a few years late.) > > If Andrew decides that these patches should go in, then I'll have to > scrutinize them more carefully than I've done so far: but currently > I'm hoping to avoid that. > > Hugh