From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1A45C7115C for ; Wed, 25 Jun 2025 05:53:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 085D76B0089; Wed, 25 Jun 2025 01:53:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 036686B00B0; Wed, 25 Jun 2025 01:53:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8E6B6B00B3; Wed, 25 Jun 2025 01:53:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D4F716B00AC for ; Wed, 25 Jun 2025 01:53:48 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CC8D6801BD for ; Wed, 25 Jun 2025 05:53:47 +0000 (UTC) X-FDA: 83592856494.16.9BCBD0B Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180]) by imf22.hostedemail.com (Postfix) with ESMTP id 12DEFC0008 for ; Wed, 25 Jun 2025 05:53:45 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kkPXwdbQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of hughd@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750830826; a=rsa-sha256; cv=none; b=3xQqMCIqLWJQrD8irMcrKhAm3iDAknuOTCMSpJ5AgBmj8HuotfdDuqKct8F3OEoB+4NYTA ZYIPUl/YLZFl8SuPzlRF0eg+RNc1G2uoBNAnGAnmvE9uYXTdPLWqqXmosqKQbFc8ht/P+w vnuvPZXXDYutG4kRnBURfukMqZPBUSA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kkPXwdbQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of hughd@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750830826; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zx4z3mvYvRG/tfV4gSv1U16fSrx8Gj6X51fryVagCtc=; b=gBgc5moF1K9dF2XSnDx3w0ujIcq7kTazZxjdOolLdVnlwi+RAAUUWQs6mSPxhl7R+wt0Je unBtVUtw9MegwPv2KvRY2sm5LWvt5/8P1nxIYXoRTrvEhSpvcuNAtw7L6mzIRlIBpsyHIc g3rjk9rV20tQ9Ds9//YJESCZMqduwTE= Received: by mail-yb1-f180.google.com with SMTP id 3f1490d57ef6-e7b4ba530feso5222116276.1 for ; Tue, 24 Jun 2025 22:53:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1750830825; x=1751435625; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=zx4z3mvYvRG/tfV4gSv1U16fSrx8Gj6X51fryVagCtc=; b=kkPXwdbQ7li1iTpqNjKDi6PlbYMVOfnh0yxjxXBl8z7beebd5UXdzq00hPUN8K6vBq yXoRAB9XQ2QjkTrh6bdPV/WE6WO8/vKgTPM5U9qfcYN07PUss660I0QYVBXE0zGSVnvC +AUjelpsGBYv5pG+/ns9YqGKBZd12D8KVk5st8b8TMRMa1yxhDSfPAs9S37Fyv7M97k4 vbdnBomY10DCj4G9zQWa9A46JUd64/C88lfgxaHDLY34guOMS+VKDIrlZNAEEZ8bADNO xQAgf8aqPIh0wEzrMJcHYb+rMsIiGM3oy3WPXS1MNnIZ/dlFiVOWgfkhEXybyNk49s3r Oimw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750830825; x=1751435625; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zx4z3mvYvRG/tfV4gSv1U16fSrx8Gj6X51fryVagCtc=; b=G1x0IXqA+ZvrFuuCEIcdlhd2TGTzKMDubTQmpsPIjoOEqMUz74lzsgIbWL/fjQWDwF 0wOPq+mRuvMMaC4OKOG5/E7eUzdfpupORY2esrhtQZmT/xlt5Kmhgva9/BMCE6vtzqU3 xB20AdH/zp1TMFaMFvHNsAEayK+jUwtP22QG77aN73KKyPjDd1RQTTn70QiWVKx76oQ0 4waL0Te5GJrFMZorBq2K6hW4T1QHemnvCDcaY2BmFqU6XgPBLxkTmq7pVlDskjG7RaqJ Dzmv4VM5SzqVCeGlqhyhWtF4PYdc9/IkOtszrWsnIVNRr9tx7lfMnFttUgfocbQa/PxR LnGw== X-Forwarded-Encrypted: i=1; AJvYcCXhbCbAge1KNgv6OYPf3mElZg4rzRHmqW9cM70TKohGAbzFckGjE+966YFam1W+iA3DvOjijm9Tiw==@kvack.org X-Gm-Message-State: AOJu0YwA4oNG+2PLu60QPhujmQ0CwYyVJpsZOTQNsM73hvYxo+BjqIlR A9MzoLlLLcY4MP4E+Jxn2Vmftf01s8tl/Zh62V7BRC2oxhU7Jowvht7XHLGCLHm+Iw== X-Gm-Gg: ASbGncvLgTI93ucGjE2YO9UWjb0al33PqU/UupNr5ls+LRx7cgExNgSIbzHKCqKY+L3 qx2rxyp26bhlTdkArqQmO057HyRZOkojYCJpG//dNZNZRaf95/+VJEbe19E8ra1zB5Qyq28AoPE XQ4LJBCCLpVfspwETmWh1Py9IOWyct3AzL3r2IcZbnYUUeeBIIGte1dfVGPQ8HnZu7h/n51+5b6 kyoipITkySVvMO+kRzD71NPM9mbNsqLMryWqc78svhNxkShEkHKxtCEJVkkp+srDnwemAjXKYbx s4yT0g2lYI2Ofoa+Zhpuhfk0fg4miU97rnOPY3PG0w/dvFnoPPuJOG6vDsg8aba6PyDPggITUQ4 pl2fTq3L+yhrb1Qac5Ry04HflMxuSMwOlbAgmO/oCC6Co20I= X-Google-Smtp-Source: AGHT+IEQN2zXhNVyExsWalzpya1s0GNQyAha8OZtfjqzvRU+p7vLrRDF/OejiPJM7ycEFxSrqgRmqA== X-Received: by 2002:a05:6902:1891:b0:e84:4af8:c6c1 with SMTP id 3f1490d57ef6-e8601776ab9mr2194708276.25.1750830824676; Tue, 24 Jun 2025 22:53:44 -0700 (PDT) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 3f1490d57ef6-e842ac8ef24sm3464821276.46.2025.06.24.22.53.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Jun 2025 22:53:43 -0700 (PDT) Date: Tue, 24 Jun 2025 22:53:28 -0700 (PDT) From: Hugh Dickins To: Baolin Wang cc: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, ziy@nvidia.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, zokeefe@google.com, shy828301@gmail.com, usamaarif642@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v4 0/2] fix MADV_COLLAPSE issue if THP settings are disabled In-Reply-To: Message-ID: <75c02dbf-4189-958d-515e-fa80bb2187fc@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Queue-Id: 12DEFC0008 X-Rspamd-Server: rspam10 X-Stat-Signature: 94obrpy51kzowdk73getgop78k5e9k96 X-HE-Tag: 1750830825-455499 X-HE-Meta: U2FsdGVkX18O13AOgmL7/K1knnO/U5Wmo48hfUdipPG8+m6B409F2SBi5IkOPv5fvKz7btcDCJWVwhwcVT6eQAApNiaw1RCJBmoBVmyNXpeP5HR+1O2GDCJ+fNQTjvFtvR8wT19znIbmKEC/8L7GibVPnffwwOWxTREv3vu6vyNazLd0c7QhjP6Xc4t+rFyXUMKgEKYZyGICp/bGN6oQbfREvAqkUU8zPyuxxIERjSNFzmPtR2c7rsRx9v8Ns09pP0NDoZjdPy4HPiHxFwHbbskZg9KGABJeNMFjRwc91VERUp8x4XmgTuPHcV4mYBvafP+ga0N+kRALg1karZmM1qiWSyfAKmW6pLBsnM9Djp9DXp3mWKZB2V/71/nSBfP1aNh8EeVyomXY4G7D3jhF0UovYg+xMZUvZAPnyPRvNgZo4+zY2HxQMWU8lYj+Fehr5quvkB+JtsRn+SdtC+U/mZhU1vOeanku9i4RHgto8qeIp9fdGxb9R0QUJYFDF8qJfnoxCWGCQXwDSXUmzRTE2L5YljGChFf1T1rCaYeEa0IiSyrmsdtFnfy0C/djBKO0omnNOJlCw5UxZG/6H62R90WC2i+c49GJHZ+vpHOKSpOiKVNJAyXUCRUmEdJdE9RGL2F/p6PJw4kysZUh430i+aUFUQpVzVLlxUk7elBmNmyXS3/xOod2u5jo4sjh/2dtsXLK1seKSXGyndHGYLR+KlVprjyaevpflBatoXcjpELM5XNKskWiimZdbVZM0gHYtY6RReoNQjaceX0Clb8IIrifD6wPoZa+zNlhBIHMF0ql2mdruxvaK/xa5vLik9SRL541ss/bQkQcHqpoY4yN8P1SsbvqFST2Mw/2NjO2xJiwZEgQ1cTyXYjQn09RgXLhaTXWw+0ab3dJMUaVykCh49rgvTbL7jrTNHTVDrC9fX+BRmyfe3XuU0GH1x3+ayyCSNoO1n6cdKsNpOk1bwr DiY2km1v M1+2sCXB1DzkDuAG19GBkGKkea3eAKu9kGH0Z3XOvrg8EC7Gl/KZMDFfwv64dX87ccmawrJuD0hSBV5gOVZ2LK8acChTkeQZW49d/z4tF+v5uWyYu2GMKKNMDis9aUBhvdGJ8gj02ljHuTBPYV7fGnVFDXa95LHByyM/ABBqXsqiY12rYQzNcNe5Zc2TYrCFGxvG61VsDfTkUKxV4ANtifNFpNEpztYAa+r6u7FYjBUmCPEMRbx4AARWfG4JqSWcypMqsbcyqz9oz864lnflTLnrJWOqQctHJYlx2kxvqzLoXARwHIas0rg64kTA+9TCadTmYvzVn4jnM8mbigsxydz4iy8w8RMP2Jz3DpRxhcHrF50VGbK9bT1Zk21HHCMtvgRFpEP2hoNysyLIZ9cnmsDl06j9E4vmoAGjgE1VdpvwCAytMSI6ucEYGJZUttaIw0arnU3cyJqQ8aQk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 25 Jun 2025, Baolin Wang wrote: > When invoking thp_vma_allowable_orders(), if the TVA_ENFORCE_SYSFS flag is not > specified, we will ignore the THP sysfs settings. Whilst it makes sense for the > callers who do not specify this flag, it creates a odd and surprising situation > where a sysadmin specifying 'never' for all THP sizes still observing THP pages > being allocated and used on the system. And the MADV_COLLAPSE is an example of > such a case, that means it will not set TVA_ENFORCE_SYSFS when calling > thp_vma_allowable_orders(). > > As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore > the system-wide anon/shmem THP sysfs settings, which means that even though > we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still > attempt to collapse into a anon/shmem THP. This violates the rule we have > agreed upon: never means never. > > For example, system administrators who disabled THP everywhere must indeed very > much not want THP to be used for whatever reason - having individual programs > being able to quietly override this is very surprising and likely to cause headaches > for those who desire this not to happen on their systems. > > This patch set will address the MADV_COLLAPSE issue. > > Test > ==== > 1. Tested the mm selftests and found no regressions. > 2. With toggling different Anon mTHP settings, the allocation and madvise collapse for > anonymous pages work well. > 3. With toggling different shmem mTHP settings, the allocation and madvise collapse for > shmem work well. > 4. Tested the large order allocation for tmpfs, and works as expected. > > [1] https://lore.kernel.org/all/1f00fdc3-a3a3-464b-8565-4c1b23d34f8d@linux.alibaba.com/ > > Changes from v3: > - Collect reviewed tags. Thanks. > - Update the commit message, per David. > > Changes from v2: > - Update the commit message and cover letter, per Lorenzo. Thanks. > - Simplify the logic in thp_vma_allowable_orders(), per Lorenzo and David. Thanks. > > Changes from v1: > - Update the commit message, per Zi. > - Add Zi's reviewed tag. Thanks. > - Update the shmem logic. > > Baolin Wang (2): > mm: huge_memory: disallow hugepages if the system-wide THP sysfs > settings are disabled > mm: shmem: disallow hugepages if the system-wide shmem THP sysfs > settings are disabled > > include/linux/huge_mm.h | 51 ++++++++++++++++++------- > mm/shmem.c | 6 +-- > tools/testing/selftests/mm/khugepaged.c | 8 +--- > 3 files changed, 43 insertions(+), 22 deletions(-) > > -- > 2.43.5 Sorry for chiming in so late, after so much effort: but I beg you, please drop these. I did not want to get into a fight, and had been hoping a voice of reason would come from others, before I got around to responding. And indeed Ryan understood correctly at the start; and he, Usama and Barry, perhaps others I've missed, have raised appropriate concerns but not prevailed. If we're sloganeering, I much prefer "never break userspace" to "never means never", attractive though that over-simplification is. Seldom has a feature been so thorougly documented as MADV_COLLAPSE, in its 6.1 commits and in the "man 2 madvise" page: which are explicit about MADV_COLLAPSE providing a way to get THPs where the sysfs setting governing automatic behaviour does not insert them. We would all prefer a less messy world of THP tunables. I certainly find plenty to dislike there too; and wish that a less assertive name than "never" had been chosen originally for the default off position. But please don't break the accepted and documented behaviour of MADV_COLLAPSE now. If you want to exclude all possibility of THPs, then please use the prctl(PR_SET_THP_DISABLE); or shmem_enabled=deny (I think it was me who insisted that be respected by MADV_COLLAPSE back then). Add a "deny" option to /sys/kernel/mm/transparent_hugepage/enabled if you like. (But in these days of filesystem large folios, adding new protections against them seems a few years late.) If Andrew decides that these patches should go in, then I'll have to scrutinize them more carefully than I've done so far: but currently I'm hoping to avoid that. Hugh