From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77963C5AE59 for ; Fri, 30 May 2025 02:22:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF6B06B0082; Thu, 29 May 2025 22:22:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ECF396B0083; Thu, 29 May 2025 22:22:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0B7F6B0085; Thu, 29 May 2025 22:22:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C34916B0082 for ; Thu, 29 May 2025 22:22:00 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4DC0FC1D4B for ; Fri, 30 May 2025 02:22:00 +0000 (UTC) X-FDA: 83497974000.25.41124FC Received: from out30-113.freemail.mail.aliyun.com (out30-113.freemail.mail.aliyun.com [115.124.30.113]) by imf29.hostedemail.com (Postfix) with ESMTP id B5ACA120002 for ; Fri, 30 May 2025 02:21:57 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=gouodgaP; spf=pass (imf29.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.113 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748571718; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yy3kIHxx5/LnrMKyzlqg4z9R4jj9NR3RwJABm/7606U=; b=Ccr5AN+TIwDTFIeUEekeDDA6vgsrMBDr7faju0C1xH2rJyuu8NU0qihvNPUyhhDJ3RwNBw rF6iGDx9V88/tW8Ie3Sozm40ekiJ6DvsSPXpFW+sb/tLuMvWVhIUbuHsW+sMGgLCc8MBpn BxnMqCkQQsz1h1MWCg4LsS1gjgBhcoM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748571718; a=rsa-sha256; cv=none; b=lYAYpL2xyRXpVlJl6lAXBxfvSoTHFtgdYt1iqiTnY5dzFP/yoOd3Lf5XAOKiQtIhC/Z26L 6IZULGVuPKxiPFpNlM0XDWNJbGRrBqYuJyYY4TUR79LgnPtih9dSEXLhZezNa1gHFD6nUB YbvzJ9DmnHc8h3oMgLtC4L5FYppoP0I= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=gouodgaP; spf=pass (imf29.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.113 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1748571714; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=Yy3kIHxx5/LnrMKyzlqg4z9R4jj9NR3RwJABm/7606U=; b=gouodgaPR7TUFcahO9CpqMJ4Kvthmh07gFvrV+43rhiFqn5Q2ysubG2m8womLhFXNPZACvXH9IO+AV2leCY6VtKRqOlmj7VVAI4y5wa17BDjhirw0Qwyi1zF/Fe0mllfCyunsk7nzTvs0UIkPLNxazQE9JBcn4IN8lkm76WU8pg= Received: from 30.74.144.115(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WcJWIqA_1748571712 cluster:ay36) by smtp.aliyun-inc.com; Fri, 30 May 2025 10:21:53 +0800 Message-ID: <31b4bc9e-06fc-4879-be2c-aedea3173f54@linux.alibaba.com> Date: Fri, 30 May 2025 10:21:52 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled To: Zi Yan Cc: akpm@linux-foundation.org, hughd@google.com, david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org References: <33577DDE-D88E-44F9-9B91-7AA46EACCCE8@nvidia.com> <5acbfc5f-81b6-40e2-b87b-ac50423172f0@linux.alibaba.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B5ACA120002 X-Stat-Signature: nzja53yfkj8hqfxmy8figufb7chksqq7 X-Rspam-User: X-HE-Tag: 1748571717-635844 X-HE-Meta: U2FsdGVkX18S/95GnwbMy3Xei4Iqal0om2t902mNwcKUOyLG0QbJe/HTjAyAvS4wCf/oel0ndZjad3bCFfAOzFt0/VuXiykBR7h4YC7tJa5+lFxZIGztmLUo+JK/WVg6KYyGe1EjITGg8f/UAMv1rwQAupSWgDRhDEGUXg4Qx3INhjX85x+ObGs3NZRWeeLxHSig8cEp2904HO5baUZAYUCNrYM5GqygujlNY8tkpV2tBAyyrFMUFfV7XpZLzZoe3jkCBNM26sirnqouYT3ck3Yp7aZ1c1q+gI08zinwL+kKENXgyyQAUitIDFt58LGkuko/K1udcejsOrd5Js6cOjIgzJdWecjmRzDV/XCrnvj3zfsVMXxjrQkTGvG8shVpm+/doecY/jMAaurTEIjdIuDQQbTrShaj4TRcKKBTVLHw85KYbQRN7UFGbN89jhyv9YVYz+L6HnZS4dvNIzsshzDswNdZNtznPmMjkxIZ+swFbLR3o6zviAD9HAwJjQGczbY0a7m0d7fTmaCT7FP/MHoEK/bfHQSzFum463+uwkart8E6liuy7mM/nRTODvdilRSb5xDH4BB+q27lBLhzJB/Tg0PNAO0NyEYh56LiiPjfeeA40yJMeZqCK1i4S4OOUT/uybhp9sNh01Q6SpdPhXpv9m1byi8IivBZesrkQOoov0G89rhcg5CDQAdNOxGmz/XlqHO3eOCnMFN1Esw9i8UsY9XiNG26wlrKxAd/tdzeIZymWIRTJ64Sq+wBI4TDrzHSnEyjgLW8LlRuKDE3Ebq0Mo4TZi/3co2ow1oO5erdl9Tr5Zoz3isFbUMjGHaF8H5xcPB7wTZLHRL9CgNHsYKLXPrYFPIppkhlE+rP6q4ekcdgkFKDyE4+co6GIQH7XlPJlGRUMNxxHPnGgT4+Ed3q1KBkDMvAJmtlXSDvNojaFtAGZOrWHx+xP4XkPkFEm3yT5X4kqa4TAFIASyR 3yw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/5/30 10:04, Zi Yan wrote: > On 29 May 2025, at 21:51, Baolin Wang wrote: > >> On 2025/5/29 23:10, Zi Yan wrote: >>> On 29 May 2025, at 4:23, Baolin Wang wrote: >>> >>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which >>>> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE >>>> will still attempt to collapse into a Anon THP. This violates the rule we have >>>> agreed upon: never means never. >>>> >>>> To address this issue, should check whether the Anon THP configuration is disabled >>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set. >>>> >>>> Signed-off-by: Baolin Wang >>>> --- >>>> include/linux/huge_mm.h | 23 +++++++++++++++++++---- >>>> 1 file changed, 19 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>> index 2f190c90192d..199ddc9f04a1 100644 >>>> --- a/include/linux/huge_mm.h >>>> +++ b/include/linux/huge_mm.h >>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma, >>>> unsigned long orders) >>>> { >>>> /* Optimization to check if required orders are enabled early. */ >>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { >>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always); >>>> + if (vma_is_anonymous(vma)) { >>>> + unsigned long always = READ_ONCE(huge_anon_orders_always); >>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); >>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit); >>>> + unsigned long mask = always | madvise; >>>> + >>>> + /* >>>> + * If the system-wide THP/mTHP sysfs settings are disabled, >>>> + * then we should never allow hugepages. >>>> + */ >>>> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders))) >>> >>> Can you explain the logic here? Is it equivalent to: >>> 1. if THP is set to always, always_mask & orders == 0, or >>> 2. if THP if set to madvise, madvise_mask & order == 0, or >>> 3. if THP is set to inherit, inherit_mask & order == 0? >>> >>> I cannot figure out why (always | madvise) & orders does not check >>> THP enablement case, but inherit & orders checks hugepage_global_enabled(). >> >> Sorry for not being clear. Let me try again: >> >> Now we can control per-sized mTHP through ‘huge_anon_orders_always’, so always does not need to rely on the check of hugepage_global_enabled(). >> >> For madvise, referring to David's suggestion: “allowing for collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fine", so we can just check 'huge_anon_orders_madvise' without relying on hugepage_global_enabled(). > > Got it. Setting always or madvise knob in per-size mTHP means user wants to > enable that size, so their orders are not limited by the global config. > Setting inherit means user wants to follow the global config. > Now it makes sense to me. I wonder if renaming inherit to inherit_global > and huge_anon_orders_inherit to huge_anon_orders_inherit_global > could make code more clear (We cannot change sysfs names, but changing > kernel variable names should be fine?). The 'huge_anon_orders_inherit' is also a per-size mTHP interface. See the doc: " Alternatively it is possible to specify that a given hugepage size will inherit the top-level "enabled" value:: echo inherit >/sys/kernel/mm/transparent_hugepage/hugepages-kB/enabled " The 'inherit' already implies that it is meant to inherit the top-level 'enabled' value, so I personally still prefer the variable name 'inherit', just as we use it elsewhere. >> In the case where hugepage_global_enabled() is enabled, we need to check whether the 'inherit' has enabled the corresponding orders. >> >> In summary, the current strategy is: >> >> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false (global THP settings are not enabled), it means mTHP of the orders are prohibited from being used, then madvise_collapse() is forbidden. >> >> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true (global THP settings are enabled), and inherit & orders == 0, it means mTHP of the orders are still prohibited from being used, and thus madvise_collapse() is not allowed. > > Putting the summary in the comment might be very helpful. WDYT? Sure. will do. > Otherwise, the patch looks good to me. Thanks. > > Reviewed-by: Zi Yan Thanks.