From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE76EC71136 for ; Thu, 12 Jun 2025 12:45:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 515826B008C; Thu, 12 Jun 2025 08:45:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4C6006B0092; Thu, 12 Jun 2025 08:45:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 403236B0093; Thu, 12 Jun 2025 08:45:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 21ABB6B008C for ; Thu, 12 Jun 2025 08:45:31 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C024B1A0274 for ; Thu, 12 Jun 2025 12:45:30 +0000 (UTC) X-FDA: 83546719620.24.6A3B44E Received: from out199-14.us.a.mail.aliyun.com (out199-14.us.a.mail.aliyun.com [47.90.199.14]) by imf19.hostedemail.com (Postfix) with ESMTP id E315D1A000B for ; Thu, 12 Jun 2025 12:45:26 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=cHwD4QHI; spf=pass (imf19.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 47.90.199.14 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749732328; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=67UlPPGc4i2QzwzleZBj0/M5eJF7H8hsVKz2BEG30Ps=; b=JTH8aSJ7GeLrpEohlpfBSsuDNXncitmSg6SXwSvyImYVHNdeeeSziDs+3p3EQu4irmlTYd pNyj/d/LcFLfo5W0xJug0je19EoyqxnlfK4lWWjEmF31Uk3jGUmqgOy8Xc2cXF8dj26dIR 86E85wHU7yGVlUfzTLRhPUxbpC44M14= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749732328; a=rsa-sha256; cv=none; b=ddgbX0gkAfgAvl54tZ2ZzQdC/xrGuwxOCZhO6nnA/3lgKfA2KYzkfAwvIAS17d9edkVk3v 5344tZV7oghaj3wNPQ/Iz6PAPxsO2GDC0wU5eR6qYQiqetrygNAkcq1I99a0sJBxNQLF12 82rWL+St5si+cBY5+Q3tyomvTkuTaX0= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=cHwD4QHI; spf=pass (imf19.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 47.90.199.14 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1749732313; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=67UlPPGc4i2QzwzleZBj0/M5eJF7H8hsVKz2BEG30Ps=; b=cHwD4QHIFOBCMg4gsludLveLLXE6RuJL5m06knYNBegLktrMN9v5aCL3fBaXL4SgKdjMmiqInu+a5lIeRVsLFNejIOuDT6S60JFewsUyKyQ2uCVL6t/Ni+kyOSqSBF6gxYwNKD3peBFVqX9OiPiOhEydtwFkq0Sawv1keJApego= Received: from 30.39.247.88(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0Wdh3ogd_1749732311 cluster:ay36) by smtp.aliyun-inc.com; Thu, 12 Jun 2025 20:45:12 +0800 Message-ID: <97a67b74-d473-455e-a05e-c85fe45da008@linux.alibaba.com> Date: Thu, 12 Jun 2025 20:45:11 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled To: David Hildenbrand , akpm@linux-foundation.org, hughd@google.com Cc: lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <8eefb0809c598fadaa4a022634fba5689a4f3257.1749109709.git.baolin.wang@linux.alibaba.com> <1ec368c4-c4d8-41ea-b8a3-7d1fdb3ec358@redhat.com> <2ff65f37-efa9-4e96-9cdf-534d63ff154e@linux.alibaba.com> <953596b2-8749-493d-97eb-a5d8995d9ef8@redhat.com> From: Baolin Wang In-Reply-To: <953596b2-8749-493d-97eb-a5d8995d9ef8@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: E315D1A000B X-Stat-Signature: 7fzbzb58qsbxxtmpt93tjwhxos9mnaze X-Rspamd-Server: rspam04 X-HE-Tag: 1749732326-761523 X-HE-Meta: U2FsdGVkX188j+A3RzGXLEXWEjvP1Wjfg0uNN57CujBKLkJT2nIfOZ5t8C6Fag03BXCyI0RGcx523mzilBYWpIkAt5E0ohuekQ0WzNdcsvGl2+VU8aGu8ariq2ZVEszLgT6ICGGuSxu0MS3GZAIEscmgFnfFhUDOo/CMFe9FkU/0w/O9JMxOQZ/xu6niFydC74PuQw5FFhERG0T0VzY+pN04BqVenv0z3QFvsyJMjw+uITTG+o7GsJY9Jtqxzz5dJgpbWprguN6gXgOzq3ZMo+x/1APkJBFVFZjPPljMfAQujKBh3UkkW5tRMP8En8fh1Ni+OMwpjkUOCoVu9vsfA22Dx16gnPVJ+oM0kVHpr+1x/Eauz0oj3g1uEAv76qjmPyI7NZTdrsIkI6Bj/lf+emuU/EtonZ4VA4FUBKFm99uOaNdrewO4w9cJTXXP6lrB4jBxG1/9Dl5F5uD09e+R0TFhux1vFR3kO74OBPl6HXz9n1EvYpltRm+oUjPODWYJ9YFy/V0Xl2QIgyP+dRY/fdDxzjhG4FWCEHscoyCwfHmu7oZbevl3LQljosmGQ6DidIr9NWPij5kVLl438T5rGwhxYtw4tc9h8+9cq7BE2BfdK5LOT6sonI86/fl1qzfsUMu3GnmW9fT5IM+gqap6u2o5nhxrgbd28diIzAPKg84+I9jTJ7GjuBk8S1NP+Tk+z0KZtuQ2POT8rt9LthVvuL59GOKt6BqnQ2xxh0SsOo4yrEuN91YWpPfFgPHQyC9hUOeh1ZQy27pxjbQYvRcaA9EV839p8WPqUn+SR83fJ/fQycx/vK7SO8rGpUOZfsTw4zmj6LCJnehRzxcNECIED4HV3aKNT+pIkZ8q0Nx8Ew2MN5itVH0iYxCqi63D7CF1OKr47Rxi68/3+q9GGJWUJktOS5oUeOjaaEtSOJ9PE9hCDmdtBXwAv0RQgNS4rYmqomwT3ZLqMFTXkbgkNqY jyfjko0y Bc1FINkZfHfmEtCWxKj5OiCKclsfEbz4MaNiJ1Nh1dtVqzkI9OQhnx6WJsd2BEkN/IVhcaEinojSBpT3+34niz9BAr0BYaHXV4PtXlWN9Ud1eoqXgdo+VZMiEVK/gR8NDWWDN5rh0DRnTDveKqrbIAYvUX08NWqxpa9Bwj7HWPvbGHrivOzWaqd9Zox9uSDGzSg+hFVQLXHRH95iF0tX+YSCtVbVIA5kZ1w6VApFOzgpsqlxYOs/smZ9Urro/BrFFy4MxjK2rRZy8uj/84tVom3FPIhKMIpuUdjK5lcAxefTsbT0GkmT4MdwYHrcq1khg4QgHNnk6VvrKRq9bYqnGSYOC4igquM4Gilou5g59pemY4jzl9NfMHGlDx/MWaH0TKsYSAV1II/SbeXEeB2CfIEbZUxnZ8z3RAKnQFVzmmsteEyJs1kvjt7LLrb9UFEdv+rhGofQFeXYmRjEJ7Su5T6AFfPT7P81Rg4D5clGLCC/zFzvIcH+ZZyFJA9xidhdWATa5rcaAB4kfZiE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/6/12 16:51, David Hildenbrand wrote: > On 12.06.25 09:51, Baolin Wang wrote: >> >> >> On 2025/6/11 20:34, David Hildenbrand wrote: >>> On 05.06.25 10:00, Baolin Wang wrote: >>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, >>>> which >>>> means that even though we have disabled the Anon THP configuration, >>>> MADV_COLLAPSE >>>> will still attempt to collapse into a Anon THP. This violates the rule >>>> we have >>>> agreed upon: never means never. >>>> >>>> Another rule for madvise, referring to David's suggestion: “allowing >>>> for collapsing >>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine". >>>> >>>> To address this issue, should check whether the Anon THP configuration >>>> is disabled >>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is >>>> set. >>>> >>>> In summary, the current strategy is: >>>> >>>> 1. If always & orders == 0, and madvise & orders == 0, and >>>> hugepage_global_enabled() == false >>>> (global THP settings are not enabled), it means mTHP of that orders >>>> are prohibited >>>> from being used, then madvise_collapse() is forbidden for that orders. >>>> >>>> 2. If always & orders == 0, and madvise & orders == 0, and >>>> hugepage_global_enabled() == true >>>> (global THP settings are enabled), and inherit & orders == 0, it means >>>> mTHP of that >>>> orders are still prohibited from being used, thus madvise_collapse() >>>> is not allowed >>>> for that orders. >>>> >>>> Reviewed-by: Zi Yan >>>> Signed-off-by: Baolin Wang >>>> --- >>>>    include/linux/huge_mm.h | 23 +++++++++++++++++++---- >>>>    1 file changed, 19 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>> index 2f190c90192d..199ddc9f04a1 100644 >>>> --- a/include/linux/huge_mm.h >>>> +++ b/include/linux/huge_mm.h >>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct >>>> vm_area_struct *vma, >>>>                           unsigned long orders) >>>>    { >>>>        /* Optimization to check if required orders are enabled >>>> early. */ >>>> -    if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) { >>>> -        unsigned long mask = READ_ONCE(huge_anon_orders_always); >>>> +    if (vma_is_anonymous(vma)) { >>>> +        unsigned long always = READ_ONCE(huge_anon_orders_always); >>>> +        unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); >>>> +        unsigned long inherit = READ_ONCE(huge_anon_orders_inherit); >>>> +        unsigned long mask = always | madvise; >>>> + >>>> +        /* >>>> +         * If the system-wide THP/mTHP sysfs settings are disabled, >>>> +         * then we should never allow hugepages. >>>   > +         */> +        if (!(mask & orders) && >>> !(hugepage_global_enabled() && (inherit & orders))) >>>> +            return 0; >>> >>> I'm still trying to digest that. Isn't there a way for us to work with >>> the orders, >>> essentially masking off all orders that are forbidden globally. Similar >>> to below, if !orders, then return 0? >>> /* Orders disabled directly. */ >>> orders &= ~TODO; >>> /* Orders disabled by inheriting from the global toggle. */ >>> if (!hugepage_global_enabled()) >>>       orders &= ~READ_ONCE(huge_anon_orders_inherit); >>> >>> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could >>> add a simple helper for that >>> >>> huge_anon_orders_never >> >> I followed Lorenzo's suggestion to simplify the logic. Does that look >> more readable? >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 2f190c90192d..3087ac7631e0 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct >> vm_area_struct *vma, >>                                            unsigned long tva_flags, >>                                            unsigned long orders); >> >> +/* Strictly mask requested anonymous orders according to sysfs >> settings. */ >> +static inline unsigned long __thp_mask_anon_orders(unsigned long >> vm_flags, >> +                               unsigned long tva_flags, unsigned long >> orders) >> +{ >> +       unsigned long always = READ_ONCE(huge_anon_orders_always); >> +       unsigned long madvise = READ_ONCE(huge_anon_orders_madvise); >> +       unsigned long inherit = READ_ONCE(huge_anon_orders_inherit); >> +       bool inherit_enabled = hugepage_global_enabled(); >> +       bool has_madvise =  vm_flags & VM_HUGEPAGE; >> +       unsigned long mask = always | madvise; >> + >> +       mask = always | madvise; >> +       if (inherit_enabled) >> +               mask |= inherit; >> + >> +       /* All set to/inherit NEVER - never means never globally, >> abort. */ >> +       if (!(mask & orders)) >> +               return 0; > > Still confusing. I am not sure if we would properly catch when someone > specifies e.g., 2M and 1M, while we only have 2M disabled. IIUC, Yes. In your case, we will only allow order 8 (1M mTHP). > I would rewrite the function to only ever substract from "orders". > > ... > > /* Disallow orders that are set to NEVER directly ... */ > order &= (always | madvise | inherit); > > /* ... or through inheritance. */ > if (inherit_enabled) >     orders &= ~inherit; Sorry, I didn't get you here. If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and madvise are 0, and inherit_enabled = true. Then orders will be 0 with your logic. But we should allow order 9, right? > > /* >  * Otherwise, we only enforce sysfs settings if asked. In addition, >  * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS >  * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE >  * set. >  */ > if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS)) >     return orders; > >> + >> +       /* >> +        * Otherwise, we only enforce sysfs settings if asked. In >> addition, >> +        * if the user sets a sysfs mode of madvise and if >> TVA_ENFORCE_SYSFS >> +        * is not set, we don't bother checking whether the VMA has >> VM_HUGEPAGE >> +        * set. >> +        */ >> +       if (!(tva_flags & TVA_ENFORCE_SYSFS)) >> +               return orders; >> + >> +       mask = always; >> +       if (has_madvise) >> +               mask |= madvise; >> +       if (hugepage_global_always() || (has_madvise && inherit_enabled)) >> +               mask |= inherit; > > Similarly, this can maybe become (not 100% sure if I got it right, the > condition above is confusing) IMO, this is the original logic. > if (!has_madvise) { >     /* >      * Without VM_HUGEPAGE, only allow orders that are set to >      * ALWAYS directly ... >       */ >     order &= (always | inherit); >     /* ... or through inheritance. */ >     if (!hugepage_global_always()) >         orders &= ~inherit; Ditto. I can not understand this too. > } >