From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 678C4C3ABC9 for ; Thu, 15 May 2025 15:54:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8793D6B00A4; Thu, 15 May 2025 11:54:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8272D6B00A5; Thu, 15 May 2025 11:54:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F0276B00A6; Thu, 15 May 2025 11:54:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 4E4E26B00A4 for ; Thu, 15 May 2025 11:54:43 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3764C583BE for ; Thu, 15 May 2025 15:54:45 +0000 (UTC) X-FDA: 83445590130.06.504A89C Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) by imf13.hostedemail.com (Postfix) with ESMTP id 224A92000C for ; Thu, 15 May 2025 15:54:42 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=c85OrXOC; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.41 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747324483; a=rsa-sha256; cv=none; b=6HXRhjYYLYXO10i0GK4Y/CV9XYT1g8zdA/V6vuSLOO2BUOUdqL7r5IRamzFBWntZHHR/XZ S03ZHv4wcKhOg6RhpvXa8VGCDfd+Mx4HlHzTa7KMcKH+IZjAGjbGY5KQedJQzH7xTCqYih FN62fCY8Qb8rtZx1wU/yFM/7DPraYp8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=c85OrXOC; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.41 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747324483; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YqGwytIKGGFw4F/Ptebu/zaf1C8GiOkuTgkNzw3NWj0=; b=DtwFvIvUFmRa6KI7zXeCZHziRX8lpauWLOiNKbn72gam4xWwZjwOh29MoyM9ep7DHRdhQ/ xbiL+JpNbE3x5EJK34IfIBrBMueO7Q9omijnNxDE22ItVcJyd2XrVAVtPJMALRi3NLXzLa PeDQP5VeiwTqHpo1UDN359qPpK/Wup0= Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-3a064a3e143so551278f8f.3 for ; Thu, 15 May 2025 08:54:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747324482; x=1747929282; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=YqGwytIKGGFw4F/Ptebu/zaf1C8GiOkuTgkNzw3NWj0=; b=c85OrXOCP9HCvtepDzCikYHbsbEmTZgqgEE0ipYGjVHQNuSsYOZ/lr4FhzSdG0DPh3 XZwGlphKT3/W5srhQYOKVStTwGKpcHQPDqm0hWWH4ykqECSipvZ/O+i0ClP0fIdoyjHu nmoARobxNx1yG3dZZsFy+xzHR0ibhGu/KMiYw3sOt0yN1Hk9TBITcKeHrP/THrijb0gx 1riZT6hbrxYsRRMVgLAvJmJXLCNb3l26YyRt0za9hHPdcTQMKOzXzgy2HUkLxY17ZMQw DSwYB5ZiVP0wuFq3FxlvhrWvfh6AmylCXxgIARYsbMAV3K8vg/3C4e/GGs8aEYeOSISY CHUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747324482; x=1747929282; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YqGwytIKGGFw4F/Ptebu/zaf1C8GiOkuTgkNzw3NWj0=; b=CTQgfgrCKO1R5U2XTSlLdD3Xss00SmwV4vMOA5LuRh/bb7seiAH4B5tOELLvx5nS6c rFzUfOy5gThkGDH4M4DgjN7O3eSt/JDCeZ6TsUu4H584Rl+zg1iIGNuSjOHU3onpK2ku fMkQZMradXi06o8M0TxHcbLquuJfOt8jFcIZ6s3+JGXYGO2d26thQu0whJF3BDmnt3oL AB8nJ93h1VtOSw3sRZyNL4YMZwlD8PIzIA42MJdn/Wjbq94g4YJZrHIwmV4CLJh48y4S PmpHIkgXkYIOs7p8cS6oNDE3CCt0UtQCOKt2NMq60aCnmcljmzjXY6MuZxw0CiKVG/5Y P+sg== X-Forwarded-Encrypted: i=1; AJvYcCXeej4pYYDIQMTzDo/oNIG0ES19tTmPw3fhsNQpywpiuDvN+shizGAMDOFHfmH+3PwMu4dTrZ5mjw==@kvack.org X-Gm-Message-State: AOJu0YygnafnCdJP/9wx/l5NPJSF4/V4QPr9DbCjPswoG03APHGtqimN JX9bgJvTuOEmSEUtRRF59TwLeolqUxve0fQjGFppCifyF2pPbBkpd8J5 X-Gm-Gg: ASbGncstabA2AgD3h5a675oomr2y5HcNPcnP5dxYgqtAyeYBNe0Cm1MYHgCNuxCGUMw GNYQM7NdIqLn9O7DtfsvFiYQk6P+VgWj1LxtUoovtuoyCpEwmAtmeZGOWHGmlWXosdVuvwOmmet k1Gi6GRAP49hIMG/6yK1E/wqRwUgIa8ZUJf51Fvklk8o8+4twcW3EhWIptCUfMPH2LIwSzdceuK HNydpBWQnRp5LevNVEj/VjTOKepZo+l1womNMYIMC3W7LZVv/YGZ+pk9yMjfIAkfG/C50f7B8Cl /GGSsTCnzNB2k/fFe8cT3VqOQMNo+VTr/wLRjeXD0Y42uErl7UI6Oo26fQ1vvXTxvG+ar8yGDFy WloujwStFt2EXs/BQxLtoH0rBM5X3tOK89NzgtvFBHZiDC/I= X-Google-Smtp-Source: AGHT+IESAoLieylo0dTRqovlru1XqKC+yKIafAefBwyIVhVUE0tbBFMijuC2ez+sboZHSSkONH7yUw== X-Received: by 2002:a05:6000:430a:b0:3a1:f537:94d0 with SMTP id ffacd0b85a97d-3a35c853129mr221714f8f.41.1747324481186; Thu, 15 May 2025 08:54:41 -0700 (PDT) Received: from ?IPV6:2a01:4b00:b211:ad00:1096:2c00:b223:9747? ([2a01:4b00:b211:ad00:1096:2c00:b223:9747]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a1f5a2d961sm23551020f8f.62.2025.05.15.08.54.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 15 May 2025 08:54:40 -0700 (PDT) Message-ID: Date: Thu, 15 May 2025 16:54:40 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/6] prctl: introduce PR_SET/GET_THP_POLICY To: Lorenzo Stoakes Cc: Andrew Morton , david@redhat.com, linux-mm@kvack.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com References: <20250515133519.2779639-1-usamaarif642@gmail.com> <6502bbb7-e8b3-4520-9547-823207119061@lucifer.local> <5e4c107f-9db8-4212-99b6-a490406fec77@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 224A92000C X-Stat-Signature: yqcroy41ws7rf5hjs56ngcxamwugmde4 X-HE-Tag: 1747324482-112994 X-HE-Meta: U2FsdGVkX18vN2obdggbZvrbWCorfSeATki0Vy3oXsrZX6qAyMHe9BufaUj8UnxuLkXM3/XHn02KEI/TkpvM6jYZMTSw6/CrgJm7xE2LncFRq5mGpr5RfMrGkk4odefLAEwlzgZg4veVhYo+jQ1Nv7veKFs1pXmJJb3SeJAN5Zobf50X31rTf/1ifqk4XBudowyx+iBnGRmnUuarabCYkIBahBjo31tVoeDFgJbpcFAh5hUbHhTLC7kjsNSWrk7zTo25hJ6Fr+3gh5VjsAQ9uakGL3Ph47GjdnZ2pWMms+vP1u1FUN+H5OLjJd+zw9eqCZltcBwaW3cj0B5KcWSa0Y2Iz8EXb21SYOI8h6QCf2aI/kfk+YBC4+TV1pnUjp3HJt7Q9AplNbbJv2wMhQ0EvDlAU9qXXXzxIBLak17pnnGyE1/MLMeYCQwPdk3E8UsJu7aeXzX1iA8HqMObPL4nkxHbF05Gf8xM/SHq3wzgMHu7+ONzds9P3SQX24Km3nFaxkC+m3EFaKGozWMAEYCYT8BuLqKZJWwXIB2cpQRndhPkLnrLiCDTEAPWcLo9XIxGDIxMhqa1FeShWL5MbC6OleWz/tyXWEpU5j5BLNfzb6W2IqbeRkuBOQmTda4XdcmOlUM96ZYkyJXUJrn3sfU8MLje+x2ut9JodlVph2N1+bfiugBmOEY4MN0Yw4NcCCh7a8qf0dRZpBleuUgIhQJ+u1Gawu8PaXFvJkVqmqzDcZXsJ/NMSMznSzT6mSQlZCYxwMpL4jOPg+0Vf5r1hGJP6ZPLfJlidIw8uNqkfJeBtkmN7Q0yEu0uZwfS0yyXsy57+oEY6pZf7gLrDqNAvJk3DcZbvBZr2lBHj0v/YuDFvvQCY9cVFFzaCOqgSyuG4uHq+kBt8hLjzzdwlI9EPq4MCb/HdOWPZlMHxjnZCqev1wo917XX0TqAjcUneufe2UseqEKjxzzWN6CsY1YIwVP s0kcW8OL M1VMHnlxAgT1LStPHCSi/WUE8/1QLDqE4oP+Kr42+lismiV1iBqHj0qTjgfLvQ8viDw0I3fntTQ/32GMxWOBxw2SbOeK1cEGPBPfJmvaAXLvQAITagx3lyBVYDLG6X7GNVAVDxs79FO7fDHbTNMUzV1XFnXacerJkkcFbxZxYBzSUzwDbLexCVOA13FkyKmfSxRTWPoQSLQGcLgpY1Hq+5alu7E0zmonmMidElNEodM7Zy7T1TFHSr/8vjgad+2TVIyrE1elKSBMOTLUss2lkbTXGF6WWBE/kVi0d4+jBpUQscUA8QIcO4Dey1DQp8vDZKTP/YRkIjAt0QfEXlcf17j24fEB/r6x9GKe/K0bTbOdqXbzfiWFcLRO61VNgP3HoYtjb7bVOQLyYoXhH54l7ul5wdZfQogOZ6+DMA4xpNMONPJyehSqnZ9FzlG1cBB+n93gU+k6OYgC+caibAErmqhq8VWooUX5Oji5KLocVR3L9bKYtC2U4cmvUGqxFCeY9y+XR4pR0+6E2mQAPNC0JAUfXqfW8ZjezK6RBK+V+ZuQf+M5jmGzjZyG1Rn+OHwwTJwkzR/f7dxoTrE4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 15/05/2025 16:15, Lorenzo Stoakes wrote: > Thanks for coming back to me so quickly, appreciated :) > > I am reacting in a 'WTF' way here, but it's in proportion to the (at least > perceived) magnitude of this change. We really need to be sure this is > right. > Lol I had to rewrite my replies a few times to tone them down. Hopefully I don't come across as aggressive :) > On Thu, May 15, 2025 at 03:50:47PM +0100, Usama Arif wrote: >> >> >> On 15/05/2025 14:55, Lorenzo Stoakes wrote: >>> On Thu, May 15, 2025 at 02:33:29PM +0100, Usama Arif wrote: >>>> This allows to change the THP policy of a process, according to the value >>>> set in arg2, all of which will be inherited during fork+exec: >>> >>> This is pretty confusing. >>> >>> It should be something like 'add a new prctl() option that allows...' etc. >>> >>>> - PR_THP_POLICY_DEFAULT_HUGE: This will set the MMF2_THP_VMA_DEFAULT_HUGE >>>> process flag which changes the default of new VMAs to be VM_HUGEPAGE. The >>>> call also modifies all existing VMAs that are not VM_NOHUGEPAGE >>>> to be VM_HUGEPAGE. >>> >>> This is referring to implementation detail that doesn't matter for an overview, >>> just add a summary here e.g. >>> >>> PR_THP_POLICY_DEFAULT_HUGE - set VM_HUGEPAGE flag in all VMAs by default, >>> including after fork/exec, ignoring global policy. >>> >>> PR_THP_POLICY_DEFAULT_NOHUGE - clear VM_HUGEPAGE flag in all VMAs by default, >>> including after fork/exec, ignoring global policy. >>> >>> PR_THP_POLICY_DEFAULT_SYSTEM - Eliminate any policy set above. >> >> Hi Lorenzo, >> >> Thanks for the review. I will make the cover letter clearer in the next revision. > > The next version should emphatically be an RFC also, please. Your cover letter > should mention you're fundamentally changing mm_struct and VMA logic, and > explain why your use cae is so important that that is justified. > Thanks, will make it RFC and add that I am making changes to mm_struct and VMA logic. >> >>> >>>> This allows systems where the global policy is set to "madvise" >>>> to effectively have THPs always for the process. In an environment >>>> where different types of workloads are stacked on the same machine >>>> whose global policy is set to "madvise", this will allow workloads >>>> that benefit from always having hugepages to do so, without regressing >>>> those that don't. >>> >>> So does this just ignore and override the global policy? I'm not sure I'm >>> comfortable with that. >> >> No. The decision making of when and what order THPs are allowed is not >> changed, i.e. there are no changes in __thp_vma_allowable_orders and >> thp_vma_allowable_orders. David has the same concern as you and this >> current series is implementing what David suggested in >> https://lore.kernel.org/all/3f7ba97d-04d5-4ea4-9f08-6ec3584e0d4c@redhat.com/ >> >> It will change the existing VMA (NO)HUGE flags according to >> the prctl. For e.g. doing PR_THP_POLICY_DEFAULT_HUGE will not give >> a THP when global policy is never. > > Umm... > > + case PR_SET_THP_POLICY: > + if (arg3 || arg4 || arg5) > + return -EINVAL; > + if (mmap_write_lock_killable(me->mm)) > + return -EINTR; > + switch (arg2) { > + case PR_THP_POLICY_DEFAULT_HUGE: > + set_bit(MMF2_THP_VMA_DEFAULT_HUGE, &me->mm->flags2); > + process_vmas_thp_default_huge(me->mm); > + break; > + default: > > > Where's the check against never? You're unconditionally setting VM_HUGEPAGE? So this was from the discussion with David. My initial implementation in v1, messed with the policy evaluation in thp_vma_allowable_orders and __thp_vma_allowable_orders. The whole point of doing it this way is that you dont mess with the policy evaluation. hugepage_global_always and hugepage_global_enabled will still evaluate to false when never is set and you will not get a hugepage. But more on it below. > > You're relying on VM_HUGEPAGE being ignored in this instance? But you're still: > > 1. Setting VM_HUGEPAGE everywhere (and breaking VMA merging everywhere). > > 2. Setting MMF2_THP_VMA_DEFAULT_HUGE and making it so PR_GET_THP_POLICY says it > has a policy of default huge even if policy is set to never? > > I'm not ok with that. I'd much rather we do the never check here... > I am ok with that. I can add a check over here that wraps this in: if (hugepage_global_enabled()) ... > Also see hugepage_madvise(). There's arch-specific code that overrides > that, and you're now bypassing that (yes it's for one arch of course but > it's still a thing) > Thanks, I will put if (mm_has_pgste(vma->vm_mm)) return 0; at the start. >> >>> >>> What about if the the policy is 'never'? Does this override that? That seems >>> completely wrong. >> >> No, it won't override it. hugepage_global_always and hugepage_global_enabled >> will still evaluate to false and you wont get a hugepage no matter what prctl >> is set. > > Ack ok I see as above, you're relying on VM_HUGEPAGE enforcing htis. > > You really need to put stuff like this in the cover letter though!! > Sure will do in the next revision, Thanks. >> >>> >>>> - PR_THP_POLICY_DEFAULT_NOHUGE: This will set the MMF2_THP_VMA_DEFAULT_NOHUGE >>>> process flag which changes the default of new VMAs to be VM_NOHUGEPAGE. >>>> The call also modifies all existing VMAs that are not VM_HUGEPAGE >>>> to be VM_NOHUGEPAGE. >>>> This allows systems where the global policy is set to "always" >>>> to effectively have THPs on madvise only for the process. In an >>>> environment where different types of workloads are stacked on the >>>> same machine whose global policy is set to "always", this will allow >>>> workloads that benefit from having hugepages on an madvise basis only >>>> to do so, without regressing those that benefit from having hugepages >>>> always. >>> >>> Wait, so 'no huge' means 'madvise'? What? This is confusing. >> >> >> I probably made the cover letter confusing :) or maybe need to rename the flags. >> >> This flag work as follows: >> >> a) Changes the default flag of new VMAs to be VM_NOHUGEPAGE >> >> b) Modifies all existing VMAs that are not VM_HUGEPAGE to be VM_NOHUGEPAGE >> >> c) Is inherited during fork+exec >> >> I think maybe I should add VMA to the flag names and rename the flags to >> PR_THP_POLICY_DEFAULT_VMA_(NO)HUGE ?? > > Please no :) 'VMA' is implicit re: mappings. If you're touching memory > mappings you're necessarily touching VMAs. > > I know some prctl() (a pathway to many abilities some consider to be > unnatural) uses 'VMA' in some of the endpoints but generally when referring > to specific VMAs no? > > These namesa are already kinda horrible (yes naming is hard, for everyone, > ask me about MADV_POISON/REMEDY) but I think something like: > > PR_DEFAULT_MADV_HUGEPAGE > PR_DEFAULT_MADV_NOHUGEPAGE > > -ish :) > Sure, happy with that, Thanks. >> >>> >>>> - PR_THP_POLICY_DEFAULT_SYSTEM: This will clear the MMF2_THP_VMA_DEFAULT_HUGE >>>> and MMF2_THP_VMA_DEFAULT_NOHUGE process flags. >>>> >>>> These patches are required in rolling out hugepages in hyperscaler >>>> configurations for workloads that benefit from them, where workloads are >>>> stacked anda single THP global policy is likely to be used across the entire >>>> fleet, and prctl will help override it. >>> >>> I don't understand this justification whatsoever. What does 'stacked' mean? And >>> you're not justifying why you'd override the policy? >> >> By stacked I just meant different types of workloads running on the same machine. >> Lets say we have a single server whose global policy is set to madvise. >> You can have a container on that server running some database workload that best >> works with madvise. >> You can have another container on that same server running some AI workload that would >> benefit from having VM_HUGEPAGE set on all new VMAs. We can use prctl >> PR_THP_POLICY_DEFAULT_HUGE to get VM_HUGEPAGE set by default on all new VMAs for that >> container. >> >>> >>> This series has no actual justificaiton here at all? You really need to provide one. >>> >> >> There was a discussion on the usecases in >> https://lore.kernel.org/all/13b68fa0-8755-43d8-8504-d181c2d46134@gmail.com/ >> >> I tried (and I guess failed :)) to summarize the justification from that thread. > > It's fine, I have most definitely not been as clear as I could be in series > too :>) just need to add a bigger summary. > > Don't afraid to waffle on... (I know I am not... ;) > >> >> I will try and rephrase it here. >> >> In hyperscalers, we have a single THP policy for the entire fleet. >> We have different types of workloads (e.g. AI/compute/databases/etc) >> running on a single server (this is what I meant by 'stacked'). >> Some of these workloads will benefit from always getting THP at fault (or collapsed >> by khugepaged), some of them will benefit by only getting them at madvise. >> >> This series is useful for 2 usecases: >> >> 1) global system policy = madvise, while we want some workloads to get THPs >> at fault and by khugepaged :- some processes (e.g. AI workloads) benefits from getting >> THPs at fault (and collapsed by khugepaged). Other workloads like databases will incur >> regression (either a performance regression or they are completely memory bound and >> even a very slight increase in memory will cause them to OOM). So what these patches >> will do is allow setting prctl(PR_THP_POLICY_DEFAULT_HUGE) on the AI workloads, >> (This is how workloads are deployed in our (Meta's/Facebook) fleet at this moment). >> >> 2) global system policy = always, while we want some workloads to get THPs >> only on madvise basis :- Same reason as 1). What these patches >> will do is allow setting prctl(PR_THP_POLICY_DEFAULT_NOHUGE) on the database >> workloads. >> (We hope this is us (Meta) in the near future, if a majority of workloads show that they >> benefit from always, we flip the default host setting to "always" across the fleet and >> workloads that regress can opt-out and be "madvise". >> New services developed will then be tested with always by default. "always" is also the >> default defconfig option upstream, so I would imagine this is faced by others as well.) > > Right, but I'm not sure you're explaining why prctl(), one of the most cursed, > neglected and frankly evil (maybe exaggerating :P) APIs in the kernel is the way > to do this? > > You do need to summarise why the suggested idea re: BPF, or cgroups, or whatnot > is _totally unworkable_. > > And why not process_madvise() with MADV_HUGEPAGE? > > I'm also not sure fork/exec is a great situation to have, because are you sure > the workloads stay the same across all fork/execs that you're now propagating? > > It feels like this should be a cgroup thing, really. > So I actually dont mind the cgroup implementation (that was actually my first prototype and after that I saw there was someone who had posted it earlier). It was shot down because it wont be hierarchical and doesnt solve it when its not being done in a cgroup. A large proportion of the thread in v1 was discussion with David, Johannes, Zi and Yafang (the bpf THP policy author) on different ways of doing this. >> >> Hope this makes the justification for the patches clearer :) > > Sure, please add this kind of thing to the cover letter to get fewer 'wtf' > reactions :) > > You're doing something really _big_ and _opinonated_ here though, that's > basically fundamentally changing core stuff, so an extended discussion of why > you feel it's so important, why other approaches are not workable, why the > Sauron-spawned Mordor dwelling prctl() API is the way to go, etc. > >> >>>> >>>> v1->v2: >>> >>> Where was the v1? Is it [0]? >>> >>> This seems like a massive change compared to that series? >>> >>> You've renamed it and not referenced the old series, please make sure you link >>> it or somehow let somebody see what this is against, because it makes review >>> difficult. >>> >> >> Yes its the patch you linked below. Sorry should have linked it in this series. >> Its a big change, but it was basically incorporating all feedback from David, >> while trying to achieve a similar goal. Will link it in future series. > > Yeah, again, this should have been an RFC on that basis :) > >> >>> [0]: https://lore.kernel.org/linux-mm/20250507141132.2773275-1-usamaarif642@gmail.com/ >>> >>>> - change from modifying the THP decision making for the process, to modifying >>>> VMA flags only. This prevents further complicating the logic used to >>>> determine THP order (Thanks David!) >>>> - change from using a prctl per policy change to just using PR_SET_THP_POLICY >>>> and arg2 to set the policy. (Zi Yan) >>>> - Introduce PR_THP_POLICY_DEFAULT_NOHUGE and PR_THP_POLICY_DEFAULT_SYSTEM >>>> - Add selftests and documentation. >>>> >>>> Usama Arif (6): >>>> prctl: introduce PR_THP_POLICY_DEFAULT_HUGE for the process >>>> prctl: introduce PR_THP_POLICY_DEFAULT_NOHUGE for the process >>>> prctl: introduce PR_THP_POLICY_SYSTEM for the process >>>> selftests: prctl: introduce tests for PR_THP_POLICY_DEFAULT_NOHUGE >>>> selftests: prctl: introduce tests for PR_THP_POLICY_DEFAULT_HUGE >>>> docs: transhuge: document process level THP controls >>>> >>>> Documentation/admin-guide/mm/transhuge.rst | 40 +++ >>>> include/linux/huge_mm.h | 4 + >>>> include/linux/mm_types.h | 14 + >>>> include/uapi/linux/prctl.h | 6 + >>>> kernel/fork.c | 1 + >>>> kernel/sys.c | 35 +++ >>>> mm/huge_memory.c | 56 ++++ >>>> mm/vma.c | 2 + >>>> tools/include/uapi/linux/prctl.h | 6 + >>>> .../trace/beauty/include/uapi/linux/prctl.h | 6 + >>>> tools/testing/selftests/prctl/Makefile | 2 +- >>>> tools/testing/selftests/prctl/thp_policy.c | 286 ++++++++++++++++++ >>>> 12 files changed, 457 insertions(+), 1 deletion(-) >>>> create mode 100644 tools/testing/selftests/prctl/thp_policy.c >>>> >>>> -- >>>> 2.47.1 >>>> >>