From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDAEAC3ABB2 for ; Fri, 30 May 2025 08:59:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 889E36B0083; Fri, 30 May 2025 04:59:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 83A5F6B0085; Fri, 30 May 2025 04:59:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7500F6B0088; Fri, 30 May 2025 04:59:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 579546B0083 for ; Fri, 30 May 2025 04:59:46 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DF20F1A2AB2 for ; Fri, 30 May 2025 08:59:45 +0000 (UTC) X-FDA: 83498976330.12.1D39D61 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf30.hostedemail.com (Postfix) with ESMTP id 23EC58000B for ; Fri, 30 May 2025 08:59:43 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf30.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748595584; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZvL2ZwyKETMS3JUDJiWcziqJvgnsoEks0k27umH21rY=; b=GWTO7kkK3MIlASOTDxmRgYwJavDM7UB4eqkFTFAadcgKyKgHErW10E39MFuKhBXOp2H+xB 56VkB632e3tISvfzqRXnQYtrNCZW53RVZ/0PyR/ufzJMUPSKCOZcsghZ2w+/wltAy8mIq4 3buihp2+SODouhMNxVSFCQOesvH8JyQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf30.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748595584; a=rsa-sha256; cv=none; b=nV0Jw/x8n968Oex8UaIK6+uuHTboycsVpEYP/UFJKdAV3sUEu9NH4PX0Fga4xdFDAh/mUy 15Bu8SNNBcNDjohEasya8MCM9bY0pCRNl+JOZ+dKmFu1k0j7Ah+uAXF0Tt8i4f65O4BaKG VhXmZs7SVgr3/Cf1JnDwBPt/4eY9cOQ= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EBDC616F2; Fri, 30 May 2025 01:59:26 -0700 (PDT) Received: from [10.57.95.14] (unknown [10.57.95.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8351C3F5A1; Fri, 30 May 2025 01:59:41 -0700 (PDT) Message-ID: <9b1bac6c-fd9f-4dc1-8c94-c4da0cbb9e7f@arm.com> Date: Fri, 30 May 2025 09:59:39 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/2] fix MADV_COLLAPSE issue if THP settings are disabled Content-Language: en-GB To: David Hildenbrand , Baolin Wang , akpm@linux-foundation.org, hughd@google.com Cc: lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, dev.jain@arm.com, ziy@nvidia.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org References: <05d60e72-3113-41f0-b81f-225397f06c81@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: 23EC58000B X-Rspamd-Server: rspam09 X-Stat-Signature: tdi3bkangodkwc1merco8uc33o8qrft8 X-HE-Tag: 1748595583-458953 X-HE-Meta: U2FsdGVkX1/NaLamIe2cSncLW7NTllmXjmiOAX8GKTmssph8FRudH5BPjuxi+tRjl8HlP8waBNonF9QEIs9ZnKBpL+0CZvkbrL1jOVUuAedlBug6TONTWh/zp8haiEYhnXkCCunDOTCGs12777f2V6YvvSs8uGemNp/GL7jElzHOmtcCTnB0YY0syyAUNlZZuqft2Xun8gmeoW3+sXuhtNSLyDecUzJEsMtqARId/ok/qf6V6cdoxtDmIXyKjZmKasB78WFKrTl2OP3D4bIDXllqClNajkpYGUAum3dEj9UqEhGabj4/E3iv7PKPI516JHtmpBIDRIQttzZAJDkUlBTMOG/XXpFQ5nbbz5fKNRGmo7gNN3YS+vcYOAhBvgls0bMaZApb8k7vvUrc9MvGTmfDV4B9KQ4i2c+GdnBOGXG74nb5LjBfyzp6UcR4BFb9ChJXN/zaxZ2krGgavjoDW7a3AHe6jK6S3HA8TpGfTtGuqL9we/Isf0KqA0bCuhLjNfNAjM2Hkqx9WTjRubI3gpCzHPld1Gx8+DmFx+cOvffwL2DQ7Cw50HWSBdSfxrld5j1kyujZcbkjTuYvZa2uY7lOAnypANVKd16DJ9/XI8fk67bKmO9AOOthIdLQkzZrdrvGjmro9zw4o5eF4E9wWwhPBC7I96t+LXpzzc/TPX/Wb0MPu9kdreAtwc8jNpth2EqFADHGNZjlWxXaikSoLx6WJHfID2LBQMUKiLsCXgBxmMHZV2eVA9ItzrWVl0Wwj6W63+VuQOoDcdgxdyIUTdZT7AfyLAw3pb1JC9NbTf1Hz/TcamhI7nm7u2yXm1KAg9Njlrtkk5jX75225fH53+nzxokuoZV62xRJ+ktGZRD5LglgMm1MgF4dx/9SyHVrbr9TNhBEVipNPkSg+P0sdfYc+I40aTrppHmV32EDvsKKeoaikkeATRQhM2fSGFwSSq0frMI1/8YDAKmVr2G X2m0ASW8 d7jtp6uX9zVorxTZePXz4OZaJ65Tv3IrPlyzWzI+rR8dubAggW+f/qpzYwN/njTqlfO1UIw8QjzMmIDcFpTiitnKqes8DSl/vHtIBYPSm3gUH4vV+BLXDJJ0Og4mt7H6ClTUfYUpkqcQreXn2Op5OxyK/iW3jpFtaz9++G2F/fYf1RnDEqGzk+9pnf2q66ticKvg6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 30/05/2025 09:44, David Hildenbrand wrote: > On 30.05.25 10:04, Ryan Roberts wrote: >> On 29/05/2025 09:23, Baolin Wang wrote: >>> As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore >>> the system-wide anon/shmem THP sysfs settings, which means that even though >>> we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still >>> attempt to collapse into a anon/shmem THP. This violates the rule we have >>> agreed upon: never means never. This patch set will address this issue. >> >> This is a drive-by comment from me without having the previous context, but... >> >> Surely MADV_COLLAPSE *should* ignore the THP sysfs settings? It's a deliberate >> user-initiated, synchonous request to use huge pages for a range of memory. >> There is nothing *transparent* about it, it just happens to be implemented using >> the same logic that THP uses. >> >> I always thought this was a deliberate design decision. > > If the admin said "never", then why should a user be able to overwrite that? Well my interpretation would be that the admin is saying never *transparently* give anyone any hugepages; on balance it does more harm than good for my workloads. The toggle is called transparent_hugepage/enabled, after all. Whereas MADV_COLLAPSE is deliberately applied to a specific region at an opportune moment in time, presumably because the user knows that the region *will* benefit and because that point in the execution is not sensitive to latency. I see them as logically separate. > > The design decision I recall is that if VM_NOHUGEPAGE is set, we'll ignore that. > Because that was set by the app itself (MADV_NOHUEPAGE). Hmm, ok. My instinct would have been the opposite; MADV_NOHUGEPAGE means "I don't want the risk of latency spikes and memory bloat that THP can cause". Not "ignore my explicit requests to MADV_COLLAPSE". But if that descision was already taken and that's the current behavior then I agree we have an inconsistency with respect to the sysfs control. Perhaps we should be guided by real world usage - AIUI there is a cloud that disables THP at system level today (Google?). Is there any concern that there are workloads in such environments that are using MADV_COLLAPSE today that would then see a performance drop?