From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C8F92EB85E for ; Tue, 19 Aug 2025 13:49:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611372; cv=none; b=fEltio4hnONmmtn6wJ+31JVvuzEEfy1eA22DMO7Ie1fKjWzo3O8eaDEKpD4imdnvE1i6VP+ImaHs4t0U5jP2Fuf1VZ8GUKJgchJETDnubnqF7jL850EWSw6EwL/S9PAzH6yvESfw85Gzri3LYlU5yOCK4BhwIklCkJYF2yLgWhU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755611372; c=relaxed/simple; bh=4MB8zoNyUP9PNYSZUbYxzZb9F2jIAlPMVEyNAZh5fE4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KeN/nDhGJ8fz6PD22dKrMt8A8NINnN/W+4/a4eoA+vafb2j5TAHSfPIOC/56lRlAHRb/qM7rv+8z9GpAxaSUqjhwxM4u7oSeTVahTIaee038/fxj8pYRfcGCXAfRa0UM9Xo+uSdXmRkC0bb1W22YSrjynBISye/D2ueJU3rjhoY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=a8l5iq8d; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="a8l5iq8d" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755611369; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KQ5SyoE/MYPNZDWioj0kryq2voWpDfBzmzGzBWhm6as=; b=a8l5iq8dtXkcte7z8TRpioe6/rcAej88SK4liXz9y7DNKocSkClFvbLziWQZNWO42jAMOp A5Lgm+LrUYImslJjVFzFOHmafNDdsvA+5UauW91VxIVHMzfS3sT3fCORcEnFkyny/jYP/s 0Qh9edWmgzoUDWMJfbJriec+t/wCNrI= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-322-uSjnhtN9NuKn8IW582EbqA-1; Tue, 19 Aug 2025 09:49:25 -0400 X-MC-Unique: uSjnhtN9NuKn8IW582EbqA-1 X-Mimecast-MFC-AGG-ID: uSjnhtN9NuKn8IW582EbqA_1755611360 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1CB27180029C; Tue, 19 Aug 2025 13:49:20 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.64.137]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DE3E0195419F; Tue, 19 Aug 2025 13:49:01 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com, Bagas Sanjaya Subject: [PATCH v10 13/13] Documentation: mm: update the admin guide for mTHP collapse Date: Tue, 19 Aug 2025 07:48:24 -0600 Message-ID: <20250819134824.623535-2-npache@redhat.com> In-Reply-To: <20250819134824.623535-1-npache@redhat.com> References: <20250819134824.623535-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Now that we can collapse to mTHPs lets update the admin guide to reflect these changes and provide proper guidence on how to utilize it. Reviewed-by: Bagas Sanjaya Signed-off-by: Nico Pache --- Documentation/admin-guide/mm/transhuge.rst | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index b85547ac4fe9..1f9e6a32052c 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -63,7 +63,7 @@ often. THP can be enabled system wide or restricted to certain tasks or even memory ranges inside task's address space. Unless THP is completely disabled, there is ``khugepaged`` daemon that scans memory and -collapses sequences of basic pages into PMD-sized huge pages. +collapses sequences of basic pages into huge pages. The THP behaviour is controlled via :ref:`sysfs ` interface and using madvise(2) and prctl(2) system calls. @@ -149,6 +149,18 @@ hugepage sizes have enabled="never". If enabling multiple hugepage sizes, the kernel will select the most appropriate enabled size for a given allocation. +khugepaged uses max_ptes_none scaled to the order of the enabled mTHP size +to determine collapses. When using mTHPs it's recommended to set +max_ptes_none low-- ideally less than HPAGE_PMD_NR / 2 (255 on 4k page +size). This will prevent undesired "creep" behavior that leads to +continuously collapsing to the largest mTHP size; when we collapse, we are +bringing in new non-zero pages that will, on a subsequent scan, cause the +max_ptes_none check of the +1 order to always be satisfied. By limiting +this to less than half the current order, we make sure we don't cause this +feedback loop. max_ptes_shared and max_ptes_swap have no effect when +collapsing to a mTHP, and mTHP collapse will fail on shared or swapped out +pages. + It's also possible to limit defrag efforts in the VM to generate anonymous hugepages in case they're not immediately free to madvise regions or to never try to defrag memory and simply fallback to regular @@ -264,11 +276,6 @@ support the following arguments:: Khugepaged controls ------------------- -.. note:: - khugepaged currently only searches for opportunities to collapse to - PMD-sized THP and no attempt is made to collapse to other THP - sizes. - khugepaged runs usually at low frequency so while one may not want to invoke defrag algorithms synchronously during the page faults, it should be worth invoking defrag at least in khugepaged. However it's -- 2.50.1