From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A429372691 for ; Fri, 22 May 2026 15:02:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779462158; cv=none; b=R6fCzN7mK+6SkBqx8m1QYK9laiRcGg8QVUR8py/8/r+SVJfz58mHW3CZXLUXH00axPtphHqsomyGn//hh8R4jioJCTt6pLjzNZNHLK8KsbY5BEp3USkJDes77n0Rx/FJtP+50R4VzjEhokRu5XYBeIobPstm9ctSFbI+t7/culw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779462158; c=relaxed/simple; bh=VBiU6XXkA7/GlZjNBMyJ3BogLBEUIYMg5v4iaL9/5FU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=k5oYpOKsQ8yWowoOHWfNMy2jNCdoEc7fF7UtraWRhJ7/1tmm1/WaWZcT7K97WTZAu7Jsbt8VgqxxtBxMqnv7cvnXGTTjnCKD2Uhowbjbw/KqPXOpCOK/tGRvFB/oS+nRQNMF41Tf6nSo02AUqCJJ6vqIgjtC/ruFofi/t7M+z10= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=eCUkL1Yr; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eCUkL1Yr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779462152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/C8PY94GSpJ6xmfMZMGQDwcUgEw3YhE561RWk3Turdw=; b=eCUkL1YrS5OL+1zX7iEzlqzcXCg8oH9ZesOn9zVNkZtFEscLUtd6Xn8o4Vq6ujp2qzk2+2 bAgS4L2W9Hb5U0YgG2xBj3vZ42ItBv8ZV2RJhRG7pKSwGYL93CSZQEOB7x+aYXPqWts72i Fo8YOQsgwLOzUCNkTavChkVLEAGgwcQ= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-465-n9qmjoDdOyKBPg5T41Tu3w-1; Fri, 22 May 2026 11:02:29 -0400 X-MC-Unique: n9qmjoDdOyKBPg5T41Tu3w-1 X-Mimecast-MFC-AGG-ID: n9qmjoDdOyKBPg5T41Tu3w_1779462148 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E600F1800343; Fri, 22 May 2026 15:02:27 +0000 (UTC) Received: from p1.redhat.com (unknown [10.44.24.8]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 11C75180034E; Fri, 22 May 2026 15:02:08 +0000 (UTC) From: Nico Pache To: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Cc: aarcange@redhat.com, akpm@linux-foundation.org, anshuman.khandual@arm.com, apopple@nvidia.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, byungchul@sk.com, catalin.marinas@arm.com, cl@gentwo.org, corbet@lwn.net, dave.hansen@linux.intel.com, david@kernel.org, dev.jain@arm.com, gourry@gourry.net, hannes@cmpxchg.org, hughd@google.com, jack@suse.cz, jackmanb@google.com, jannh@google.com, jglisse@google.com, joshua.hahnjy@gmail.com, kas@kernel.org, lance.yang@linux.dev, liam@infradead.org, ljs@kernel.org, mathieu.desnoyers@efficios.com, matthew.brost@intel.com, mhiramat@kernel.org, mhocko@suse.com, npache@redhat.com, peterx@redhat.com, pfalcato@suse.de, rakie.kim@sk.com, raquini@redhat.com, rdunlap@infradead.org, richard.weiyang@gmail.com, rientjes@google.com, rostedt@goodmis.org, rppt@kernel.org, ryan.roberts@arm.com, shivankg@amd.com, sunnanyong@huawei.com, surenb@google.com, thomas.hellstrom@linux.intel.com, tiwai@suse.de, usamaarif642@gmail.com, vbabka@suse.cz, vishal.moola@gmail.com, wangkefeng.wang@huawei.com, will@kernel.org, willy@infradead.org, yang@os.amperecomputing.com, ying.huang@linux.alibaba.com, ziy@nvidia.com, zokeefe@google.com Subject: [PATCH mm-unstable v18 08/14] mm/khugepaged: add per-order mTHP collapse failure statistics Date: Fri, 22 May 2026 09:00:03 -0600 Message-ID: <20260522150009.121603-9-npache@redhat.com> In-Reply-To: <20260522150009.121603-1-npache@redhat.com> References: <20260522150009.121603-1-npache@redhat.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-MFC-PROC-ID: WZ-kcOieezlsLxjReI65c_T1B7Kz-g7vKj4nsTIavsE_1779462148 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true Add three new mTHP statistics to track collapse failures for different orders when encountering swap PTEs, excessive none PTEs, and shared PTEs: - collapse_exceed_swap_pte: Increment when mTHP collapse fails due to encountering a swap PTE. - collapse_exceed_none_pte: Counts when mTHP collapse fails due to exceeding the none PTE threshold for the given order - collapse_exceed_shared_pte: Counts when mTHP collapse fails due to encountering a shared PTE. These statistics complement the existing THP_SCAN_EXCEED_* events by providing per-order granularity for mTHP collapse attempts. The stats are exposed via sysfs under `/sys/kernel/mm/transparent_hugepage/hugepages-*/stats/` for each supported hugepage size. As we currently do not support collapsing mTHPs that contain a swap or shared entry, those statistics keep track of how often we are encountering failed mTHP collapses due to these restrictions. We will add support for mTHP collapse for anonymous pages next; lets also track when this happens at the PMD level within the per-mTHP stats. Signed-off-by: Nico Pache --- Documentation/admin-guide/mm/transhuge.rst | 14 ++++++++++++++ include/linux/huge_mm.h | 3 +++ mm/huge_memory.c | 7 +++++++ mm/khugepaged.c | 21 +++++++++++++++++++-- 4 files changed, 43 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index c51932e6275d..80a4d0bed70b 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -714,6 +714,20 @@ nr_anon_partially_mapped an anonymous THP as "partially mapped" and count it here, even though it is not actually partially mapped anymore. +collapse_exceed_none_pte + The number of collapse attempts that failed due to exceeding the + max_ptes_none threshold. + +collapse_exceed_swap_pte + The number of collapse attempts that failed due to exceeding the + max_ptes_swap threshold. For non-PMD orders this occurs if a mTHP range + contains at least one swap PTE. + +collapse_exceed_shared_pte + The number of collapse attempts that failed due to exceeding the + max_ptes_shared threshold. For non-PMD orders this occurs if a mTHP range + contains at least one shared PTE. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index ba7ae6808544..48496f09909b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -144,6 +144,9 @@ enum mthp_stat_item { MTHP_STAT_SPLIT_DEFERRED, MTHP_STAT_NR_ANON, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, + MTHP_STAT_COLLAPSE_EXCEED_SWAP, + MTHP_STAT_COLLAPSE_EXCEED_NONE, + MTHP_STAT_COLLAPSE_EXCEED_SHARED, __MTHP_STAT_COUNT }; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 345c54133c83..5c128cdec810 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -703,6 +703,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON); DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_SWAP); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_NONE); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + static struct attribute *anon_stats_attrs[] = { &anon_fault_alloc_attr.attr, @@ -719,6 +723,9 @@ static struct attribute *anon_stats_attrs[] = { &split_deferred_attr.attr, &nr_anon_attr.attr, &nr_anon_partially_mapped_attr.attr, + &collapse_exceed_swap_pte_attr.attr, + &collapse_exceed_none_pte_attr.attr, + &collapse_exceed_shared_pte_attr.attr, NULL, }; diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 928e32a0d4d7..fff6a8fbf1d4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -649,7 +649,9 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma, if (pte_none_or_zero(pteval)) { if (++none_or_zero > max_ptes_none) { result = SCAN_EXCEED_NONE_PTE; - count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + if (is_pmd_order(order)) + count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE); goto out; } continue; @@ -683,9 +685,17 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma, /* See collapse_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { + /* + * TODO: Support shared pages without leading to further + * mTHP collapses. Currently bringing in new pages via + * shared may cause a future higher order collapse on a + * rescan of the same range. + */ if (++shared > max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; - count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); + if (is_pmd_order(order)) + count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); goto out; } } @@ -1138,6 +1148,7 @@ static enum scan_result __collapse_huge_page_swapin(struct mm_struct *mm, * range. */ if (!is_pmd_order(order)) { + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SWAP); pte_unmap(pte); mmap_read_unlock(mm); result = SCAN_EXCEED_SWAP_PTE; @@ -1433,6 +1444,8 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm, if (++none_or_zero > max_ptes_none) { result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + count_mthp_stat(HPAGE_PMD_ORDER, + MTHP_STAT_COLLAPSE_EXCEED_NONE); goto out_unmap; } continue; @@ -1441,6 +1454,8 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm, if (++unmapped > max_ptes_swap) { result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); + count_mthp_stat(HPAGE_PMD_ORDER, + MTHP_STAT_COLLAPSE_EXCEED_SWAP); goto out_unmap; } /* @@ -1498,6 +1513,8 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm, if (++shared > max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); + count_mthp_stat(HPAGE_PMD_ORDER, + MTHP_STAT_COLLAPSE_EXCEED_SHARED); goto out_unmap; } } -- 2.54.0