From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D991AC83F17 for ; Fri, 11 Jul 2025 00:23:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 78C086B0092; Thu, 10 Jul 2025 20:23:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7630F6B0093; Thu, 10 Jul 2025 20:23:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 651B66B0095; Thu, 10 Jul 2025 20:23:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 521B76B0092 for ; Thu, 10 Jul 2025 20:23:46 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B01A780C82 for ; Fri, 11 Jul 2025 00:23:45 +0000 (UTC) X-FDA: 83650085610.21.2FADB37 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf13.hostedemail.com (Postfix) with ESMTP id B0C3520003 for ; Fri, 11 Jul 2025 00:23:43 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=BKKRYfXk; dmarc=pass (policy=reject) header.from=purestorage.com; spf=pass (imf13.hostedemail.com: domain of cachen@purestorage.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=cachen@purestorage.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752193424; a=rsa-sha256; cv=none; b=WDJ7f/dA7SbDrDJMA2eB8WKj8VWm5ilvvwWlXpd3gs1wuUULeDh2v92k+ZpI+i8neOg4dB 6s9lmUKI67/mWAEnlMiyea2Z6uKftYeLEPdm1g9mv8Tw1IQsUB2v9YWgjNt0xDtaAlFsBy t7ipmUNtX5C3LmhhH/rAl6ne81dqVSc= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=BKKRYfXk; dmarc=pass (policy=reject) header.from=purestorage.com; spf=pass (imf13.hostedemail.com: domain of cachen@purestorage.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=cachen@purestorage.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752193424; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=VpKL4Lr9lJYqhz/bcIZyuWD+5Yiwyuqv6Dq41oGuJ5w=; b=Ji3Hl3EoIqCMe0ObBw9H1hQ5yAiG0ouT1Ri1IIaUcrRBtyR3bkAXMzVT5xDmTpz28ZnEtJ 5zt7Zt9Y28Zd3n/HQhAP5vhDEg4C829DnS1GpIn5DmXVdZ4iLQ/NFR3nSbtFatMC13w4Ma DUbR/JtF/P8UWtmGTpQd+928Uzny/Mo= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-23c71b21f72so2171265ad.2 for ; Thu, 10 Jul 2025 17:23:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1752193422; x=1752798222; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=VpKL4Lr9lJYqhz/bcIZyuWD+5Yiwyuqv6Dq41oGuJ5w=; b=BKKRYfXkyI7UCSqHppZxNldN81AqiqvbKj2tBx8hw+ve6EXjugEuGd9CDoaYnpE1kK rJLHFACM7QqJl3SuP1O22aWejGCi47lvidbNED04frKEx/rurYy21/Bg60JYn5B0GXYF +F/PolhcJ/H46ICdTWdIg34yqtR6r0ZslI7Ut/aA5/LG2jc4mn42P2T05gDbpNiuAj+K X0DIODxrrzdBRsjeh1RJvwM4O/vjcRbVeQfZzz8SX2H9dgr6f2P4I8r5zwnK/idzdws8 70u078YwDpP6Ky+CJyLzUdSpUXQF5I4DCyUiKHAjCz5MiHbSGK84blv1ytlA378X8KOm V0yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752193422; x=1752798222; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VpKL4Lr9lJYqhz/bcIZyuWD+5Yiwyuqv6Dq41oGuJ5w=; b=lNKrHR1GQWJjt3/hSW6HcioUnXnRNxpCsRqPE1RUAganNPK5NAWMpyGERRYY3MFIkI VZQDqH/nkoCLF0Bo66BSujnbORxjTQP3+09g0EqyR1Krj46o5I8rhq6jalytuMkvZ6+J 6mVA+Im5UAWcG2L/7GRQhiBveDvChGW8MFu8o6/debTooRbQeN7nyZRREB+90RhBrZe1 R83pcnO3m9o69l3QGdKXupSso5L36NbIFdQf2jvPrbf1hK/me5wexz1E3DvchE5i+C95 dYRvV4ehYjQxGvdKwz+qjztmunY9zGa3JPrvwSm8SLnwQy4iD92Aa9/9UTg63Brzv2va 97jA== X-Forwarded-Encrypted: i=1; AJvYcCWOQz2lQTqHdS9V/Ty1zmRMH/Kul2vJxkeRDANBjABmwbtfmyp2tnwUipvkdXWS4XwUilsD8I7iaQ==@kvack.org X-Gm-Message-State: AOJu0YzYj0f831XEK1A3I9VCUP+w5/tMB7/NvoX2HP1n964K+T5943t4 88/lH9pXyDTZZYfvvp/FEuHZjXqqX8+UB9X1fDwtg+cCV4Rg9HAmhF6NDUgn6+Fvkbk= X-Gm-Gg: ASbGnctJSrYI+jTV7ooGMZY1MqnJetEsAvi3aT53dTYS3y6DWC+TtuY3OB4eS5dH5TN rFHk8/4/d0aPcHdBB5A3A5vdkOmOirOJTfK9TnDep6pVLlzNNHs7elw5hIWYylxMFPMSlx/qJmd CXYsrivj/6uUPHCnoyyYIRW0Atb50X/neZnPTm+F0JU5mCgdLL+/Rfh6duPQXFcmVsdx9W0siA1 To45zjP024sGXPhZZu5mTV0ynS4O6EikFJBBPb60U/7aLu1esGPP8LT2+kWgvIBkfwq/9f8UZG4 w35YOnA1rLSBtwRYSBFc0UEf9GqgFhRooe4BfD2SK6wfLDcc5yPMbR0WdDTqnV6RkAlVNpujImS V8cFZ/VgtvXSwzPswnLApK2syCH5UBvjudj37hBVUbA== X-Google-Smtp-Source: AGHT+IGe7tsbgRE+IJwc7kAz6u5dCKZoWA5wX6lbInccD12yyxOY+GM8OVHbAP/fh3Q6DuYbjuJ5mQ== X-Received: by 2002:a17:903:41c4:b0:235:f1e4:3381 with SMTP id d9443c01a7336-23dede4e7b5mr5325275ad.8.1752193422279; Thu, 10 Jul 2025 17:23:42 -0700 (PDT) Received: from dev-cachen2.dev.purestorage.com ([2620:125:9007:640:ffff::5458]) by smtp.googlemail.com with ESMTPSA id d9443c01a7336-23de4284898sm33601555ad.24.2025.07.10.17.23.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Jul 2025 17:23:41 -0700 (PDT) From: Casey Chen To: akpm@linux-foundation.org, surenb@google.com Cc: kent.overstreet@linux.dev, corbet@lwn.net, dennis@kernel.org, tj@kernel.org, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, rientjes@google.com, roman.gushchin@linux.dev, harry.yoo@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, yzhong@purestorage.com, souravpanda@google.com, 00107082@163.com, Casey Chen Subject: [PATCH v3] alloc_tag: add per-NUMA node stats Date: Thu, 10 Jul 2025 18:23:22 -0600 Message-Id: <20250711002322.1303421-1-cachen@purestorage.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam11 X-Rspam-User: X-Stat-Signature: 1bs38usm1jad8p1tdhtytg9spfhxacbz X-Rspamd-Queue-Id: B0C3520003 X-HE-Tag: 1752193423-907621 X-HE-Meta: U2FsdGVkX1/idebdej4iEdr0etG5Uq58eP3CtD0cul05gJ3OmErxRQUn+Zi6Q4VI7RXblf0yfQUOJMTJLA0vEi3JevqUDGk2lLrWlnbtD/PNGXL9CKLAQlh7vlk772qjWoYWBlLuYQf/XkIPpT5akUxlQe96IA+MX8crVZeo7EFQAPGSz3jcvi7xQlbVOjR8/zmZ+lT82xXGOmj8902X2/s6io7L0zrjQtLy2F3GsddmGeRHoBRtEgp616XzbjycdYVdhG/Tw4E5Ewndqiuffw42Ce6+R7PBZyi7bYoTMUUaKGdw5+NIoYK1FcA3MSNyNcY0yZ63Vs/XFNpdgLtbafn44KR8/ijurc5lM82nWlexuNdY31S/Rd5cm6G0zk3kF7PvP0UVOJzarmE0vnsuJ4beyhFqu8TBW2nJ9LXuaAlVhXNJ+/jwXXwbOsuXsG5KPScg7+YV7Y5BQ+KBGp3CmnmpWWWhyNefLV5x0tdNGuBEp4FRTc46ukWWNqN7xuPlV+tZ3H8SPwYqDfunGZAgWKYDI28p1XON1pdcLCQuWXz+40EpPHRFkbM6KId8NRsNJkJUK+AgYIzl8DJ8cKu1oPU+rw4vVQ+OD2aHeLLSRFCiDuD6lLNX2Ykh9E5yvhM3yASZXgyoqg74m4C4AzDqGrAo6gJL0fGLVldaXCxnKXzqdlq4bs+gdNPCmBHtlcOyJHUmwwh6FKXjRBPCMvGpA7BlMnvcqiLhHoctum8IJugNbM2hI3fcGcPB3xEC5fVY2VnkE/Z1sO86GtvJyGLHEB1awC1BO2ezYxzmPhfp3HjY+AAPeyF+QJGzc4/7+FBvbi6YbHuzd86dF0SO4hPvTb38LlWNvt+GRqxxG84Vl/9gvUgOkGWOooG1E2KOvgarsPh2zcSfHlatl1hzcNshxuc+hXKLPKNOunS8JA2Tus9YyFcCvg6gLspPHo5SUp3r9kNaL0O6ULC12w21FkV fDSoiPI9 2FD28SL6Vab35L/wsqAfOc6iMQX2A6X1SwgFFwM2J1FZK+HVKc2rrW8sDt9tWAPTgybtppRK4mE5c4/rhq3SbEoih6nBKrR6r4dLGm+TlWcnIJpTvnaB3BlovhA2x9EgGjhFYtjPejccVXMTz0GooVFJNRR48izRUx/+Zp71U3P+EGYqGTWdM0++OP+U8OVIN5UhzXve1+u9QJMHi1LRaezTCfAH8y/Ou2VuqnJf5u0nd9IWznEnjK16rSz+P3/t20u9lwS8BwnUe/fYmyMAn5FZF+QcfdTlM9t5hSzlKS0kLSJk/4mrmp2ENZBLwS3Dnh8376D54Hwv7C3uIOBzUCr6k3z+R5rENFWbVmOaeoYyqUC790ynoGikbbmScx7PW9s5rHuyP89F3RtpD+hI/ip23VLzOF2ySZL8sNml9LFzs0XOoXbJ1+s4rdPN48eHuOMs2fX60FTwDzJtDAd1UEd9h6A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch adds per-NUMA node breakdown of memory allocation, enabling more precise visibility into memory usage patterns across nodes. It is particularly valuable in cloud environments, where tracking asymmetric memory usage and identifying NUMA imbalances down to the allocation caller helps optimize memory efficiency, avoid CPU stranding, and improve system responsiveness under memory pressure. As implementation, it adds per-NUMA node statistics in /proc/allocinfo. Previously, each alloc_tag had a single set of counters (bytes and calls), aggregated across all CPUs. With this change, each CPU can maintain separate counters for each NUMA node, allowing finer-grained memory allocation profiling. This feature is controlled by the new CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS option: * When enabled (=y), the output includes per-node statistics following the total bytes/calls: ... 315456 9858 mm/dmapool.c:338 func:pool_alloc_page nid0 94912 2966 nid1 220544 6892 7680 60 mm/dmapool.c:254 func:dma_pool_create nid0 4224 33 nid1 3456 27 * When disabled (=n), the output remains unchanged: ... 315456 9858 mm/dmapool.c:338 func:pool_alloc_page 7680 60 mm/dmapool.c:254 func:dma_pool_create To minimize memory overhead, per-NUMA stats counters are dynamically allocated using the percpu allocator. PERCPU_DYNAMIC_RESERVE has been increased to ensure sufficient space for in-kernel alloc_tag counters. For in-kernel alloc_tag instances, pcpu_alloc_noprof() is used to allocate counters. These allocations are excluded from the profiling statistics themselves. Link: https://lore.kernel.org/all/20250610233053.973796-1-cachen@purestorage.com Link: https://lore.kernel.org/all/20250530003944.2929392-1-cachen@purestorage.com Signed-off-by: Casey Chen Reviewed-by: Yuanyuan Zhong Cc: David Rientjes Cc: Sourav Panda --- Documentation/mm/allocation-profiling.rst | 3 ++ include/linux/alloc_tag.h | 52 ++++++++++++++++------ include/linux/codetag.h | 4 ++ include/linux/percpu.h | 2 +- lib/Kconfig.debug | 7 +++ lib/alloc_tag.c | 54 ++++++++++++++++++++--- mm/page_alloc.c | 35 ++++++++------- mm/percpu.c | 8 +++- mm/show_mem.c | 25 ++++++++--- mm/slub.c | 11 +++-- 10 files changed, 150 insertions(+), 51 deletions(-) diff --git a/Documentation/mm/allocation-profiling.rst b/Documentation/mm/allocation-profiling.rst index 316311240e6a..13d1d0cb91bf 100644 --- a/Documentation/mm/allocation-profiling.rst +++ b/Documentation/mm/allocation-profiling.rst @@ -17,6 +17,9 @@ kconfig options: adds warnings for allocations that weren't accounted because of a missing annotation +- CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS + adds memory allocation profiling stats for each numa node, off by default. + Boot parameter: sysctl.vm.mem_profiling={0|1|never}[,compressed] diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h index 9ef2633e2c08..f714f1a436ec 100644 --- a/include/linux/alloc_tag.h +++ b/include/linux/alloc_tag.h @@ -15,6 +15,12 @@ #include #include +#ifdef CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS +#define ALLOC_TAG_NUM_NODES num_possible_nodes() +#else +#define ALLOC_TAG_NUM_NODES 1 +#endif + struct alloc_tag_counters { u64 bytes; u64 calls; @@ -134,16 +140,33 @@ static inline bool mem_alloc_profiling_enabled(void) &mem_alloc_profiling_key); } +static inline struct alloc_tag_counters alloc_tag_read_nid(struct alloc_tag *tag, int nid) +{ + struct alloc_tag_counters v = { 0, 0 }; + struct alloc_tag_counters *counters; + int cpu; + + for_each_possible_cpu(cpu) { + counters = per_cpu_ptr(tag->counters, cpu); + v.bytes += counters[nid].bytes; + v.calls += counters[nid].calls; + } + + return v; +} + static inline struct alloc_tag_counters alloc_tag_read(struct alloc_tag *tag) { struct alloc_tag_counters v = { 0, 0 }; - struct alloc_tag_counters *counter; + struct alloc_tag_counters *counters; int cpu; for_each_possible_cpu(cpu) { - counter = per_cpu_ptr(tag->counters, cpu); - v.bytes += counter->bytes; - v.calls += counter->calls; + counters = per_cpu_ptr(tag->counters, cpu); + for (int nid = 0; nid < ALLOC_TAG_NUM_NODES; nid++) { + v.bytes += counters[nid].bytes; + v.calls += counters[nid].calls; + } } return v; @@ -179,7 +202,7 @@ static inline bool __alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag return true; } -static inline bool alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag) +static inline bool alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag, int nid) { if (unlikely(!__alloc_tag_ref_set(ref, tag))) return false; @@ -190,17 +213,18 @@ static inline bool alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *t * Each new reference for every sub-allocation needs to increment call * counter because when we free each part the counter will be decremented. */ - this_cpu_inc(tag->counters->calls); + this_cpu_inc(tag->counters[nid].calls); return true; } -static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag, size_t bytes) +static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag, + int nid, size_t bytes) { - if (likely(alloc_tag_ref_set(ref, tag))) - this_cpu_add(tag->counters->bytes, bytes); + if (likely(alloc_tag_ref_set(ref, tag, nid))) + this_cpu_add(tag->counters[nid].bytes, bytes); } -static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) +static inline void alloc_tag_sub(union codetag_ref *ref, int nid, size_t bytes) { struct alloc_tag *tag; @@ -215,8 +239,8 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) tag = ct_to_alloc_tag(ref->ct); - this_cpu_sub(tag->counters->bytes, bytes); - this_cpu_dec(tag->counters->calls); + this_cpu_sub(tag->counters[nid].bytes, bytes); + this_cpu_dec(tag->counters[nid].calls); ref->ct = NULL; } @@ -228,8 +252,8 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) #define DEFINE_ALLOC_TAG(_alloc_tag) static inline bool mem_alloc_profiling_enabled(void) { return false; } static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag, - size_t bytes) {} -static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {} + int nid, size_t bytes) {} +static inline void alloc_tag_sub(union codetag_ref *ref, int nid, size_t bytes) {} #define alloc_tag_record(p) do {} while (0) #endif /* CONFIG_MEM_ALLOC_PROFILING */ diff --git a/include/linux/codetag.h b/include/linux/codetag.h index 457ed8fd3214..35b314b36633 100644 --- a/include/linux/codetag.h +++ b/include/linux/codetag.h @@ -16,6 +16,10 @@ struct module; #define CODETAG_SECTION_START_PREFIX "__start_" #define CODETAG_SECTION_STOP_PREFIX "__stop_" +enum codetag_flags { + CODETAG_PERCPU_ALLOC = (1 << 0), /* codetag tracking percpu allocation */ +}; + /* * An instance of this structure is created in a special ELF section at every * code location being tagged. At runtime, the special section is treated as diff --git a/include/linux/percpu.h b/include/linux/percpu.h index 85bf8dd9f087..d92c27fbcd0d 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -43,7 +43,7 @@ # define PERCPU_DYNAMIC_SIZE_SHIFT 12 #endif /* LOCKDEP and PAGE_SIZE > 4KiB */ #else -#define PERCPU_DYNAMIC_SIZE_SHIFT 10 +#define PERCPU_DYNAMIC_SIZE_SHIFT 13 #endif /* diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index ebe33181b6e6..b2a35cc78635 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1037,6 +1037,13 @@ config MEM_ALLOC_PROFILING_DEBUG Adds warnings with helpful error messages for memory allocation profiling. +config MEM_ALLOC_PROFILING_PER_NUMA_STATS + bool "Memory allocation profiling per-NUMA stats" + default n + depends on MEM_ALLOC_PROFILING + help + Display allocation stats on every NUMA node. + source "lib/Kconfig.kasan" source "lib/Kconfig.kfence" source "lib/Kconfig.kmsan" diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c index e9b33848700a..3b170847f547 100644 --- a/lib/alloc_tag.c +++ b/lib/alloc_tag.c @@ -40,6 +40,9 @@ struct alloc_tag_kernel_section kernel_tags = { NULL, 0 }; unsigned long alloc_tag_ref_mask; int alloc_tag_ref_offs; +/* Total size of all alloc_tag_counters of each CPU */ +static unsigned long pcpu_counters_size; + struct allocinfo_private { struct codetag_iterator iter; bool print_header; @@ -81,7 +84,7 @@ static void print_allocinfo_header(struct seq_buf *buf) { /* Output format version, so we can change it. */ seq_buf_printf(buf, "allocinfo - version: 1.0\n"); - seq_buf_printf(buf, "# \n"); + seq_buf_printf(buf, " \n"); } static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct) @@ -90,12 +93,32 @@ static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct) struct alloc_tag_counters counter = alloc_tag_read(tag); s64 bytes = counter.bytes; - seq_buf_printf(out, "%12lli %8llu ", bytes, counter.calls); + seq_buf_printf(out, "%-12lli %-8llu ", bytes, counter.calls); codetag_to_text(out, ct); seq_buf_putc(out, ' '); seq_buf_putc(out, '\n'); } +#ifdef CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS +static void alloc_tag_to_text_all_nids(struct seq_buf *out, struct codetag *ct) +{ + struct alloc_tag *tag = ct_to_alloc_tag(ct); + struct alloc_tag_counters counter; + s64 bytes; + + for (int nid = 0; nid < ALLOC_TAG_NUM_NODES; nid++) { + counter = alloc_tag_read_nid(tag, nid); + bytes = counter.bytes; + seq_buf_printf(out, " nid%-5u %-12lli %-8llu\n", + nid, bytes, counter.calls); + } +} +#else +static void alloc_tag_to_text_all_nids(struct seq_buf *out, struct codetag *ct) +{ +} +#endif + static int allocinfo_show(struct seq_file *m, void *arg) { struct allocinfo_private *priv = (struct allocinfo_private *)arg; @@ -109,6 +132,7 @@ static int allocinfo_show(struct seq_file *m, void *arg) priv->print_header = false; } alloc_tag_to_text(&buf, priv->iter.ct); + alloc_tag_to_text_all_nids(&buf, priv->iter.ct); seq_commit(m, seq_buf_used(&buf)); return 0; } @@ -180,7 +204,7 @@ void pgalloc_tag_split(struct folio *folio, int old_order, int new_order) if (get_page_tag_ref(folio_page(folio, i), &ref, &handle)) { /* Set new reference to point to the original tag */ - alloc_tag_ref_set(&ref, tag); + alloc_tag_ref_set(&ref, tag, folio_nid(folio)); update_page_tag_ref(handle, &ref); put_page_tag_ref(handle); } @@ -247,15 +271,29 @@ void __init alloc_tag_sec_init(void) if (!mem_profiling_support) return; - if (!static_key_enabled(&mem_profiling_compressed)) - return; - kernel_tags.first_tag = (struct alloc_tag *)kallsyms_lookup_name( SECTION_START(ALLOC_TAG_SECTION_NAME)); last_codetag = (struct alloc_tag *)kallsyms_lookup_name( SECTION_STOP(ALLOC_TAG_SECTION_NAME)); kernel_tags.count = last_codetag - kernel_tags.first_tag; + pcpu_counters_size = ALLOC_TAG_NUM_NODES * sizeof(struct alloc_tag_counters); + for (int i = 0; i < kernel_tags.count; i++) { + /* Each CPU has one alloc_tag_counters per numa node */ + kernel_tags.first_tag[i].counters = + pcpu_alloc_noprof(pcpu_counters_size, + sizeof(struct alloc_tag_counters), + false, GFP_KERNEL | __GFP_ZERO); + if (!kernel_tags.first_tag[i].counters) { + while (--i >= 0) + free_percpu(kernel_tags.first_tag[i].counters); + panic("Failed to allocate per-cpu alloc_tag counters\n"); + } + } + + if (!static_key_enabled(&mem_profiling_compressed)) + return; + /* Check if kernel tags fit into page flags */ if (kernel_tags.count > (1UL << NR_UNUSED_PAGEFLAG_BITS)) { shutdown_mem_profiling(false); /* allocinfo file does not exist yet */ @@ -618,7 +656,9 @@ static int load_module(struct module *mod, struct codetag *start, struct codetag stop_tag = ct_to_alloc_tag(stop); for (tag = start_tag; tag < stop_tag; tag++) { WARN_ON(tag->counters); - tag->counters = alloc_percpu(struct alloc_tag_counters); + tag->counters = __alloc_percpu_gfp(pcpu_counters_size, + sizeof(struct alloc_tag_counters), + GFP_KERNEL | __GFP_ZERO); if (!tag->counters) { while (--tag >= start_tag) { free_percpu(tag->counters); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 78ddf1d43c6c..7c4d10f6873c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1247,58 +1247,59 @@ void __clear_page_tag_ref(struct page *page) /* Should be called only if mem_alloc_profiling_enabled() */ static noinline void __pgalloc_tag_add(struct page *page, struct task_struct *task, - unsigned int nr) + int nid, unsigned int nr) { union pgtag_ref_handle handle; union codetag_ref ref; if (get_page_tag_ref(page, &ref, &handle)) { - alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr); + alloc_tag_add(&ref, task->alloc_tag, nid, PAGE_SIZE * nr); update_page_tag_ref(handle, &ref); put_page_tag_ref(handle); } } static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, - unsigned int nr) + int nid, unsigned int nr) { if (mem_alloc_profiling_enabled()) - __pgalloc_tag_add(page, task, nr); + __pgalloc_tag_add(page, task, nid, nr); } /* Should be called only if mem_alloc_profiling_enabled() */ static noinline -void __pgalloc_tag_sub(struct page *page, unsigned int nr) +void __pgalloc_tag_sub(struct page *page, int nid, unsigned int nr) { union pgtag_ref_handle handle; union codetag_ref ref; if (get_page_tag_ref(page, &ref, &handle)) { - alloc_tag_sub(&ref, PAGE_SIZE * nr); + alloc_tag_sub(&ref, nid, PAGE_SIZE * nr); update_page_tag_ref(handle, &ref); put_page_tag_ref(handle); } } -static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) +static inline void pgalloc_tag_sub(struct page *page, int nid, unsigned int nr) { if (mem_alloc_profiling_enabled()) - __pgalloc_tag_sub(page, nr); + __pgalloc_tag_sub(page, nid, nr); } /* When tag is not NULL, assuming mem_alloc_profiling_enabled */ -static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) +static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, + int nid, unsigned int nr) { if (tag) - this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr); + this_cpu_sub(tag->counters[nid].bytes, PAGE_SIZE * nr); } #else /* CONFIG_MEM_ALLOC_PROFILING */ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, - unsigned int nr) {} -static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {} -static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {} + int nid, unsigned int nr) {} +static inline void pgalloc_tag_sub(struct page *page, int nid, unsigned int nr) {} +static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, int nid, unsigned int nr) {} #endif /* CONFIG_MEM_ALLOC_PROFILING */ @@ -1337,7 +1338,7 @@ __always_inline bool free_pages_prepare(struct page *page, /* Do not let hwpoison pages hit pcplists/buddy */ reset_page_owner(page, order); page_table_check_free(page, order); - pgalloc_tag_sub(page, 1 << order); + pgalloc_tag_sub(page, page_to_nid(page), 1 << order); /* * The page is isolated and accounted for. @@ -1394,7 +1395,7 @@ __always_inline bool free_pages_prepare(struct page *page, page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; reset_page_owner(page, order); page_table_check_free(page, order); - pgalloc_tag_sub(page, 1 << order); + pgalloc_tag_sub(page, page_to_nid(page), 1 << order); if (!PageHighMem(page)) { debug_check_no_locks_freed(page_address(page), @@ -1850,7 +1851,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, set_page_owner(page, order, gfp_flags); page_table_check_alloc(page, order); - pgalloc_tag_add(page, current, 1 << order); + pgalloc_tag_add(page, current, page_to_nid(page), 1 << order); } static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, @@ -5228,7 +5229,7 @@ static void ___free_pages(struct page *page, unsigned int order, if (put_page_testzero(page)) __free_frozen_pages(page, order, fpi_flags); else if (!head) { - pgalloc_tag_sub_pages(tag, (1 << order) - 1); + pgalloc_tag_sub_pages(tag, page_to_nid(page), (1 << order) - 1); while (order-- > 0) __free_frozen_pages(page + (1 << order), order, fpi_flags); diff --git a/mm/percpu.c b/mm/percpu.c index 782cc148b39c..4c5369a40323 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1691,15 +1691,19 @@ static void pcpu_alloc_tag_alloc_hook(struct pcpu_chunk *chunk, int off, size_t size) { if (mem_alloc_profiling_enabled() && likely(chunk->obj_exts)) { + /* For percpu allocation, store all alloc_tag stats on numa node 0 */ alloc_tag_add(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag, - current->alloc_tag, size); + current->alloc_tag, 0, size); + if (current->alloc_tag) + current->alloc_tag->ct.flags |= CODETAG_PERCPU_ALLOC; } } static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t size) { + /* percpu alloc_tag stats is stored on numa node 0 so subtract from node 0 */ if (mem_alloc_profiling_enabled() && likely(chunk->obj_exts)) - alloc_tag_sub(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag, size); + alloc_tag_sub(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag, 0, size); } #else static void pcpu_alloc_tag_alloc_hook(struct pcpu_chunk *chunk, int off, diff --git a/mm/show_mem.c b/mm/show_mem.c index 41999e94a56d..3939c58e55c4 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -5,6 +5,7 @@ * Copyright (C) 2008 Johannes Weiner */ +#include #include #include #include @@ -426,6 +427,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) nr = alloc_tag_top_users(tags, ARRAY_SIZE(tags), false); if (nr) { pr_notice("Memory allocations:\n"); + pr_notice(" \n"); for (i = 0; i < nr; i++) { struct codetag *ct = tags[i].ct; struct alloc_tag *tag = ct_to_alloc_tag(ct); @@ -433,16 +435,25 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) char bytes[10]; string_get_size(counter.bytes, 1, STRING_UNITS_2, bytes, sizeof(bytes)); - /* Same as alloc_tag_to_text() but w/o intermediate buffer */ if (ct->modname) - pr_notice("%12s %8llu %s:%u [%s] func:%s\n", - bytes, counter.calls, ct->filename, - ct->lineno, ct->modname, ct->function); + pr_notice("%-12s %-8llu %s:%u [%s] func:%s\n", + bytes, counter.calls, ct->filename, + ct->lineno, ct->modname, ct->function); else - pr_notice("%12s %8llu %s:%u func:%s\n", - bytes, counter.calls, ct->filename, - ct->lineno, ct->function); + pr_notice("%-12s %-8llu %s:%u func:%s\n", + bytes, counter.calls, + ct->filename, ct->lineno, ct->function); + +#ifdef CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS + for (int nid = 0; nid < ALLOC_TAG_NUM_NODES; nid++) { + counter = alloc_tag_read_nid(tag, nid); + string_get_size(counter.bytes, 1, STRING_UNITS_2, + bytes, sizeof(bytes)); + pr_notice(" nid%-5u %-12s %-8llu\n", + nid, bytes, counter.calls); + } +#endif } } } diff --git a/mm/slub.c b/mm/slub.c index c4b64821e680..1c7b10befa7c 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2106,8 +2106,12 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags) * If other users appear then mem_alloc_profiling_enabled() * check should be added before alloc_tag_add(). */ - if (likely(obj_exts)) - alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size); + if (likely(obj_exts)) { + struct page *page = virt_to_page(object); + + alloc_tag_add(&obj_exts->ref, current->alloc_tag, + page_to_nid(page), s->size); + } } static inline void @@ -2135,8 +2139,9 @@ __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p for (i = 0; i < objects; i++) { unsigned int off = obj_to_index(s, slab, p[i]); + struct page *page = virt_to_page(p[i]); - alloc_tag_sub(&obj_exts[off].ref, s->size); + alloc_tag_sub(&obj_exts[off].ref, page_to_nid(page), s->size); } } -- 2.34.1