* [PATCH v4 0/6] page allocation tag compression
@ 2024-10-23 17:07 Suren Baghdasaryan
2024-10-23 17:07 ` [PATCH v4 1/6] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
` (5 more replies)
0 siblings, 6 replies; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-10-23 17:07 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, urezki, hch, petr.pavlu,
samitolvanen, da.gomez, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim,
rientjes, minchan, kaleshsingh, linux-doc, linux-kernel,
linux-arch, linux-mm, maple-tree, linux-modules, kernel-team,
surenb
This patchset implements several improvements:
1. Gracefully handles module unloading while there are used allocations
allocated from that module;
2. Provides an option to store page allocation tag references in the
page flags, removing dependency on page extensions and eliminating the
memory overhead from storing page allocation references (~0.2% of total
system memory). This also improves page allocation performance when
CONFIG_MEM_ALLOC_PROFILING is enabled by eliminating page extension
lookup. Page allocation performance overhead is reduced from 41% to 5.5%.
Patch #1 introduces mas_for_each_rev() helper function.
Patch #2 introduces shutdown_mem_profiling() helper function to be used
when disabling memory allocation profiling.
Patch #3 copies module tags into virtually contiguous memory which
serves two purposes:
- Lets us deal with the situation when module is unloaded while there
are still live allocations from that module. Since we are using a copy
version of the tags we can safely unload the module. Space and gaps in
this contiguous memory are managed using a maple tree.
- Enables simple indexing of the tags in the later patches.
Patch #4 changes the way we allocate virtually contiguous memory for
module tags to reserve only vitrual area and populate physical pages
only as needed at module load time.
Patch #5 abstracts page allocation tag reference to simplify later
changes.
Patch #6 adds compression option to the sysctl.vm.mem_profiling boot
parameter for storing page allocation tag references inside page flags
if they fit. If the number of available page flag bits is insufficient
to address all kernel allocations, memory allocation profiling gets
disabled with an appropriate warning.
Patchset applies to mm-unstable.
Changes since v3 [1]:
- rebased over Mike's patchset in mm-unstable
- added Reviewed-by, per Liam Howlett
- limited execmem_vmap to work with EXECMEM_MODULE_DATA only,
per Mike Rapoport
- moved __get_vm_area_node() declaration into mm/internal.h,
per Mike Rapoport
- split parts of reserve_module_tags() into helper functions to make it
more readable, per Mike Rapoport
- introduced shutdown_mem_profiling() to be used when disabling memory
allocation profiling
- replaced CONFIG_PGALLOC_TAG_USE_PAGEFLAGS with a new boot parameter
option, per Michal Hocko
- minor code cleanups and refactoring to make the code more readable
- added VMALLOC and MODULE SUPPORT reviewers I missed before
[1] https://lore.kernel.org/all/20241014203646.1952505-1-surenb@google.com/
Suren Baghdasaryan (6):
maple_tree: add mas_for_each_rev() helper
alloc_tag: introduce shutdown_mem_profiling helper function
alloc_tag: load module tags into separate contiguous memory
alloc_tag: populate memory for module tags as needed
alloc_tag: introduce pgtag_ref_handle to abstract page tag references
alloc_tag: support for page allocation tag compression
Documentation/mm/allocation-profiling.rst | 7 +-
include/asm-generic/codetag.lds.h | 19 +
include/linux/alloc_tag.h | 21 +-
include/linux/codetag.h | 40 +-
include/linux/execmem.h | 10 +
include/linux/maple_tree.h | 14 +
include/linux/mm.h | 25 +-
include/linux/page-flags-layout.h | 7 +
include/linux/pgalloc_tag.h | 197 +++++++--
include/linux/vmalloc.h | 3 +
kernel/module/main.c | 80 ++--
lib/alloc_tag.c | 467 ++++++++++++++++++++--
lib/codetag.c | 104 ++++-
mm/execmem.c | 16 +
mm/internal.h | 6 +
mm/mm_init.c | 5 +-
mm/vmalloc.c | 4 +-
scripts/module.lds.S | 5 +-
18 files changed, 903 insertions(+), 127 deletions(-)
base-commit: b5d43fad926a3f542cd06f3c9d286f6f489f7129
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 1/6] maple_tree: add mas_for_each_rev() helper
2024-10-23 17:07 [PATCH v4 0/6] page allocation tag compression Suren Baghdasaryan
@ 2024-10-23 17:07 ` Suren Baghdasaryan
2024-10-23 17:24 ` Pasha Tatashin
2024-10-23 17:07 ` [PATCH v4 2/6] alloc_tag: introduce shutdown_mem_profiling helper function Suren Baghdasaryan
` (4 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-10-23 17:07 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, urezki, hch, petr.pavlu,
samitolvanen, da.gomez, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim,
rientjes, minchan, kaleshsingh, linux-doc, linux-kernel,
linux-arch, linux-mm, maple-tree, linux-modules, kernel-team,
surenb, Liam R. Howlett
Add mas_for_each_rev() function to iterate maple tree nodes in reverse
order.
Suggested-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
---
include/linux/maple_tree.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index 61c236850ca8..cbbcd18d4186 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -592,6 +592,20 @@ static __always_inline void mas_reset(struct ma_state *mas)
#define mas_for_each(__mas, __entry, __max) \
while (((__entry) = mas_find((__mas), (__max))) != NULL)
+/**
+ * mas_for_each_rev() - Iterate over a range of the maple tree in reverse order.
+ * @__mas: Maple Tree operation state (maple_state)
+ * @__entry: Entry retrieved from the tree
+ * @__min: minimum index to retrieve from the tree
+ *
+ * When returned, mas->index and mas->last will hold the entire range for the
+ * entry.
+ *
+ * Note: may return the zero entry.
+ */
+#define mas_for_each_rev(__mas, __entry, __min) \
+ while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
+
#ifdef CONFIG_DEBUG_MAPLE_TREE
enum mt_dump_format {
mt_dump_dec,
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 2/6] alloc_tag: introduce shutdown_mem_profiling helper function
2024-10-23 17:07 [PATCH v4 0/6] page allocation tag compression Suren Baghdasaryan
2024-10-23 17:07 ` [PATCH v4 1/6] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
@ 2024-10-23 17:07 ` Suren Baghdasaryan
2024-10-23 17:26 ` Pasha Tatashin
2024-10-23 17:07 ` [PATCH v4 3/6] alloc_tag: load module tags into separate contiguous memory Suren Baghdasaryan
` (3 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-10-23 17:07 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, urezki, hch, petr.pavlu,
samitolvanen, da.gomez, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim,
rientjes, minchan, kaleshsingh, linux-doc, linux-kernel,
linux-arch, linux-mm, maple-tree, linux-modules, kernel-team,
surenb
Implement a helper function to disable memory allocation profiling and
use it when creation of /proc/allocinfo fails.
Ensure /proc/allocinfo does not get created when memory allocation
profiling is disabled.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
lib/alloc_tag.c | 33 ++++++++++++++++++++++++++-------
1 file changed, 26 insertions(+), 7 deletions(-)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 81e5f9a70f22..435aa837e550 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -8,6 +8,14 @@
#include <linux/seq_buf.h>
#include <linux/seq_file.h>
+#define ALLOCINFO_FILE_NAME "allocinfo"
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
+static bool mem_profiling_support __meminitdata = true;
+#else
+static bool mem_profiling_support __meminitdata;
+#endif
+
static struct codetag_type *alloc_tag_cttype;
DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
@@ -144,9 +152,26 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
return nr;
}
+static void __init shutdown_mem_profiling(void)
+{
+ if (mem_alloc_profiling_enabled())
+ static_branch_disable(&mem_alloc_profiling_key);
+
+ if (!mem_profiling_support)
+ return;
+
+ mem_profiling_support = false;
+}
+
static void __init procfs_init(void)
{
- proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
+ if (!mem_profiling_support)
+ return;
+
+ if (!proc_create_seq(ALLOCINFO_FILE_NAME, 0400, NULL, &allocinfo_seq_op)) {
+ pr_err("Failed to create %s file\n", ALLOCINFO_FILE_NAME);
+ shutdown_mem_profiling();
+ }
}
static bool alloc_tag_module_unload(struct codetag_type *cttype,
@@ -174,12 +199,6 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype,
return module_unused;
}
-#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
-static bool mem_profiling_support __meminitdata = true;
-#else
-static bool mem_profiling_support __meminitdata;
-#endif
-
static int __init setup_early_mem_profiling(char *str)
{
bool enable;
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 3/6] alloc_tag: load module tags into separate contiguous memory
2024-10-23 17:07 [PATCH v4 0/6] page allocation tag compression Suren Baghdasaryan
2024-10-23 17:07 ` [PATCH v4 1/6] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
2024-10-23 17:07 ` [PATCH v4 2/6] alloc_tag: introduce shutdown_mem_profiling helper function Suren Baghdasaryan
@ 2024-10-23 17:07 ` Suren Baghdasaryan
2024-10-23 18:05 ` Pasha Tatashin
2024-10-23 17:07 ` [PATCH v4 4/6] alloc_tag: populate memory for module tags as needed Suren Baghdasaryan
` (2 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-10-23 17:07 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, urezki, hch, petr.pavlu,
samitolvanen, da.gomez, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim,
rientjes, minchan, kaleshsingh, linux-doc, linux-kernel,
linux-arch, linux-mm, maple-tree, linux-modules, kernel-team,
surenb
When a module gets unloaded there is a possibility that some of the
allocations it made are still used and therefore the allocation tags
corresponding to these allocations are still referenced. As such, the
memory for these tags can't be freed. This is currently handled as an
abnormal situation and module's data section is not being unloaded.
To handle this situation without keeping module's data in memory,
allow codetags with longer lifespan than the module to be loaded into
their own separate memory. The in-use memory areas and gaps after
module unloading in this separate memory are tracked using maple trees.
Allocation tags arrange their separate memory so that it is virtually
contiguous and that will allow simple allocation tag indexing later on
in this patchset. The size of this virtually contiguous memory is set
to store up to 100000 allocation tags.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/asm-generic/codetag.lds.h | 19 +++
include/linux/alloc_tag.h | 13 +-
include/linux/codetag.h | 37 ++++-
kernel/module/main.c | 80 ++++++----
lib/alloc_tag.c | 249 +++++++++++++++++++++++++++---
lib/codetag.c | 100 +++++++++++-
scripts/module.lds.S | 5 +-
7 files changed, 441 insertions(+), 62 deletions(-)
diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
index 64f536b80380..372c320c5043 100644
--- a/include/asm-generic/codetag.lds.h
+++ b/include/asm-generic/codetag.lds.h
@@ -11,4 +11,23 @@
#define CODETAG_SECTIONS() \
SECTION_WITH_BOUNDARIES(alloc_tags)
+/*
+ * Module codetags which aren't used after module unload, therefore have the
+ * same lifespan as the module and can be safely unloaded with the module.
+ */
+#define MOD_CODETAG_SECTIONS()
+
+#define MOD_SEPARATE_CODETAG_SECTION(_name) \
+ .codetag.##_name : { \
+ SECTION_WITH_BOUNDARIES(_name) \
+ }
+
+/*
+ * For codetags which might be used after module unload, therefore might stay
+ * longer in memory. Each such codetag type has its own section so that we can
+ * unload them individually once unused.
+ */
+#define MOD_SEPARATE_CODETAG_SECTIONS() \
+ MOD_SEPARATE_CODETAG_SECTION(alloc_tags)
+
#endif /* __ASM_GENERIC_CODETAG_LDS_H */
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 1f0a9ff23a2c..7431757999c5 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,6 +30,13 @@ struct alloc_tag {
struct alloc_tag_counters __percpu *counters;
} __aligned(8);
+struct alloc_tag_module_section {
+ unsigned long start_addr;
+ unsigned long end_addr;
+ /* used size */
+ unsigned long size;
+};
+
#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
#define CODETAG_EMPTY ((void *)1)
@@ -54,6 +61,8 @@ static inline void set_codetag_empty(union codetag_ref *ref) {}
#ifdef CONFIG_MEM_ALLOC_PROFILING
+#define ALLOC_TAG_SECTION_NAME "alloc_tags"
+
struct codetag_bytes {
struct codetag *ct;
s64 bytes;
@@ -76,7 +85,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
#define DEFINE_ALLOC_TAG(_alloc_tag) \
static struct alloc_tag _alloc_tag __used __aligned(8) \
- __section("alloc_tags") = { \
+ __section(ALLOC_TAG_SECTION_NAME) = { \
.ct = CODE_TAG_INIT, \
.counters = &_shared_alloc_tag };
@@ -85,7 +94,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
#define DEFINE_ALLOC_TAG(_alloc_tag) \
static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr); \
static struct alloc_tag _alloc_tag __used __aligned(8) \
- __section("alloc_tags") = { \
+ __section(ALLOC_TAG_SECTION_NAME) = { \
.ct = CODE_TAG_INIT, \
.counters = &_alloc_tag_cntr };
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index c2a579ccd455..d10bd9810d32 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -35,8 +35,15 @@ struct codetag_type_desc {
size_t tag_size;
void (*module_load)(struct codetag_type *cttype,
struct codetag_module *cmod);
- bool (*module_unload)(struct codetag_type *cttype,
+ void (*module_unload)(struct codetag_type *cttype,
struct codetag_module *cmod);
+#ifdef CONFIG_MODULES
+ void (*module_replaced)(struct module *mod, struct module *new_mod);
+ bool (*needs_section_mem)(struct module *mod, unsigned long size);
+ void *(*alloc_section_mem)(struct module *mod, unsigned long size,
+ unsigned int prepend, unsigned long align);
+ void (*free_section_mem)(struct module *mod, bool used);
+#endif
};
struct codetag_iterator {
@@ -71,11 +78,31 @@ struct codetag_type *
codetag_register_type(const struct codetag_type_desc *desc);
#if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
+
+bool codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size);
+void *codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align);
+void codetag_free_module_sections(struct module *mod);
+void codetag_module_replaced(struct module *mod, struct module *new_mod);
void codetag_load_module(struct module *mod);
-bool codetag_unload_module(struct module *mod);
-#else
+void codetag_unload_module(struct module *mod);
+
+#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
+
+static inline bool
+codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size) { return false; }
+static inline void *
+codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align) { return NULL; }
+static inline void codetag_free_module_sections(struct module *mod) {}
+static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
static inline void codetag_load_module(struct module *mod) {}
-static inline bool codetag_unload_module(struct module *mod) { return true; }
-#endif
+static inline void codetag_unload_module(struct module *mod) {}
+
+#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
#endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index ef54733bd7d2..1787686e5cae 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1254,22 +1254,17 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
return 0;
}
-static void module_memory_free(struct module *mod, enum mod_mem_type type,
- bool unload_codetags)
+static void module_memory_free(struct module *mod, enum mod_mem_type type)
{
struct module_memory *mem = &mod->mem[type];
- void *ptr = mem->base;
if (mem->is_rox)
vfree(mem->rw_copy);
- if (!unload_codetags && mod_mem_type_is_core_data(type))
- return;
-
- execmem_free(ptr);
+ execmem_free(mem->base);
}
-static void free_mod_mem(struct module *mod, bool unload_codetags)
+static void free_mod_mem(struct module *mod)
{
for_each_mod_mem_type(type) {
struct module_memory *mod_mem = &mod->mem[type];
@@ -1280,25 +1275,20 @@ static void free_mod_mem(struct module *mod, bool unload_codetags)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
- module_memory_free(mod, type, unload_codetags);
+ module_memory_free(mod, type);
}
/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
- module_memory_free(mod, MOD_DATA, unload_codetags);
+ module_memory_free(mod, MOD_DATA);
}
/* Free a module, remove from lists, etc. */
static void free_module(struct module *mod)
{
- bool unload_codetags;
-
trace_module_free(mod);
- unload_codetags = codetag_unload_module(mod);
- if (!unload_codetags)
- pr_warn("%s: memory allocation(s) from the module still alive, cannot unload cleanly\n",
- mod->name);
+ codetag_unload_module(mod);
mod_sysfs_teardown(mod);
@@ -1341,7 +1331,7 @@ static void free_module(struct module *mod)
kfree(mod->args);
percpu_modfree(mod);
- free_mod_mem(mod, unload_codetags);
+ free_mod_mem(mod);
}
void *__symbol_get(const char *symbol)
@@ -1606,6 +1596,20 @@ static void __layout_sections(struct module *mod, struct load_info *info, bool i
if (WARN_ON_ONCE(type == MOD_INVALID))
continue;
+ /*
+ * Do not allocate codetag memory as we load it into
+ * preallocated contiguous memory.
+ */
+ if (codetag_needs_module_section(mod, sname, s->sh_size)) {
+ /*
+ * s->sh_entsize won't be used but populate the
+ * type field to avoid confusion.
+ */
+ s->sh_entsize = ((unsigned long)(type) & SH_ENTSIZE_TYPE_MASK)
+ << SH_ENTSIZE_TYPE_SHIFT;
+ continue;
+ }
+
s->sh_entsize = module_get_offset_and_type(mod, type, s, i);
pr_debug("\t%s\n", sname);
}
@@ -2280,6 +2284,7 @@ static int move_module(struct module *mod, struct load_info *info)
int i;
enum mod_mem_type t = 0;
int ret = -ENOMEM;
+ bool codetag_section_found = false;
for_each_mod_mem_type(type) {
if (!mod->mem[type].size) {
@@ -2291,7 +2296,7 @@ static int move_module(struct module *mod, struct load_info *info)
ret = module_memory_alloc(mod, type);
if (ret) {
t = type;
- goto out_enomem;
+ goto out_err;
}
}
@@ -2300,15 +2305,33 @@ static int move_module(struct module *mod, struct load_info *info)
for (i = 0; i < info->hdr->e_shnum; i++) {
void *dest;
Elf_Shdr *shdr = &info->sechdrs[i];
- enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
- unsigned long offset = shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK;
+ const char *sname;
unsigned long addr;
if (!(shdr->sh_flags & SHF_ALLOC))
continue;
- addr = (unsigned long)mod->mem[type].base + offset;
- dest = mod->mem[type].rw_copy + offset;
+ sname = info->secstrings + shdr->sh_name;
+ /*
+ * Load codetag sections separately as they might still be used
+ * after module unload.
+ */
+ if (codetag_needs_module_section(mod, sname, shdr->sh_size)) {
+ dest = codetag_alloc_module_section(mod, sname, shdr->sh_size,
+ arch_mod_section_prepend(mod, i), shdr->sh_addralign);
+ if (IS_ERR(dest)) {
+ ret = PTR_ERR(dest);
+ goto out_err;
+ }
+ addr = (unsigned long)dest;
+ codetag_section_found = true;
+ } else {
+ enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
+ unsigned long offset = shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK;
+
+ addr = (unsigned long)mod->mem[type].base + offset;
+ dest = mod->mem[type].rw_copy + offset;
+ }
if (shdr->sh_type != SHT_NOBITS) {
/*
@@ -2320,7 +2343,7 @@ static int move_module(struct module *mod, struct load_info *info)
if (i == info->index.mod &&
(WARN_ON_ONCE(shdr->sh_size != sizeof(struct module)))) {
ret = -ENOEXEC;
- goto out_enomem;
+ goto out_err;
}
memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
}
@@ -2336,9 +2359,12 @@ static int move_module(struct module *mod, struct load_info *info)
}
return 0;
-out_enomem:
+out_err:
for (t--; t >= 0; t--)
- module_memory_free(mod, t, true);
+ module_memory_free(mod, t);
+ if (codetag_section_found)
+ codetag_free_module_sections(mod);
+
return ret;
}
@@ -2459,6 +2485,8 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
/* Module has been copied to its final place now: return it. */
mod = (void *)info->sechdrs[info->index.mod].sh_addr;
kmemleak_load_module(mod, info);
+ codetag_module_replaced(info->mod, mod);
+
return mod;
}
@@ -2468,7 +2496,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
percpu_modfree(mod);
module_arch_freeing_init(mod);
- free_mod_mem(mod, true);
+ free_mod_mem(mod);
}
int __weak module_finalize(const Elf_Ehdr *hdr,
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 435aa837e550..d9f51169ffeb 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/alloc_tag.h>
+#include <linux/execmem.h>
#include <linux/fs.h>
#include <linux/gfp.h>
#include <linux/module.h>
@@ -9,6 +10,7 @@
#include <linux/seq_file.h>
#define ALLOCINFO_FILE_NAME "allocinfo"
+#define MODULE_ALLOC_TAG_VMAP_SIZE (100000UL * sizeof(struct alloc_tag))
#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
static bool mem_profiling_support __meminitdata = true;
@@ -174,31 +176,226 @@ static void __init procfs_init(void)
}
}
-static bool alloc_tag_module_unload(struct codetag_type *cttype,
- struct codetag_module *cmod)
+#ifdef CONFIG_MODULES
+
+static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
+/* A dummy object used to indicate an unloaded module */
+static struct module unloaded_mod;
+/* A dummy object used to indicate a module prepended area */
+static struct module prepend_mod;
+
+static struct alloc_tag_module_section module_tags;
+
+static bool needs_section_mem(struct module *mod, unsigned long size)
{
- struct codetag_iterator iter = codetag_get_ct_iter(cttype);
- struct alloc_tag_counters counter;
- bool module_unused = true;
- struct alloc_tag *tag;
- struct codetag *ct;
+ return size >= sizeof(struct alloc_tag);
+}
+
+static struct alloc_tag *find_used_tag(struct alloc_tag *from, struct alloc_tag *to)
+{
+ while (from <= to) {
+ struct alloc_tag_counters counter;
- for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
- if (iter.cmod != cmod)
+ counter = alloc_tag_read(from);
+ if (counter.bytes)
+ return from;
+ from++;
+ }
+
+ return NULL;
+}
+
+/* Called with mod_area_mt locked */
+static void clean_unused_module_areas_locked(void)
+{
+ MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
+ struct module *val;
+
+ mas_for_each(&mas, val, module_tags.size) {
+ if (val != &unloaded_mod)
continue;
- tag = ct_to_alloc_tag(ct);
- counter = alloc_tag_read(tag);
+ /* Release area if all tags are unused */
+ if (!find_used_tag((struct alloc_tag *)(module_tags.start_addr + mas.index),
+ (struct alloc_tag *)(module_tags.start_addr + mas.last)))
+ mas_erase(&mas);
+ }
+}
+
+/* Called with mod_area_mt locked */
+static bool find_aligned_area(struct ma_state *mas, unsigned long section_size,
+ unsigned long size, unsigned int prepend, unsigned long align)
+{
+ bool cleanup_done = false;
+
+repeat:
+ /* Try finding exact size and hope the start is aligned */
+ if (!mas_empty_area(mas, 0, section_size - 1, prepend + size)) {
+ if (IS_ALIGNED(mas->index + prepend, align))
+ return true;
+
+ /* Try finding larger area to align later */
+ mas_reset(mas);
+ if (!mas_empty_area(mas, 0, section_size - 1,
+ size + prepend + align - 1))
+ return true;
+ }
- if (WARN(counter.bytes,
- "%s:%u module %s func:%s has %llu allocated at module unload",
- ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
- module_unused = false;
+ /* No free area, try cleanup stale data and repeat the search once */
+ if (!cleanup_done) {
+ clean_unused_module_areas_locked();
+ cleanup_done = true;
+ mas_reset(mas);
+ goto repeat;
}
- return module_unused;
+ return false;
+}
+
+static void *reserve_module_tags(struct module *mod, unsigned long size,
+ unsigned int prepend, unsigned long align)
+{
+ unsigned long section_size = module_tags.end_addr - module_tags.start_addr;
+ MA_STATE(mas, &mod_area_mt, 0, section_size - 1);
+ unsigned long offset;
+ void *ret = NULL;
+
+ /* If no tags return NULL */
+ if (size < sizeof(struct alloc_tag))
+ return NULL;
+
+ /*
+ * align is always power of 2, so we can use IS_ALIGNED and ALIGN.
+ * align 0 or 1 means no alignment, to simplify set to 1.
+ */
+ if (!align)
+ align = 1;
+
+ mas_lock(&mas);
+ if (!find_aligned_area(&mas, section_size, size, prepend, align)) {
+ ret = ERR_PTR(-ENOMEM);
+ goto unlock;
+ }
+
+ /* Mark found area as reserved */
+ offset = mas.index;
+ offset += prepend;
+ offset = ALIGN(offset, align);
+ if (offset != mas.index) {
+ unsigned long pad_start = mas.index;
+
+ mas.last = offset - 1;
+ mas_store(&mas, &prepend_mod);
+ if (mas_is_err(&mas)) {
+ ret = ERR_PTR(xa_err(mas.node));
+ goto unlock;
+ }
+ mas.index = offset;
+ mas.last = offset + size - 1;
+ mas_store(&mas, mod);
+ if (mas_is_err(&mas)) {
+ mas.index = pad_start;
+ mas_erase(&mas);
+ ret = ERR_PTR(xa_err(mas.node));
+ }
+ } else {
+ mas.last = offset + size - 1;
+ mas_store(&mas, mod);
+ if (mas_is_err(&mas))
+ ret = ERR_PTR(xa_err(mas.node));
+ }
+unlock:
+ mas_unlock(&mas);
+
+ if (IS_ERR(ret))
+ return ret;
+
+ if (module_tags.size < offset + size)
+ module_tags.size = offset + size;
+
+ return (struct alloc_tag *)(module_tags.start_addr + offset);
}
+static void release_module_tags(struct module *mod, bool used)
+{
+ MA_STATE(mas, &mod_area_mt, module_tags.size, module_tags.size);
+ struct alloc_tag *tag;
+ struct module *val;
+
+ mas_lock(&mas);
+ mas_for_each_rev(&mas, val, 0)
+ if (val == mod)
+ break;
+
+ if (!val) /* module not found */
+ goto out;
+
+ if (!used)
+ goto release_area;
+
+ /* Find out if the area is used */
+ tag = find_used_tag((struct alloc_tag *)(module_tags.start_addr + mas.index),
+ (struct alloc_tag *)(module_tags.start_addr + mas.last));
+ if (tag) {
+ struct alloc_tag_counters counter = alloc_tag_read(tag);
+
+ pr_info("%s:%u module %s func:%s has %llu allocated at module unload\n",
+ tag->ct.filename, tag->ct.lineno, tag->ct.modname,
+ tag->ct.function, counter.bytes);
+ } else {
+ used = false;
+ }
+release_area:
+ mas_store(&mas, used ? &unloaded_mod : NULL);
+ val = mas_prev_range(&mas, 0);
+ if (val == &prepend_mod)
+ mas_store(&mas, NULL);
+out:
+ mas_unlock(&mas);
+}
+
+static void replace_module(struct module *mod, struct module *new_mod)
+{
+ MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
+ struct module *val;
+
+ mas_lock(&mas);
+ mas_for_each(&mas, val, module_tags.size) {
+ if (val != mod)
+ continue;
+
+ mas_store_gfp(&mas, new_mod, GFP_KERNEL);
+ break;
+ }
+ mas_unlock(&mas);
+}
+
+static int __init alloc_mod_tags_mem(void)
+{
+ /* Allocate space to copy allocation tags */
+ module_tags.start_addr = (unsigned long)execmem_alloc(EXECMEM_MODULE_DATA,
+ MODULE_ALLOC_TAG_VMAP_SIZE);
+ if (!module_tags.start_addr)
+ return -ENOMEM;
+
+ module_tags.end_addr = module_tags.start_addr + MODULE_ALLOC_TAG_VMAP_SIZE;
+
+ return 0;
+}
+
+static void __init free_mod_tags_mem(void)
+{
+ execmem_free((void *)module_tags.start_addr);
+ module_tags.start_addr = 0;
+}
+
+#else /* CONFIG_MODULES */
+
+static inline int alloc_mod_tags_mem(void) { return 0; }
+static inline void free_mod_tags_mem(void) {}
+
+#endif /* CONFIG_MODULES */
+
static int __init setup_early_mem_profiling(char *str)
{
bool enable;
@@ -274,14 +471,26 @@ static inline void sysctl_init(void) {}
static int __init alloc_tag_init(void)
{
const struct codetag_type_desc desc = {
- .section = "alloc_tags",
- .tag_size = sizeof(struct alloc_tag),
- .module_unload = alloc_tag_module_unload,
+ .section = ALLOC_TAG_SECTION_NAME,
+ .tag_size = sizeof(struct alloc_tag),
+#ifdef CONFIG_MODULES
+ .needs_section_mem = needs_section_mem,
+ .alloc_section_mem = reserve_module_tags,
+ .free_section_mem = release_module_tags,
+ .module_replaced = replace_module,
+#endif
};
+ int res;
+
+ res = alloc_mod_tags_mem();
+ if (res)
+ return res;
alloc_tag_cttype = codetag_register_type(&desc);
- if (IS_ERR(alloc_tag_cttype))
+ if (IS_ERR(alloc_tag_cttype)) {
+ free_mod_tags_mem();
return PTR_ERR(alloc_tag_cttype);
+ }
sysctl_init();
procfs_init();
diff --git a/lib/codetag.c b/lib/codetag.c
index d1fbbb7c2ec3..654496952f86 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -207,6 +207,94 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
}
#ifdef CONFIG_MODULES
+#define CODETAG_SECTION_PREFIX ".codetag."
+
+/* Some codetag types need a separate module section */
+bool codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size)
+{
+ const char *type_name;
+ struct codetag_type *cttype;
+ bool ret = false;
+
+ if (strncmp(name, CODETAG_SECTION_PREFIX, strlen(CODETAG_SECTION_PREFIX)))
+ return false;
+
+ type_name = name + strlen(CODETAG_SECTION_PREFIX);
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (strcmp(type_name, cttype->desc.section) == 0) {
+ if (!cttype->desc.needs_section_mem)
+ break;
+
+ down_write(&cttype->mod_lock);
+ ret = cttype->desc.needs_section_mem(mod, size);
+ up_write(&cttype->mod_lock);
+ break;
+ }
+ }
+ mutex_unlock(&codetag_lock);
+
+ return ret;
+}
+
+void *codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align)
+{
+ const char *type_name = name + strlen(CODETAG_SECTION_PREFIX);
+ struct codetag_type *cttype;
+ void *ret = NULL;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (strcmp(type_name, cttype->desc.section) == 0) {
+ if (WARN_ON(!cttype->desc.alloc_section_mem))
+ break;
+
+ down_write(&cttype->mod_lock);
+ ret = cttype->desc.alloc_section_mem(mod, size, prepend, align);
+ up_write(&cttype->mod_lock);
+ break;
+ }
+ }
+ mutex_unlock(&codetag_lock);
+
+ return ret;
+}
+
+void codetag_free_module_sections(struct module *mod)
+{
+ struct codetag_type *cttype;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (!cttype->desc.free_section_mem)
+ continue;
+
+ down_write(&cttype->mod_lock);
+ cttype->desc.free_section_mem(mod, false);
+ up_write(&cttype->mod_lock);
+ }
+ mutex_unlock(&codetag_lock);
+}
+
+void codetag_module_replaced(struct module *mod, struct module *new_mod)
+{
+ struct codetag_type *cttype;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (!cttype->desc.module_replaced)
+ continue;
+
+ down_write(&cttype->mod_lock);
+ cttype->desc.module_replaced(mod, new_mod);
+ up_write(&cttype->mod_lock);
+ }
+ mutex_unlock(&codetag_lock);
+}
+
void codetag_load_module(struct module *mod)
{
struct codetag_type *cttype;
@@ -220,13 +308,12 @@ void codetag_load_module(struct module *mod)
mutex_unlock(&codetag_lock);
}
-bool codetag_unload_module(struct module *mod)
+void codetag_unload_module(struct module *mod)
{
struct codetag_type *cttype;
- bool unload_ok = true;
if (!mod)
- return true;
+ return;
/* await any module's kfree_rcu() operations to complete */
kvfree_rcu_barrier();
@@ -246,18 +333,17 @@ bool codetag_unload_module(struct module *mod)
}
if (found) {
if (cttype->desc.module_unload)
- if (!cttype->desc.module_unload(cttype, cmod))
- unload_ok = false;
+ cttype->desc.module_unload(cttype, cmod);
cttype->count -= range_size(cttype, &cmod->range);
idr_remove(&cttype->mod_idr, mod_id);
kfree(cmod);
}
up_write(&cttype->mod_lock);
+ if (found && cttype->desc.free_section_mem)
+ cttype->desc.free_section_mem(mod, true);
}
mutex_unlock(&codetag_lock);
-
- return unload_ok;
}
#endif /* CONFIG_MODULES */
diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index 3f43edef813c..711c6e029936 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -50,7 +50,7 @@ SECTIONS {
.data : {
*(.data .data.[0-9a-zA-Z_]*)
*(.data..L*)
- CODETAG_SECTIONS()
+ MOD_CODETAG_SECTIONS()
}
.rodata : {
@@ -59,9 +59,10 @@ SECTIONS {
}
#else
.data : {
- CODETAG_SECTIONS()
+ MOD_CODETAG_SECTIONS()
}
#endif
+ MOD_SEPARATE_CODETAG_SECTIONS()
}
/* bring in arch-specific sections */
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 4/6] alloc_tag: populate memory for module tags as needed
2024-10-23 17:07 [PATCH v4 0/6] page allocation tag compression Suren Baghdasaryan
` (2 preceding siblings ...)
2024-10-23 17:07 ` [PATCH v4 3/6] alloc_tag: load module tags into separate contiguous memory Suren Baghdasaryan
@ 2024-10-23 17:07 ` Suren Baghdasaryan
2024-10-23 18:28 ` Pasha Tatashin
2024-10-23 17:07 ` [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references Suren Baghdasaryan
2024-10-23 17:07 ` [PATCH v4 6/6] alloc_tag: support for page allocation tag compression Suren Baghdasaryan
5 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-10-23 17:07 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, urezki, hch, petr.pavlu,
samitolvanen, da.gomez, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim,
rientjes, minchan, kaleshsingh, linux-doc, linux-kernel,
linux-arch, linux-mm, maple-tree, linux-modules, kernel-team,
surenb
The memory reserved for module tags does not need to be backed by
physical pages until there are tags to store there. Change the way
we reserve this memory to allocate only virtual area for the tags
and populate it with physical pages as needed when we load a module.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/execmem.h | 10 ++++++
include/linux/vmalloc.h | 3 ++
lib/alloc_tag.c | 73 ++++++++++++++++++++++++++++++++++++-----
mm/execmem.c | 16 +++++++++
mm/internal.h | 6 ++++
mm/vmalloc.c | 4 +--
6 files changed, 101 insertions(+), 11 deletions(-)
diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index 1517fa196bf7..5a5e2917f870 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -139,6 +139,16 @@ void *execmem_alloc(enum execmem_type type, size_t size);
*/
void execmem_free(void *ptr);
+/**
+ * execmem_vmap - create virtual mapping for EXECMEM_MODULE_DATA memory
+ * @size: size of the virtual mapping in bytes
+ *
+ * Maps virtually contiguous area in the range suitable for EXECMEM_MODULE_DATA.
+ *
+ * Return: the area descriptor on success or %NULL on failure.
+ */
+struct vm_struct *execmem_vmap(size_t size);
+
/**
* execmem_update_copy - copy an update to executable memory
* @dst: destination address to update
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 27408f21e501..31e9ffd936e3 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -202,6 +202,9 @@ extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
unsigned long pgoff);
+int vmap_pages_range(unsigned long addr, unsigned long end, pgprot_t prot,
+ struct page **pages, unsigned int page_shift);
+
/*
* Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
* and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index d9f51169ffeb..061e43196247 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -8,14 +8,15 @@
#include <linux/proc_fs.h>
#include <linux/seq_buf.h>
#include <linux/seq_file.h>
+#include <linux/vmalloc.h>
#define ALLOCINFO_FILE_NAME "allocinfo"
#define MODULE_ALLOC_TAG_VMAP_SIZE (100000UL * sizeof(struct alloc_tag))
#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
-static bool mem_profiling_support __meminitdata = true;
+static bool mem_profiling_support = true;
#else
-static bool mem_profiling_support __meminitdata;
+static bool mem_profiling_support;
#endif
static struct codetag_type *alloc_tag_cttype;
@@ -154,7 +155,7 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
return nr;
}
-static void __init shutdown_mem_profiling(void)
+static void shutdown_mem_profiling(void)
{
if (mem_alloc_profiling_enabled())
static_branch_disable(&mem_alloc_profiling_key);
@@ -179,6 +180,7 @@ static void __init procfs_init(void)
#ifdef CONFIG_MODULES
static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
+static struct vm_struct *vm_module_tags;
/* A dummy object used to indicate an unloaded module */
static struct module unloaded_mod;
/* A dummy object used to indicate a module prepended area */
@@ -252,6 +254,33 @@ static bool find_aligned_area(struct ma_state *mas, unsigned long section_size,
return false;
}
+static int vm_module_tags_populate(void)
+{
+ unsigned long phys_size = vm_module_tags->nr_pages << PAGE_SHIFT;
+
+ if (phys_size < module_tags.size) {
+ struct page **next_page = vm_module_tags->pages + vm_module_tags->nr_pages;
+ unsigned long addr = module_tags.start_addr + phys_size;
+ unsigned long more_pages;
+ unsigned long nr;
+
+ more_pages = ALIGN(module_tags.size - phys_size, PAGE_SIZE) >> PAGE_SHIFT;
+ nr = alloc_pages_bulk_array_node(GFP_KERNEL | __GFP_NOWARN,
+ NUMA_NO_NODE, more_pages, next_page);
+ if (nr < more_pages ||
+ vmap_pages_range(addr, addr + (nr << PAGE_SHIFT), PAGE_KERNEL,
+ next_page, PAGE_SHIFT) < 0) {
+ /* Clean up and error out */
+ for (int i = 0; i < nr; i++)
+ __free_page(next_page[i]);
+ return -ENOMEM;
+ }
+ vm_module_tags->nr_pages += nr;
+ }
+
+ return 0;
+}
+
static void *reserve_module_tags(struct module *mod, unsigned long size,
unsigned int prepend, unsigned long align)
{
@@ -310,8 +339,18 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
if (IS_ERR(ret))
return ret;
- if (module_tags.size < offset + size)
+ if (module_tags.size < offset + size) {
+ int grow_res;
+
module_tags.size = offset + size;
+ grow_res = vm_module_tags_populate();
+ if (grow_res) {
+ shutdown_mem_profiling();
+ pr_err("Failed to allocate memory for allocation tags in the module %s. Memory allocation profiling is disabled!\n",
+ mod->name);
+ return ERR_PTR(grow_res);
+ }
+ }
return (struct alloc_tag *)(module_tags.start_addr + offset);
}
@@ -372,12 +411,23 @@ static void replace_module(struct module *mod, struct module *new_mod)
static int __init alloc_mod_tags_mem(void)
{
- /* Allocate space to copy allocation tags */
- module_tags.start_addr = (unsigned long)execmem_alloc(EXECMEM_MODULE_DATA,
- MODULE_ALLOC_TAG_VMAP_SIZE);
- if (!module_tags.start_addr)
+ /* Map space to copy allocation tags */
+ vm_module_tags = execmem_vmap(MODULE_ALLOC_TAG_VMAP_SIZE);
+ if (!vm_module_tags) {
+ pr_err("Failed to map %lu bytes for module allocation tags\n",
+ MODULE_ALLOC_TAG_VMAP_SIZE);
+ module_tags.start_addr = 0;
return -ENOMEM;
+ }
+ vm_module_tags->pages = kmalloc_array(get_vm_area_size(vm_module_tags) >> PAGE_SHIFT,
+ sizeof(struct page *), GFP_KERNEL | __GFP_ZERO);
+ if (!vm_module_tags->pages) {
+ free_vm_area(vm_module_tags);
+ return -ENOMEM;
+ }
+
+ module_tags.start_addr = (unsigned long)vm_module_tags->addr;
module_tags.end_addr = module_tags.start_addr + MODULE_ALLOC_TAG_VMAP_SIZE;
return 0;
@@ -385,8 +435,13 @@ static int __init alloc_mod_tags_mem(void)
static void __init free_mod_tags_mem(void)
{
- execmem_free((void *)module_tags.start_addr);
+ int i;
+
module_tags.start_addr = 0;
+ for (i = 0; i < vm_module_tags->nr_pages; i++)
+ __free_page(vm_module_tags->pages[i]);
+ kfree(vm_module_tags->pages);
+ free_vm_area(vm_module_tags);
}
#else /* CONFIG_MODULES */
diff --git a/mm/execmem.c b/mm/execmem.c
index 576a57e2161f..5c0f9f2d6f83 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -368,6 +368,22 @@ void execmem_free(void *ptr)
vfree(ptr);
}
+struct vm_struct *execmem_vmap(size_t size)
+{
+ struct execmem_range *range = &execmem_info->ranges[EXECMEM_MODULE_DATA];
+ struct vm_struct *area;
+
+ area = __get_vm_area_node(size, range->alignment, PAGE_SHIFT, VM_ALLOC,
+ range->start, range->end, NUMA_NO_NODE,
+ GFP_KERNEL, __builtin_return_address(0));
+ if (!area && range->fallback_start)
+ area = __get_vm_area_node(size, range->alignment, PAGE_SHIFT, VM_ALLOC,
+ range->fallback_start, range->fallback_end,
+ NUMA_NO_NODE, GFP_KERNEL, __builtin_return_address(0));
+
+ return area;
+}
+
void *execmem_update_copy(void *dst, const void *src, size_t size)
{
return text_poke_copy(dst, src, size);
diff --git a/mm/internal.h b/mm/internal.h
index 508f7802dd2b..f1ce0e10bed8 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1219,6 +1219,12 @@ int numa_migrate_check(struct folio *folio, struct vm_fault *vmf,
void free_zone_device_folio(struct folio *folio);
int migrate_device_coherent_folio(struct folio *folio);
+struct vm_struct *__get_vm_area_node(unsigned long size,
+ unsigned long align, unsigned long shift,
+ unsigned long flags, unsigned long start,
+ unsigned long end, int node, gfp_t gfp_mask,
+ const void *caller);
+
/*
* mm/gup.c
*/
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 74c0a5eae210..7ed39d104201 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -653,7 +653,7 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
* RETURNS:
* 0 on success, -errno on failure.
*/
-static int vmap_pages_range(unsigned long addr, unsigned long end,
+int vmap_pages_range(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
int err;
@@ -3106,7 +3106,7 @@ static void clear_vm_uninitialized_flag(struct vm_struct *vm)
vm->flags &= ~VM_UNINITIALIZED;
}
-static struct vm_struct *__get_vm_area_node(unsigned long size,
+struct vm_struct *__get_vm_area_node(unsigned long size,
unsigned long align, unsigned long shift, unsigned long flags,
unsigned long start, unsigned long end, int node,
gfp_t gfp_mask, const void *caller)
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references
2024-10-23 17:07 [PATCH v4 0/6] page allocation tag compression Suren Baghdasaryan
` (3 preceding siblings ...)
2024-10-23 17:07 ` [PATCH v4 4/6] alloc_tag: populate memory for module tags as needed Suren Baghdasaryan
@ 2024-10-23 17:07 ` Suren Baghdasaryan
2024-10-23 17:35 ` Pasha Tatashin
2024-10-23 21:00 ` Andrew Morton
2024-10-23 17:07 ` [PATCH v4 6/6] alloc_tag: support for page allocation tag compression Suren Baghdasaryan
5 siblings, 2 replies; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-10-23 17:07 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, urezki, hch, petr.pavlu,
samitolvanen, da.gomez, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim,
rientjes, minchan, kaleshsingh, linux-doc, linux-kernel,
linux-arch, linux-mm, maple-tree, linux-modules, kernel-team,
surenb
To simplify later changes to page tag references, introduce new
pgtag_ref_handle type. This allows easy replacement of page_ext
as a storage of page allocation tags.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/mm.h | 25 +++++-----
include/linux/pgalloc_tag.h | 92 ++++++++++++++++++++++---------------
2 files changed, 67 insertions(+), 50 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5cd22303fbc0..8efb4a6a1a70 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4180,37 +4180,38 @@ static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new
return;
for (i = nr_pages; i < (1 << old_order); i += nr_pages) {
- union codetag_ref *ref = get_page_tag_ref(folio_page(folio, i));
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
+ if (get_page_tag_ref(folio_page(folio, i), &ref, &handle)) {
/* Set new reference to point to the original tag */
- alloc_tag_ref_set(ref, tag);
- put_page_tag_ref(ref);
+ alloc_tag_ref_set(&ref, tag);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
static inline void pgalloc_tag_copy(struct folio *new, struct folio *old)
{
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
struct alloc_tag *tag;
- union codetag_ref *ref;
tag = pgalloc_tag_get(&old->page);
if (!tag)
return;
- ref = get_page_tag_ref(&new->page);
- if (!ref)
+ if (!get_page_tag_ref(&new->page, &ref, &handle))
return;
/* Clear the old ref to the original allocation tag. */
clear_page_tag_ref(&old->page);
/* Decrement the counters of the tag on get_new_folio. */
- alloc_tag_sub(ref, folio_nr_pages(new));
-
- __alloc_tag_ref_set(ref, tag);
-
- put_page_tag_ref(ref);
+ alloc_tag_sub(&ref, folio_nr_pages(new));
+ __alloc_tag_ref_set(&ref, tag);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
#else /* !CONFIG_MEM_ALLOC_PROFILING */
static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new_order)
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 59a3deb792a8..b13cd3313a88 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -11,46 +11,59 @@
#include <linux/page_ext.h>
+union pgtag_ref_handle {
+ union codetag_ref *ref; /* reference in page extension */
+};
+
extern struct page_ext_operations page_alloc_tagging_ops;
-static inline union codetag_ref *codetag_ref_from_page_ext(struct page_ext *page_ext)
+/* Should be called only if mem_alloc_profiling_enabled() */
+static inline bool get_page_tag_ref(struct page *page, union codetag_ref *ref,
+ union pgtag_ref_handle *handle)
{
- return (union codetag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
-}
+ struct page_ext *page_ext;
+ union codetag_ref *tmp;
-static inline struct page_ext *page_ext_from_codetag_ref(union codetag_ref *ref)
-{
- return (void *)ref - page_alloc_tagging_ops.offset;
+ if (!page)
+ return false;
+
+ page_ext = page_ext_get(page);
+ if (!page_ext)
+ return false;
+
+ tmp = (union codetag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
+ ref->ct = tmp->ct;
+ handle->ref = tmp;
+ return true;
}
-/* Should be called only if mem_alloc_profiling_enabled() */
-static inline union codetag_ref *get_page_tag_ref(struct page *page)
+static inline void put_page_tag_ref(union pgtag_ref_handle handle)
{
- if (page) {
- struct page_ext *page_ext = page_ext_get(page);
+ if (WARN_ON(!handle.ref))
+ return;
- if (page_ext)
- return codetag_ref_from_page_ext(page_ext);
- }
- return NULL;
+ page_ext_put((void *)handle.ref - page_alloc_tagging_ops.offset);
}
-static inline void put_page_tag_ref(union codetag_ref *ref)
+static inline void update_page_tag_ref(union pgtag_ref_handle handle,
+ union codetag_ref *ref)
{
- if (WARN_ON(!ref))
+ if (WARN_ON(!handle.ref || !ref))
return;
- page_ext_put(page_ext_from_codetag_ref(ref));
+ handle.ref->ct = ref->ct;
}
static inline void clear_page_tag_ref(struct page *page)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
- set_codetag_empty(ref);
- put_page_tag_ref(ref);
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ set_codetag_empty(&ref);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -59,11 +72,13 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
- alloc_tag_add(ref, task->alloc_tag, PAGE_SIZE * nr);
- put_page_tag_ref(ref);
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -71,11 +86,13 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
static inline void pgalloc_tag_sub(struct page *page, unsigned int nr)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
- alloc_tag_sub(ref, PAGE_SIZE * nr);
- put_page_tag_ref(ref);
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ alloc_tag_sub(&ref, PAGE_SIZE * nr);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -85,13 +102,14 @@ static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
struct alloc_tag *tag = NULL;
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- alloc_tag_sub_check(ref);
- if (ref) {
- if (ref->ct)
- tag = ct_to_alloc_tag(ref->ct);
- put_page_tag_ref(ref);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
+
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ alloc_tag_sub_check(&ref);
+ if (ref.ct)
+ tag = ct_to_alloc_tag(ref.ct);
+ put_page_tag_ref(handle);
}
}
@@ -106,8 +124,6 @@ static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
#else /* CONFIG_MEM_ALLOC_PROFILING */
-static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
-static inline void put_page_tag_ref(union codetag_ref *ref) {}
static inline void clear_page_tag_ref(struct page *page) {}
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr) {}
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 6/6] alloc_tag: support for page allocation tag compression
2024-10-23 17:07 [PATCH v4 0/6] page allocation tag compression Suren Baghdasaryan
` (4 preceding siblings ...)
2024-10-23 17:07 ` [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references Suren Baghdasaryan
@ 2024-10-23 17:07 ` Suren Baghdasaryan
2024-10-23 18:29 ` Pasha Tatashin
5 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-10-23 17:07 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, urezki, hch, petr.pavlu,
samitolvanen, da.gomez, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim,
rientjes, minchan, kaleshsingh, linux-doc, linux-kernel,
linux-arch, linux-mm, maple-tree, linux-modules, kernel-team,
surenb
Implement support for storing page allocation tag references directly
in the page flags instead of page extensions. sysctl.vm.mem_profiling
boot parameter it extended to provide a way for a user to request this
mode. Enabling compression eliminates memory overhead caused by page_ext
and results in better performance for page allocations. However this
mode will not work if the number of available page flag bits is
insufficient to address all kernel allocations. Such condition can
happen during boot or when loading a module. If this condition is
detected, memory allocation profiling gets disabled with an appropriate
warning. By default compression mode is disabled.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
Documentation/mm/allocation-profiling.rst | 7 +-
include/linux/alloc_tag.h | 10 +-
include/linux/codetag.h | 3 +
include/linux/page-flags-layout.h | 7 ++
include/linux/pgalloc_tag.h | 145 +++++++++++++++++++---
lib/alloc_tag.c | 142 +++++++++++++++++++--
lib/codetag.c | 4 +-
mm/mm_init.c | 5 +-
8 files changed, 290 insertions(+), 33 deletions(-)
diff --git a/Documentation/mm/allocation-profiling.rst b/Documentation/mm/allocation-profiling.rst
index ffd6655b7be2..316311240e6a 100644
--- a/Documentation/mm/allocation-profiling.rst
+++ b/Documentation/mm/allocation-profiling.rst
@@ -18,12 +18,17 @@ kconfig options:
missing annotation
Boot parameter:
- sysctl.vm.mem_profiling=0|1|never
+ sysctl.vm.mem_profiling={0|1|never}[,compressed]
When set to "never", memory allocation profiling overhead is minimized and it
cannot be enabled at runtime (sysctl becomes read-only).
When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y, default value is "1".
When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n, default value is "never".
+ "compressed" optional parameter will try to store page tag references in a
+ compact format, avoiding page extensions. This results in improved performance
+ and memory consumption, however it might fail depending on system configuration.
+ If compression fails, a warning is issued and memory allocation profiling gets
+ disabled.
sysctl:
/proc/sys/vm/mem_profiling
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 7431757999c5..4f811ec0ffe0 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,8 +30,16 @@ struct alloc_tag {
struct alloc_tag_counters __percpu *counters;
} __aligned(8);
+struct alloc_tag_kernel_section {
+ struct alloc_tag *first_tag;
+ unsigned long count;
+};
+
struct alloc_tag_module_section {
- unsigned long start_addr;
+ union {
+ unsigned long start_addr;
+ struct alloc_tag *first_tag;
+ };
unsigned long end_addr;
/* used size */
unsigned long size;
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index d10bd9810d32..d14dbd26b370 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -13,6 +13,9 @@ struct codetag_module;
struct seq_buf;
struct module;
+#define CODETAG_SECTION_START_PREFIX "__start_"
+#define CODETAG_SECTION_STOP_PREFIX "__stop_"
+
/*
* An instance of this structure is created in a special ELF section at every
* code location being tagged. At runtime, the special section is treated as
diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h
index 7d79818dc065..4f5c9e979bb9 100644
--- a/include/linux/page-flags-layout.h
+++ b/include/linux/page-flags-layout.h
@@ -111,5 +111,12 @@
ZONES_WIDTH - LRU_GEN_WIDTH - SECTIONS_WIDTH - \
NODES_WIDTH - KASAN_TAG_WIDTH - LAST_CPUPID_WIDTH)
+#define NR_NON_PAGEFLAG_BITS (SECTIONS_WIDTH + NODES_WIDTH + ZONES_WIDTH + \
+ LAST_CPUPID_SHIFT + KASAN_TAG_WIDTH + \
+ LRU_GEN_WIDTH + LRU_REFS_WIDTH)
+
+#define NR_UNUSED_PAGEFLAG_BITS (BITS_PER_LONG - \
+ (NR_NON_PAGEFLAG_BITS + NR_PAGEFLAGS))
+
#endif
#endif /* _LINUX_PAGE_FLAGS_LAYOUT */
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index b13cd3313a88..1fe63b52e5e5 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -11,29 +11,118 @@
#include <linux/page_ext.h>
+extern struct page_ext_operations page_alloc_tagging_ops;
+extern unsigned long alloc_tag_ref_mask;
+extern int alloc_tag_ref_offs;
+extern struct alloc_tag_kernel_section kernel_tags;
+
+DECLARE_STATIC_KEY_FALSE(mem_profiling_compressed);
+
+typedef u16 pgalloc_tag_idx;
+
union pgtag_ref_handle {
union codetag_ref *ref; /* reference in page extension */
+ struct page *page; /* reference in page flags */
};
-extern struct page_ext_operations page_alloc_tagging_ops;
+/* Reserved indexes */
+#define CODETAG_ID_NULL 0
+#define CODETAG_ID_EMPTY 1
+#define CODETAG_ID_FIRST 2
+
+#ifdef CONFIG_MODULES
+
+extern struct alloc_tag_module_section module_tags;
+
+static inline struct alloc_tag *module_idx_to_tag(pgalloc_tag_idx idx)
+{
+ return &module_tags.first_tag[idx - kernel_tags.count];
+}
+
+static inline pgalloc_tag_idx module_tag_to_idx(struct alloc_tag *tag)
+{
+ return CODETAG_ID_FIRST + kernel_tags.count + (tag - module_tags.first_tag);
+}
+
+#else /* CONFIG_MODULES */
+
+static inline struct alloc_tag *module_idx_to_tag(pgalloc_tag_idx idx)
+{
+ pr_warn("invalid page tag reference %lu\n", (unsigned long)idx);
+ return NULL;
+}
+
+static inline pgalloc_tag_idx module_tag_to_idx(struct alloc_tag *tag)
+{
+ pr_warn("invalid page tag 0x%lx\n", (unsigned long)tag);
+ return CODETAG_ID_NULL;
+}
+
+#endif /* CONFIG_MODULES */
+
+static inline void idx_to_ref(pgalloc_tag_idx idx, union codetag_ref *ref)
+{
+ switch (idx) {
+ case (CODETAG_ID_NULL):
+ ref->ct = NULL;
+ break;
+ case (CODETAG_ID_EMPTY):
+ set_codetag_empty(ref);
+ break;
+ default:
+ idx -= CODETAG_ID_FIRST;
+ ref->ct = idx < kernel_tags.count ?
+ &kernel_tags.first_tag[idx].ct :
+ &module_idx_to_tag(idx)->ct;
+ break;
+ }
+}
+
+static inline pgalloc_tag_idx ref_to_idx(union codetag_ref *ref)
+{
+ struct alloc_tag *tag;
+
+ if (!ref->ct)
+ return CODETAG_ID_NULL;
+
+ if (is_codetag_empty(ref))
+ return CODETAG_ID_EMPTY;
+
+ tag = ct_to_alloc_tag(ref->ct);
+ if (tag >= kernel_tags.first_tag && tag < kernel_tags.first_tag + kernel_tags.count)
+ return CODETAG_ID_FIRST + (tag - kernel_tags.first_tag);
+
+ return module_tag_to_idx(tag);
+}
+
+
/* Should be called only if mem_alloc_profiling_enabled() */
static inline bool get_page_tag_ref(struct page *page, union codetag_ref *ref,
union pgtag_ref_handle *handle)
{
- struct page_ext *page_ext;
- union codetag_ref *tmp;
-
if (!page)
return false;
- page_ext = page_ext_get(page);
- if (!page_ext)
- return false;
+ if (static_key_enabled(&mem_profiling_compressed)) {
+ pgalloc_tag_idx idx;
+
+ idx = (page->flags >> alloc_tag_ref_offs) & alloc_tag_ref_mask;
+ idx_to_ref(idx, ref);
+ handle->page = page;
+ } else {
+ struct page_ext *page_ext;
+ union codetag_ref *tmp;
+
+ page_ext = page_ext_get(page);
+ if (!page_ext)
+ return false;
+
+ tmp = (union codetag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
+ ref->ct = tmp->ct;
+ handle->ref = tmp;
+ }
- tmp = (union codetag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
- ref->ct = tmp->ct;
- handle->ref = tmp;
return true;
}
@@ -42,16 +131,35 @@ static inline void put_page_tag_ref(union pgtag_ref_handle handle)
if (WARN_ON(!handle.ref))
return;
- page_ext_put((void *)handle.ref - page_alloc_tagging_ops.offset);
+ if (!static_key_enabled(&mem_profiling_compressed))
+ page_ext_put((void *)handle.ref - page_alloc_tagging_ops.offset);
}
-static inline void update_page_tag_ref(union pgtag_ref_handle handle,
- union codetag_ref *ref)
+static inline void update_page_tag_ref(union pgtag_ref_handle handle, union codetag_ref *ref)
{
- if (WARN_ON(!handle.ref || !ref))
- return;
-
- handle.ref->ct = ref->ct;
+ if (static_key_enabled(&mem_profiling_compressed)) {
+ struct page *page = handle.page;
+ unsigned long old_flags;
+ unsigned long flags;
+ unsigned long idx;
+
+ if (WARN_ON(!page || !ref))
+ return;
+
+ idx = (unsigned long)ref_to_idx(ref);
+ idx = (idx & alloc_tag_ref_mask) << alloc_tag_ref_offs;
+ do {
+ old_flags = READ_ONCE(page->flags);
+ flags = old_flags;
+ flags &= ~(alloc_tag_ref_mask << alloc_tag_ref_offs);
+ flags |= idx;
+ } while (unlikely(!try_cmpxchg(&page->flags, &old_flags, flags)));
+ } else {
+ if (WARN_ON(!handle.ref || !ref))
+ return;
+
+ handle.ref->ct = ref->ct;
+ }
}
static inline void clear_page_tag_ref(struct page *page)
@@ -122,6 +230,8 @@ static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
}
+void __init alloc_tag_sec_init(void);
+
#else /* CONFIG_MEM_ALLOC_PROFILING */
static inline void clear_page_tag_ref(struct page *page) {}
@@ -130,6 +240,7 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; }
static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
+static inline void alloc_tag_sec_init(void) {}
#endif /* CONFIG_MEM_ALLOC_PROFILING */
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 061e43196247..a6f6f014461e 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -3,6 +3,7 @@
#include <linux/execmem.h>
#include <linux/fs.h>
#include <linux/gfp.h>
+#include <linux/kallsyms.h>
#include <linux/module.h>
#include <linux/page_ext.h>
#include <linux/proc_fs.h>
@@ -12,6 +13,8 @@
#define ALLOCINFO_FILE_NAME "allocinfo"
#define MODULE_ALLOC_TAG_VMAP_SIZE (100000UL * sizeof(struct alloc_tag))
+#define SECTION_START(NAME) (CODETAG_SECTION_START_PREFIX NAME)
+#define SECTION_STOP(NAME) (CODETAG_SECTION_STOP_PREFIX NAME)
#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
static bool mem_profiling_support = true;
@@ -26,6 +29,11 @@ EXPORT_SYMBOL(_shared_alloc_tag);
DEFINE_STATIC_KEY_MAYBE(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
mem_alloc_profiling_key);
+DEFINE_STATIC_KEY_FALSE(mem_profiling_compressed);
+
+struct alloc_tag_kernel_section kernel_tags = { NULL, 0 };
+unsigned long alloc_tag_ref_mask;
+int alloc_tag_ref_offs;
struct allocinfo_private {
struct codetag_iterator iter;
@@ -155,7 +163,7 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
return nr;
}
-static void shutdown_mem_profiling(void)
+static void shutdown_mem_profiling(bool remove_file)
{
if (mem_alloc_profiling_enabled())
static_branch_disable(&mem_alloc_profiling_key);
@@ -163,6 +171,8 @@ static void shutdown_mem_profiling(void)
if (!mem_profiling_support)
return;
+ if (remove_file)
+ remove_proc_entry(ALLOCINFO_FILE_NAME, NULL);
mem_profiling_support = false;
}
@@ -173,10 +183,40 @@ static void __init procfs_init(void)
if (!proc_create_seq(ALLOCINFO_FILE_NAME, 0400, NULL, &allocinfo_seq_op)) {
pr_err("Failed to create %s file\n", ALLOCINFO_FILE_NAME);
- shutdown_mem_profiling();
+ shutdown_mem_profiling(false);
}
}
+void __init alloc_tag_sec_init(void)
+{
+ struct alloc_tag *last_codetag;
+
+ if (!mem_profiling_support)
+ return;
+
+ if (!static_key_enabled(&mem_profiling_compressed))
+ return;
+
+ kernel_tags.first_tag = (struct alloc_tag *)kallsyms_lookup_name(
+ SECTION_START(ALLOC_TAG_SECTION_NAME));
+ last_codetag = (struct alloc_tag *)kallsyms_lookup_name(
+ SECTION_STOP(ALLOC_TAG_SECTION_NAME));
+ kernel_tags.count = last_codetag - kernel_tags.first_tag;
+
+ /* Check if kernel tags fit into page flags */
+ if (kernel_tags.count > (1UL << NR_UNUSED_PAGEFLAG_BITS)) {
+ shutdown_mem_profiling(false); /* allocinfo file does not exist yet */
+ pr_err("%lu allocation tags cannot be references using %d available page flag bits. Memory allocation profiling is disabled!\n",
+ kernel_tags.count, NR_UNUSED_PAGEFLAG_BITS);
+ return;
+ }
+
+ alloc_tag_ref_offs = (LRU_REFS_PGOFF - NR_UNUSED_PAGEFLAG_BITS);
+ alloc_tag_ref_mask = ((1UL << NR_UNUSED_PAGEFLAG_BITS) - 1);
+ pr_debug("Memory allocation profiling compression is using %d page flag bits!\n",
+ NR_UNUSED_PAGEFLAG_BITS);
+}
+
#ifdef CONFIG_MODULES
static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
@@ -186,10 +226,59 @@ static struct module unloaded_mod;
/* A dummy object used to indicate a module prepended area */
static struct module prepend_mod;
-static struct alloc_tag_module_section module_tags;
+struct alloc_tag_module_section module_tags;
+
+static inline unsigned long alloc_tag_align(unsigned long val)
+{
+ if (!static_key_enabled(&mem_profiling_compressed)) {
+ /* No alignment requirements when we are not indexing the tags */
+ return val;
+ }
+
+ if (val % sizeof(struct alloc_tag) == 0)
+ return val;
+ return ((val / sizeof(struct alloc_tag)) + 1) * sizeof(struct alloc_tag);
+}
+
+static bool ensure_alignment(unsigned long align, unsigned int *prepend)
+{
+ if (!static_key_enabled(&mem_profiling_compressed)) {
+ /* No alignment requirements when we are not indexing the tags */
+ return true;
+ }
+
+ /*
+ * If alloc_tag size is not a multiple of required alignment, tag
+ * indexing does not work.
+ */
+ if (!IS_ALIGNED(sizeof(struct alloc_tag), align))
+ return false;
+
+ /* Ensure prepend consumes multiple of alloc_tag-sized blocks */
+ if (*prepend)
+ *prepend = alloc_tag_align(*prepend);
+
+ return true;
+}
+
+static inline bool tags_addressable(void)
+{
+ unsigned long tag_idx_count;
+
+ if (!static_key_enabled(&mem_profiling_compressed))
+ return true; /* with page_ext tags are always addressable */
+
+ tag_idx_count = CODETAG_ID_FIRST + kernel_tags.count +
+ module_tags.size / sizeof(struct alloc_tag);
+
+ return tag_idx_count < (1UL << NR_UNUSED_PAGEFLAG_BITS);
+}
static bool needs_section_mem(struct module *mod, unsigned long size)
{
+ if (!mem_profiling_support)
+ return false;
+
return size >= sizeof(struct alloc_tag);
}
@@ -300,6 +389,13 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
if (!align)
align = 1;
+ if (!ensure_alignment(align, &prepend)) {
+ shutdown_mem_profiling(true);
+ pr_err("%s: alignment %lu is incompatible with allocation tag indexing. Memory allocation profiling is disabled!\n",
+ mod->name, align);
+ return ERR_PTR(-EINVAL);
+ }
+
mas_lock(&mas);
if (!find_aligned_area(&mas, section_size, size, prepend, align)) {
ret = ERR_PTR(-ENOMEM);
@@ -343,9 +439,15 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
int grow_res;
module_tags.size = offset + size;
+ if (mem_alloc_profiling_enabled() && !tags_addressable()) {
+ shutdown_mem_profiling(true);
+ pr_warn("With module %s there are too many tags to fit in %d page flag bits. Memory allocation profiling is disabled!\n",
+ mod->name, NR_UNUSED_PAGEFLAG_BITS);
+ }
+
grow_res = vm_module_tags_populate();
if (grow_res) {
- shutdown_mem_profiling();
+ shutdown_mem_profiling(true);
pr_err("Failed to allocate memory for allocation tags in the module %s. Memory allocation profiling is disabled!\n",
mod->name);
return ERR_PTR(grow_res);
@@ -429,6 +531,8 @@ static int __init alloc_mod_tags_mem(void)
module_tags.start_addr = (unsigned long)vm_module_tags->addr;
module_tags.end_addr = module_tags.start_addr + MODULE_ALLOC_TAG_VMAP_SIZE;
+ /* Ensure the base is alloc_tag aligned when required for indexing */
+ module_tags.start_addr = alloc_tag_align(module_tags.start_addr);
return 0;
}
@@ -451,8 +555,10 @@ static inline void free_mod_tags_mem(void) {}
#endif /* CONFIG_MODULES */
+/* See: Documentation/mm/allocation-profiling.rst */
static int __init setup_early_mem_profiling(char *str)
{
+ bool compressed = false;
bool enable;
if (!str || !str[0])
@@ -461,22 +567,37 @@ static int __init setup_early_mem_profiling(char *str)
if (!strncmp(str, "never", 5)) {
enable = false;
mem_profiling_support = false;
+ pr_info("Memory allocation profiling is disabled!\n");
} else {
- int res;
+ char *token = strsep(&str, ",");
+
+ if (kstrtobool(token, &enable))
+ return -EINVAL;
- res = kstrtobool(str, &enable);
- if (res)
- return res;
+ if (str) {
+ if (strcmp(str, "compressed"))
+ return -EINVAL;
+
+ compressed = true;
+ }
mem_profiling_support = true;
+ pr_info("Memory allocation profiling is enabled %s compression and is turned %s!\n",
+ compressed ? "with" : "without", enable ? "on" : "off");
}
- if (enable != static_key_enabled(&mem_alloc_profiling_key)) {
+ if (enable != mem_alloc_profiling_enabled()) {
if (enable)
static_branch_enable(&mem_alloc_profiling_key);
else
static_branch_disable(&mem_alloc_profiling_key);
}
+ if (compressed != static_key_enabled(&mem_profiling_compressed)) {
+ if (compressed)
+ static_branch_enable(&mem_profiling_compressed);
+ else
+ static_branch_disable(&mem_profiling_compressed);
+ }
return 0;
}
@@ -484,6 +605,9 @@ early_param("sysctl.vm.mem_profiling", setup_early_mem_profiling);
static __init bool need_page_alloc_tagging(void)
{
+ if (static_key_enabled(&mem_profiling_compressed))
+ return false;
+
return mem_profiling_support;
}
diff --git a/lib/codetag.c b/lib/codetag.c
index 654496952f86..4949511b4933 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -149,8 +149,8 @@ static struct codetag_range get_section_range(struct module *mod,
const char *section)
{
return (struct codetag_range) {
- get_symbol(mod, "__start_", section),
- get_symbol(mod, "__stop_", section),
+ get_symbol(mod, CODETAG_SECTION_START_PREFIX, section),
+ get_symbol(mod, CODETAG_SECTION_STOP_PREFIX, section),
};
}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 4ba5607aaf19..1c205b0a86ed 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -83,8 +83,7 @@ void __init mminit_verify_pageflags_layout(void)
unsigned long or_mask, add_mask;
shift = BITS_PER_LONG;
- width = shift - SECTIONS_WIDTH - NODES_WIDTH - ZONES_WIDTH
- - LAST_CPUPID_SHIFT - KASAN_TAG_WIDTH - LRU_GEN_WIDTH - LRU_REFS_WIDTH;
+ width = shift - NR_NON_PAGEFLAG_BITS;
mminit_dprintk(MMINIT_TRACE, "pageflags_layout_widths",
"Section %d Node %d Zone %d Lastcpupid %d Kasantag %d Gen %d Tier %d Flags %d\n",
SECTIONS_WIDTH,
@@ -2639,7 +2638,7 @@ void __init mm_core_init(void)
BUILD_BUG_ON(MAX_ZONELISTS > 2);
build_all_zonelists(NULL);
page_alloc_init_cpuhp();
-
+ alloc_tag_sec_init();
/*
* page_ext requires contiguous pages,
* bigger than MAX_PAGE_ORDER unless SPARSEMEM.
--
2.47.0.105.g07ac214952-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v4 1/6] maple_tree: add mas_for_each_rev() helper
2024-10-23 17:07 ` [PATCH v4 1/6] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
@ 2024-10-23 17:24 ` Pasha Tatashin
0 siblings, 0 replies; 17+ messages in thread
From: Pasha Tatashin @ 2024-10-23 17:24 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, souravpanda, keescook,
dennis, jhubbard, urezki, hch, petr.pavlu, samitolvanen, da.gomez,
yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
maple-tree, linux-modules, kernel-team
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Add mas_for_each_rev() function to iterate maple tree nodes in reverse
> order.
>
> Suggested-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/linux/maple_tree.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index 61c236850ca8..cbbcd18d4186 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -592,6 +592,20 @@ static __always_inline void mas_reset(struct ma_state *mas)
> #define mas_for_each(__mas, __entry, __max) \
> while (((__entry) = mas_find((__mas), (__max))) != NULL)
>
> +/**
> + * mas_for_each_rev() - Iterate over a range of the maple tree in reverse order.
> + * @__mas: Maple Tree operation state (maple_state)
> + * @__entry: Entry retrieved from the tree
> + * @__min: minimum index to retrieve from the tree
> + *
> + * When returned, mas->index and mas->last will hold the entire range for the
> + * entry.
> + *
> + * Note: may return the zero entry.
> + */
> +#define mas_for_each_rev(__mas, __entry, __min) \
> + while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
> +
> #ifdef CONFIG_DEBUG_MAPLE_TREE
> enum mt_dump_format {
> mt_dump_dec,
> --
> 2.47.0.105.g07ac214952-goog
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 2/6] alloc_tag: introduce shutdown_mem_profiling helper function
2024-10-23 17:07 ` [PATCH v4 2/6] alloc_tag: introduce shutdown_mem_profiling helper function Suren Baghdasaryan
@ 2024-10-23 17:26 ` Pasha Tatashin
0 siblings, 0 replies; 17+ messages in thread
From: Pasha Tatashin @ 2024-10-23 17:26 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, souravpanda, keescook,
dennis, jhubbard, urezki, hch, petr.pavlu, samitolvanen, da.gomez,
yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
maple-tree, linux-modules, kernel-team
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Implement a helper function to disable memory allocation profiling and
> use it when creation of /proc/allocinfo fails.
> Ensure /proc/allocinfo does not get created when memory allocation
> profiling is disabled.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references
2024-10-23 17:07 ` [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references Suren Baghdasaryan
@ 2024-10-23 17:35 ` Pasha Tatashin
2024-10-23 21:00 ` Andrew Morton
1 sibling, 0 replies; 17+ messages in thread
From: Pasha Tatashin @ 2024-10-23 17:35 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, souravpanda, keescook,
dennis, jhubbard, urezki, hch, petr.pavlu, samitolvanen, da.gomez,
yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
maple-tree, linux-modules, kernel-team
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> To simplify later changes to page tag references, introduce new
> pgtag_ref_handle type. This allows easy replacement of page_ext
> as a storage of page allocation tags.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 3/6] alloc_tag: load module tags into separate contiguous memory
2024-10-23 17:07 ` [PATCH v4 3/6] alloc_tag: load module tags into separate contiguous memory Suren Baghdasaryan
@ 2024-10-23 18:05 ` Pasha Tatashin
0 siblings, 0 replies; 17+ messages in thread
From: Pasha Tatashin @ 2024-10-23 18:05 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, souravpanda, keescook,
dennis, jhubbard, urezki, hch, petr.pavlu, samitolvanen, da.gomez,
yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
maple-tree, linux-modules, kernel-team
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> When a module gets unloaded there is a possibility that some of the
> allocations it made are still used and therefore the allocation tags
> corresponding to these allocations are still referenced. As such, the
> memory for these tags can't be freed. This is currently handled as an
> abnormal situation and module's data section is not being unloaded.
> To handle this situation without keeping module's data in memory,
> allow codetags with longer lifespan than the module to be loaded into
> their own separate memory. The in-use memory areas and gaps after
> module unloading in this separate memory are tracked using maple trees.
> Allocation tags arrange their separate memory so that it is virtually
> contiguous and that will allow simple allocation tag indexing later on
> in this patchset. The size of this virtually contiguous memory is set
> to store up to 100000 allocation tags.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> include/asm-generic/codetag.lds.h | 19 +++
> include/linux/alloc_tag.h | 13 +-
> include/linux/codetag.h | 37 ++++-
> kernel/module/main.c | 80 ++++++----
> lib/alloc_tag.c | 249 +++++++++++++++++++++++++++---
> lib/codetag.c | 100 +++++++++++-
> scripts/module.lds.S | 5 +-
> 7 files changed, 441 insertions(+), 62 deletions(-)
>
> diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
> index 64f536b80380..372c320c5043 100644
> --- a/include/asm-generic/codetag.lds.h
> +++ b/include/asm-generic/codetag.lds.h
> @@ -11,4 +11,23 @@
> #define CODETAG_SECTIONS() \
> SECTION_WITH_BOUNDARIES(alloc_tags)
>
> +/*
> + * Module codetags which aren't used after module unload, therefore have the
> + * same lifespan as the module and can be safely unloaded with the module.
> + */
> +#define MOD_CODETAG_SECTIONS()
> +
> +#define MOD_SEPARATE_CODETAG_SECTION(_name) \
> + .codetag.##_name : { \
> + SECTION_WITH_BOUNDARIES(_name) \
> + }
> +
> +/*
> + * For codetags which might be used after module unload, therefore might stay
> + * longer in memory. Each such codetag type has its own section so that we can
> + * unload them individually once unused.
> + */
> +#define MOD_SEPARATE_CODETAG_SECTIONS() \
> + MOD_SEPARATE_CODETAG_SECTION(alloc_tags)
> +
> #endif /* __ASM_GENERIC_CODETAG_LDS_H */
> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> index 1f0a9ff23a2c..7431757999c5 100644
> --- a/include/linux/alloc_tag.h
> +++ b/include/linux/alloc_tag.h
> @@ -30,6 +30,13 @@ struct alloc_tag {
> struct alloc_tag_counters __percpu *counters;
> } __aligned(8);
>
> +struct alloc_tag_module_section {
> + unsigned long start_addr;
> + unsigned long end_addr;
> + /* used size */
> + unsigned long size;
> +};
> +
> #ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
>
> #define CODETAG_EMPTY ((void *)1)
> @@ -54,6 +61,8 @@ static inline void set_codetag_empty(union codetag_ref *ref) {}
>
> #ifdef CONFIG_MEM_ALLOC_PROFILING
>
> +#define ALLOC_TAG_SECTION_NAME "alloc_tags"
> +
> struct codetag_bytes {
> struct codetag *ct;
> s64 bytes;
> @@ -76,7 +85,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
>
> #define DEFINE_ALLOC_TAG(_alloc_tag) \
> static struct alloc_tag _alloc_tag __used __aligned(8) \
> - __section("alloc_tags") = { \
> + __section(ALLOC_TAG_SECTION_NAME) = { \
> .ct = CODE_TAG_INIT, \
> .counters = &_shared_alloc_tag };
>
> @@ -85,7 +94,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
> #define DEFINE_ALLOC_TAG(_alloc_tag) \
> static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr); \
> static struct alloc_tag _alloc_tag __used __aligned(8) \
> - __section("alloc_tags") = { \
> + __section(ALLOC_TAG_SECTION_NAME) = { \
> .ct = CODE_TAG_INIT, \
> .counters = &_alloc_tag_cntr };
>
> diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> index c2a579ccd455..d10bd9810d32 100644
> --- a/include/linux/codetag.h
> +++ b/include/linux/codetag.h
> @@ -35,8 +35,15 @@ struct codetag_type_desc {
> size_t tag_size;
> void (*module_load)(struct codetag_type *cttype,
> struct codetag_module *cmod);
> - bool (*module_unload)(struct codetag_type *cttype,
> + void (*module_unload)(struct codetag_type *cttype,
> struct codetag_module *cmod);
> +#ifdef CONFIG_MODULES
> + void (*module_replaced)(struct module *mod, struct module *new_mod);
> + bool (*needs_section_mem)(struct module *mod, unsigned long size);
> + void *(*alloc_section_mem)(struct module *mod, unsigned long size,
> + unsigned int prepend, unsigned long align);
> + void (*free_section_mem)(struct module *mod, bool used);
> +#endif
> };
>
> struct codetag_iterator {
> @@ -71,11 +78,31 @@ struct codetag_type *
> codetag_register_type(const struct codetag_type_desc *desc);
>
> #if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
> +
> +bool codetag_needs_module_section(struct module *mod, const char *name,
> + unsigned long size);
> +void *codetag_alloc_module_section(struct module *mod, const char *name,
> + unsigned long size, unsigned int prepend,
> + unsigned long align);
> +void codetag_free_module_sections(struct module *mod);
> +void codetag_module_replaced(struct module *mod, struct module *new_mod);
> void codetag_load_module(struct module *mod);
> -bool codetag_unload_module(struct module *mod);
> -#else
> +void codetag_unload_module(struct module *mod);
> +
> +#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
> +
> +static inline bool
> +codetag_needs_module_section(struct module *mod, const char *name,
> + unsigned long size) { return false; }
> +static inline void *
> +codetag_alloc_module_section(struct module *mod, const char *name,
> + unsigned long size, unsigned int prepend,
> + unsigned long align) { return NULL; }
> +static inline void codetag_free_module_sections(struct module *mod) {}
> +static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
> static inline void codetag_load_module(struct module *mod) {}
> -static inline bool codetag_unload_module(struct module *mod) { return true; }
> -#endif
> +static inline void codetag_unload_module(struct module *mod) {}
> +
> +#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
>
> #endif /* _LINUX_CODETAG_H */
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index ef54733bd7d2..1787686e5cae 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -1254,22 +1254,17 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
> return 0;
> }
>
> -static void module_memory_free(struct module *mod, enum mod_mem_type type,
> - bool unload_codetags)
> +static void module_memory_free(struct module *mod, enum mod_mem_type type)
> {
> struct module_memory *mem = &mod->mem[type];
> - void *ptr = mem->base;
>
> if (mem->is_rox)
> vfree(mem->rw_copy);
>
> - if (!unload_codetags && mod_mem_type_is_core_data(type))
> - return;
> -
> - execmem_free(ptr);
> + execmem_free(mem->base);
> }
>
> -static void free_mod_mem(struct module *mod, bool unload_codetags)
> +static void free_mod_mem(struct module *mod)
> {
> for_each_mod_mem_type(type) {
> struct module_memory *mod_mem = &mod->mem[type];
> @@ -1280,25 +1275,20 @@ static void free_mod_mem(struct module *mod, bool unload_codetags)
> /* Free lock-classes; relies on the preceding sync_rcu(). */
> lockdep_free_key_range(mod_mem->base, mod_mem->size);
> if (mod_mem->size)
> - module_memory_free(mod, type, unload_codetags);
> + module_memory_free(mod, type);
> }
>
> /* MOD_DATA hosts mod, so free it at last */
> lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
> - module_memory_free(mod, MOD_DATA, unload_codetags);
> + module_memory_free(mod, MOD_DATA);
> }
>
> /* Free a module, remove from lists, etc. */
> static void free_module(struct module *mod)
> {
> - bool unload_codetags;
> -
> trace_module_free(mod);
>
> - unload_codetags = codetag_unload_module(mod);
> - if (!unload_codetags)
> - pr_warn("%s: memory allocation(s) from the module still alive, cannot unload cleanly\n",
> - mod->name);
> + codetag_unload_module(mod);
>
> mod_sysfs_teardown(mod);
>
> @@ -1341,7 +1331,7 @@ static void free_module(struct module *mod)
> kfree(mod->args);
> percpu_modfree(mod);
>
> - free_mod_mem(mod, unload_codetags);
> + free_mod_mem(mod);
> }
>
> void *__symbol_get(const char *symbol)
> @@ -1606,6 +1596,20 @@ static void __layout_sections(struct module *mod, struct load_info *info, bool i
> if (WARN_ON_ONCE(type == MOD_INVALID))
> continue;
>
> + /*
> + * Do not allocate codetag memory as we load it into
> + * preallocated contiguous memory.
> + */
> + if (codetag_needs_module_section(mod, sname, s->sh_size)) {
> + /*
> + * s->sh_entsize won't be used but populate the
> + * type field to avoid confusion.
> + */
> + s->sh_entsize = ((unsigned long)(type) & SH_ENTSIZE_TYPE_MASK)
> + << SH_ENTSIZE_TYPE_SHIFT;
> + continue;
> + }
> +
> s->sh_entsize = module_get_offset_and_type(mod, type, s, i);
> pr_debug("\t%s\n", sname);
> }
> @@ -2280,6 +2284,7 @@ static int move_module(struct module *mod, struct load_info *info)
> int i;
> enum mod_mem_type t = 0;
> int ret = -ENOMEM;
> + bool codetag_section_found = false;
>
> for_each_mod_mem_type(type) {
> if (!mod->mem[type].size) {
> @@ -2291,7 +2296,7 @@ static int move_module(struct module *mod, struct load_info *info)
> ret = module_memory_alloc(mod, type);
> if (ret) {
> t = type;
> - goto out_enomem;
> + goto out_err;
> }
> }
>
> @@ -2300,15 +2305,33 @@ static int move_module(struct module *mod, struct load_info *info)
> for (i = 0; i < info->hdr->e_shnum; i++) {
> void *dest;
> Elf_Shdr *shdr = &info->sechdrs[i];
> - enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
> - unsigned long offset = shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK;
> + const char *sname;
> unsigned long addr;
>
> if (!(shdr->sh_flags & SHF_ALLOC))
> continue;
>
> - addr = (unsigned long)mod->mem[type].base + offset;
> - dest = mod->mem[type].rw_copy + offset;
> + sname = info->secstrings + shdr->sh_name;
> + /*
> + * Load codetag sections separately as they might still be used
> + * after module unload.
> + */
> + if (codetag_needs_module_section(mod, sname, shdr->sh_size)) {
> + dest = codetag_alloc_module_section(mod, sname, shdr->sh_size,
> + arch_mod_section_prepend(mod, i), shdr->sh_addralign);
> + if (IS_ERR(dest)) {
> + ret = PTR_ERR(dest);
> + goto out_err;
> + }
> + addr = (unsigned long)dest;
> + codetag_section_found = true;
> + } else {
> + enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
> + unsigned long offset = shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK;
> +
> + addr = (unsigned long)mod->mem[type].base + offset;
> + dest = mod->mem[type].rw_copy + offset;
> + }
>
> if (shdr->sh_type != SHT_NOBITS) {
> /*
> @@ -2320,7 +2343,7 @@ static int move_module(struct module *mod, struct load_info *info)
> if (i == info->index.mod &&
> (WARN_ON_ONCE(shdr->sh_size != sizeof(struct module)))) {
> ret = -ENOEXEC;
> - goto out_enomem;
> + goto out_err;
> }
> memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
> }
> @@ -2336,9 +2359,12 @@ static int move_module(struct module *mod, struct load_info *info)
> }
>
> return 0;
> -out_enomem:
> +out_err:
> for (t--; t >= 0; t--)
> - module_memory_free(mod, t, true);
> + module_memory_free(mod, t);
> + if (codetag_section_found)
> + codetag_free_module_sections(mod);
> +
> return ret;
> }
>
> @@ -2459,6 +2485,8 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> /* Module has been copied to its final place now: return it. */
> mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> kmemleak_load_module(mod, info);
> + codetag_module_replaced(info->mod, mod);
> +
> return mod;
> }
>
> @@ -2468,7 +2496,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
> percpu_modfree(mod);
> module_arch_freeing_init(mod);
>
> - free_mod_mem(mod, true);
> + free_mod_mem(mod);
> }
>
> int __weak module_finalize(const Elf_Ehdr *hdr,
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index 435aa837e550..d9f51169ffeb 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
> #include <linux/alloc_tag.h>
> +#include <linux/execmem.h>
> #include <linux/fs.h>
> #include <linux/gfp.h>
> #include <linux/module.h>
> @@ -9,6 +10,7 @@
> #include <linux/seq_file.h>
>
> #define ALLOCINFO_FILE_NAME "allocinfo"
> +#define MODULE_ALLOC_TAG_VMAP_SIZE (100000UL * sizeof(struct alloc_tag))
>
> #ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
> static bool mem_profiling_support __meminitdata = true;
> @@ -174,31 +176,226 @@ static void __init procfs_init(void)
> }
> }
>
> -static bool alloc_tag_module_unload(struct codetag_type *cttype,
> - struct codetag_module *cmod)
> +#ifdef CONFIG_MODULES
> +
> +static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
> +/* A dummy object used to indicate an unloaded module */
> +static struct module unloaded_mod;
> +/* A dummy object used to indicate a module prepended area */
> +static struct module prepend_mod;
> +
> +static struct alloc_tag_module_section module_tags;
> +
> +static bool needs_section_mem(struct module *mod, unsigned long size)
> {
> - struct codetag_iterator iter = codetag_get_ct_iter(cttype);
> - struct alloc_tag_counters counter;
> - bool module_unused = true;
> - struct alloc_tag *tag;
> - struct codetag *ct;
> + return size >= sizeof(struct alloc_tag);
> +}
> +
> +static struct alloc_tag *find_used_tag(struct alloc_tag *from, struct alloc_tag *to)
> +{
> + while (from <= to) {
> + struct alloc_tag_counters counter;
>
> - for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
> - if (iter.cmod != cmod)
> + counter = alloc_tag_read(from);
> + if (counter.bytes)
> + return from;
> + from++;
> + }
> +
> + return NULL;
> +}
> +
> +/* Called with mod_area_mt locked */
> +static void clean_unused_module_areas_locked(void)
> +{
> + MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
> + struct module *val;
> +
> + mas_for_each(&mas, val, module_tags.size) {
> + if (val != &unloaded_mod)
> continue;
>
> - tag = ct_to_alloc_tag(ct);
> - counter = alloc_tag_read(tag);
> + /* Release area if all tags are unused */
> + if (!find_used_tag((struct alloc_tag *)(module_tags.start_addr + mas.index),
> + (struct alloc_tag *)(module_tags.start_addr + mas.last)))
> + mas_erase(&mas);
> + }
> +}
> +
> +/* Called with mod_area_mt locked */
> +static bool find_aligned_area(struct ma_state *mas, unsigned long section_size,
> + unsigned long size, unsigned int prepend, unsigned long align)
> +{
> + bool cleanup_done = false;
> +
> +repeat:
> + /* Try finding exact size and hope the start is aligned */
> + if (!mas_empty_area(mas, 0, section_size - 1, prepend + size)) {
> + if (IS_ALIGNED(mas->index + prepend, align))
> + return true;
> +
> + /* Try finding larger area to align later */
> + mas_reset(mas);
> + if (!mas_empty_area(mas, 0, section_size - 1,
> + size + prepend + align - 1))
> + return true;
> + }
>
> - if (WARN(counter.bytes,
> - "%s:%u module %s func:%s has %llu allocated at module unload",
> - ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
> - module_unused = false;
> + /* No free area, try cleanup stale data and repeat the search once */
> + if (!cleanup_done) {
> + clean_unused_module_areas_locked();
> + cleanup_done = true;
> + mas_reset(mas);
> + goto repeat;
> }
>
> - return module_unused;
> + return false;
> +}
> +
> +static void *reserve_module_tags(struct module *mod, unsigned long size,
> + unsigned int prepend, unsigned long align)
> +{
> + unsigned long section_size = module_tags.end_addr - module_tags.start_addr;
> + MA_STATE(mas, &mod_area_mt, 0, section_size - 1);
> + unsigned long offset;
> + void *ret = NULL;
> +
> + /* If no tags return NULL */
> + if (size < sizeof(struct alloc_tag))
> + return NULL;
> +
> + /*
> + * align is always power of 2, so we can use IS_ALIGNED and ALIGN.
> + * align 0 or 1 means no alignment, to simplify set to 1.
> + */
> + if (!align)
> + align = 1;
> +
> + mas_lock(&mas);
> + if (!find_aligned_area(&mas, section_size, size, prepend, align)) {
> + ret = ERR_PTR(-ENOMEM);
> + goto unlock;
> + }
> +
> + /* Mark found area as reserved */
> + offset = mas.index;
> + offset += prepend;
> + offset = ALIGN(offset, align);
> + if (offset != mas.index) {
> + unsigned long pad_start = mas.index;
> +
> + mas.last = offset - 1;
> + mas_store(&mas, &prepend_mod);
> + if (mas_is_err(&mas)) {
> + ret = ERR_PTR(xa_err(mas.node));
> + goto unlock;
> + }
> + mas.index = offset;
> + mas.last = offset + size - 1;
> + mas_store(&mas, mod);
> + if (mas_is_err(&mas)) {
> + mas.index = pad_start;
> + mas_erase(&mas);
> + ret = ERR_PTR(xa_err(mas.node));
> + }
> + } else {
> + mas.last = offset + size - 1;
> + mas_store(&mas, mod);
> + if (mas_is_err(&mas))
> + ret = ERR_PTR(xa_err(mas.node));
> + }
> +unlock:
> + mas_unlock(&mas);
> +
> + if (IS_ERR(ret))
> + return ret;
> +
> + if (module_tags.size < offset + size)
> + module_tags.size = offset + size;
> +
> + return (struct alloc_tag *)(module_tags.start_addr + offset);
> }
>
> +static void release_module_tags(struct module *mod, bool used)
> +{
> + MA_STATE(mas, &mod_area_mt, module_tags.size, module_tags.size);
> + struct alloc_tag *tag;
> + struct module *val;
> +
> + mas_lock(&mas);
> + mas_for_each_rev(&mas, val, 0)
> + if (val == mod)
> + break;
> +
> + if (!val) /* module not found */
> + goto out;
> +
> + if (!used)
> + goto release_area;
> +
> + /* Find out if the area is used */
> + tag = find_used_tag((struct alloc_tag *)(module_tags.start_addr + mas.index),
> + (struct alloc_tag *)(module_tags.start_addr + mas.last));
> + if (tag) {
> + struct alloc_tag_counters counter = alloc_tag_read(tag);
> +
> + pr_info("%s:%u module %s func:%s has %llu allocated at module unload\n",
> + tag->ct.filename, tag->ct.lineno, tag->ct.modname,
> + tag->ct.function, counter.bytes);
> + } else {
> + used = false;
> + }
> +release_area:
> + mas_store(&mas, used ? &unloaded_mod : NULL);
> + val = mas_prev_range(&mas, 0);
> + if (val == &prepend_mod)
> + mas_store(&mas, NULL);
> +out:
> + mas_unlock(&mas);
> +}
> +
> +static void replace_module(struct module *mod, struct module *new_mod)
> +{
> + MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
> + struct module *val;
> +
> + mas_lock(&mas);
> + mas_for_each(&mas, val, module_tags.size) {
> + if (val != mod)
> + continue;
> +
> + mas_store_gfp(&mas, new_mod, GFP_KERNEL);
> + break;
> + }
> + mas_unlock(&mas);
> +}
> +
> +static int __init alloc_mod_tags_mem(void)
> +{
> + /* Allocate space to copy allocation tags */
> + module_tags.start_addr = (unsigned long)execmem_alloc(EXECMEM_MODULE_DATA,
> + MODULE_ALLOC_TAG_VMAP_SIZE);
> + if (!module_tags.start_addr)
> + return -ENOMEM;
> +
> + module_tags.end_addr = module_tags.start_addr + MODULE_ALLOC_TAG_VMAP_SIZE;
> +
> + return 0;
> +}
> +
> +static void __init free_mod_tags_mem(void)
> +{
> + execmem_free((void *)module_tags.start_addr);
> + module_tags.start_addr = 0;
> +}
> +
> +#else /* CONFIG_MODULES */
> +
> +static inline int alloc_mod_tags_mem(void) { return 0; }
> +static inline void free_mod_tags_mem(void) {}
> +
> +#endif /* CONFIG_MODULES */
> +
> static int __init setup_early_mem_profiling(char *str)
> {
> bool enable;
> @@ -274,14 +471,26 @@ static inline void sysctl_init(void) {}
> static int __init alloc_tag_init(void)
> {
> const struct codetag_type_desc desc = {
> - .section = "alloc_tags",
> - .tag_size = sizeof(struct alloc_tag),
> - .module_unload = alloc_tag_module_unload,
> + .section = ALLOC_TAG_SECTION_NAME,
> + .tag_size = sizeof(struct alloc_tag),
> +#ifdef CONFIG_MODULES
> + .needs_section_mem = needs_section_mem,
> + .alloc_section_mem = reserve_module_tags,
> + .free_section_mem = release_module_tags,
> + .module_replaced = replace_module,
> +#endif
> };
> + int res;
> +
> + res = alloc_mod_tags_mem();
> + if (res)
> + return res;
>
> alloc_tag_cttype = codetag_register_type(&desc);
> - if (IS_ERR(alloc_tag_cttype))
> + if (IS_ERR(alloc_tag_cttype)) {
> + free_mod_tags_mem();
> return PTR_ERR(alloc_tag_cttype);
> + }
>
> sysctl_init();
> procfs_init();
> diff --git a/lib/codetag.c b/lib/codetag.c
> index d1fbbb7c2ec3..654496952f86 100644
> --- a/lib/codetag.c
> +++ b/lib/codetag.c
> @@ -207,6 +207,94 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
> }
>
> #ifdef CONFIG_MODULES
> +#define CODETAG_SECTION_PREFIX ".codetag."
> +
> +/* Some codetag types need a separate module section */
> +bool codetag_needs_module_section(struct module *mod, const char *name,
> + unsigned long size)
> +{
> + const char *type_name;
> + struct codetag_type *cttype;
> + bool ret = false;
> +
> + if (strncmp(name, CODETAG_SECTION_PREFIX, strlen(CODETAG_SECTION_PREFIX)))
> + return false;
> +
> + type_name = name + strlen(CODETAG_SECTION_PREFIX);
> + mutex_lock(&codetag_lock);
> + list_for_each_entry(cttype, &codetag_types, link) {
> + if (strcmp(type_name, cttype->desc.section) == 0) {
> + if (!cttype->desc.needs_section_mem)
> + break;
> +
> + down_write(&cttype->mod_lock);
> + ret = cttype->desc.needs_section_mem(mod, size);
> + up_write(&cttype->mod_lock);
> + break;
> + }
> + }
> + mutex_unlock(&codetag_lock);
> +
> + return ret;
> +}
> +
> +void *codetag_alloc_module_section(struct module *mod, const char *name,
> + unsigned long size, unsigned int prepend,
> + unsigned long align)
> +{
> + const char *type_name = name + strlen(CODETAG_SECTION_PREFIX);
> + struct codetag_type *cttype;
> + void *ret = NULL;
> +
> + mutex_lock(&codetag_lock);
> + list_for_each_entry(cttype, &codetag_types, link) {
> + if (strcmp(type_name, cttype->desc.section) == 0) {
> + if (WARN_ON(!cttype->desc.alloc_section_mem))
> + break;
> +
> + down_write(&cttype->mod_lock);
> + ret = cttype->desc.alloc_section_mem(mod, size, prepend, align);
> + up_write(&cttype->mod_lock);
> + break;
> + }
> + }
> + mutex_unlock(&codetag_lock);
> +
> + return ret;
> +}
> +
> +void codetag_free_module_sections(struct module *mod)
> +{
> + struct codetag_type *cttype;
> +
> + mutex_lock(&codetag_lock);
> + list_for_each_entry(cttype, &codetag_types, link) {
> + if (!cttype->desc.free_section_mem)
> + continue;
> +
> + down_write(&cttype->mod_lock);
> + cttype->desc.free_section_mem(mod, false);
> + up_write(&cttype->mod_lock);
> + }
> + mutex_unlock(&codetag_lock);
> +}
> +
> +void codetag_module_replaced(struct module *mod, struct module *new_mod)
> +{
> + struct codetag_type *cttype;
> +
> + mutex_lock(&codetag_lock);
> + list_for_each_entry(cttype, &codetag_types, link) {
> + if (!cttype->desc.module_replaced)
> + continue;
> +
> + down_write(&cttype->mod_lock);
> + cttype->desc.module_replaced(mod, new_mod);
> + up_write(&cttype->mod_lock);
> + }
> + mutex_unlock(&codetag_lock);
> +}
> +
> void codetag_load_module(struct module *mod)
> {
> struct codetag_type *cttype;
> @@ -220,13 +308,12 @@ void codetag_load_module(struct module *mod)
> mutex_unlock(&codetag_lock);
> }
>
> -bool codetag_unload_module(struct module *mod)
> +void codetag_unload_module(struct module *mod)
> {
> struct codetag_type *cttype;
> - bool unload_ok = true;
>
> if (!mod)
> - return true;
> + return;
>
> /* await any module's kfree_rcu() operations to complete */
> kvfree_rcu_barrier();
> @@ -246,18 +333,17 @@ bool codetag_unload_module(struct module *mod)
> }
> if (found) {
> if (cttype->desc.module_unload)
> - if (!cttype->desc.module_unload(cttype, cmod))
> - unload_ok = false;
> + cttype->desc.module_unload(cttype, cmod);
>
> cttype->count -= range_size(cttype, &cmod->range);
> idr_remove(&cttype->mod_idr, mod_id);
> kfree(cmod);
> }
> up_write(&cttype->mod_lock);
> + if (found && cttype->desc.free_section_mem)
> + cttype->desc.free_section_mem(mod, true);
> }
> mutex_unlock(&codetag_lock);
> -
> - return unload_ok;
> }
> #endif /* CONFIG_MODULES */
>
> diff --git a/scripts/module.lds.S b/scripts/module.lds.S
> index 3f43edef813c..711c6e029936 100644
> --- a/scripts/module.lds.S
> +++ b/scripts/module.lds.S
> @@ -50,7 +50,7 @@ SECTIONS {
> .data : {
> *(.data .data.[0-9a-zA-Z_]*)
> *(.data..L*)
> - CODETAG_SECTIONS()
> + MOD_CODETAG_SECTIONS()
> }
>
> .rodata : {
> @@ -59,9 +59,10 @@ SECTIONS {
> }
> #else
> .data : {
> - CODETAG_SECTIONS()
> + MOD_CODETAG_SECTIONS()
> }
> #endif
> + MOD_SEPARATE_CODETAG_SECTIONS()
> }
>
> /* bring in arch-specific sections */
> --
> 2.47.0.105.g07ac214952-goog
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 4/6] alloc_tag: populate memory for module tags as needed
2024-10-23 17:07 ` [PATCH v4 4/6] alloc_tag: populate memory for module tags as needed Suren Baghdasaryan
@ 2024-10-23 18:28 ` Pasha Tatashin
0 siblings, 0 replies; 17+ messages in thread
From: Pasha Tatashin @ 2024-10-23 18:28 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, souravpanda, keescook,
dennis, jhubbard, urezki, hch, petr.pavlu, samitolvanen, da.gomez,
yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
maple-tree, linux-modules, kernel-team
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> The memory reserved for module tags does not need to be backed by
> physical pages until there are tags to store there. Change the way
> we reserve this memory to allocate only virtual area for the tags
> and populate it with physical pages as needed when we load a module.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 6/6] alloc_tag: support for page allocation tag compression
2024-10-23 17:07 ` [PATCH v4 6/6] alloc_tag: support for page allocation tag compression Suren Baghdasaryan
@ 2024-10-23 18:29 ` Pasha Tatashin
2024-10-24 3:48 ` Suren Baghdasaryan
0 siblings, 1 reply; 17+ messages in thread
From: Pasha Tatashin @ 2024-10-23 18:29 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, souravpanda, keescook,
dennis, jhubbard, urezki, hch, petr.pavlu, samitolvanen, da.gomez,
yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
maple-tree, linux-modules, kernel-team
On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Implement support for storing page allocation tag references directly
> in the page flags instead of page extensions. sysctl.vm.mem_profiling
> boot parameter it extended to provide a way for a user to request this
> mode. Enabling compression eliminates memory overhead caused by page_ext
> and results in better performance for page allocations. However this
> mode will not work if the number of available page flag bits is
> insufficient to address all kernel allocations. Such condition can
> happen during boot or when loading a module. If this condition is
> detected, memory allocation profiling gets disabled with an appropriate
> warning. By default compression mode is disabled.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Thank you very much Suren for doing this work. This is a very
significant improvement for the fleet users.
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references
2024-10-23 17:07 ` [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references Suren Baghdasaryan
2024-10-23 17:35 ` Pasha Tatashin
@ 2024-10-23 21:00 ` Andrew Morton
2024-10-23 21:09 ` Suren Baghdasaryan
1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2024-10-23 21:00 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, urezki, hch, petr.pavlu,
samitolvanen, da.gomez, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim,
rientjes, minchan, kaleshsingh, linux-doc, linux-kernel,
linux-arch, linux-mm, maple-tree, linux-modules, kernel-team
On Wed, 23 Oct 2024 10:07:58 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> To simplify later changes to page tag references, introduce new
> pgtag_ref_handle type. This allows easy replacement of page_ext
> as a storage of page allocation tags.
>
> ...
>
> static inline void pgalloc_tag_copy(struct folio *new, struct folio *old)
> {
> + union pgtag_ref_handle handle;
> + union codetag_ref ref;
> struct alloc_tag *tag;
> - union codetag_ref *ref;
>
> tag = pgalloc_tag_get(&old->page);
> if (!tag)
> return;
>
> - ref = get_page_tag_ref(&new->page);
> - if (!ref)
> + if (!get_page_tag_ref(&new->page, &ref, &handle))
> return;
>
> /* Clear the old ref to the original allocation tag. */
> clear_page_tag_ref(&old->page);
> /* Decrement the counters of the tag on get_new_folio. */
> - alloc_tag_sub(ref, folio_nr_pages(new));
> -
> - __alloc_tag_ref_set(ref, tag);
> -
> - put_page_tag_ref(ref);
> + alloc_tag_sub(&ref, folio_nr_pages(new));
mm-stable has folio_size(new) here, fixed up.
I think we aleady discussed this, but there's a crazy amount of
inlining here. pgalloc_tag_split() is huge, and has four callsites.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references
2024-10-23 21:00 ` Andrew Morton
@ 2024-10-23 21:09 ` Suren Baghdasaryan
2024-10-24 16:25 ` Suren Baghdasaryan
0 siblings, 1 reply; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-10-23 21:09 UTC (permalink / raw)
To: Andrew Morton
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, urezki, hch, petr.pavlu,
samitolvanen, da.gomez, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim,
rientjes, minchan, kaleshsingh, linux-doc, linux-kernel,
linux-arch, linux-mm, maple-tree, linux-modules, kernel-team
On Wed, Oct 23, 2024 at 2:00 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 23 Oct 2024 10:07:58 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > To simplify later changes to page tag references, introduce new
> > pgtag_ref_handle type. This allows easy replacement of page_ext
> > as a storage of page allocation tags.
> >
> > ...
> >
> > static inline void pgalloc_tag_copy(struct folio *new, struct folio *old)
> > {
> > + union pgtag_ref_handle handle;
> > + union codetag_ref ref;
> > struct alloc_tag *tag;
> > - union codetag_ref *ref;
> >
> > tag = pgalloc_tag_get(&old->page);
> > if (!tag)
> > return;
> >
> > - ref = get_page_tag_ref(&new->page);
> > - if (!ref)
> > + if (!get_page_tag_ref(&new->page, &ref, &handle))
> > return;
> >
> > /* Clear the old ref to the original allocation tag. */
> > clear_page_tag_ref(&old->page);
> > /* Decrement the counters of the tag on get_new_folio. */
> > - alloc_tag_sub(ref, folio_nr_pages(new));
> > -
> > - __alloc_tag_ref_set(ref, tag);
> > -
> > - put_page_tag_ref(ref);
> > + alloc_tag_sub(&ref, folio_nr_pages(new));
>
> mm-stable has folio_size(new) here, fixed up.
Oh, right. You merged that patch tonight and I formatted my patchset
yesterday :)
Thanks for the fixup.
>
> I think we aleady discussed this, but there's a crazy amount of
> inlining here. pgalloc_tag_split() is huge, and has four callsites.
I must have missed that discussion but I am happy to unline this
function. I think splitting is heavy enough operation that this
uninlining would not have be noticeable.
Thanks!
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 6/6] alloc_tag: support for page allocation tag compression
2024-10-23 18:29 ` Pasha Tatashin
@ 2024-10-24 3:48 ` Suren Baghdasaryan
0 siblings, 0 replies; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-10-24 3:48 UTC (permalink / raw)
To: Pasha Tatashin
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, souravpanda, keescook,
dennis, jhubbard, urezki, hch, petr.pavlu, samitolvanen, da.gomez,
yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
maple-tree, linux-modules, kernel-team
On Wed, Oct 23, 2024 at 11:30 AM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> On Wed, Oct 23, 2024 at 1:08 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > Implement support for storing page allocation tag references directly
> > in the page flags instead of page extensions. sysctl.vm.mem_profiling
> > boot parameter it extended to provide a way for a user to request this
> > mode. Enabling compression eliminates memory overhead caused by page_ext
> > and results in better performance for page allocations. However this
> > mode will not work if the number of available page flag bits is
> > insufficient to address all kernel allocations. Such condition can
> > happen during boot or when loading a module. If this condition is
> > detected, memory allocation profiling gets disabled with an appropriate
> > warning. By default compression mode is disabled.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> Thank you very much Suren for doing this work. This is a very
> significant improvement for the fleet users.
>
> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Thank you for the reviews and I'm glad it's useful for others as well!
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references
2024-10-23 21:09 ` Suren Baghdasaryan
@ 2024-10-24 16:25 ` Suren Baghdasaryan
0 siblings, 0 replies; 17+ messages in thread
From: Suren Baghdasaryan @ 2024-10-24 16:25 UTC (permalink / raw)
To: Andrew Morton
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, urezki, hch, petr.pavlu,
samitolvanen, da.gomez, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim,
rientjes, minchan, kaleshsingh, linux-doc, linux-kernel,
linux-arch, linux-mm, maple-tree, linux-modules, kernel-team
On Wed, Oct 23, 2024 at 2:09 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Oct 23, 2024 at 2:00 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Wed, 23 Oct 2024 10:07:58 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > > To simplify later changes to page tag references, introduce new
> > > pgtag_ref_handle type. This allows easy replacement of page_ext
> > > as a storage of page allocation tags.
> > >
> > > ...
> > >
> > > static inline void pgalloc_tag_copy(struct folio *new, struct folio *old)
> > > {
> > > + union pgtag_ref_handle handle;
> > > + union codetag_ref ref;
> > > struct alloc_tag *tag;
> > > - union codetag_ref *ref;
> > >
> > > tag = pgalloc_tag_get(&old->page);
> > > if (!tag)
> > > return;
> > >
> > > - ref = get_page_tag_ref(&new->page);
> > > - if (!ref)
> > > + if (!get_page_tag_ref(&new->page, &ref, &handle))
> > > return;
> > >
> > > /* Clear the old ref to the original allocation tag. */
> > > clear_page_tag_ref(&old->page);
> > > /* Decrement the counters of the tag on get_new_folio. */
> > > - alloc_tag_sub(ref, folio_nr_pages(new));
> > > -
> > > - __alloc_tag_ref_set(ref, tag);
> > > -
> > > - put_page_tag_ref(ref);
> > > + alloc_tag_sub(&ref, folio_nr_pages(new));
> >
> > mm-stable has folio_size(new) here, fixed up.
>
> Oh, right. You merged that patch tonight and I formatted my patchset
> yesterday :)
> Thanks for the fixup.
>
> >
> > I think we aleady discussed this, but there's a crazy amount of
> > inlining here. pgalloc_tag_split() is huge, and has four callsites.
>
> I must have missed that discussion but I am happy to unline this
> function. I think splitting is heavy enough operation that this
> uninlining would not have be noticeable.
Posted requested uninlining at
https://lore.kernel.org/all/20241024162318.1640781-1-surenb@google.com/
> Thanks!
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-10-24 16:25 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 17:07 [PATCH v4 0/6] page allocation tag compression Suren Baghdasaryan
2024-10-23 17:07 ` [PATCH v4 1/6] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
2024-10-23 17:24 ` Pasha Tatashin
2024-10-23 17:07 ` [PATCH v4 2/6] alloc_tag: introduce shutdown_mem_profiling helper function Suren Baghdasaryan
2024-10-23 17:26 ` Pasha Tatashin
2024-10-23 17:07 ` [PATCH v4 3/6] alloc_tag: load module tags into separate contiguous memory Suren Baghdasaryan
2024-10-23 18:05 ` Pasha Tatashin
2024-10-23 17:07 ` [PATCH v4 4/6] alloc_tag: populate memory for module tags as needed Suren Baghdasaryan
2024-10-23 18:28 ` Pasha Tatashin
2024-10-23 17:07 ` [PATCH v4 5/6] alloc_tag: introduce pgtag_ref_handle to abstract page tag references Suren Baghdasaryan
2024-10-23 17:35 ` Pasha Tatashin
2024-10-23 21:00 ` Andrew Morton
2024-10-23 21:09 ` Suren Baghdasaryan
2024-10-24 16:25 ` Suren Baghdasaryan
2024-10-23 17:07 ` [PATCH v4 6/6] alloc_tag: support for page allocation tag compression Suren Baghdasaryan
2024-10-23 18:29 ` Pasha Tatashin
2024-10-24 3:48 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).