* [PATCH 0/5] page allocation tag compression
@ 2024-08-19 15:15 Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 1/5] alloc_tag: load module tags into separate continuous memory Suren Baghdasaryan
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2024-08-19 15:15 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
This patchset implements several improvements:
1. Gracefully handles module unloading while there are used allocations
allocated from that module;
2. Provides an option to reduce memory overhead from storing page
allocation references by indexing allocation tags;
3. Provides an option to store page allocation tag references in the
page flags, removing dependency on page extensions and eliminating the
memory overhead from storing page allocation references (~0.2% of total
system memory).
4. Improves page allocation performance when CONFIG_MEM_ALLOC_PROFILING
is enabled by eliminating page extension lookup. Page allocation
performance overhead is reduced from 14% to 5.5%.
Patch #1 copies module tags into virtually contiguous memory which
serves two purposes:
- Lets us deal with the situation when module is unloaded while there
are still live allocations from that module. Since we are using a copy
version of the tags we can safely unload the module. Space and gaps in
this contiguous memory are managed using a maple tree.
- Enables simple indexing of the tags in the later patches.
Preallocated virtually contiguous memory size can be configured using
max_module_alloc_tags kernel parameter.
Patch #2 is a code cleanup to simplify later changes.
Patch #3 abstracts page allocation tag reference to simplify later
changes.
Patch #4 lets us control page allocation tag reference sizes and
introduces tag indexing.
Patch #5 adds a config to store page allocation tag references inside
page flags if they fit.
Patchset applies to mm-unstable.
Suren Baghdasaryan (5):
alloc_tag: load module tags into separate continuous memory
alloc_tag: eliminate alloc_tag_ref_set
alloc_tag: introduce pgalloc_tag_ref to abstract page tag references
alloc_tag: make page allocation tag reference size configurable
alloc_tag: config to store page allocation tag refs in page flags
.../admin-guide/kernel-parameters.txt | 4 +
include/asm-generic/codetag.lds.h | 19 ++
include/linux/alloc_tag.h | 46 ++-
include/linux/codetag.h | 38 ++-
include/linux/mmzone.h | 3 +
include/linux/page-flags-layout.h | 10 +-
include/linux/pgalloc_tag.h | 257 ++++++++++++---
kernel/module/main.c | 67 ++--
lib/Kconfig.debug | 36 ++-
lib/alloc_tag.c | 300 ++++++++++++++++--
lib/codetag.c | 105 +++++-
mm/mm_init.c | 1 +
mm/page_ext.c | 2 +-
scripts/module.lds.S | 5 +-
14 files changed, 759 insertions(+), 134 deletions(-)
base-commit: 651c8c1d735983040bec4f71d0e2e690f3c0fc2d
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/5] alloc_tag: load module tags into separate continuous memory
2024-08-19 15:15 [PATCH 0/5] page allocation tag compression Suren Baghdasaryan
@ 2024-08-19 15:15 ` Suren Baghdasaryan
2024-08-20 7:13 ` Mike Rapoport
2024-08-20 15:31 ` Liam R. Howlett
2024-08-19 15:15 ` [PATCH 2/5] alloc_tag: eliminate alloc_tag_ref_set Suren Baghdasaryan
` (3 subsequent siblings)
4 siblings, 2 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2024-08-19 15:15 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
When a module gets unloaded there is a possibility that some of the
allocations it made are still used and therefore the allocation tags
corresponding to these allocations are still referenced. As such, the
memory for these tags can't be freed. This is currently handled as an
abnormal situation and module's data section is not being unloaded.
To handle this situation without keeping module's data in memory,
allow codetags with longer lifespan than the module to be loaded into
their own separate memory. The in-use memory areas and gaps after
module unloading in this separate memory are tracked using maple trees.
Allocation tags arrange their separate memory so that it is virtually
contiguous and that will allow simple allocation tag indexing later on
in this patchset. The size of this virtually contiguous memory is set
to store up to 100000 allocation tags and max_module_alloc_tags kernel
parameter is introduced to change this size.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
.../admin-guide/kernel-parameters.txt | 4 +
include/asm-generic/codetag.lds.h | 19 ++
include/linux/alloc_tag.h | 13 +-
include/linux/codetag.h | 35 ++-
kernel/module/main.c | 67 +++--
lib/alloc_tag.c | 245 ++++++++++++++++--
lib/codetag.c | 101 +++++++-
scripts/module.lds.S | 5 +-
8 files changed, 429 insertions(+), 60 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d0d141d50638..17f9f811a9c0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3205,6 +3205,10 @@
devices can be requested on-demand with the
/dev/loop-control interface.
+
+ max_module_alloc_tags= [KNL] Max supported number of allocation tags
+ in modules.
+
mce [X86-32] Machine Check Exception
mce=option [X86-64] See Documentation/arch/x86/x86_64/boot-options.rst
diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
index 64f536b80380..372c320c5043 100644
--- a/include/asm-generic/codetag.lds.h
+++ b/include/asm-generic/codetag.lds.h
@@ -11,4 +11,23 @@
#define CODETAG_SECTIONS() \
SECTION_WITH_BOUNDARIES(alloc_tags)
+/*
+ * Module codetags which aren't used after module unload, therefore have the
+ * same lifespan as the module and can be safely unloaded with the module.
+ */
+#define MOD_CODETAG_SECTIONS()
+
+#define MOD_SEPARATE_CODETAG_SECTION(_name) \
+ .codetag.##_name : { \
+ SECTION_WITH_BOUNDARIES(_name) \
+ }
+
+/*
+ * For codetags which might be used after module unload, therefore might stay
+ * longer in memory. Each such codetag type has its own section so that we can
+ * unload them individually once unused.
+ */
+#define MOD_SEPARATE_CODETAG_SECTIONS() \
+ MOD_SEPARATE_CODETAG_SECTION(alloc_tags)
+
#endif /* __ASM_GENERIC_CODETAG_LDS_H */
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 8c61ccd161ba..99cbc7f086ad 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,6 +30,13 @@ struct alloc_tag {
struct alloc_tag_counters __percpu *counters;
} __aligned(8);
+struct alloc_tag_module_section {
+ unsigned long start_addr;
+ unsigned long end_addr;
+ /* used size */
+ unsigned long size;
+};
+
#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
#define CODETAG_EMPTY ((void *)1)
@@ -54,6 +61,8 @@ static inline void set_codetag_empty(union codetag_ref *ref) {}
#ifdef CONFIG_MEM_ALLOC_PROFILING
+#define ALLOC_TAG_SECTION_NAME "alloc_tags"
+
struct codetag_bytes {
struct codetag *ct;
s64 bytes;
@@ -76,7 +85,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
#define DEFINE_ALLOC_TAG(_alloc_tag) \
static struct alloc_tag _alloc_tag __used __aligned(8) \
- __section("alloc_tags") = { \
+ __section(ALLOC_TAG_SECTION_NAME) = { \
.ct = CODE_TAG_INIT, \
.counters = &_shared_alloc_tag };
@@ -85,7 +94,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
#define DEFINE_ALLOC_TAG(_alloc_tag) \
static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr); \
static struct alloc_tag _alloc_tag __used __aligned(8) \
- __section("alloc_tags") = { \
+ __section(ALLOC_TAG_SECTION_NAME) = { \
.ct = CODE_TAG_INIT, \
.counters = &_alloc_tag_cntr };
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index c2a579ccd455..c4a3dd60205e 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -35,8 +35,13 @@ struct codetag_type_desc {
size_t tag_size;
void (*module_load)(struct codetag_type *cttype,
struct codetag_module *cmod);
- bool (*module_unload)(struct codetag_type *cttype,
+ void (*module_unload)(struct codetag_type *cttype,
struct codetag_module *cmod);
+ void (*module_replaced)(struct module *mod, struct module *new_mod);
+ bool (*needs_section_mem)(struct module *mod, unsigned long size);
+ void *(*alloc_section_mem)(struct module *mod, unsigned long size,
+ unsigned int prepend, unsigned long align);
+ void (*free_section_mem)(struct module *mod, bool unused);
};
struct codetag_iterator {
@@ -71,11 +76,31 @@ struct codetag_type *
codetag_register_type(const struct codetag_type_desc *desc);
#if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
+
+bool codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size);
+void *codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align);
+void codetag_free_module_sections(struct module *mod);
+void codetag_module_replaced(struct module *mod, struct module *new_mod);
void codetag_load_module(struct module *mod);
-bool codetag_unload_module(struct module *mod);
-#else
+void codetag_unload_module(struct module *mod);
+
+#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
+
+static inline bool
+codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size) { return false; }
+static inline void *
+codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align) { return NULL; }
+static inline void codetag_free_module_sections(struct module *mod) {}
+static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
static inline void codetag_load_module(struct module *mod) {}
-static inline bool codetag_unload_module(struct module *mod) { return true; }
-#endif
+static inline void codetag_unload_module(struct module *mod) {}
+
+#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
#endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 71396e297499..d195d788835c 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1225,18 +1225,12 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
return 0;
}
-static void module_memory_free(struct module *mod, enum mod_mem_type type,
- bool unload_codetags)
+static void module_memory_free(struct module *mod, enum mod_mem_type type)
{
- void *ptr = mod->mem[type].base;
-
- if (!unload_codetags && mod_mem_type_is_core_data(type))
- return;
-
- execmem_free(ptr);
+ execmem_free(mod->mem[type].base);
}
-static void free_mod_mem(struct module *mod, bool unload_codetags)
+static void free_mod_mem(struct module *mod)
{
for_each_mod_mem_type(type) {
struct module_memory *mod_mem = &mod->mem[type];
@@ -1247,25 +1241,20 @@ static void free_mod_mem(struct module *mod, bool unload_codetags)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
- module_memory_free(mod, type, unload_codetags);
+ module_memory_free(mod, type);
}
/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
- module_memory_free(mod, MOD_DATA, unload_codetags);
+ module_memory_free(mod, MOD_DATA);
}
/* Free a module, remove from lists, etc. */
static void free_module(struct module *mod)
{
- bool unload_codetags;
-
trace_module_free(mod);
- unload_codetags = codetag_unload_module(mod);
- if (!unload_codetags)
- pr_warn("%s: memory allocation(s) from the module still alive, cannot unload cleanly\n",
- mod->name);
+ codetag_unload_module(mod);
mod_sysfs_teardown(mod);
@@ -1308,7 +1297,7 @@ static void free_module(struct module *mod)
kfree(mod->args);
percpu_modfree(mod);
- free_mod_mem(mod, unload_codetags);
+ free_mod_mem(mod);
}
void *__symbol_get(const char *symbol)
@@ -1573,6 +1562,14 @@ static void __layout_sections(struct module *mod, struct load_info *info, bool i
if (WARN_ON_ONCE(type == MOD_INVALID))
continue;
+ /*
+ * Do not allocate codetag memory as we load it into
+ * preallocated contiguous memory. s->sh_entsize will
+ * not be used for this section, so leave it as is.
+ */
+ if (codetag_needs_module_section(mod, sname, s->sh_size))
+ continue;
+
s->sh_entsize = module_get_offset_and_type(mod, type, s, i);
pr_debug("\t%s\n", sname);
}
@@ -2247,6 +2244,7 @@ static int move_module(struct module *mod, struct load_info *info)
int i;
enum mod_mem_type t = 0;
int ret = -ENOMEM;
+ bool codetag_section_found = false;
for_each_mod_mem_type(type) {
if (!mod->mem[type].size) {
@@ -2257,7 +2255,7 @@ static int move_module(struct module *mod, struct load_info *info)
ret = module_memory_alloc(mod, type);
if (ret) {
t = type;
- goto out_enomem;
+ goto out_err;
}
}
@@ -2267,11 +2265,27 @@ static int move_module(struct module *mod, struct load_info *info)
void *dest;
Elf_Shdr *shdr = &info->sechdrs[i];
enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
+ const char *sname;
if (!(shdr->sh_flags & SHF_ALLOC))
continue;
- dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
+ sname = info->secstrings + shdr->sh_name;
+ /*
+ * Load codetag sections separately as they might still be used
+ * after module unload.
+ */
+ if (codetag_needs_module_section(mod, sname, shdr->sh_size)) {
+ dest = codetag_alloc_module_section(mod, sname, shdr->sh_size,
+ arch_mod_section_prepend(mod, i), shdr->sh_addralign);
+ if (IS_ERR(dest)) {
+ ret = PTR_ERR(dest);
+ goto out_err;
+ }
+ codetag_section_found = true;
+ } else {
+ dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
+ }
if (shdr->sh_type != SHT_NOBITS) {
/*
@@ -2283,7 +2297,7 @@ static int move_module(struct module *mod, struct load_info *info)
if (i == info->index.mod &&
(WARN_ON_ONCE(shdr->sh_size != sizeof(struct module)))) {
ret = -ENOEXEC;
- goto out_enomem;
+ goto out_err;
}
memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
}
@@ -2299,9 +2313,12 @@ static int move_module(struct module *mod, struct load_info *info)
}
return 0;
-out_enomem:
+out_err:
for (t--; t >= 0; t--)
- module_memory_free(mod, t, true);
+ module_memory_free(mod, t);
+ if (codetag_section_found)
+ codetag_free_module_sections(mod);
+
return ret;
}
@@ -2422,6 +2439,8 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
/* Module has been copied to its final place now: return it. */
mod = (void *)info->sechdrs[info->index.mod].sh_addr;
kmemleak_load_module(mod, info);
+ codetag_module_replaced(info->mod, mod);
+
return mod;
}
@@ -2431,7 +2450,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
percpu_modfree(mod);
module_arch_freeing_init(mod);
- free_mod_mem(mod, true);
+ free_mod_mem(mod);
}
int __weak module_finalize(const Elf_Ehdr *hdr,
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 81e5f9a70f22..f33784f48dd2 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/alloc_tag.h>
+#include <linux/execmem.h>
#include <linux/fs.h>
#include <linux/gfp.h>
#include <linux/module.h>
@@ -9,6 +10,12 @@
#include <linux/seq_file.h>
static struct codetag_type *alloc_tag_cttype;
+static struct alloc_tag_module_section module_tags;
+static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
+/* A dummy object used to indicate an unloaded module */
+static struct module unloaded_mod;
+/* A dummy object used to indicate a module prepended area */
+static struct module prepend_mod;
DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
EXPORT_SYMBOL(_shared_alloc_tag);
@@ -149,29 +156,198 @@ static void __init procfs_init(void)
proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
}
-static bool alloc_tag_module_unload(struct codetag_type *cttype,
- struct codetag_module *cmod)
+static bool needs_section_mem(struct module *mod, unsigned long size)
{
- struct codetag_iterator iter = codetag_get_ct_iter(cttype);
- struct alloc_tag_counters counter;
- bool module_unused = true;
- struct alloc_tag *tag;
- struct codetag *ct;
+ return size >= sizeof(struct alloc_tag);
+}
+
+/* Called under RCU read lock */
+static void clean_unused_module_areas(void)
+{
+ MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
+ struct module *val;
+
+ mas_for_each(&mas, val, module_tags.size) {
+ if (val == &unloaded_mod) {
+ struct alloc_tag *tag;
+ struct alloc_tag *last;
+ bool unused = true;
+
+ tag = (struct alloc_tag *)(module_tags.start_addr + mas.index);
+ last = (struct alloc_tag *)(module_tags.start_addr + mas.last);
+ while (tag <= last) {
+ struct alloc_tag_counters counter;
+
+ counter = alloc_tag_read(tag);
+ if (counter.bytes) {
+ unused = false;
+ break;
+ }
+ tag++;
+ }
+ if (unused) {
+ mtree_store_range(&mod_area_mt, mas.index,
+ mas.last, NULL, GFP_KERNEL);
+ }
+ }
+ }
+}
+
+static void *reserve_module_tags(struct module *mod, unsigned long size,
+ unsigned int prepend, unsigned long align)
+{
+ unsigned long section_size = module_tags.end_addr - module_tags.start_addr;
+ MA_STATE(mas, &mod_area_mt, 0, section_size - 1);
+ bool cleanup_done = false;
+ unsigned long offset;
+ void *ret;
+
+ /* If no tags return NULL */
+ if (size < sizeof(struct alloc_tag))
+ return NULL;
+
+ /*
+ * align is always power of 2, so we can use IS_ALIGNED and ALIGN.
+ * align 0 or 1 means no alignment, to simplify set to 1.
+ */
+ if (!align)
+ align = 1;
+
+ rcu_read_lock();
+repeat:
+ /* Try finding exact size and hope the start is aligned */
+ if (mas_empty_area(&mas, 0, section_size - 1, prepend + size))
+ goto cleanup;
+
+ if (IS_ALIGNED(mas.index + prepend, align))
+ goto found;
+
+ /* Try finding larger area to align later */
+ mas_reset(&mas);
+ if (!mas_empty_area(&mas, 0, section_size - 1,
+ size + prepend + align - 1))
+ goto found;
+
+cleanup:
+ /* No free area, try cleanup stale data and repeat the search once */
+ if (!cleanup_done) {
+ clean_unused_module_areas();
+ cleanup_done = true;
+ mas_reset(&mas);
+ goto repeat;
+ } else {
+ ret = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+found:
+ offset = mas.index;
+ offset += prepend;
+ offset = ALIGN(offset, align);
+
+ if (mtree_insert_range(&mod_area_mt, offset, offset + size - 1, mod,
+ GFP_KERNEL)) {
+ ret = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+ if (offset != mas.index) {
+ if (mtree_insert_range(&mod_area_mt, mas.index, offset - 1,
+ &prepend_mod, GFP_KERNEL)) {
+ mtree_store_range(&mod_area_mt, offset, offset + size - 1,
+ NULL, GFP_KERNEL);
+ ret = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+ }
+
+ if (module_tags.size < offset + size)
+ module_tags.size = offset + size;
+
+ ret = (struct alloc_tag *)(module_tags.start_addr + offset);
+out:
+ rcu_read_unlock();
+
+ return ret;
+}
+
+static void release_module_tags(struct module *mod, bool unused)
+{
+ MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
+ unsigned long padding_start;
+ bool padding_found = false;
+ struct module *val;
+
+ if (unused)
+ return;
+
+ unused = true;
+ rcu_read_lock();
+ mas_for_each(&mas, val, module_tags.size) {
+ struct alloc_tag *tag;
+ struct alloc_tag *last;
+
+ if (val == &prepend_mod) {
+ padding_start = mas.index;
+ padding_found = true;
+ continue;
+ }
- for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
- if (iter.cmod != cmod)
+ if (val != mod) {
+ padding_found = false;
continue;
+ }
+
+ tag = (struct alloc_tag *)(module_tags.start_addr + mas.index);
+ last = (struct alloc_tag *)(module_tags.start_addr + mas.last);
+ while (tag <= last) {
+ struct alloc_tag_counters counter;
+
+ counter = alloc_tag_read(tag);
+ if (counter.bytes) {
+ struct codetag *ct = &tag->ct;
- tag = ct_to_alloc_tag(ct);
- counter = alloc_tag_read(tag);
+ pr_info("%s:%u module %s func:%s has %llu allocated at module unload\n",
+ ct->filename, ct->lineno, ct->modname,
+ ct->function, counter.bytes);
+ unused = false;
+ break;
+ }
+ tag++;
+ }
+ if (unused) {
+ mtree_store_range(&mod_area_mt,
+ padding_found ? padding_start : mas.index,
+ mas.last, NULL, GFP_KERNEL);
+ } else {
+ /* Release the padding and mark the module unloaded */
+ if (padding_found)
+ mtree_store_range(&mod_area_mt, padding_start,
+ mas.index - 1, NULL, GFP_KERNEL);
+ mtree_store_range(&mod_area_mt, mas.index, mas.last,
+ &unloaded_mod, GFP_KERNEL);
+ }
- if (WARN(counter.bytes,
- "%s:%u module %s func:%s has %llu allocated at module unload",
- ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
- module_unused = false;
+ break;
}
+ rcu_read_unlock();
+}
+
+static void replace_module(struct module *mod, struct module *new_mod)
+{
+ MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
+ struct module *val;
- return module_unused;
+ rcu_read_lock();
+ mas_for_each(&mas, val, module_tags.size) {
+ if (val != mod)
+ continue;
+
+ mtree_store_range(&mod_area_mt, mas.index, mas.last,
+ new_mod, GFP_KERNEL);
+ break;
+ }
+ rcu_read_unlock();
}
#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
@@ -252,17 +428,46 @@ static void __init sysctl_init(void)
static inline void sysctl_init(void) {}
#endif /* CONFIG_SYSCTL */
+static unsigned long max_module_alloc_tags __initdata = 100000;
+
+static int __init set_max_module_alloc_tags(char *arg)
+{
+ if (!arg)
+ return -EINVAL;
+
+ return kstrtoul(arg, 0, &max_module_alloc_tags);
+}
+early_param("max_module_alloc_tags", set_max_module_alloc_tags);
+
static int __init alloc_tag_init(void)
{
const struct codetag_type_desc desc = {
- .section = "alloc_tags",
- .tag_size = sizeof(struct alloc_tag),
- .module_unload = alloc_tag_module_unload,
+ .section = ALLOC_TAG_SECTION_NAME,
+ .tag_size = sizeof(struct alloc_tag),
+ .needs_section_mem = needs_section_mem,
+ .alloc_section_mem = reserve_module_tags,
+ .free_section_mem = release_module_tags,
+ .module_replaced = replace_module,
};
+ unsigned long module_tags_mem_sz;
+ module_tags_mem_sz = max_module_alloc_tags * sizeof(struct alloc_tag);
+ pr_info("%lu bytes reserved for module allocation tags\n", module_tags_mem_sz);
+
+ /* Allocate space to copy allocation tags */
+ module_tags.start_addr = (unsigned long)execmem_alloc(EXECMEM_MODULE_DATA,
+ module_tags_mem_sz);
+ if (!module_tags.start_addr)
+ return -ENOMEM;
+
+ module_tags.end_addr = module_tags.start_addr + module_tags_mem_sz;
+ mt_set_in_rcu(&mod_area_mt);
alloc_tag_cttype = codetag_register_type(&desc);
- if (IS_ERR(alloc_tag_cttype))
+ if (IS_ERR(alloc_tag_cttype)) {
+ execmem_free((void *)module_tags.start_addr);
+ module_tags.start_addr = 0;
return PTR_ERR(alloc_tag_cttype);
+ }
sysctl_init();
procfs_init();
diff --git a/lib/codetag.c b/lib/codetag.c
index 5ace625f2328..d602a81bdc03 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -126,6 +126,7 @@ static inline size_t range_size(const struct codetag_type *cttype,
}
#ifdef CONFIG_MODULES
+
static void *get_symbol(struct module *mod, const char *prefix, const char *name)
{
DECLARE_SEQ_BUF(sb, KSYM_NAME_LEN);
@@ -155,6 +156,94 @@ static struct codetag_range get_section_range(struct module *mod,
};
}
+#define CODETAG_SECTION_PREFIX ".codetag."
+
+/* Some codetag types need a separate module section */
+bool codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size)
+{
+ const char *type_name;
+ struct codetag_type *cttype;
+ bool ret = false;
+
+ if (strncmp(name, CODETAG_SECTION_PREFIX, strlen(CODETAG_SECTION_PREFIX)))
+ return false;
+
+ type_name = name + strlen(CODETAG_SECTION_PREFIX);
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (strcmp(type_name, cttype->desc.section) == 0) {
+ if (!cttype->desc.needs_section_mem)
+ break;
+
+ down_write(&cttype->mod_lock);
+ ret = cttype->desc.needs_section_mem(mod, size);
+ up_write(&cttype->mod_lock);
+ break;
+ }
+ }
+ mutex_unlock(&codetag_lock);
+
+ return ret;
+}
+
+void *codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align)
+{
+ const char *type_name = name + strlen(CODETAG_SECTION_PREFIX);
+ struct codetag_type *cttype;
+ void *ret = NULL;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (strcmp(type_name, cttype->desc.section) == 0) {
+ if (WARN_ON(!cttype->desc.alloc_section_mem))
+ break;
+
+ down_write(&cttype->mod_lock);
+ ret = cttype->desc.alloc_section_mem(mod, size, prepend, align);
+ up_write(&cttype->mod_lock);
+ break;
+ }
+ }
+ mutex_unlock(&codetag_lock);
+
+ return ret;
+}
+
+void codetag_free_module_sections(struct module *mod)
+{
+ struct codetag_type *cttype;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (!cttype->desc.free_section_mem)
+ continue;
+
+ down_write(&cttype->mod_lock);
+ cttype->desc.free_section_mem(mod, true);
+ up_write(&cttype->mod_lock);
+ }
+ mutex_unlock(&codetag_lock);
+}
+
+void codetag_module_replaced(struct module *mod, struct module *new_mod)
+{
+ struct codetag_type *cttype;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (!cttype->desc.module_replaced)
+ continue;
+
+ down_write(&cttype->mod_lock);
+ cttype->desc.module_replaced(mod, new_mod);
+ up_write(&cttype->mod_lock);
+ }
+ mutex_unlock(&codetag_lock);
+}
+
static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
{
struct codetag_range range;
@@ -212,13 +301,12 @@ void codetag_load_module(struct module *mod)
mutex_unlock(&codetag_lock);
}
-bool codetag_unload_module(struct module *mod)
+void codetag_unload_module(struct module *mod)
{
struct codetag_type *cttype;
- bool unload_ok = true;
if (!mod)
- return true;
+ return;
mutex_lock(&codetag_lock);
list_for_each_entry(cttype, &codetag_types, link) {
@@ -235,18 +323,17 @@ bool codetag_unload_module(struct module *mod)
}
if (found) {
if (cttype->desc.module_unload)
- if (!cttype->desc.module_unload(cttype, cmod))
- unload_ok = false;
+ cttype->desc.module_unload(cttype, cmod);
cttype->count -= range_size(cttype, &cmod->range);
idr_remove(&cttype->mod_idr, mod_id);
kfree(cmod);
}
up_write(&cttype->mod_lock);
+ if (found && cttype->desc.free_section_mem)
+ cttype->desc.free_section_mem(mod, false);
}
mutex_unlock(&codetag_lock);
-
- return unload_ok;
}
#else /* CONFIG_MODULES */
diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index 3f43edef813c..711c6e029936 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -50,7 +50,7 @@ SECTIONS {
.data : {
*(.data .data.[0-9a-zA-Z_]*)
*(.data..L*)
- CODETAG_SECTIONS()
+ MOD_CODETAG_SECTIONS()
}
.rodata : {
@@ -59,9 +59,10 @@ SECTIONS {
}
#else
.data : {
- CODETAG_SECTIONS()
+ MOD_CODETAG_SECTIONS()
}
#endif
+ MOD_SEPARATE_CODETAG_SECTIONS()
}
/* bring in arch-specific sections */
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/5] alloc_tag: eliminate alloc_tag_ref_set
2024-08-19 15:15 [PATCH 0/5] page allocation tag compression Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 1/5] alloc_tag: load module tags into separate continuous memory Suren Baghdasaryan
@ 2024-08-19 15:15 ` Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 3/5] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references Suren Baghdasaryan
` (2 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2024-08-19 15:15 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
To simplify further refactoring, open-code the only two callers
of alloc_tag_ref_set().
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/alloc_tag.h | 25 ++-----------------------
include/linux/pgalloc_tag.h | 12 +++++++++++-
2 files changed, 13 insertions(+), 24 deletions(-)
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 99cbc7f086ad..21e3098220e3 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -143,36 +143,15 @@ static inline void alloc_tag_add_check(union codetag_ref *ref, struct alloc_tag
static inline void alloc_tag_sub_check(union codetag_ref *ref) {}
#endif
-/* Caller should verify both ref and tag to be valid */
-static inline void __alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag)
-{
- ref->ct = &tag->ct;
- /*
- * We need in increment the call counter every time we have a new
- * allocation or when we split a large allocation into smaller ones.
- * Each new reference for every sub-allocation needs to increment call
- * counter because when we free each part the counter will be decremented.
- */
- this_cpu_inc(tag->counters->calls);
-}
-
-static inline void alloc_tag_ref_set(union codetag_ref *ref, struct alloc_tag *tag)
-{
- alloc_tag_add_check(ref, tag);
- if (!ref || !tag)
- return;
-
- __alloc_tag_ref_set(ref, tag);
-}
-
static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag, size_t bytes)
{
alloc_tag_add_check(ref, tag);
if (!ref || !tag)
return;
- __alloc_tag_ref_set(ref, tag);
+ ref->ct = &tag->ct;
this_cpu_add(tag->counters->bytes, bytes);
+ this_cpu_inc(tag->counters->calls);
}
static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 207f0c83c8e9..244a328dff62 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -103,7 +103,17 @@ static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
page_ext = page_ext_next(page_ext);
for (i = 1; i < nr; i++) {
/* Set new reference to point to the original tag */
- alloc_tag_ref_set(codetag_ref_from_page_ext(page_ext), tag);
+ ref = codetag_ref_from_page_ext(page_ext);
+ alloc_tag_add_check(ref, tag);
+ if (ref) {
+ ref->ct = &tag->ct;
+ /*
+ * We need in increment the call counter every time we split a
+ * large allocation into smaller ones because when we free each
+ * part the counter will be decremented.
+ */
+ this_cpu_inc(tag->counters->calls);
+ }
page_ext = page_ext_next(page_ext);
}
out:
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/5] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references
2024-08-19 15:15 [PATCH 0/5] page allocation tag compression Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 1/5] alloc_tag: load module tags into separate continuous memory Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 2/5] alloc_tag: eliminate alloc_tag_ref_set Suren Baghdasaryan
@ 2024-08-19 15:15 ` Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 4/5] alloc_tag: make page allocation tag reference size configurable Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 5/5] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
4 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2024-08-19 15:15 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
To simplify later changes to page tag references, introduce new
pgalloc_tag_ref and pgtag_ref_handle types. This allows easy
replacement of page_ext as a storage of page allocation tags
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/pgalloc_tag.h | 144 +++++++++++++++++++++++-------------
lib/alloc_tag.c | 3 +-
2 files changed, 95 insertions(+), 52 deletions(-)
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 244a328dff62..c76b629d0206 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -9,48 +9,76 @@
#ifdef CONFIG_MEM_ALLOC_PROFILING
+typedef union codetag_ref pgalloc_tag_ref;
+
+static inline void read_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+ ref->ct = pgref->ct;
+}
+
+static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+ pgref->ct = ref->ct;
+}
#include <linux/page_ext.h>
extern struct page_ext_operations page_alloc_tagging_ops;
-static inline union codetag_ref *codetag_ref_from_page_ext(struct page_ext *page_ext)
+static inline pgalloc_tag_ref *pgref_from_page_ext(struct page_ext *page_ext)
{
- return (union codetag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
+ return (pgalloc_tag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
}
-static inline struct page_ext *page_ext_from_codetag_ref(union codetag_ref *ref)
+static inline struct page_ext *page_ext_from_pgref(pgalloc_tag_ref *pgref)
{
- return (void *)ref - page_alloc_tagging_ops.offset;
+ return (void *)pgref - page_alloc_tagging_ops.offset;
}
+typedef pgalloc_tag_ref *pgtag_ref_handle;
+
/* Should be called only if mem_alloc_profiling_enabled() */
-static inline union codetag_ref *get_page_tag_ref(struct page *page)
+static inline pgtag_ref_handle get_page_tag_ref(struct page *page, union codetag_ref *ref)
{
if (page) {
struct page_ext *page_ext = page_ext_get(page);
- if (page_ext)
- return codetag_ref_from_page_ext(page_ext);
+ if (page_ext) {
+ pgalloc_tag_ref *pgref = pgref_from_page_ext(page_ext);
+
+ read_pgref(pgref, ref);
+ return pgref;
+ }
}
return NULL;
}
-static inline void put_page_tag_ref(union codetag_ref *ref)
+static inline void put_page_tag_ref(pgtag_ref_handle pgref)
{
- if (WARN_ON(!ref))
+ if (WARN_ON(!pgref))
return;
- page_ext_put(page_ext_from_codetag_ref(ref));
+ page_ext_put(page_ext_from_pgref(pgref));
+}
+
+static inline void update_page_tag_ref(pgtag_ref_handle pgref, union codetag_ref *ref)
+{
+ if (WARN_ON(!pgref || !ref))
+ return;
+
+ write_pgref(pgref, ref);
}
static inline void clear_page_tag_ref(struct page *page)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- if (ref) {
- set_codetag_empty(ref);
- put_page_tag_ref(ref);
+ pgtag_ref_handle handle;
+ union codetag_ref ref;
+
+ handle = get_page_tag_ref(page, &ref);
+ if (handle) {
+ set_codetag_empty(&ref);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -59,11 +87,14 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- if (ref) {
- alloc_tag_add(ref, task->alloc_tag, PAGE_SIZE * nr);
- put_page_tag_ref(ref);
+ pgtag_ref_handle handle;
+ union codetag_ref ref;
+
+ handle = get_page_tag_ref(page, &ref);
+ if (handle) {
+ alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -71,53 +102,58 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
static inline void pgalloc_tag_sub(struct page *page, unsigned int nr)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- if (ref) {
- alloc_tag_sub(ref, PAGE_SIZE * nr);
- put_page_tag_ref(ref);
+ pgtag_ref_handle handle;
+ union codetag_ref ref;
+
+ handle = get_page_tag_ref(page, &ref);
+ if (handle) {
+ alloc_tag_sub(&ref, PAGE_SIZE * nr);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
{
- int i;
- struct page_ext *first_page_ext;
- struct page_ext *page_ext;
- union codetag_ref *ref;
+ pgtag_ref_handle first_pgref;
+ union codetag_ref first_ref;
struct alloc_tag *tag;
+ int i;
if (!mem_alloc_profiling_enabled())
return;
- first_page_ext = page_ext = page_ext_get(page);
- if (unlikely(!page_ext))
+ first_pgref = get_page_tag_ref(page, &first_ref);
+ if (unlikely(!first_pgref))
return;
- ref = codetag_ref_from_page_ext(page_ext);
- if (!ref->ct)
+ if (!first_ref.ct)
goto out;
- tag = ct_to_alloc_tag(ref->ct);
- page_ext = page_ext_next(page_ext);
+ tag = ct_to_alloc_tag(first_ref.ct);
for (i = 1; i < nr; i++) {
- /* Set new reference to point to the original tag */
- ref = codetag_ref_from_page_ext(page_ext);
- alloc_tag_add_check(ref, tag);
- if (ref) {
- ref->ct = &tag->ct;
+ pgtag_ref_handle handle;
+ union codetag_ref ref;
+
+ page++;
+ handle = get_page_tag_ref(page, &ref);
+ if (handle) {
+ /* Set new reference to point to the original tag */
+ alloc_tag_add_check(&ref, tag);
+ ref.ct = &tag->ct;
/*
* We need in increment the call counter every time we split a
* large allocation into smaller ones because when we free each
* part the counter will be decremented.
*/
this_cpu_inc(tag->counters->calls);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
- page_ext = page_ext_next(page_ext);
}
out:
- page_ext_put(first_page_ext);
+ put_page_tag_ref(first_pgref);
}
static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
@@ -125,13 +161,15 @@ static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
struct alloc_tag *tag = NULL;
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- alloc_tag_sub_check(ref);
- if (ref) {
- if (ref->ct)
- tag = ct_to_alloc_tag(ref->ct);
- put_page_tag_ref(ref);
+ pgtag_ref_handle handle;
+ union codetag_ref ref;
+
+ handle = get_page_tag_ref(page, &ref);
+ if (handle) {
+ alloc_tag_sub_check(&ref);
+ if (ref.ct)
+ tag = ct_to_alloc_tag(ref.ct);
+ put_page_tag_ref(handle);
}
}
@@ -146,8 +184,12 @@ static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
#else /* CONFIG_MEM_ALLOC_PROFILING */
-static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
-static inline void put_page_tag_ref(union codetag_ref *ref) {}
+typedef void *pgtag_ref_handle;
+
+static inline pgtag_ref_handle
+get_page_tag_ref(struct page *page, union codetag_ref *ref) { return NULL; }
+static inline void put_page_tag_ref(pgtag_ref_handle handle) {}
+static inline void update_page_tag_ref(pgtag_ref_handle handle, union codetag_ref *ref) {}
static inline void clear_page_tag_ref(struct page *page) {}
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr) {}
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index f33784f48dd2..a1d80d2ef512 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -5,6 +5,7 @@
#include <linux/gfp.h>
#include <linux/module.h>
#include <linux/page_ext.h>
+#include <linux/pgalloc_tag.h>
#include <linux/proc_fs.h>
#include <linux/seq_buf.h>
#include <linux/seq_file.h>
@@ -397,7 +398,7 @@ static __init void init_page_alloc_tagging(void)
}
struct page_ext_operations page_alloc_tagging_ops = {
- .size = sizeof(union codetag_ref),
+ .size = sizeof(pgalloc_tag_ref),
.need = need_page_alloc_tagging,
.init = init_page_alloc_tagging,
};
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/5] alloc_tag: make page allocation tag reference size configurable
2024-08-19 15:15 [PATCH 0/5] page allocation tag compression Suren Baghdasaryan
` (2 preceding siblings ...)
2024-08-19 15:15 ` [PATCH 3/5] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references Suren Baghdasaryan
@ 2024-08-19 15:15 ` Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 5/5] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
4 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2024-08-19 15:15 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
Introduce CONFIG_PGALLOC_TAG_REF_BITS to control the size of the
page allocation tag references. When the size is configured to be
less than a direct pointer, the tags are searched using an index
stored as the tag reference.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/alloc_tag.h | 10 +++++-
include/linux/codetag.h | 3 ++
include/linux/pgalloc_tag.h | 69 +++++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 11 ++++++
lib/alloc_tag.c | 50 ++++++++++++++++++++++++++-
lib/codetag.c | 4 +--
mm/mm_init.c | 1 +
7 files changed, 144 insertions(+), 4 deletions(-)
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 21e3098220e3..b5cf24517333 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,8 +30,16 @@ struct alloc_tag {
struct alloc_tag_counters __percpu *counters;
} __aligned(8);
+struct alloc_tag_kernel_section {
+ struct alloc_tag *first_tag;
+ unsigned long count;
+};
+
struct alloc_tag_module_section {
- unsigned long start_addr;
+ union {
+ unsigned long start_addr;
+ struct alloc_tag *first_tag;
+ };
unsigned long end_addr;
/* used size */
unsigned long size;
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index c4a3dd60205e..dafc59838d87 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -13,6 +13,9 @@ struct codetag_module;
struct seq_buf;
struct module;
+#define CODETAG_SECTION_START_PREFIX "__start_"
+#define CODETAG_SECTION_STOP_PREFIX "__stop_"
+
/*
* An instance of this structure is created in a special ELF section at every
* code location being tagged. At runtime, the special section is treated as
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index c76b629d0206..80b8801cb90b 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -9,7 +9,18 @@
#ifdef CONFIG_MEM_ALLOC_PROFILING
+#if !defined(CONFIG_PGALLOC_TAG_REF_BITS) || CONFIG_PGALLOC_TAG_REF_BITS > 32
+#define PGALLOC_TAG_DIRECT_REF
typedef union codetag_ref pgalloc_tag_ref;
+#else /* !defined(CONFIG_PGALLOC_TAG_REF_BITS) || CONFIG_PGALLOC_TAG_REF_BITS > 32 */
+#if CONFIG_PGALLOC_TAG_REF_BITS > 16
+typedef u32 pgalloc_tag_ref;
+#else
+typedef u16 pgalloc_tag_ref;
+#endif
+#endif /* !defined(CONFIG_PGALLOC_TAG_REF_BITS) || CONFIG_PGALLOC_TAG_REF_BITS > 32 */
+
+#ifdef PGALLOC_TAG_DIRECT_REF
static inline void read_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
{
@@ -20,6 +31,63 @@ static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
{
pgref->ct = ref->ct;
}
+
+static inline void alloc_tag_sec_init(void) {}
+
+#else /* PGALLOC_TAG_DIRECT_REF */
+
+extern struct alloc_tag_kernel_section kernel_tags;
+extern struct alloc_tag_module_section module_tags;
+
+#define CODETAG_ID_NULL 0
+#define CODETAG_ID_EMPTY 1
+#define CODETAG_ID_FIRST 2
+
+static inline void read_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+ pgalloc_tag_ref idx = *pgref;
+
+ switch (idx) {
+ case (CODETAG_ID_NULL):
+ ref->ct = NULL;
+ break;
+ case (CODETAG_ID_EMPTY):
+ set_codetag_empty(ref);
+ break;
+ default:
+ idx -= CODETAG_ID_FIRST;
+ ref->ct = idx < kernel_tags.count ?
+ &kernel_tags.first_tag[idx].ct :
+ &module_tags.first_tag[idx - kernel_tags.count].ct;
+ }
+}
+
+static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+ struct alloc_tag *tag;
+
+ if (!ref->ct) {
+ *pgref = CODETAG_ID_NULL;
+ return;
+ }
+
+ if (is_codetag_empty(ref)) {
+ *pgref = CODETAG_ID_EMPTY;
+ return;
+ }
+
+ tag = ct_to_alloc_tag(ref->ct);
+ if (tag >= kernel_tags.first_tag && tag < kernel_tags.first_tag + kernel_tags.count) {
+ *pgref = CODETAG_ID_FIRST + (tag - kernel_tags.first_tag);
+ return;
+ }
+
+ *pgref = CODETAG_ID_FIRST + kernel_tags.count + (tag - module_tags.first_tag);
+}
+
+void __init alloc_tag_sec_init(void);
+
+#endif /* PGALLOC_TAG_DIRECT_REF */
#include <linux/page_ext.h>
extern struct page_ext_operations page_alloc_tagging_ops;
@@ -197,6 +265,7 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}
static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; }
static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
+static inline void alloc_tag_sec_init(void) {}
#endif /* CONFIG_MEM_ALLOC_PROFILING */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a30c03a66172..253f9c2028da 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1000,6 +1000,17 @@ config MEM_ALLOC_PROFILING_DEBUG
Adds warnings with helpful error messages for memory allocation
profiling.
+config PGALLOC_TAG_REF_BITS
+ int "Number of bits for page allocation tag reference (10-64)"
+ range 10 64
+ default "64"
+ depends on MEM_ALLOC_PROFILING
+ help
+ Number of bits used to encode a page allocation tag reference.
+
+ Smaller number results in less memory overhead but limits the number of
+ allocations which can be tagged (including allocations from modules).
+
source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"
source "lib/Kconfig.kmsan"
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index a1d80d2ef512..d0da206d539e 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -3,6 +3,7 @@
#include <linux/execmem.h>
#include <linux/fs.h>
#include <linux/gfp.h>
+#include <linux/kallsyms.h>
#include <linux/module.h>
#include <linux/page_ext.h>
#include <linux/pgalloc_tag.h>
@@ -11,13 +12,14 @@
#include <linux/seq_file.h>
static struct codetag_type *alloc_tag_cttype;
-static struct alloc_tag_module_section module_tags;
static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
/* A dummy object used to indicate an unloaded module */
static struct module unloaded_mod;
/* A dummy object used to indicate a module prepended area */
static struct module prepend_mod;
+struct alloc_tag_module_section module_tags;
+
DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
EXPORT_SYMBOL(_shared_alloc_tag);
@@ -157,6 +159,33 @@ static void __init procfs_init(void)
proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
}
+#ifndef PGALLOC_TAG_DIRECT_REF
+
+#define SECTION_START(NAME) (CODETAG_SECTION_START_PREFIX NAME)
+#define SECTION_STOP(NAME) (CODETAG_SECTION_STOP_PREFIX NAME)
+
+struct alloc_tag_kernel_section kernel_tags = { NULL, 0 };
+
+void __init alloc_tag_sec_init(void)
+{
+ struct alloc_tag *last_codetag;
+
+ kernel_tags.first_tag = (struct alloc_tag *)kallsyms_lookup_name(
+ SECTION_START(ALLOC_TAG_SECTION_NAME));
+ last_codetag = (struct alloc_tag *)kallsyms_lookup_name(
+ SECTION_STOP(ALLOC_TAG_SECTION_NAME));
+ kernel_tags.count = last_codetag - kernel_tags.first_tag;
+}
+
+static inline unsigned long alloc_tag_align(unsigned long val)
+{
+ if (val % sizeof(struct alloc_tag) == 0)
+ return val;
+ return ((val / sizeof(struct alloc_tag)) + 1) * sizeof(struct alloc_tag);
+}
+
+#endif /* PGALLOC_TAG_DIRECT_REF */
+
static bool needs_section_mem(struct module *mod, unsigned long size)
{
return size >= sizeof(struct alloc_tag);
@@ -214,6 +243,21 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
if (!align)
align = 1;
+#ifndef PGALLOC_TAG_DIRECT_REF
+ /*
+ * If alloc_tag size is not a multiple of required alignment tag
+ * indexing does not work.
+ */
+ if (!IS_ALIGNED(sizeof(struct alloc_tag), align)) {
+ pr_err("%s: alignment %lu is incompatible with allocation tag indexing (CONFIG_PGALLOC_TAG_REF_BITS)",
+ mod->name, align);
+ return ERR_PTR(-EINVAL);
+ }
+
+ /* Ensure prepend consumes multiple of alloc_tag-sized blocks */
+ if (prepend)
+ prepend = alloc_tag_align(prepend);
+#endif /* PGALLOC_TAG_DIRECT_REF */
rcu_read_lock();
repeat:
/* Try finding exact size and hope the start is aligned */
@@ -462,6 +506,10 @@ static int __init alloc_tag_init(void)
return -ENOMEM;
module_tags.end_addr = module_tags.start_addr + module_tags_mem_sz;
+#ifndef PGALLOC_TAG_DIRECT_REF
+ /* Ensure the base is alloc_tag aligned */
+ module_tags.start_addr = alloc_tag_align(module_tags.start_addr);
+#endif
mt_set_in_rcu(&mod_area_mt);
alloc_tag_cttype = codetag_register_type(&desc);
if (IS_ERR(alloc_tag_cttype)) {
diff --git a/lib/codetag.c b/lib/codetag.c
index d602a81bdc03..53585518a103 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -151,8 +151,8 @@ static struct codetag_range get_section_range(struct module *mod,
const char *section)
{
return (struct codetag_range) {
- get_symbol(mod, "__start_", section),
- get_symbol(mod, "__stop_", section),
+ get_symbol(mod, CODETAG_SECTION_START_PREFIX, section),
+ get_symbol(mod, CODETAG_SECTION_STOP_PREFIX, section),
};
}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 4ba5607aaf19..231a95782455 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2650,6 +2650,7 @@ void __init mm_core_init(void)
report_meminit();
kmsan_init_shadow();
stack_depot_early_init();
+ alloc_tag_sec_init();
mem_init();
kmem_cache_init();
/*
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-08-19 15:15 [PATCH 0/5] page allocation tag compression Suren Baghdasaryan
` (3 preceding siblings ...)
2024-08-19 15:15 ` [PATCH 4/5] alloc_tag: make page allocation tag reference size configurable Suren Baghdasaryan
@ 2024-08-19 15:15 ` Suren Baghdasaryan
2024-08-19 19:34 ` Matthew Wilcox
4 siblings, 1 reply; 16+ messages in thread
From: Suren Baghdasaryan @ 2024-08-19 15:15 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
references directly in the page flags. This removes dependency on
page_ext and results in better performance for page allocations as
well as reduced page_ext memory overhead.
CONFIG_PGALLOC_TAG_REF_BITS controls the number of bits required
to be available in the page flags to store the references. If the
number of page flag bits is insufficient, the build will fail and
either CONFIG_PGALLOC_TAG_REF_BITS would have to be lowered or
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS should be disabled.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/mmzone.h | 3 ++
include/linux/page-flags-layout.h | 10 +++++--
include/linux/pgalloc_tag.h | 48 +++++++++++++++++++++++++++++++
lib/Kconfig.debug | 27 +++++++++++++++--
lib/alloc_tag.c | 4 +++
mm/page_ext.c | 2 +-
6 files changed, 89 insertions(+), 5 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 17506e4a2835..0dd2b42f7cb6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1085,6 +1085,7 @@ static inline bool zone_is_empty(struct zone *zone)
#define KASAN_TAG_PGOFF (LAST_CPUPID_PGOFF - KASAN_TAG_WIDTH)
#define LRU_GEN_PGOFF (KASAN_TAG_PGOFF - LRU_GEN_WIDTH)
#define LRU_REFS_PGOFF (LRU_GEN_PGOFF - LRU_REFS_WIDTH)
+#define ALLOC_TAG_REF_PGOFF (LRU_REFS_PGOFF - ALLOC_TAG_REF_WIDTH)
/*
* Define the bit shifts to access each section. For non-existent
@@ -1096,6 +1097,7 @@ static inline bool zone_is_empty(struct zone *zone)
#define ZONES_PGSHIFT (ZONES_PGOFF * (ZONES_WIDTH != 0))
#define LAST_CPUPID_PGSHIFT (LAST_CPUPID_PGOFF * (LAST_CPUPID_WIDTH != 0))
#define KASAN_TAG_PGSHIFT (KASAN_TAG_PGOFF * (KASAN_TAG_WIDTH != 0))
+#define ALLOC_TAG_REF_PGSHIFT (ALLOC_TAG_REF_PGOFF * (ALLOC_TAG_REF_WIDTH != 0))
/* NODE:ZONE or SECTION:ZONE is used to ID a zone for the buddy allocator */
#ifdef NODE_NOT_IN_PAGE_FLAGS
@@ -1116,6 +1118,7 @@ static inline bool zone_is_empty(struct zone *zone)
#define LAST_CPUPID_MASK ((1UL << LAST_CPUPID_SHIFT) - 1)
#define KASAN_TAG_MASK ((1UL << KASAN_TAG_WIDTH) - 1)
#define ZONEID_MASK ((1UL << ZONEID_SHIFT) - 1)
+#define ALLOC_TAG_REF_MASK ((1UL << ALLOC_TAG_REF_WIDTH) - 1)
static inline enum zone_type page_zonenum(const struct page *page)
{
diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h
index 7d79818dc065..21bba7c8c965 100644
--- a/include/linux/page-flags-layout.h
+++ b/include/linux/page-flags-layout.h
@@ -5,6 +5,12 @@
#include <linux/numa.h>
#include <generated/bounds.h>
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+#define ALLOC_TAG_REF_WIDTH CONFIG_PGALLOC_TAG_REF_BITS
+#else
+#define ALLOC_TAG_REF_WIDTH 0
+#endif
+
/*
* When a memory allocation must conform to specific limitations (such
* as being suitable for DMA) the caller will pass in hints to the
@@ -91,7 +97,7 @@
#endif
#if ZONES_WIDTH + LRU_GEN_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + \
- KASAN_TAG_WIDTH + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
+ KASAN_TAG_WIDTH + ALLOC_TAG_REF_WIDTH + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
#define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
#else
#define LAST_CPUPID_WIDTH 0
@@ -102,7 +108,7 @@
#endif
#if ZONES_WIDTH + LRU_GEN_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + \
- KASAN_TAG_WIDTH + LAST_CPUPID_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
+ KASAN_TAG_WIDTH + ALLOC_TAG_REF_WIDTH + LAST_CPUPID_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
#error "Not enough bits in page flags"
#endif
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 80b8801cb90b..da95c09bcdf1 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -88,6 +88,52 @@ static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
void __init alloc_tag_sec_init(void);
#endif /* PGALLOC_TAG_DIRECT_REF */
+
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+
+typedef struct page *pgtag_ref_handle;
+
+/* Should be called only if mem_alloc_profiling_enabled() */
+static inline pgtag_ref_handle get_page_tag_ref(struct page *page,
+ union codetag_ref *ref)
+{
+ if (page) {
+ pgalloc_tag_ref pgref;
+
+ pgref = (page->flags >> ALLOC_TAG_REF_PGSHIFT) & ALLOC_TAG_REF_MASK;
+ read_pgref(&pgref, ref);
+ return page;
+ }
+
+ return NULL;
+}
+
+static inline void put_page_tag_ref(pgtag_ref_handle page)
+{
+ WARN_ON(!page);
+}
+
+static inline void update_page_tag_ref(pgtag_ref_handle page, union codetag_ref *ref)
+{
+ unsigned long old_flags, flags, val;
+ pgalloc_tag_ref pgref;
+
+ if (WARN_ON(!page || !ref))
+ return;
+
+ write_pgref(&pgref, ref);
+ val = (unsigned long)pgref;
+ val = (val & ALLOC_TAG_REF_MASK) << ALLOC_TAG_REF_PGSHIFT;
+ do {
+ old_flags = READ_ONCE(page->flags);
+ flags = old_flags;
+ flags &= ~(ALLOC_TAG_REF_MASK << ALLOC_TAG_REF_PGSHIFT);
+ flags |= val;
+ } while (unlikely(!try_cmpxchg(&page->flags, &old_flags, flags)));
+}
+
+#else /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
#include <linux/page_ext.h>
extern struct page_ext_operations page_alloc_tagging_ops;
@@ -136,6 +182,8 @@ static inline void update_page_tag_ref(pgtag_ref_handle pgref, union codetag_ref
write_pgref(pgref, ref);
}
+#endif /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
static inline void clear_page_tag_ref(struct page *page)
{
if (mem_alloc_profiling_enabled()) {
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 253f9c2028da..9fc8c1981f27 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -979,7 +979,7 @@ config MEM_ALLOC_PROFILING
depends on PROC_FS
depends on !DEBUG_FORCE_WEAK_PER_CPU
select CODE_TAGGING
- select PAGE_EXTENSION
+ select PAGE_EXTENSION if !PGALLOC_TAG_USE_PAGEFLAGS
select SLAB_OBJ_EXT
help
Track allocation source code and record total allocation size
@@ -1000,10 +1000,26 @@ config MEM_ALLOC_PROFILING_DEBUG
Adds warnings with helpful error messages for memory allocation
profiling.
+config PGALLOC_TAG_USE_PAGEFLAGS
+ bool "Use pageflags to encode page allocation tag reference"
+ default n
+ depends on MEM_ALLOC_PROFILING
+ help
+ When set, page allocation tag references are encoded inside page
+ flags, otherwise they are encoded in page extensions.
+
+ Setting this flag reduces memory and performance overhead of memory
+ allocation profiling but also limits how many allocations can be
+ tagged. The number of bits is set by PGALLOC_TAG_USE_PAGEFLAGS and
+ they must fit in the page flags field.
+
+ Say N if unsure.
+
config PGALLOC_TAG_REF_BITS
int "Number of bits for page allocation tag reference (10-64)"
range 10 64
- default "64"
+ default "16" if PGALLOC_TAG_USE_PAGEFLAGS
+ default "64" if !PGALLOC_TAG_USE_PAGEFLAGS
depends on MEM_ALLOC_PROFILING
help
Number of bits used to encode a page allocation tag reference.
@@ -1011,6 +1027,13 @@ config PGALLOC_TAG_REF_BITS
Smaller number results in less memory overhead but limits the number of
allocations which can be tagged (including allocations from modules).
+ If PGALLOC_TAG_USE_PAGEFLAGS is set, the number of requested bits should
+ fit inside the page flags.
+
+ If PGALLOC_TAG_USE_PAGEFLAGS is not set, the number of bits used to store
+ a reference is rounded up to the closest basic type. If set higher than 32,
+ a direct pointer to the allocation tag is stored for performance reasons.
+
source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"
source "lib/Kconfig.kmsan"
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index d0da206d539e..1fbe80e68fdb 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -432,6 +432,8 @@ static int __init setup_early_mem_profiling(char *str)
}
early_param("sysctl.vm.mem_profiling", setup_early_mem_profiling);
+#ifndef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+
static __init bool need_page_alloc_tagging(void)
{
return mem_profiling_support;
@@ -448,6 +450,8 @@ struct page_ext_operations page_alloc_tagging_ops = {
};
EXPORT_SYMBOL(page_alloc_tagging_ops);
+#endif /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
#ifdef CONFIG_SYSCTL
static struct ctl_table memory_allocation_profiling_sysctls[] = {
{
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 641d93f6af4c..5f993c271ee7 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -83,7 +83,7 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT)
&page_idle_ops,
#endif
-#ifdef CONFIG_MEM_ALLOC_PROFILING
+#if defined(CONFIG_MEM_ALLOC_PROFILING) && !defined(CONFIG_PGALLOC_TAG_USE_PAGEFLAGS)
&page_alloc_tagging_ops,
#endif
#ifdef CONFIG_PAGE_TABLE_CHECK
--
2.46.0.184.g6999bdac58-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-08-19 15:15 ` [PATCH 5/5] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
@ 2024-08-19 19:34 ` Matthew Wilcox
2024-08-19 20:39 ` Suren Baghdasaryan
0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2024-08-19 19:34 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, liam.howlett, pasha.tatashin, souravpanda,
keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, Aug 19, 2024 at 08:15:11AM -0700, Suren Baghdasaryan wrote:
> @@ -91,7 +97,7 @@
> #endif
>
> #if ZONES_WIDTH + LRU_GEN_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + \
> - KASAN_TAG_WIDTH + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
> + KASAN_TAG_WIDTH + ALLOC_TAG_REF_WIDTH + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
> #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
> #else
> #define LAST_CPUPID_WIDTH 0
So if ALLOC_TAG_REF_WIDTH is big enough, it's going to force last_cpupid
into struct page. That will misalign struct page and disable HVO --
with no warning!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-08-19 19:34 ` Matthew Wilcox
@ 2024-08-19 20:39 ` Suren Baghdasaryan
2024-08-19 20:40 ` Matthew Wilcox
0 siblings, 1 reply; 16+ messages in thread
From: Suren Baghdasaryan @ 2024-08-19 20:39 UTC (permalink / raw)
To: Matthew Wilcox
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, liam.howlett, pasha.tatashin, souravpanda,
keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, Aug 19, 2024 at 12:34 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Aug 19, 2024 at 08:15:11AM -0700, Suren Baghdasaryan wrote:
> > @@ -91,7 +97,7 @@
> > #endif
> >
> > #if ZONES_WIDTH + LRU_GEN_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + \
> > - KASAN_TAG_WIDTH + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
> > + KASAN_TAG_WIDTH + ALLOC_TAG_REF_WIDTH + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
> > #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
> > #else
> > #define LAST_CPUPID_WIDTH 0
>
> So if ALLOC_TAG_REF_WIDTH is big enough, it's going to force last_cpupid
> into struct page.
Thanks for taking a look!
Yes, but how is this field different from say KASAN_TAG_WIDTH which
can also force last_cpupid out of page flags?
> That will misalign struct page and disable HVO -- with no warning!
mminit_verify_pageflags_layout already has a mminit_dprintk() to
indicate this condition. Is that not enough?
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-08-19 20:39 ` Suren Baghdasaryan
@ 2024-08-19 20:40 ` Matthew Wilcox
2024-08-20 1:46 ` Andrew Morton
0 siblings, 1 reply; 16+ messages in thread
From: Matthew Wilcox @ 2024-08-19 20:40 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, liam.howlett, pasha.tatashin, souravpanda,
keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, Aug 19, 2024 at 01:39:16PM -0700, Suren Baghdasaryan wrote:
> On Mon, Aug 19, 2024 at 12:34 PM Matthew Wilcox <willy@infradead.org> wrote:
> > So if ALLOC_TAG_REF_WIDTH is big enough, it's going to force last_cpupid
> > into struct page.
>
> Thanks for taking a look!
> Yes, but how is this field different from say KASAN_TAG_WIDTH which
> can also force last_cpupid out of page flags?
Because KASAN isn't for production use?
> > That will misalign struct page and disable HVO -- with no warning!
>
> mminit_verify_pageflags_layout already has a mminit_dprintk() to
> indicate this condition. Is that not enough?
Fair.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-08-19 20:40 ` Matthew Wilcox
@ 2024-08-20 1:46 ` Andrew Morton
2024-08-20 2:21 ` Suren Baghdasaryan
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2024-08-20 1:46 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Suren Baghdasaryan, kent.overstreet, corbet, arnd, mcgrof, rppt,
paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
mhocko, hannes, roman.gushchin, dave, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, jhubbard, yuzhao,
vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Mon, 19 Aug 2024 21:40:34 +0100 Matthew Wilcox <willy@infradead.org> wrote:
> On Mon, Aug 19, 2024 at 01:39:16PM -0700, Suren Baghdasaryan wrote:
> > On Mon, Aug 19, 2024 at 12:34 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > So if ALLOC_TAG_REF_WIDTH is big enough, it's going to force last_cpupid
> > > into struct page.
> >
> > Thanks for taking a look!
> > Yes, but how is this field different from say KASAN_TAG_WIDTH which
> > can also force last_cpupid out of page flags?
>
> Because KASAN isn't for production use?
>
> > > That will misalign struct page and disable HVO -- with no warning!
> >
> > mminit_verify_pageflags_layout already has a mminit_dprintk() to
> > indicate this condition. Is that not enough?
>
> Fair.
Is a BUILD_BUG_ON() feasible here?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-08-20 1:46 ` Andrew Morton
@ 2024-08-20 2:21 ` Suren Baghdasaryan
0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2024-08-20 2:21 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox, kent.overstreet, corbet, arnd, mcgrof, rppt,
paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
mhocko, hannes, roman.gushchin, dave, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, jhubbard, yuzhao,
vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Mon, Aug 19, 2024 at 6:46 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 19 Aug 2024 21:40:34 +0100 Matthew Wilcox <willy@infradead.org> wrote:
>
> > On Mon, Aug 19, 2024 at 01:39:16PM -0700, Suren Baghdasaryan wrote:
> > > On Mon, Aug 19, 2024 at 12:34 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > So if ALLOC_TAG_REF_WIDTH is big enough, it's going to force last_cpupid
> > > > into struct page.
> > >
> > > Thanks for taking a look!
> > > Yes, but how is this field different from say KASAN_TAG_WIDTH which
> > > can also force last_cpupid out of page flags?
> >
> > Because KASAN isn't for production use?
> >
> > > > That will misalign struct page and disable HVO -- with no warning!
> > >
> > > mminit_verify_pageflags_layout already has a mminit_dprintk() to
> > > indicate this condition. Is that not enough?
> >
> > Fair.
>
> Is a BUILD_BUG_ON() feasible here?
We could, but I didn't think we should prevent people from having such
a configuration if that's what they need...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/5] alloc_tag: load module tags into separate continuous memory
2024-08-19 15:15 ` [PATCH 1/5] alloc_tag: load module tags into separate continuous memory Suren Baghdasaryan
@ 2024-08-20 7:13 ` Mike Rapoport
2024-08-20 7:26 ` Suren Baghdasaryan
2024-08-20 11:10 ` Kent Overstreet
2024-08-20 15:31 ` Liam R. Howlett
1 sibling, 2 replies; 16+ messages in thread
From: Mike Rapoport @ 2024-08-20 7:13 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, Aug 19, 2024 at 08:15:07AM -0700, Suren Baghdasaryan wrote:
> When a module gets unloaded there is a possibility that some of the
> allocations it made are still used and therefore the allocation tags
> corresponding to these allocations are still referenced. As such, the
> memory for these tags can't be freed. This is currently handled as an
> abnormal situation and module's data section is not being unloaded.
> To handle this situation without keeping module's data in memory,
> allow codetags with longer lifespan than the module to be loaded into
> their own separate memory. The in-use memory areas and gaps after
> module unloading in this separate memory are tracked using maple trees.
> Allocation tags arrange their separate memory so that it is virtually
> contiguous and that will allow simple allocation tag indexing later on
> in this patchset. The size of this virtually contiguous memory is set
> to store up to 100000 allocation tags and max_module_alloc_tags kernel
> parameter is introduced to change this size.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
> .../admin-guide/kernel-parameters.txt | 4 +
> include/asm-generic/codetag.lds.h | 19 ++
> include/linux/alloc_tag.h | 13 +-
> include/linux/codetag.h | 35 ++-
> kernel/module/main.c | 67 +++--
> lib/alloc_tag.c | 245 ++++++++++++++++--
> lib/codetag.c | 101 +++++++-
> scripts/module.lds.S | 5 +-
> 8 files changed, 429 insertions(+), 60 deletions(-)
...
> diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> index c2a579ccd455..c4a3dd60205e 100644
> --- a/include/linux/codetag.h
> +++ b/include/linux/codetag.h
> @@ -35,8 +35,13 @@ struct codetag_type_desc {
> size_t tag_size;
> void (*module_load)(struct codetag_type *cttype,
> struct codetag_module *cmod);
> - bool (*module_unload)(struct codetag_type *cttype,
> + void (*module_unload)(struct codetag_type *cttype,
> struct codetag_module *cmod);
> + void (*module_replaced)(struct module *mod, struct module *new_mod);
> + bool (*needs_section_mem)(struct module *mod, unsigned long size);
> + void *(*alloc_section_mem)(struct module *mod, unsigned long size,
> + unsigned int prepend, unsigned long align);
> + void (*free_section_mem)(struct module *mod, bool unused);
> };
>
> struct codetag_iterator {
> @@ -71,11 +76,31 @@ struct codetag_type *
> codetag_register_type(const struct codetag_type_desc *desc);
>
> #if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
> +
> +bool codetag_needs_module_section(struct module *mod, const char *name,
> + unsigned long size);
> +void *codetag_alloc_module_section(struct module *mod, const char *name,
> + unsigned long size, unsigned int prepend,
> + unsigned long align);
> +void codetag_free_module_sections(struct module *mod);
> +void codetag_module_replaced(struct module *mod, struct module *new_mod);
> void codetag_load_module(struct module *mod);
> -bool codetag_unload_module(struct module *mod);
> -#else
> +void codetag_unload_module(struct module *mod);
> +
> +#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
> +
> +static inline bool
> +codetag_needs_module_section(struct module *mod, const char *name,
> + unsigned long size) { return false; }
> +static inline void *
> +codetag_alloc_module_section(struct module *mod, const char *name,
> + unsigned long size, unsigned int prepend,
> + unsigned long align) { return NULL; }
> +static inline void codetag_free_module_sections(struct module *mod) {}
> +static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
> static inline void codetag_load_module(struct module *mod) {}
> -static inline bool codetag_unload_module(struct module *mod) { return true; }
> -#endif
> +static inline void codetag_unload_module(struct module *mod) {}
> +
> +#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
Maybe I'm missing something, but can't alloc_tag::module_unload() just copy
the tags that cannot be freed somewhere outside of module sections and then
free the module?
The heavy lifting would be localized to alloc_tags rather than spread all
over.
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/5] alloc_tag: load module tags into separate continuous memory
2024-08-20 7:13 ` Mike Rapoport
@ 2024-08-20 7:26 ` Suren Baghdasaryan
2024-08-20 11:10 ` Kent Overstreet
1 sibling, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2024-08-20 7:26 UTC (permalink / raw)
To: Mike Rapoport
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Tue, Aug 20, 2024 at 12:13 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Mon, Aug 19, 2024 at 08:15:07AM -0700, Suren Baghdasaryan wrote:
> > When a module gets unloaded there is a possibility that some of the
> > allocations it made are still used and therefore the allocation tags
> > corresponding to these allocations are still referenced. As such, the
> > memory for these tags can't be freed. This is currently handled as an
> > abnormal situation and module's data section is not being unloaded.
> > To handle this situation without keeping module's data in memory,
> > allow codetags with longer lifespan than the module to be loaded into
> > their own separate memory. The in-use memory areas and gaps after
> > module unloading in this separate memory are tracked using maple trees.
> > Allocation tags arrange their separate memory so that it is virtually
> > contiguous and that will allow simple allocation tag indexing later on
> > in this patchset. The size of this virtually contiguous memory is set
> > to store up to 100000 allocation tags and max_module_alloc_tags kernel
> > parameter is introduced to change this size.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> > .../admin-guide/kernel-parameters.txt | 4 +
> > include/asm-generic/codetag.lds.h | 19 ++
> > include/linux/alloc_tag.h | 13 +-
> > include/linux/codetag.h | 35 ++-
> > kernel/module/main.c | 67 +++--
> > lib/alloc_tag.c | 245 ++++++++++++++++--
> > lib/codetag.c | 101 +++++++-
> > scripts/module.lds.S | 5 +-
> > 8 files changed, 429 insertions(+), 60 deletions(-)
>
> ...
>
> > diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> > index c2a579ccd455..c4a3dd60205e 100644
> > --- a/include/linux/codetag.h
> > +++ b/include/linux/codetag.h
> > @@ -35,8 +35,13 @@ struct codetag_type_desc {
> > size_t tag_size;
> > void (*module_load)(struct codetag_type *cttype,
> > struct codetag_module *cmod);
> > - bool (*module_unload)(struct codetag_type *cttype,
> > + void (*module_unload)(struct codetag_type *cttype,
> > struct codetag_module *cmod);
> > + void (*module_replaced)(struct module *mod, struct module *new_mod);
> > + bool (*needs_section_mem)(struct module *mod, unsigned long size);
> > + void *(*alloc_section_mem)(struct module *mod, unsigned long size,
> > + unsigned int prepend, unsigned long align);
> > + void (*free_section_mem)(struct module *mod, bool unused);
> > };
> >
> > struct codetag_iterator {
> > @@ -71,11 +76,31 @@ struct codetag_type *
> > codetag_register_type(const struct codetag_type_desc *desc);
> >
> > #if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
> > +
> > +bool codetag_needs_module_section(struct module *mod, const char *name,
> > + unsigned long size);
> > +void *codetag_alloc_module_section(struct module *mod, const char *name,
> > + unsigned long size, unsigned int prepend,
> > + unsigned long align);
> > +void codetag_free_module_sections(struct module *mod);
> > +void codetag_module_replaced(struct module *mod, struct module *new_mod);
> > void codetag_load_module(struct module *mod);
> > -bool codetag_unload_module(struct module *mod);
> > -#else
> > +void codetag_unload_module(struct module *mod);
> > +
> > +#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
> > +
> > +static inline bool
> > +codetag_needs_module_section(struct module *mod, const char *name,
> > + unsigned long size) { return false; }
> > +static inline void *
> > +codetag_alloc_module_section(struct module *mod, const char *name,
> > + unsigned long size, unsigned int prepend,
> > + unsigned long align) { return NULL; }
> > +static inline void codetag_free_module_sections(struct module *mod) {}
> > +static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
> > static inline void codetag_load_module(struct module *mod) {}
> > -static inline bool codetag_unload_module(struct module *mod) { return true; }
> > -#endif
> > +static inline void codetag_unload_module(struct module *mod) {}
> > +
> > +#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
>
> Maybe I'm missing something, but can't alloc_tag::module_unload() just copy
> the tags that cannot be freed somewhere outside of module sections and then
> free the module?
>
> The heavy lifting would be localized to alloc_tags rather than spread all
> over.
Hi Mike,
We can't copy those tags because allocations already have references
to them. We would have to find and update those references to point to
the new locations of these tags. That means potentially scanning all
page extensions/pages in the system and updating their tag references
in some race-less fashion. So, quite not trivial.
Thanks,
Suren.
>
> --
> Sincerely yours,
> Mike.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/5] alloc_tag: load module tags into separate continuous memory
2024-08-20 7:13 ` Mike Rapoport
2024-08-20 7:26 ` Suren Baghdasaryan
@ 2024-08-20 11:10 ` Kent Overstreet
1 sibling, 0 replies; 16+ messages in thread
From: Kent Overstreet @ 2024-08-20 11:10 UTC (permalink / raw)
To: Mike Rapoport
Cc: Suren Baghdasaryan, akpm, corbet, arnd, mcgrof, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Tue, Aug 20, 2024 at 10:13:07AM GMT, Mike Rapoport wrote:
> On Mon, Aug 19, 2024 at 08:15:07AM -0700, Suren Baghdasaryan wrote:
> > When a module gets unloaded there is a possibility that some of the
> > allocations it made are still used and therefore the allocation tags
> > corresponding to these allocations are still referenced. As such, the
> > memory for these tags can't be freed. This is currently handled as an
> > abnormal situation and module's data section is not being unloaded.
> > To handle this situation without keeping module's data in memory,
> > allow codetags with longer lifespan than the module to be loaded into
> > their own separate memory. The in-use memory areas and gaps after
> > module unloading in this separate memory are tracked using maple trees.
> > Allocation tags arrange their separate memory so that it is virtually
> > contiguous and that will allow simple allocation tag indexing later on
> > in this patchset. The size of this virtually contiguous memory is set
> > to store up to 100000 allocation tags and max_module_alloc_tags kernel
> > parameter is introduced to change this size.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> > .../admin-guide/kernel-parameters.txt | 4 +
> > include/asm-generic/codetag.lds.h | 19 ++
> > include/linux/alloc_tag.h | 13 +-
> > include/linux/codetag.h | 35 ++-
> > kernel/module/main.c | 67 +++--
> > lib/alloc_tag.c | 245 ++++++++++++++++--
> > lib/codetag.c | 101 +++++++-
> > scripts/module.lds.S | 5 +-
> > 8 files changed, 429 insertions(+), 60 deletions(-)
>
> ...
>
> > diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> > index c2a579ccd455..c4a3dd60205e 100644
> > --- a/include/linux/codetag.h
> > +++ b/include/linux/codetag.h
> > @@ -35,8 +35,13 @@ struct codetag_type_desc {
> > size_t tag_size;
> > void (*module_load)(struct codetag_type *cttype,
> > struct codetag_module *cmod);
> > - bool (*module_unload)(struct codetag_type *cttype,
> > + void (*module_unload)(struct codetag_type *cttype,
> > struct codetag_module *cmod);
> > + void (*module_replaced)(struct module *mod, struct module *new_mod);
> > + bool (*needs_section_mem)(struct module *mod, unsigned long size);
> > + void *(*alloc_section_mem)(struct module *mod, unsigned long size,
> > + unsigned int prepend, unsigned long align);
> > + void (*free_section_mem)(struct module *mod, bool unused);
> > };
> >
> > struct codetag_iterator {
> > @@ -71,11 +76,31 @@ struct codetag_type *
> > codetag_register_type(const struct codetag_type_desc *desc);
> >
> > #if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
> > +
> > +bool codetag_needs_module_section(struct module *mod, const char *name,
> > + unsigned long size);
> > +void *codetag_alloc_module_section(struct module *mod, const char *name,
> > + unsigned long size, unsigned int prepend,
> > + unsigned long align);
> > +void codetag_free_module_sections(struct module *mod);
> > +void codetag_module_replaced(struct module *mod, struct module *new_mod);
> > void codetag_load_module(struct module *mod);
> > -bool codetag_unload_module(struct module *mod);
> > -#else
> > +void codetag_unload_module(struct module *mod);
> > +
> > +#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
> > +
> > +static inline bool
> > +codetag_needs_module_section(struct module *mod, const char *name,
> > + unsigned long size) { return false; }
> > +static inline void *
> > +codetag_alloc_module_section(struct module *mod, const char *name,
> > + unsigned long size, unsigned int prepend,
> > + unsigned long align) { return NULL; }
> > +static inline void codetag_free_module_sections(struct module *mod) {}
> > +static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
> > static inline void codetag_load_module(struct module *mod) {}
> > -static inline bool codetag_unload_module(struct module *mod) { return true; }
> > -#endif
> > +static inline void codetag_unload_module(struct module *mod) {}
> > +
> > +#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
>
> Maybe I'm missing something, but can't alloc_tag::module_unload() just copy
> the tags that cannot be freed somewhere outside of module sections and then
> free the module?
>
> The heavy lifting would be localized to alloc_tags rather than spread all
> over.
The reason they can't be freed is because they're referenced by
slab/page allocatons - can't move them either.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/5] alloc_tag: load module tags into separate continuous memory
2024-08-19 15:15 ` [PATCH 1/5] alloc_tag: load module tags into separate continuous memory Suren Baghdasaryan
2024-08-20 7:13 ` Mike Rapoport
@ 2024-08-20 15:31 ` Liam R. Howlett
2024-08-20 18:35 ` Suren Baghdasaryan
1 sibling, 1 reply; 16+ messages in thread
From: Liam R. Howlett @ 2024-08-20 15:31 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, pasha.tatashin, souravpanda,
keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
* Suren Baghdasaryan <surenb@google.com> [240819 11:15]:
> When a module gets unloaded there is a possibility that some of the
> allocations it made are still used and therefore the allocation tags
> corresponding to these allocations are still referenced. As such, the
> memory for these tags can't be freed. This is currently handled as an
> abnormal situation and module's data section is not being unloaded.
> To handle this situation without keeping module's data in memory,
> allow codetags with longer lifespan than the module to be loaded into
> their own separate memory. The in-use memory areas and gaps after
> module unloading in this separate memory are tracked using maple trees.
> Allocation tags arrange their separate memory so that it is virtually
> contiguous and that will allow simple allocation tag indexing later on
> in this patchset. The size of this virtually contiguous memory is set
> to store up to 100000 allocation tags and max_module_alloc_tags kernel
> parameter is introduced to change this size.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
> .../admin-guide/kernel-parameters.txt | 4 +
> include/asm-generic/codetag.lds.h | 19 ++
> include/linux/alloc_tag.h | 13 +-
> include/linux/codetag.h | 35 ++-
> kernel/module/main.c | 67 +++--
> lib/alloc_tag.c | 245 ++++++++++++++++--
> lib/codetag.c | 101 +++++++-
> scripts/module.lds.S | 5 +-
> 8 files changed, 429 insertions(+), 60 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index d0d141d50638..17f9f811a9c0 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3205,6 +3205,10 @@
> devices can be requested on-demand with the
> /dev/loop-control interface.
>
> +
> + max_module_alloc_tags= [KNL] Max supported number of allocation tags
> + in modules.
> +
> mce [X86-32] Machine Check Exception
>
> mce=option [X86-64] See Documentation/arch/x86/x86_64/boot-options.rst
> diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
> index 64f536b80380..372c320c5043 100644
> --- a/include/asm-generic/codetag.lds.h
> +++ b/include/asm-generic/codetag.lds.h
> @@ -11,4 +11,23 @@
> #define CODETAG_SECTIONS() \
> SECTION_WITH_BOUNDARIES(alloc_tags)
>
> +/*
> + * Module codetags which aren't used after module unload, therefore have the
> + * same lifespan as the module and can be safely unloaded with the module.
> + */
> +#define MOD_CODETAG_SECTIONS()
> +
> +#define MOD_SEPARATE_CODETAG_SECTION(_name) \
> + .codetag.##_name : { \
> + SECTION_WITH_BOUNDARIES(_name) \
> + }
> +
> +/*
> + * For codetags which might be used after module unload, therefore might stay
> + * longer in memory. Each such codetag type has its own section so that we can
> + * unload them individually once unused.
> + */
> +#define MOD_SEPARATE_CODETAG_SECTIONS() \
> + MOD_SEPARATE_CODETAG_SECTION(alloc_tags)
> +
> #endif /* __ASM_GENERIC_CODETAG_LDS_H */
> diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> index 8c61ccd161ba..99cbc7f086ad 100644
> --- a/include/linux/alloc_tag.h
> +++ b/include/linux/alloc_tag.h
> @@ -30,6 +30,13 @@ struct alloc_tag {
> struct alloc_tag_counters __percpu *counters;
> } __aligned(8);
>
> +struct alloc_tag_module_section {
> + unsigned long start_addr;
> + unsigned long end_addr;
> + /* used size */
> + unsigned long size;
> +};
> +
> #ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
>
> #define CODETAG_EMPTY ((void *)1)
> @@ -54,6 +61,8 @@ static inline void set_codetag_empty(union codetag_ref *ref) {}
>
> #ifdef CONFIG_MEM_ALLOC_PROFILING
>
> +#define ALLOC_TAG_SECTION_NAME "alloc_tags"
> +
> struct codetag_bytes {
> struct codetag *ct;
> s64 bytes;
> @@ -76,7 +85,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
>
> #define DEFINE_ALLOC_TAG(_alloc_tag) \
> static struct alloc_tag _alloc_tag __used __aligned(8) \
> - __section("alloc_tags") = { \
> + __section(ALLOC_TAG_SECTION_NAME) = { \
> .ct = CODE_TAG_INIT, \
> .counters = &_shared_alloc_tag };
>
> @@ -85,7 +94,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
> #define DEFINE_ALLOC_TAG(_alloc_tag) \
> static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr); \
> static struct alloc_tag _alloc_tag __used __aligned(8) \
> - __section("alloc_tags") = { \
> + __section(ALLOC_TAG_SECTION_NAME) = { \
> .ct = CODE_TAG_INIT, \
> .counters = &_alloc_tag_cntr };
>
> diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> index c2a579ccd455..c4a3dd60205e 100644
> --- a/include/linux/codetag.h
> +++ b/include/linux/codetag.h
> @@ -35,8 +35,13 @@ struct codetag_type_desc {
> size_t tag_size;
> void (*module_load)(struct codetag_type *cttype,
> struct codetag_module *cmod);
> - bool (*module_unload)(struct codetag_type *cttype,
> + void (*module_unload)(struct codetag_type *cttype,
> struct codetag_module *cmod);
> + void (*module_replaced)(struct module *mod, struct module *new_mod);
> + bool (*needs_section_mem)(struct module *mod, unsigned long size);
> + void *(*alloc_section_mem)(struct module *mod, unsigned long size,
> + unsigned int prepend, unsigned long align);
> + void (*free_section_mem)(struct module *mod, bool unused);
> };
>
> struct codetag_iterator {
> @@ -71,11 +76,31 @@ struct codetag_type *
> codetag_register_type(const struct codetag_type_desc *desc);
>
> #if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
> +
> +bool codetag_needs_module_section(struct module *mod, const char *name,
> + unsigned long size);
> +void *codetag_alloc_module_section(struct module *mod, const char *name,
> + unsigned long size, unsigned int prepend,
> + unsigned long align);
> +void codetag_free_module_sections(struct module *mod);
> +void codetag_module_replaced(struct module *mod, struct module *new_mod);
> void codetag_load_module(struct module *mod);
> -bool codetag_unload_module(struct module *mod);
> -#else
> +void codetag_unload_module(struct module *mod);
> +
> +#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
> +
> +static inline bool
> +codetag_needs_module_section(struct module *mod, const char *name,
> + unsigned long size) { return false; }
> +static inline void *
> +codetag_alloc_module_section(struct module *mod, const char *name,
> + unsigned long size, unsigned int prepend,
> + unsigned long align) { return NULL; }
> +static inline void codetag_free_module_sections(struct module *mod) {}
> +static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
> static inline void codetag_load_module(struct module *mod) {}
> -static inline bool codetag_unload_module(struct module *mod) { return true; }
> -#endif
> +static inline void codetag_unload_module(struct module *mod) {}
> +
> +#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
>
> #endif /* _LINUX_CODETAG_H */
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index 71396e297499..d195d788835c 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -1225,18 +1225,12 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
> return 0;
> }
>
> -static void module_memory_free(struct module *mod, enum mod_mem_type type,
> - bool unload_codetags)
> +static void module_memory_free(struct module *mod, enum mod_mem_type type)
> {
> - void *ptr = mod->mem[type].base;
> -
> - if (!unload_codetags && mod_mem_type_is_core_data(type))
> - return;
> -
> - execmem_free(ptr);
> + execmem_free(mod->mem[type].base);
> }
>
> -static void free_mod_mem(struct module *mod, bool unload_codetags)
> +static void free_mod_mem(struct module *mod)
> {
> for_each_mod_mem_type(type) {
> struct module_memory *mod_mem = &mod->mem[type];
> @@ -1247,25 +1241,20 @@ static void free_mod_mem(struct module *mod, bool unload_codetags)
> /* Free lock-classes; relies on the preceding sync_rcu(). */
> lockdep_free_key_range(mod_mem->base, mod_mem->size);
> if (mod_mem->size)
> - module_memory_free(mod, type, unload_codetags);
> + module_memory_free(mod, type);
> }
>
> /* MOD_DATA hosts mod, so free it at last */
> lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
> - module_memory_free(mod, MOD_DATA, unload_codetags);
> + module_memory_free(mod, MOD_DATA);
> }
>
> /* Free a module, remove from lists, etc. */
> static void free_module(struct module *mod)
> {
> - bool unload_codetags;
> -
> trace_module_free(mod);
>
> - unload_codetags = codetag_unload_module(mod);
> - if (!unload_codetags)
> - pr_warn("%s: memory allocation(s) from the module still alive, cannot unload cleanly\n",
> - mod->name);
> + codetag_unload_module(mod);
>
> mod_sysfs_teardown(mod);
>
> @@ -1308,7 +1297,7 @@ static void free_module(struct module *mod)
> kfree(mod->args);
> percpu_modfree(mod);
>
> - free_mod_mem(mod, unload_codetags);
> + free_mod_mem(mod);
> }
>
> void *__symbol_get(const char *symbol)
> @@ -1573,6 +1562,14 @@ static void __layout_sections(struct module *mod, struct load_info *info, bool i
> if (WARN_ON_ONCE(type == MOD_INVALID))
> continue;
>
> + /*
> + * Do not allocate codetag memory as we load it into
> + * preallocated contiguous memory. s->sh_entsize will
> + * not be used for this section, so leave it as is.
> + */
> + if (codetag_needs_module_section(mod, sname, s->sh_size))
> + continue;
> +
> s->sh_entsize = module_get_offset_and_type(mod, type, s, i);
> pr_debug("\t%s\n", sname);
> }
> @@ -2247,6 +2244,7 @@ static int move_module(struct module *mod, struct load_info *info)
> int i;
> enum mod_mem_type t = 0;
> int ret = -ENOMEM;
> + bool codetag_section_found = false;
>
> for_each_mod_mem_type(type) {
> if (!mod->mem[type].size) {
> @@ -2257,7 +2255,7 @@ static int move_module(struct module *mod, struct load_info *info)
> ret = module_memory_alloc(mod, type);
> if (ret) {
> t = type;
> - goto out_enomem;
> + goto out_err;
> }
> }
>
> @@ -2267,11 +2265,27 @@ static int move_module(struct module *mod, struct load_info *info)
> void *dest;
> Elf_Shdr *shdr = &info->sechdrs[i];
> enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
> + const char *sname;
>
> if (!(shdr->sh_flags & SHF_ALLOC))
> continue;
>
> - dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
> + sname = info->secstrings + shdr->sh_name;
> + /*
> + * Load codetag sections separately as they might still be used
> + * after module unload.
> + */
> + if (codetag_needs_module_section(mod, sname, shdr->sh_size)) {
> + dest = codetag_alloc_module_section(mod, sname, shdr->sh_size,
> + arch_mod_section_prepend(mod, i), shdr->sh_addralign);
> + if (IS_ERR(dest)) {
> + ret = PTR_ERR(dest);
> + goto out_err;
> + }
> + codetag_section_found = true;
> + } else {
> + dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
> + }
>
> if (shdr->sh_type != SHT_NOBITS) {
> /*
> @@ -2283,7 +2297,7 @@ static int move_module(struct module *mod, struct load_info *info)
> if (i == info->index.mod &&
> (WARN_ON_ONCE(shdr->sh_size != sizeof(struct module)))) {
> ret = -ENOEXEC;
> - goto out_enomem;
> + goto out_err;
> }
> memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
> }
> @@ -2299,9 +2313,12 @@ static int move_module(struct module *mod, struct load_info *info)
> }
>
> return 0;
> -out_enomem:
> +out_err:
> for (t--; t >= 0; t--)
> - module_memory_free(mod, t, true);
> + module_memory_free(mod, t);
> + if (codetag_section_found)
> + codetag_free_module_sections(mod);
> +
> return ret;
> }
>
> @@ -2422,6 +2439,8 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> /* Module has been copied to its final place now: return it. */
> mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> kmemleak_load_module(mod, info);
> + codetag_module_replaced(info->mod, mod);
> +
> return mod;
> }
>
> @@ -2431,7 +2450,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
> percpu_modfree(mod);
> module_arch_freeing_init(mod);
>
> - free_mod_mem(mod, true);
> + free_mod_mem(mod);
> }
>
> int __weak module_finalize(const Elf_Ehdr *hdr,
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index 81e5f9a70f22..f33784f48dd2 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
> #include <linux/alloc_tag.h>
> +#include <linux/execmem.h>
> #include <linux/fs.h>
> #include <linux/gfp.h>
> #include <linux/module.h>
> @@ -9,6 +10,12 @@
> #include <linux/seq_file.h>
>
> static struct codetag_type *alloc_tag_cttype;
> +static struct alloc_tag_module_section module_tags;
> +static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
> +/* A dummy object used to indicate an unloaded module */
> +static struct module unloaded_mod;
> +/* A dummy object used to indicate a module prepended area */
> +static struct module prepend_mod;
>
> DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
> EXPORT_SYMBOL(_shared_alloc_tag);
> @@ -149,29 +156,198 @@ static void __init procfs_init(void)
> proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
> }
>
> -static bool alloc_tag_module_unload(struct codetag_type *cttype,
> - struct codetag_module *cmod)
> +static bool needs_section_mem(struct module *mod, unsigned long size)
> {
> - struct codetag_iterator iter = codetag_get_ct_iter(cttype);
> - struct alloc_tag_counters counter;
> - bool module_unused = true;
> - struct alloc_tag *tag;
> - struct codetag *ct;
> + return size >= sizeof(struct alloc_tag);
> +}
> +
> +/* Called under RCU read lock */
> +static void clean_unused_module_areas(void)
> +{
> + MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
> + struct module *val;
> +
> + mas_for_each(&mas, val, module_tags.size) {
> + if (val == &unloaded_mod) {
> + struct alloc_tag *tag;
> + struct alloc_tag *last;
> + bool unused = true;
> +
> + tag = (struct alloc_tag *)(module_tags.start_addr + mas.index);
> + last = (struct alloc_tag *)(module_tags.start_addr + mas.last);
> + while (tag <= last) {
> + struct alloc_tag_counters counter;
> +
> + counter = alloc_tag_read(tag);
> + if (counter.bytes) {
> + unused = false;
> + break;
> + }
> + tag++;
> + }
> + if (unused) {
> + mtree_store_range(&mod_area_mt, mas.index,
> + mas.last, NULL, GFP_KERNEL);
There is an mtree_erase() or mas_erase() function as well, no problem
doing it this way.
> + }
> + }
> + }
> +}
> +
> +static void *reserve_module_tags(struct module *mod, unsigned long size,
> + unsigned int prepend, unsigned long align)
> +{
> + unsigned long section_size = module_tags.end_addr - module_tags.start_addr;
> + MA_STATE(mas, &mod_area_mt, 0, section_size - 1);
> + bool cleanup_done = false;
> + unsigned long offset;
> + void *ret;
> +
> + /* If no tags return NULL */
> + if (size < sizeof(struct alloc_tag))
> + return NULL;
> +
> + /*
> + * align is always power of 2, so we can use IS_ALIGNED and ALIGN.
> + * align 0 or 1 means no alignment, to simplify set to 1.
> + */
> + if (!align)
> + align = 1;
> +
> + rcu_read_lock();
> +repeat:
> + /* Try finding exact size and hope the start is aligned */
> + if (mas_empty_area(&mas, 0, section_size - 1, prepend + size))
> + goto cleanup;
> +
> + if (IS_ALIGNED(mas.index + prepend, align))
> + goto found;
> +
> + /* Try finding larger area to align later */
> + mas_reset(&mas);
> + if (!mas_empty_area(&mas, 0, section_size - 1,
> + size + prepend + align - 1))
> + goto found;
> +
> +cleanup:
> + /* No free area, try cleanup stale data and repeat the search once */
> + if (!cleanup_done) {
> + clean_unused_module_areas();
> + cleanup_done = true;
> + mas_reset(&mas);
> + goto repeat;
> + } else {
> + ret = ERR_PTR(-ENOMEM);
> + goto out;
> + }
> +
> +found:
> + offset = mas.index;
> + offset += prepend;
> + offset = ALIGN(offset, align);
> +
> + if (mtree_insert_range(&mod_area_mt, offset, offset + size - 1, mod,
> + GFP_KERNEL)) {
> + ret = ERR_PTR(-ENOMEM);
> + goto out;
> + }
> +
> + if (offset != mas.index) {
> + if (mtree_insert_range(&mod_area_mt, mas.index, offset - 1,
> + &prepend_mod, GFP_KERNEL)) {
> + mtree_store_range(&mod_area_mt, offset, offset + size - 1,
> + NULL, GFP_KERNEL);
Aren't you negating the security of knowing you haven't raced to store
in the same location as another reader if you use mtree_store_range()
on the failure of mtree_insert_range()?
> + ret = ERR_PTR(-ENOMEM);
> + goto out;
> + }
> + }
mtree_insert and mtree_store do the locking for you, but will re-walk
the tree to the location for the store. If you use the advanced mas_
operations, you can store without a re-walk but need to take the
spinlock and check mas_is_err(&mas) yourself.
mas.last = offset - 1;
mas_lock(&mas)
mas_insert(&mas, prepend_mod)
mas_unlock(&mas);
if (mas_is_err(&mas))
ret = xa_err(mas->node);
What you have works though.
> +
> + if (module_tags.size < offset + size)
> + module_tags.size = offset + size;
> +
> + ret = (struct alloc_tag *)(module_tags.start_addr + offset);
> +out:
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
> +static void release_module_tags(struct module *mod, bool unused)
> +{
> + MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
> + unsigned long padding_start;
> + bool padding_found = false;
> + struct module *val;
> +
> + if (unused)
> + return;
> +
> + unused = true;
> + rcu_read_lock();
> + mas_for_each(&mas, val, module_tags.size) {
> + struct alloc_tag *tag;
> + struct alloc_tag *last;
> +
> + if (val == &prepend_mod) {
> + padding_start = mas.index;
> + padding_found = true;
> + continue;
> + }
>
> - for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
> - if (iter.cmod != cmod)
> + if (val != mod) {
> + padding_found = false;
> continue;
> + }
> +
> + tag = (struct alloc_tag *)(module_tags.start_addr + mas.index);
> + last = (struct alloc_tag *)(module_tags.start_addr + mas.last);
> + while (tag <= last) {
> + struct alloc_tag_counters counter;
> +
> + counter = alloc_tag_read(tag);
> + if (counter.bytes) {
> + struct codetag *ct = &tag->ct;
>
> - tag = ct_to_alloc_tag(ct);
> - counter = alloc_tag_read(tag);
> + pr_info("%s:%u module %s func:%s has %llu allocated at module unload\n",
> + ct->filename, ct->lineno, ct->modname,
> + ct->function, counter.bytes);
> + unused = false;
> + break;
> + }
> + tag++;
> + }
> + if (unused) {
> + mtree_store_range(&mod_area_mt,
> + padding_found ? padding_start : mas.index,
> + mas.last, NULL, GFP_KERNEL);
> + } else {
> + /* Release the padding and mark the module unloaded */
> + if (padding_found)
> + mtree_store_range(&mod_area_mt, padding_start,
> + mas.index - 1, NULL, GFP_KERNEL);
> + mtree_store_range(&mod_area_mt, mas.index, mas.last,
> + &unloaded_mod, GFP_KERNEL);
> + }
>
> - if (WARN(counter.bytes,
> - "%s:%u module %s func:%s has %llu allocated at module unload",
> - ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
> - module_unused = false;
> + break;
> }
> + rcu_read_unlock();
You may want to use a reverse iterator here, I don't think we have had a
use for one yet..
#define mas_for_each_rev(__mas, __entry, __min) \
while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
===================================
MA_STATE(mas, &mod_area_mt, module_tags.size, module_tags.size);
...
rcu_read_lock();
mas_for_each_rev(&mas, val, 0)
if (val == mod)
break;
if (!val) /* no module found */
goto out;
..unused check loop here..
mas_lock(&mas) /* spinlock */
mas_store(&mas, unused ? NULL : &unloaded_mod);
val = mas_prev_range(&mas, 0);
if (val == &prepend_mod)
mas_store(&mas, NULL);
mas_unlock(&mas);
out:
rcu_read_unlock();
> +}
> +
> +static void replace_module(struct module *mod, struct module *new_mod)
> +{
> + MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
> + struct module *val;
>
> - return module_unused;
> + rcu_read_lock();
> + mas_for_each(&mas, val, module_tags.size) {
> + if (val != mod)
> + continue;
> +
> + mtree_store_range(&mod_area_mt, mas.index, mas.last,
> + new_mod, GFP_KERNEL);
> + break;
> + }
> + rcu_read_unlock();
> }
>
> #ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
> @@ -252,17 +428,46 @@ static void __init sysctl_init(void)
> static inline void sysctl_init(void) {}
> #endif /* CONFIG_SYSCTL */
>
> +static unsigned long max_module_alloc_tags __initdata = 100000;
> +
> +static int __init set_max_module_alloc_tags(char *arg)
> +{
> + if (!arg)
> + return -EINVAL;
> +
> + return kstrtoul(arg, 0, &max_module_alloc_tags);
> +}
> +early_param("max_module_alloc_tags", set_max_module_alloc_tags);
> +
> static int __init alloc_tag_init(void)
> {
> const struct codetag_type_desc desc = {
> - .section = "alloc_tags",
> - .tag_size = sizeof(struct alloc_tag),
> - .module_unload = alloc_tag_module_unload,
> + .section = ALLOC_TAG_SECTION_NAME,
> + .tag_size = sizeof(struct alloc_tag),
> + .needs_section_mem = needs_section_mem,
> + .alloc_section_mem = reserve_module_tags,
> + .free_section_mem = release_module_tags,
> + .module_replaced = replace_module,
> };
> + unsigned long module_tags_mem_sz;
>
> + module_tags_mem_sz = max_module_alloc_tags * sizeof(struct alloc_tag);
> + pr_info("%lu bytes reserved for module allocation tags\n", module_tags_mem_sz);
> +
> + /* Allocate space to copy allocation tags */
> + module_tags.start_addr = (unsigned long)execmem_alloc(EXECMEM_MODULE_DATA,
> + module_tags_mem_sz);
> + if (!module_tags.start_addr)
> + return -ENOMEM;
> +
> + module_tags.end_addr = module_tags.start_addr + module_tags_mem_sz;
> + mt_set_in_rcu(&mod_area_mt);
If you are not toggling rcu-safe, then you can use the init of the tree
to set rcu on by setting MT_FLAGS_USE_RCU.
> alloc_tag_cttype = codetag_register_type(&desc);
> - if (IS_ERR(alloc_tag_cttype))
> + if (IS_ERR(alloc_tag_cttype)) {
> + execmem_free((void *)module_tags.start_addr);
> + module_tags.start_addr = 0;
> return PTR_ERR(alloc_tag_cttype);
> + }
>
> sysctl_init();
> procfs_init();
> diff --git a/lib/codetag.c b/lib/codetag.c
> index 5ace625f2328..d602a81bdc03 100644
> --- a/lib/codetag.c
> +++ b/lib/codetag.c
> @@ -126,6 +126,7 @@ static inline size_t range_size(const struct codetag_type *cttype,
> }
>
> #ifdef CONFIG_MODULES
> +
> static void *get_symbol(struct module *mod, const char *prefix, const char *name)
> {
> DECLARE_SEQ_BUF(sb, KSYM_NAME_LEN);
> @@ -155,6 +156,94 @@ static struct codetag_range get_section_range(struct module *mod,
> };
> }
>
> +#define CODETAG_SECTION_PREFIX ".codetag."
> +
> +/* Some codetag types need a separate module section */
> +bool codetag_needs_module_section(struct module *mod, const char *name,
> + unsigned long size)
> +{
> + const char *type_name;
> + struct codetag_type *cttype;
> + bool ret = false;
> +
> + if (strncmp(name, CODETAG_SECTION_PREFIX, strlen(CODETAG_SECTION_PREFIX)))
> + return false;
> +
> + type_name = name + strlen(CODETAG_SECTION_PREFIX);
> + mutex_lock(&codetag_lock);
> + list_for_each_entry(cttype, &codetag_types, link) {
> + if (strcmp(type_name, cttype->desc.section) == 0) {
> + if (!cttype->desc.needs_section_mem)
> + break;
> +
> + down_write(&cttype->mod_lock);
> + ret = cttype->desc.needs_section_mem(mod, size);
> + up_write(&cttype->mod_lock);
> + break;
> + }
> + }
> + mutex_unlock(&codetag_lock);
> +
> + return ret;
> +}
> +
> +void *codetag_alloc_module_section(struct module *mod, const char *name,
> + unsigned long size, unsigned int prepend,
> + unsigned long align)
> +{
> + const char *type_name = name + strlen(CODETAG_SECTION_PREFIX);
> + struct codetag_type *cttype;
> + void *ret = NULL;
> +
> + mutex_lock(&codetag_lock);
> + list_for_each_entry(cttype, &codetag_types, link) {
> + if (strcmp(type_name, cttype->desc.section) == 0) {
> + if (WARN_ON(!cttype->desc.alloc_section_mem))
> + break;
> +
> + down_write(&cttype->mod_lock);
> + ret = cttype->desc.alloc_section_mem(mod, size, prepend, align);
> + up_write(&cttype->mod_lock);
> + break;
> + }
> + }
> + mutex_unlock(&codetag_lock);
> +
> + return ret;
> +}
> +
> +void codetag_free_module_sections(struct module *mod)
> +{
> + struct codetag_type *cttype;
> +
> + mutex_lock(&codetag_lock);
> + list_for_each_entry(cttype, &codetag_types, link) {
> + if (!cttype->desc.free_section_mem)
> + continue;
> +
> + down_write(&cttype->mod_lock);
> + cttype->desc.free_section_mem(mod, true);
> + up_write(&cttype->mod_lock);
> + }
> + mutex_unlock(&codetag_lock);
> +}
> +
> +void codetag_module_replaced(struct module *mod, struct module *new_mod)
> +{
> + struct codetag_type *cttype;
> +
> + mutex_lock(&codetag_lock);
> + list_for_each_entry(cttype, &codetag_types, link) {
> + if (!cttype->desc.module_replaced)
> + continue;
> +
> + down_write(&cttype->mod_lock);
> + cttype->desc.module_replaced(mod, new_mod);
> + up_write(&cttype->mod_lock);
> + }
> + mutex_unlock(&codetag_lock);
> +}
> +
> static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
> {
> struct codetag_range range;
> @@ -212,13 +301,12 @@ void codetag_load_module(struct module *mod)
> mutex_unlock(&codetag_lock);
> }
>
> -bool codetag_unload_module(struct module *mod)
> +void codetag_unload_module(struct module *mod)
> {
> struct codetag_type *cttype;
> - bool unload_ok = true;
>
> if (!mod)
> - return true;
> + return;
>
> mutex_lock(&codetag_lock);
> list_for_each_entry(cttype, &codetag_types, link) {
> @@ -235,18 +323,17 @@ bool codetag_unload_module(struct module *mod)
> }
> if (found) {
> if (cttype->desc.module_unload)
> - if (!cttype->desc.module_unload(cttype, cmod))
> - unload_ok = false;
> + cttype->desc.module_unload(cttype, cmod);
>
> cttype->count -= range_size(cttype, &cmod->range);
> idr_remove(&cttype->mod_idr, mod_id);
> kfree(cmod);
> }
> up_write(&cttype->mod_lock);
> + if (found && cttype->desc.free_section_mem)
> + cttype->desc.free_section_mem(mod, false);
> }
> mutex_unlock(&codetag_lock);
> -
> - return unload_ok;
> }
>
> #else /* CONFIG_MODULES */
> diff --git a/scripts/module.lds.S b/scripts/module.lds.S
> index 3f43edef813c..711c6e029936 100644
> --- a/scripts/module.lds.S
> +++ b/scripts/module.lds.S
> @@ -50,7 +50,7 @@ SECTIONS {
> .data : {
> *(.data .data.[0-9a-zA-Z_]*)
> *(.data..L*)
> - CODETAG_SECTIONS()
> + MOD_CODETAG_SECTIONS()
> }
>
> .rodata : {
> @@ -59,9 +59,10 @@ SECTIONS {
> }
> #else
> .data : {
> - CODETAG_SECTIONS()
> + MOD_CODETAG_SECTIONS()
> }
> #endif
> + MOD_SEPARATE_CODETAG_SECTIONS()
> }
>
> /* bring in arch-specific sections */
> --
> 2.46.0.184.g6999bdac58-goog
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/5] alloc_tag: load module tags into separate continuous memory
2024-08-20 15:31 ` Liam R. Howlett
@ 2024-08-20 18:35 ` Suren Baghdasaryan
0 siblings, 0 replies; 16+ messages in thread
From: Suren Baghdasaryan @ 2024-08-20 18:35 UTC (permalink / raw)
To: Liam R. Howlett, Suren Baghdasaryan, akpm, kent.overstreet,
corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx, bp,
xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, pasha.tatashin, souravpanda,
keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Tue, Aug 20, 2024 at 8:31 AM 'Liam R. Howlett' via kernel-team
<kernel-team@android.com> wrote:
>
> * Suren Baghdasaryan <surenb@google.com> [240819 11:15]:
> > When a module gets unloaded there is a possibility that some of the
> > allocations it made are still used and therefore the allocation tags
> > corresponding to these allocations are still referenced. As such, the
> > memory for these tags can't be freed. This is currently handled as an
> > abnormal situation and module's data section is not being unloaded.
> > To handle this situation without keeping module's data in memory,
> > allow codetags with longer lifespan than the module to be loaded into
> > their own separate memory. The in-use memory areas and gaps after
> > module unloading in this separate memory are tracked using maple trees.
> > Allocation tags arrange their separate memory so that it is virtually
> > contiguous and that will allow simple allocation tag indexing later on
> > in this patchset. The size of this virtually contiguous memory is set
> > to store up to 100000 allocation tags and max_module_alloc_tags kernel
> > parameter is introduced to change this size.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> > .../admin-guide/kernel-parameters.txt | 4 +
> > include/asm-generic/codetag.lds.h | 19 ++
> > include/linux/alloc_tag.h | 13 +-
> > include/linux/codetag.h | 35 ++-
> > kernel/module/main.c | 67 +++--
> > lib/alloc_tag.c | 245 ++++++++++++++++--
> > lib/codetag.c | 101 +++++++-
> > scripts/module.lds.S | 5 +-
> > 8 files changed, 429 insertions(+), 60 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index d0d141d50638..17f9f811a9c0 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -3205,6 +3205,10 @@
> > devices can be requested on-demand with the
> > /dev/loop-control interface.
> >
> > +
> > + max_module_alloc_tags= [KNL] Max supported number of allocation tags
> > + in modules.
> > +
> > mce [X86-32] Machine Check Exception
> >
> > mce=option [X86-64] See Documentation/arch/x86/x86_64/boot-options.rst
> > diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
> > index 64f536b80380..372c320c5043 100644
> > --- a/include/asm-generic/codetag.lds.h
> > +++ b/include/asm-generic/codetag.lds.h
> > @@ -11,4 +11,23 @@
> > #define CODETAG_SECTIONS() \
> > SECTION_WITH_BOUNDARIES(alloc_tags)
> >
> > +/*
> > + * Module codetags which aren't used after module unload, therefore have the
> > + * same lifespan as the module and can be safely unloaded with the module.
> > + */
> > +#define MOD_CODETAG_SECTIONS()
> > +
> > +#define MOD_SEPARATE_CODETAG_SECTION(_name) \
> > + .codetag.##_name : { \
> > + SECTION_WITH_BOUNDARIES(_name) \
> > + }
> > +
> > +/*
> > + * For codetags which might be used after module unload, therefore might stay
> > + * longer in memory. Each such codetag type has its own section so that we can
> > + * unload them individually once unused.
> > + */
> > +#define MOD_SEPARATE_CODETAG_SECTIONS() \
> > + MOD_SEPARATE_CODETAG_SECTION(alloc_tags)
> > +
> > #endif /* __ASM_GENERIC_CODETAG_LDS_H */
> > diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> > index 8c61ccd161ba..99cbc7f086ad 100644
> > --- a/include/linux/alloc_tag.h
> > +++ b/include/linux/alloc_tag.h
> > @@ -30,6 +30,13 @@ struct alloc_tag {
> > struct alloc_tag_counters __percpu *counters;
> > } __aligned(8);
> >
> > +struct alloc_tag_module_section {
> > + unsigned long start_addr;
> > + unsigned long end_addr;
> > + /* used size */
> > + unsigned long size;
> > +};
> > +
> > #ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
> >
> > #define CODETAG_EMPTY ((void *)1)
> > @@ -54,6 +61,8 @@ static inline void set_codetag_empty(union codetag_ref *ref) {}
> >
> > #ifdef CONFIG_MEM_ALLOC_PROFILING
> >
> > +#define ALLOC_TAG_SECTION_NAME "alloc_tags"
> > +
> > struct codetag_bytes {
> > struct codetag *ct;
> > s64 bytes;
> > @@ -76,7 +85,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
> >
> > #define DEFINE_ALLOC_TAG(_alloc_tag) \
> > static struct alloc_tag _alloc_tag __used __aligned(8) \
> > - __section("alloc_tags") = { \
> > + __section(ALLOC_TAG_SECTION_NAME) = { \
> > .ct = CODE_TAG_INIT, \
> > .counters = &_shared_alloc_tag };
> >
> > @@ -85,7 +94,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
> > #define DEFINE_ALLOC_TAG(_alloc_tag) \
> > static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr); \
> > static struct alloc_tag _alloc_tag __used __aligned(8) \
> > - __section("alloc_tags") = { \
> > + __section(ALLOC_TAG_SECTION_NAME) = { \
> > .ct = CODE_TAG_INIT, \
> > .counters = &_alloc_tag_cntr };
> >
> > diff --git a/include/linux/codetag.h b/include/linux/codetag.h
> > index c2a579ccd455..c4a3dd60205e 100644
> > --- a/include/linux/codetag.h
> > +++ b/include/linux/codetag.h
> > @@ -35,8 +35,13 @@ struct codetag_type_desc {
> > size_t tag_size;
> > void (*module_load)(struct codetag_type *cttype,
> > struct codetag_module *cmod);
> > - bool (*module_unload)(struct codetag_type *cttype,
> > + void (*module_unload)(struct codetag_type *cttype,
> > struct codetag_module *cmod);
> > + void (*module_replaced)(struct module *mod, struct module *new_mod);
> > + bool (*needs_section_mem)(struct module *mod, unsigned long size);
> > + void *(*alloc_section_mem)(struct module *mod, unsigned long size,
> > + unsigned int prepend, unsigned long align);
> > + void (*free_section_mem)(struct module *mod, bool unused);
> > };
> >
> > struct codetag_iterator {
> > @@ -71,11 +76,31 @@ struct codetag_type *
> > codetag_register_type(const struct codetag_type_desc *desc);
> >
> > #if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
> > +
> > +bool codetag_needs_module_section(struct module *mod, const char *name,
> > + unsigned long size);
> > +void *codetag_alloc_module_section(struct module *mod, const char *name,
> > + unsigned long size, unsigned int prepend,
> > + unsigned long align);
> > +void codetag_free_module_sections(struct module *mod);
> > +void codetag_module_replaced(struct module *mod, struct module *new_mod);
> > void codetag_load_module(struct module *mod);
> > -bool codetag_unload_module(struct module *mod);
> > -#else
> > +void codetag_unload_module(struct module *mod);
> > +
> > +#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
> > +
> > +static inline bool
> > +codetag_needs_module_section(struct module *mod, const char *name,
> > + unsigned long size) { return false; }
> > +static inline void *
> > +codetag_alloc_module_section(struct module *mod, const char *name,
> > + unsigned long size, unsigned int prepend,
> > + unsigned long align) { return NULL; }
> > +static inline void codetag_free_module_sections(struct module *mod) {}
> > +static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
> > static inline void codetag_load_module(struct module *mod) {}
> > -static inline bool codetag_unload_module(struct module *mod) { return true; }
> > -#endif
> > +static inline void codetag_unload_module(struct module *mod) {}
> > +
> > +#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
> >
> > #endif /* _LINUX_CODETAG_H */
> > diff --git a/kernel/module/main.c b/kernel/module/main.c
> > index 71396e297499..d195d788835c 100644
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -1225,18 +1225,12 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
> > return 0;
> > }
> >
> > -static void module_memory_free(struct module *mod, enum mod_mem_type type,
> > - bool unload_codetags)
> > +static void module_memory_free(struct module *mod, enum mod_mem_type type)
> > {
> > - void *ptr = mod->mem[type].base;
> > -
> > - if (!unload_codetags && mod_mem_type_is_core_data(type))
> > - return;
> > -
> > - execmem_free(ptr);
> > + execmem_free(mod->mem[type].base);
> > }
> >
> > -static void free_mod_mem(struct module *mod, bool unload_codetags)
> > +static void free_mod_mem(struct module *mod)
> > {
> > for_each_mod_mem_type(type) {
> > struct module_memory *mod_mem = &mod->mem[type];
> > @@ -1247,25 +1241,20 @@ static void free_mod_mem(struct module *mod, bool unload_codetags)
> > /* Free lock-classes; relies on the preceding sync_rcu(). */
> > lockdep_free_key_range(mod_mem->base, mod_mem->size);
> > if (mod_mem->size)
> > - module_memory_free(mod, type, unload_codetags);
> > + module_memory_free(mod, type);
> > }
> >
> > /* MOD_DATA hosts mod, so free it at last */
> > lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
> > - module_memory_free(mod, MOD_DATA, unload_codetags);
> > + module_memory_free(mod, MOD_DATA);
> > }
> >
> > /* Free a module, remove from lists, etc. */
> > static void free_module(struct module *mod)
> > {
> > - bool unload_codetags;
> > -
> > trace_module_free(mod);
> >
> > - unload_codetags = codetag_unload_module(mod);
> > - if (!unload_codetags)
> > - pr_warn("%s: memory allocation(s) from the module still alive, cannot unload cleanly\n",
> > - mod->name);
> > + codetag_unload_module(mod);
> >
> > mod_sysfs_teardown(mod);
> >
> > @@ -1308,7 +1297,7 @@ static void free_module(struct module *mod)
> > kfree(mod->args);
> > percpu_modfree(mod);
> >
> > - free_mod_mem(mod, unload_codetags);
> > + free_mod_mem(mod);
> > }
> >
> > void *__symbol_get(const char *symbol)
> > @@ -1573,6 +1562,14 @@ static void __layout_sections(struct module *mod, struct load_info *info, bool i
> > if (WARN_ON_ONCE(type == MOD_INVALID))
> > continue;
> >
> > + /*
> > + * Do not allocate codetag memory as we load it into
> > + * preallocated contiguous memory. s->sh_entsize will
> > + * not be used for this section, so leave it as is.
> > + */
> > + if (codetag_needs_module_section(mod, sname, s->sh_size))
> > + continue;
> > +
> > s->sh_entsize = module_get_offset_and_type(mod, type, s, i);
> > pr_debug("\t%s\n", sname);
> > }
> > @@ -2247,6 +2244,7 @@ static int move_module(struct module *mod, struct load_info *info)
> > int i;
> > enum mod_mem_type t = 0;
> > int ret = -ENOMEM;
> > + bool codetag_section_found = false;
> >
> > for_each_mod_mem_type(type) {
> > if (!mod->mem[type].size) {
> > @@ -2257,7 +2255,7 @@ static int move_module(struct module *mod, struct load_info *info)
> > ret = module_memory_alloc(mod, type);
> > if (ret) {
> > t = type;
> > - goto out_enomem;
> > + goto out_err;
> > }
> > }
> >
> > @@ -2267,11 +2265,27 @@ static int move_module(struct module *mod, struct load_info *info)
> > void *dest;
> > Elf_Shdr *shdr = &info->sechdrs[i];
> > enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
> > + const char *sname;
> >
> > if (!(shdr->sh_flags & SHF_ALLOC))
> > continue;
> >
> > - dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
> > + sname = info->secstrings + shdr->sh_name;
> > + /*
> > + * Load codetag sections separately as they might still be used
> > + * after module unload.
> > + */
> > + if (codetag_needs_module_section(mod, sname, shdr->sh_size)) {
> > + dest = codetag_alloc_module_section(mod, sname, shdr->sh_size,
> > + arch_mod_section_prepend(mod, i), shdr->sh_addralign);
> > + if (IS_ERR(dest)) {
> > + ret = PTR_ERR(dest);
> > + goto out_err;
> > + }
> > + codetag_section_found = true;
> > + } else {
> > + dest = mod->mem[type].base + (shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK);
> > + }
> >
> > if (shdr->sh_type != SHT_NOBITS) {
> > /*
> > @@ -2283,7 +2297,7 @@ static int move_module(struct module *mod, struct load_info *info)
> > if (i == info->index.mod &&
> > (WARN_ON_ONCE(shdr->sh_size != sizeof(struct module)))) {
> > ret = -ENOEXEC;
> > - goto out_enomem;
> > + goto out_err;
> > }
> > memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
> > }
> > @@ -2299,9 +2313,12 @@ static int move_module(struct module *mod, struct load_info *info)
> > }
> >
> > return 0;
> > -out_enomem:
> > +out_err:
> > for (t--; t >= 0; t--)
> > - module_memory_free(mod, t, true);
> > + module_memory_free(mod, t);
> > + if (codetag_section_found)
> > + codetag_free_module_sections(mod);
> > +
> > return ret;
> > }
> >
> > @@ -2422,6 +2439,8 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
> > /* Module has been copied to its final place now: return it. */
> > mod = (void *)info->sechdrs[info->index.mod].sh_addr;
> > kmemleak_load_module(mod, info);
> > + codetag_module_replaced(info->mod, mod);
> > +
> > return mod;
> > }
> >
> > @@ -2431,7 +2450,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
> > percpu_modfree(mod);
> > module_arch_freeing_init(mod);
> >
> > - free_mod_mem(mod, true);
> > + free_mod_mem(mod);
> > }
> >
> > int __weak module_finalize(const Elf_Ehdr *hdr,
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index 81e5f9a70f22..f33784f48dd2 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -1,5 +1,6 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> > #include <linux/alloc_tag.h>
> > +#include <linux/execmem.h>
> > #include <linux/fs.h>
> > #include <linux/gfp.h>
> > #include <linux/module.h>
> > @@ -9,6 +10,12 @@
> > #include <linux/seq_file.h>
> >
> > static struct codetag_type *alloc_tag_cttype;
> > +static struct alloc_tag_module_section module_tags;
> > +static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
> > +/* A dummy object used to indicate an unloaded module */
> > +static struct module unloaded_mod;
> > +/* A dummy object used to indicate a module prepended area */
> > +static struct module prepend_mod;
> >
> > DEFINE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
> > EXPORT_SYMBOL(_shared_alloc_tag);
> > @@ -149,29 +156,198 @@ static void __init procfs_init(void)
> > proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
> > }
> >
> > -static bool alloc_tag_module_unload(struct codetag_type *cttype,
> > - struct codetag_module *cmod)
> > +static bool needs_section_mem(struct module *mod, unsigned long size)
> > {
> > - struct codetag_iterator iter = codetag_get_ct_iter(cttype);
> > - struct alloc_tag_counters counter;
> > - bool module_unused = true;
> > - struct alloc_tag *tag;
> > - struct codetag *ct;
> > + return size >= sizeof(struct alloc_tag);
> > +}
> > +
> > +/* Called under RCU read lock */
> > +static void clean_unused_module_areas(void)
> > +{
> > + MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
> > + struct module *val;
> > +
> > + mas_for_each(&mas, val, module_tags.size) {
> > + if (val == &unloaded_mod) {
> > + struct alloc_tag *tag;
> > + struct alloc_tag *last;
> > + bool unused = true;
> > +
> > + tag = (struct alloc_tag *)(module_tags.start_addr + mas.index);
> > + last = (struct alloc_tag *)(module_tags.start_addr + mas.last);
> > + while (tag <= last) {
> > + struct alloc_tag_counters counter;
> > +
> > + counter = alloc_tag_read(tag);
> > + if (counter.bytes) {
> > + unused = false;
> > + break;
> > + }
> > + tag++;
> > + }
> > + if (unused) {
> > + mtree_store_range(&mod_area_mt, mas.index,
> > + mas.last, NULL, GFP_KERNEL);
>
> There is an mtree_erase() or mas_erase() function as well, no problem
> doing it this way.
Yeah, mtree_erase() seems more clear. I'll use that.
>
> > + }
> > + }
> > + }
> > +}
> > +
> > +static void *reserve_module_tags(struct module *mod, unsigned long size,
> > + unsigned int prepend, unsigned long align)
> > +{
> > + unsigned long section_size = module_tags.end_addr - module_tags.start_addr;
> > + MA_STATE(mas, &mod_area_mt, 0, section_size - 1);
> > + bool cleanup_done = false;
> > + unsigned long offset;
> > + void *ret;
> > +
> > + /* If no tags return NULL */
> > + if (size < sizeof(struct alloc_tag))
> > + return NULL;
> > +
> > + /*
> > + * align is always power of 2, so we can use IS_ALIGNED and ALIGN.
> > + * align 0 or 1 means no alignment, to simplify set to 1.
> > + */
> > + if (!align)
> > + align = 1;
> > +
> > + rcu_read_lock();
> > +repeat:
> > + /* Try finding exact size and hope the start is aligned */
> > + if (mas_empty_area(&mas, 0, section_size - 1, prepend + size))
> > + goto cleanup;
> > +
> > + if (IS_ALIGNED(mas.index + prepend, align))
> > + goto found;
> > +
> > + /* Try finding larger area to align later */
> > + mas_reset(&mas);
> > + if (!mas_empty_area(&mas, 0, section_size - 1,
> > + size + prepend + align - 1))
> > + goto found;
> > +
> > +cleanup:
> > + /* No free area, try cleanup stale data and repeat the search once */
> > + if (!cleanup_done) {
> > + clean_unused_module_areas();
> > + cleanup_done = true;
> > + mas_reset(&mas);
> > + goto repeat;
> > + } else {
> > + ret = ERR_PTR(-ENOMEM);
> > + goto out;
> > + }
> > +
> > +found:
> > + offset = mas.index;
> > + offset += prepend;
> > + offset = ALIGN(offset, align);
> > +
> > + if (mtree_insert_range(&mod_area_mt, offset, offset + size - 1, mod,
> > + GFP_KERNEL)) {
> > + ret = ERR_PTR(-ENOMEM);
> > + goto out;
> > + }
> > +
> > + if (offset != mas.index) {
> > + if (mtree_insert_range(&mod_area_mt, mas.index, offset - 1,
> > + &prepend_mod, GFP_KERNEL)) {
> > + mtree_store_range(&mod_area_mt, offset, offset + size - 1,
> > + NULL, GFP_KERNEL);
>
> Aren't you negating the security of knowing you haven't raced to store
> in the same location as another reader if you use mtree_store_range()
> on the failure of mtree_insert_range()?
All functions using mod_area_mt are synchronized using
cttype->mod_lock (see codetag_alloc_module_section() for an example),
so we should not be racing.
>
> > + ret = ERR_PTR(-ENOMEM);
> > + goto out;
> > + }
> > + }
>
> mtree_insert and mtree_store do the locking for you, but will re-walk
> the tree to the location for the store. If you use the advanced mas_
> operations, you can store without a re-walk but need to take the
> spinlock and check mas_is_err(&mas) yourself.
>
> mas.last = offset - 1;
> mas_lock(&mas)
> mas_insert(&mas, prepend_mod)
> mas_unlock(&mas);
> if (mas_is_err(&mas))
> ret = xa_err(mas->node);
>
> What you have works though.
Thanks for the tip. I'll try using this advanced API.
>
>
> > +
> > + if (module_tags.size < offset + size)
> > + module_tags.size = offset + size;
> > +
> > + ret = (struct alloc_tag *)(module_tags.start_addr + offset);
> > +out:
> > + rcu_read_unlock();
> > +
> > + return ret;
> > +}
> > +
> > +static void release_module_tags(struct module *mod, bool unused)
> > +{
> > + MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
> > + unsigned long padding_start;
> > + bool padding_found = false;
> > + struct module *val;
> > +
> > + if (unused)
> > + return;
> > +
> > + unused = true;
> > + rcu_read_lock();
> > + mas_for_each(&mas, val, module_tags.size) {
> > + struct alloc_tag *tag;
> > + struct alloc_tag *last;
> > +
> > + if (val == &prepend_mod) {
> > + padding_start = mas.index;
> > + padding_found = true;
> > + continue;
> > + }
> >
> > - for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
> > - if (iter.cmod != cmod)
> > + if (val != mod) {
> > + padding_found = false;
> > continue;
> > + }
> > +
> > + tag = (struct alloc_tag *)(module_tags.start_addr + mas.index);
> > + last = (struct alloc_tag *)(module_tags.start_addr + mas.last);
> > + while (tag <= last) {
> > + struct alloc_tag_counters counter;
> > +
> > + counter = alloc_tag_read(tag);
> > + if (counter.bytes) {
> > + struct codetag *ct = &tag->ct;
> >
> > - tag = ct_to_alloc_tag(ct);
> > - counter = alloc_tag_read(tag);
> > + pr_info("%s:%u module %s func:%s has %llu allocated at module unload\n",
> > + ct->filename, ct->lineno, ct->modname,
> > + ct->function, counter.bytes);
> > + unused = false;
> > + break;
> > + }
> > + tag++;
> > + }
> > + if (unused) {
> > + mtree_store_range(&mod_area_mt,
> > + padding_found ? padding_start : mas.index,
> > + mas.last, NULL, GFP_KERNEL);
> > + } else {
> > + /* Release the padding and mark the module unloaded */
> > + if (padding_found)
> > + mtree_store_range(&mod_area_mt, padding_start,
> > + mas.index - 1, NULL, GFP_KERNEL);
> > + mtree_store_range(&mod_area_mt, mas.index, mas.last,
> > + &unloaded_mod, GFP_KERNEL);
> > + }
> >
> > - if (WARN(counter.bytes,
> > - "%s:%u module %s func:%s has %llu allocated at module unload",
> > - ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
> > - module_unused = false;
> > + break;
> > }
> > + rcu_read_unlock();
>
> You may want to use a reverse iterator here, I don't think we have had a
> use for one yet..
>
> #define mas_for_each_rev(__mas, __entry, __min) \
> while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
>
> ===================================
>
> MA_STATE(mas, &mod_area_mt, module_tags.size, module_tags.size);
>
> ...
>
> rcu_read_lock();
> mas_for_each_rev(&mas, val, 0)
> if (val == mod)
> break;
>
> if (!val) /* no module found */
> goto out;
>
> ..unused check loop here..
>
> mas_lock(&mas) /* spinlock */
> mas_store(&mas, unused ? NULL : &unloaded_mod);
> val = mas_prev_range(&mas, 0);
> if (val == &prepend_mod)
> mas_store(&mas, NULL);
> mas_unlock(&mas);
>
> out:
> rcu_read_unlock();
>
I see. Ok, that should simplify this loop. I'll try that out.
>
> > +}
> > +
> > +static void replace_module(struct module *mod, struct module *new_mod)
> > +{
> > + MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
> > + struct module *val;
> >
> > - return module_unused;
> > + rcu_read_lock();
> > + mas_for_each(&mas, val, module_tags.size) {
> > + if (val != mod)
> > + continue;
> > +
> > + mtree_store_range(&mod_area_mt, mas.index, mas.last,
> > + new_mod, GFP_KERNEL);
> > + break;
> > + }
> > + rcu_read_unlock();
> > }
> >
> > #ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
> > @@ -252,17 +428,46 @@ static void __init sysctl_init(void)
> > static inline void sysctl_init(void) {}
> > #endif /* CONFIG_SYSCTL */
> >
> > +static unsigned long max_module_alloc_tags __initdata = 100000;
> > +
> > +static int __init set_max_module_alloc_tags(char *arg)
> > +{
> > + if (!arg)
> > + return -EINVAL;
> > +
> > + return kstrtoul(arg, 0, &max_module_alloc_tags);
> > +}
> > +early_param("max_module_alloc_tags", set_max_module_alloc_tags);
> > +
> > static int __init alloc_tag_init(void)
> > {
> > const struct codetag_type_desc desc = {
> > - .section = "alloc_tags",
> > - .tag_size = sizeof(struct alloc_tag),
> > - .module_unload = alloc_tag_module_unload,
> > + .section = ALLOC_TAG_SECTION_NAME,
> > + .tag_size = sizeof(struct alloc_tag),
> > + .needs_section_mem = needs_section_mem,
> > + .alloc_section_mem = reserve_module_tags,
> > + .free_section_mem = release_module_tags,
> > + .module_replaced = replace_module,
> > };
> > + unsigned long module_tags_mem_sz;
> >
> > + module_tags_mem_sz = max_module_alloc_tags * sizeof(struct alloc_tag);
> > + pr_info("%lu bytes reserved for module allocation tags\n", module_tags_mem_sz);
> > +
> > + /* Allocate space to copy allocation tags */
> > + module_tags.start_addr = (unsigned long)execmem_alloc(EXECMEM_MODULE_DATA,
> > + module_tags_mem_sz);
> > + if (!module_tags.start_addr)
> > + return -ENOMEM;
> > +
> > + module_tags.end_addr = module_tags.start_addr + module_tags_mem_sz;
> > + mt_set_in_rcu(&mod_area_mt);
>
> If you are not toggling rcu-safe, then you can use the init of the tree
> to set rcu on by setting MT_FLAGS_USE_RCU.
Ack.
Thanks for the feedback, Liam!
>
> > alloc_tag_cttype = codetag_register_type(&desc);
> > - if (IS_ERR(alloc_tag_cttype))
> > + if (IS_ERR(alloc_tag_cttype)) {
> > + execmem_free((void *)module_tags.start_addr);
> > + module_tags.start_addr = 0;
> > return PTR_ERR(alloc_tag_cttype);
> > + }
> >
> > sysctl_init();
> > procfs_init();
> > diff --git a/lib/codetag.c b/lib/codetag.c
> > index 5ace625f2328..d602a81bdc03 100644
> > --- a/lib/codetag.c
> > +++ b/lib/codetag.c
> > @@ -126,6 +126,7 @@ static inline size_t range_size(const struct codetag_type *cttype,
> > }
> >
> > #ifdef CONFIG_MODULES
> > +
> > static void *get_symbol(struct module *mod, const char *prefix, const char *name)
> > {
> > DECLARE_SEQ_BUF(sb, KSYM_NAME_LEN);
> > @@ -155,6 +156,94 @@ static struct codetag_range get_section_range(struct module *mod,
> > };
> > }
> >
> > +#define CODETAG_SECTION_PREFIX ".codetag."
> > +
> > +/* Some codetag types need a separate module section */
> > +bool codetag_needs_module_section(struct module *mod, const char *name,
> > + unsigned long size)
> > +{
> > + const char *type_name;
> > + struct codetag_type *cttype;
> > + bool ret = false;
> > +
> > + if (strncmp(name, CODETAG_SECTION_PREFIX, strlen(CODETAG_SECTION_PREFIX)))
> > + return false;
> > +
> > + type_name = name + strlen(CODETAG_SECTION_PREFIX);
> > + mutex_lock(&codetag_lock);
> > + list_for_each_entry(cttype, &codetag_types, link) {
> > + if (strcmp(type_name, cttype->desc.section) == 0) {
> > + if (!cttype->desc.needs_section_mem)
> > + break;
> > +
> > + down_write(&cttype->mod_lock);
> > + ret = cttype->desc.needs_section_mem(mod, size);
> > + up_write(&cttype->mod_lock);
> > + break;
> > + }
> > + }
> > + mutex_unlock(&codetag_lock);
> > +
> > + return ret;
> > +}
> > +
> > +void *codetag_alloc_module_section(struct module *mod, const char *name,
> > + unsigned long size, unsigned int prepend,
> > + unsigned long align)
> > +{
> > + const char *type_name = name + strlen(CODETAG_SECTION_PREFIX);
> > + struct codetag_type *cttype;
> > + void *ret = NULL;
> > +
> > + mutex_lock(&codetag_lock);
> > + list_for_each_entry(cttype, &codetag_types, link) {
> > + if (strcmp(type_name, cttype->desc.section) == 0) {
> > + if (WARN_ON(!cttype->desc.alloc_section_mem))
> > + break;
> > +
> > + down_write(&cttype->mod_lock);
> > + ret = cttype->desc.alloc_section_mem(mod, size, prepend, align);
> > + up_write(&cttype->mod_lock);
> > + break;
> > + }
> > + }
> > + mutex_unlock(&codetag_lock);
> > +
> > + return ret;
> > +}
> > +
> > +void codetag_free_module_sections(struct module *mod)
> > +{
> > + struct codetag_type *cttype;
> > +
> > + mutex_lock(&codetag_lock);
> > + list_for_each_entry(cttype, &codetag_types, link) {
> > + if (!cttype->desc.free_section_mem)
> > + continue;
> > +
> > + down_write(&cttype->mod_lock);
> > + cttype->desc.free_section_mem(mod, true);
> > + up_write(&cttype->mod_lock);
> > + }
> > + mutex_unlock(&codetag_lock);
> > +}
> > +
> > +void codetag_module_replaced(struct module *mod, struct module *new_mod)
> > +{
> > + struct codetag_type *cttype;
> > +
> > + mutex_lock(&codetag_lock);
> > + list_for_each_entry(cttype, &codetag_types, link) {
> > + if (!cttype->desc.module_replaced)
> > + continue;
> > +
> > + down_write(&cttype->mod_lock);
> > + cttype->desc.module_replaced(mod, new_mod);
> > + up_write(&cttype->mod_lock);
> > + }
> > + mutex_unlock(&codetag_lock);
> > +}
> > +
> > static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
> > {
> > struct codetag_range range;
> > @@ -212,13 +301,12 @@ void codetag_load_module(struct module *mod)
> > mutex_unlock(&codetag_lock);
> > }
> >
> > -bool codetag_unload_module(struct module *mod)
> > +void codetag_unload_module(struct module *mod)
> > {
> > struct codetag_type *cttype;
> > - bool unload_ok = true;
> >
> > if (!mod)
> > - return true;
> > + return;
> >
> > mutex_lock(&codetag_lock);
> > list_for_each_entry(cttype, &codetag_types, link) {
> > @@ -235,18 +323,17 @@ bool codetag_unload_module(struct module *mod)
> > }
> > if (found) {
> > if (cttype->desc.module_unload)
> > - if (!cttype->desc.module_unload(cttype, cmod))
> > - unload_ok = false;
> > + cttype->desc.module_unload(cttype, cmod);
> >
> > cttype->count -= range_size(cttype, &cmod->range);
> > idr_remove(&cttype->mod_idr, mod_id);
> > kfree(cmod);
> > }
> > up_write(&cttype->mod_lock);
> > + if (found && cttype->desc.free_section_mem)
> > + cttype->desc.free_section_mem(mod, false);
> > }
> > mutex_unlock(&codetag_lock);
> > -
> > - return unload_ok;
> > }
> >
> > #else /* CONFIG_MODULES */
> > diff --git a/scripts/module.lds.S b/scripts/module.lds.S
> > index 3f43edef813c..711c6e029936 100644
> > --- a/scripts/module.lds.S
> > +++ b/scripts/module.lds.S
> > @@ -50,7 +50,7 @@ SECTIONS {
> > .data : {
> > *(.data .data.[0-9a-zA-Z_]*)
> > *(.data..L*)
> > - CODETAG_SECTIONS()
> > + MOD_CODETAG_SECTIONS()
> > }
> >
> > .rodata : {
> > @@ -59,9 +59,10 @@ SECTIONS {
> > }
> > #else
> > .data : {
> > - CODETAG_SECTIONS()
> > + MOD_CODETAG_SECTIONS()
> > }
> > #endif
> > + MOD_SEPARATE_CODETAG_SECTIONS()
> > }
> >
> > /* bring in arch-specific sections */
> > --
> > 2.46.0.184.g6999bdac58-goog
> >
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-08-20 18:35 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-19 15:15 [PATCH 0/5] page allocation tag compression Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 1/5] alloc_tag: load module tags into separate continuous memory Suren Baghdasaryan
2024-08-20 7:13 ` Mike Rapoport
2024-08-20 7:26 ` Suren Baghdasaryan
2024-08-20 11:10 ` Kent Overstreet
2024-08-20 15:31 ` Liam R. Howlett
2024-08-20 18:35 ` Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 2/5] alloc_tag: eliminate alloc_tag_ref_set Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 3/5] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 4/5] alloc_tag: make page allocation tag reference size configurable Suren Baghdasaryan
2024-08-19 15:15 ` [PATCH 5/5] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
2024-08-19 19:34 ` Matthew Wilcox
2024-08-19 20:39 ` Suren Baghdasaryan
2024-08-19 20:40 ` Matthew Wilcox
2024-08-20 1:46 ` Andrew Morton
2024-08-20 2:21 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).