* [PATCH v3 0/5] page allocation tag compression
@ 2024-10-14 20:36 Suren Baghdasaryan
2024-10-14 20:36 ` [PATCH v3 1/5] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
` (5 more replies)
0 siblings, 6 replies; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-14 20:36 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
This patchset implements several improvements:
1. Gracefully handles module unloading while there are used allocations
allocated from that module;
2. Provides an option to store page allocation tag references in the
page flags, removing dependency on page extensions and eliminating the
memory overhead from storing page allocation references (~0.2% of total
system memory). This also improves page allocation performance when
CONFIG_MEM_ALLOC_PROFILING is enabled by eliminating page extension
lookup. Page allocation performance overhead is reduced from 41% to 5.5%.
Patch #1 introduces mas_for_each_rev() helper function.
Patch #2 copies module tags into virtually contiguous memory which
serves two purposes:
- Lets us deal with the situation when module is unloaded while there
are still live allocations from that module. Since we are using a copy
version of the tags we can safely unload the module. Space and gaps in
this contiguous memory are managed using a maple tree.
- Enables simple indexing of the tags in the later patches.
Patch #3 changes the way we allocate virtually contiguous memory for
module tags to reserve only vitrual area and populate physical pages
only as needed at module load time.
Patch #4 abstracts page allocation tag reference to simplify later
changes.
Patch #5 adds a config to store page allocation tag references inside
page flags if they fit. If the number of available page flag bits is
insufficient to address all kernel allocations, profiling falls back
to using page extensions with an appropriate warning.
Patchset applies to mm-unstable.
Changes since v2 [1]:
- removed extra configs, leaving only CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
yes/no option, per Andrew Morton
- populate physical memory for module tags only as needed,
per Pasha Tatashin
[1] https://lore.kernel.org/all/20240902044128.664075-1-surenb@google.com/
Suren Baghdasaryan (5):
maple_tree: add mas_for_each_rev() helper
alloc_tag: load module tags into separate contiguous memory
alloc_tag: populate memory for module tags as needed
alloc_tag: introduce pgalloc_tag_ref to abstract page tag references
alloc_tag: config to store page allocation tag refs in page flags
include/asm-generic/codetag.lds.h | 19 ++
include/linux/alloc_tag.h | 21 +-
include/linux/codetag.h | 40 ++-
include/linux/execmem.h | 11 +
include/linux/maple_tree.h | 14 ++
include/linux/mm.h | 25 +-
include/linux/page-flags-layout.h | 7 +
include/linux/pgalloc_tag.h | 278 ++++++++++++++++++---
include/linux/vmalloc.h | 9 +
kernel/module/main.c | 74 ++++--
lib/Kconfig.debug | 19 ++
lib/alloc_tag.c | 394 ++++++++++++++++++++++++++++--
lib/codetag.c | 104 +++++++-
mm/execmem.c | 16 ++
mm/mm_init.c | 5 +-
mm/vmalloc.c | 4 +-
scripts/module.lds.S | 5 +-
17 files changed, 931 insertions(+), 114 deletions(-)
base-commit: 828d7267c42c2aab3877c08b4bb00b1e56769557
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH v3 1/5] maple_tree: add mas_for_each_rev() helper
2024-10-14 20:36 [PATCH v3 0/5] page allocation tag compression Suren Baghdasaryan
@ 2024-10-14 20:36 ` Suren Baghdasaryan
2024-10-16 1:48 ` Liam R. Howlett
2024-10-14 20:36 ` [PATCH v3 2/5] alloc_tag: load module tags into separate contiguous memory Suren Baghdasaryan
` (4 subsequent siblings)
5 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-14 20:36 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb, Liam R. Howlett
Add mas_for_each_rev() function to iterate maple tree nodes in reverse
order.
Suggested-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/maple_tree.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index c2c11004085e..e7e2caa1a95a 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -592,6 +592,20 @@ static __always_inline void mas_reset(struct ma_state *mas)
#define mas_for_each(__mas, __entry, __max) \
while (((__entry) = mas_find((__mas), (__max))) != NULL)
+/**
+ * mas_for_each_rev() - Iterate over a range of the maple tree in reverse order.
+ * @__mas: Maple Tree operation state (maple_state)
+ * @__entry: Entry retrieved from the tree
+ * @__min: minimum index to retrieve from the tree
+ *
+ * When returned, mas->index and mas->last will hold the entire range for the
+ * entry.
+ *
+ * Note: may return the zero entry.
+ */
+#define mas_for_each_rev(__mas, __entry, __min) \
+ while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
+
#ifdef CONFIG_DEBUG_MAPLE_TREE
enum mt_dump_format {
mt_dump_dec,
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH v3 2/5] alloc_tag: load module tags into separate contiguous memory
2024-10-14 20:36 [PATCH v3 0/5] page allocation tag compression Suren Baghdasaryan
2024-10-14 20:36 ` [PATCH v3 1/5] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
@ 2024-10-14 20:36 ` Suren Baghdasaryan
2024-10-14 23:51 ` Andrew Morton
2024-10-14 20:36 ` [PATCH v3 3/5] alloc_tag: populate memory for module tags as needed Suren Baghdasaryan
` (3 subsequent siblings)
5 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-14 20:36 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
When a module gets unloaded there is a possibility that some of the
allocations it made are still used and therefore the allocation tags
corresponding to these allocations are still referenced. As such, the
memory for these tags can't be freed. This is currently handled as an
abnormal situation and module's data section is not being unloaded.
To handle this situation without keeping module's data in memory,
allow codetags with longer lifespan than the module to be loaded into
their own separate memory. The in-use memory areas and gaps after
module unloading in this separate memory are tracked using maple trees.
Allocation tags arrange their separate memory so that it is virtually
contiguous and that will allow simple allocation tag indexing later on
in this patchset. The size of this virtually contiguous memory is set
to store up to 100000 allocation tags.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/asm-generic/codetag.lds.h | 19 +++
include/linux/alloc_tag.h | 13 +-
include/linux/codetag.h | 37 ++++-
kernel/module/main.c | 74 +++++----
lib/alloc_tag.c | 251 +++++++++++++++++++++++++++---
lib/codetag.c | 100 +++++++++++-
scripts/module.lds.S | 5 +-
7 files changed, 438 insertions(+), 61 deletions(-)
diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
index 64f536b80380..372c320c5043 100644
--- a/include/asm-generic/codetag.lds.h
+++ b/include/asm-generic/codetag.lds.h
@@ -11,4 +11,23 @@
#define CODETAG_SECTIONS() \
SECTION_WITH_BOUNDARIES(alloc_tags)
+/*
+ * Module codetags which aren't used after module unload, therefore have the
+ * same lifespan as the module and can be safely unloaded with the module.
+ */
+#define MOD_CODETAG_SECTIONS()
+
+#define MOD_SEPARATE_CODETAG_SECTION(_name) \
+ .codetag.##_name : { \
+ SECTION_WITH_BOUNDARIES(_name) \
+ }
+
+/*
+ * For codetags which might be used after module unload, therefore might stay
+ * longer in memory. Each such codetag type has its own section so that we can
+ * unload them individually once unused.
+ */
+#define MOD_SEPARATE_CODETAG_SECTIONS() \
+ MOD_SEPARATE_CODETAG_SECTION(alloc_tags)
+
#endif /* __ASM_GENERIC_CODETAG_LDS_H */
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 1f0a9ff23a2c..7431757999c5 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,6 +30,13 @@ struct alloc_tag {
struct alloc_tag_counters __percpu *counters;
} __aligned(8);
+struct alloc_tag_module_section {
+ unsigned long start_addr;
+ unsigned long end_addr;
+ /* used size */
+ unsigned long size;
+};
+
#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
#define CODETAG_EMPTY ((void *)1)
@@ -54,6 +61,8 @@ static inline void set_codetag_empty(union codetag_ref *ref) {}
#ifdef CONFIG_MEM_ALLOC_PROFILING
+#define ALLOC_TAG_SECTION_NAME "alloc_tags"
+
struct codetag_bytes {
struct codetag *ct;
s64 bytes;
@@ -76,7 +85,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
#define DEFINE_ALLOC_TAG(_alloc_tag) \
static struct alloc_tag _alloc_tag __used __aligned(8) \
- __section("alloc_tags") = { \
+ __section(ALLOC_TAG_SECTION_NAME) = { \
.ct = CODE_TAG_INIT, \
.counters = &_shared_alloc_tag };
@@ -85,7 +94,7 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
#define DEFINE_ALLOC_TAG(_alloc_tag) \
static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr); \
static struct alloc_tag _alloc_tag __used __aligned(8) \
- __section("alloc_tags") = { \
+ __section(ALLOC_TAG_SECTION_NAME) = { \
.ct = CODE_TAG_INIT, \
.counters = &_alloc_tag_cntr };
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index c2a579ccd455..fb4e7adfa746 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -35,8 +35,15 @@ struct codetag_type_desc {
size_t tag_size;
void (*module_load)(struct codetag_type *cttype,
struct codetag_module *cmod);
- bool (*module_unload)(struct codetag_type *cttype,
+ void (*module_unload)(struct codetag_type *cttype,
struct codetag_module *cmod);
+#ifdef CONFIG_MODULES
+ void (*module_replaced)(struct module *mod, struct module *new_mod);
+ bool (*needs_section_mem)(struct module *mod, unsigned long size);
+ void *(*alloc_section_mem)(struct module *mod, unsigned long size,
+ unsigned int prepend, unsigned long align);
+ void (*free_section_mem)(struct module *mod, bool unused);
+#endif
};
struct codetag_iterator {
@@ -71,11 +78,31 @@ struct codetag_type *
codetag_register_type(const struct codetag_type_desc *desc);
#if defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES)
+
+bool codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size);
+void *codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align);
+void codetag_free_module_sections(struct module *mod);
+void codetag_module_replaced(struct module *mod, struct module *new_mod);
void codetag_load_module(struct module *mod);
-bool codetag_unload_module(struct module *mod);
-#else
+void codetag_unload_module(struct module *mod);
+
+#else /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
+
+static inline bool
+codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size) { return false; }
+static inline void *
+codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align) { return NULL; }
+static inline void codetag_free_module_sections(struct module *mod) {}
+static inline void codetag_module_replaced(struct module *mod, struct module *new_mod) {}
static inline void codetag_load_module(struct module *mod) {}
-static inline bool codetag_unload_module(struct module *mod) { return true; }
-#endif
+static inline void codetag_unload_module(struct module *mod) {}
+
+#endif /* defined(CONFIG_CODE_TAGGING) && defined(CONFIG_MODULES) */
#endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index ef54733bd7d2..5569dd052511 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1254,22 +1254,17 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
return 0;
}
-static void module_memory_free(struct module *mod, enum mod_mem_type type,
- bool unload_codetags)
+static void module_memory_free(struct module *mod, enum mod_mem_type type)
{
struct module_memory *mem = &mod->mem[type];
- void *ptr = mem->base;
if (mem->is_rox)
vfree(mem->rw_copy);
- if (!unload_codetags && mod_mem_type_is_core_data(type))
- return;
-
- execmem_free(ptr);
+ execmem_free(mem->base);
}
-static void free_mod_mem(struct module *mod, bool unload_codetags)
+static void free_mod_mem(struct module *mod)
{
for_each_mod_mem_type(type) {
struct module_memory *mod_mem = &mod->mem[type];
@@ -1280,25 +1275,20 @@ static void free_mod_mem(struct module *mod, bool unload_codetags)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
- module_memory_free(mod, type, unload_codetags);
+ module_memory_free(mod, type);
}
/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
- module_memory_free(mod, MOD_DATA, unload_codetags);
+ module_memory_free(mod, MOD_DATA);
}
/* Free a module, remove from lists, etc. */
static void free_module(struct module *mod)
{
- bool unload_codetags;
-
trace_module_free(mod);
- unload_codetags = codetag_unload_module(mod);
- if (!unload_codetags)
- pr_warn("%s: memory allocation(s) from the module still alive, cannot unload cleanly\n",
- mod->name);
+ codetag_unload_module(mod);
mod_sysfs_teardown(mod);
@@ -1341,7 +1331,7 @@ static void free_module(struct module *mod)
kfree(mod->args);
percpu_modfree(mod);
- free_mod_mem(mod, unload_codetags);
+ free_mod_mem(mod);
}
void *__symbol_get(const char *symbol)
@@ -1606,6 +1596,14 @@ static void __layout_sections(struct module *mod, struct load_info *info, bool i
if (WARN_ON_ONCE(type == MOD_INVALID))
continue;
+ /*
+ * Do not allocate codetag memory as we load it into
+ * preallocated contiguous memory. s->sh_entsize will
+ * not be used for this section, so leave it as is.
+ */
+ if (codetag_needs_module_section(mod, sname, s->sh_size))
+ continue;
+
s->sh_entsize = module_get_offset_and_type(mod, type, s, i);
pr_debug("\t%s\n", sname);
}
@@ -2280,6 +2278,7 @@ static int move_module(struct module *mod, struct load_info *info)
int i;
enum mod_mem_type t = 0;
int ret = -ENOMEM;
+ bool codetag_section_found = false;
for_each_mod_mem_type(type) {
if (!mod->mem[type].size) {
@@ -2291,7 +2290,7 @@ static int move_module(struct module *mod, struct load_info *info)
ret = module_memory_alloc(mod, type);
if (ret) {
t = type;
- goto out_enomem;
+ goto out_err;
}
}
@@ -2300,15 +2299,33 @@ static int move_module(struct module *mod, struct load_info *info)
for (i = 0; i < info->hdr->e_shnum; i++) {
void *dest;
Elf_Shdr *shdr = &info->sechdrs[i];
- enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
- unsigned long offset = shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK;
+ const char *sname;
unsigned long addr;
if (!(shdr->sh_flags & SHF_ALLOC))
continue;
- addr = (unsigned long)mod->mem[type].base + offset;
- dest = mod->mem[type].rw_copy + offset;
+ sname = info->secstrings + shdr->sh_name;
+ /*
+ * Load codetag sections separately as they might still be used
+ * after module unload.
+ */
+ if (codetag_needs_module_section(mod, sname, shdr->sh_size)) {
+ dest = codetag_alloc_module_section(mod, sname, shdr->sh_size,
+ arch_mod_section_prepend(mod, i), shdr->sh_addralign);
+ if (IS_ERR(dest)) {
+ ret = PTR_ERR(dest);
+ goto out_err;
+ }
+ addr = (unsigned long)dest;
+ codetag_section_found = true;
+ } else {
+ enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT;
+ unsigned long offset = shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK;
+
+ addr = (unsigned long)mod->mem[type].base + offset;
+ dest = mod->mem[type].rw_copy + offset;
+ }
if (shdr->sh_type != SHT_NOBITS) {
/*
@@ -2320,7 +2337,7 @@ static int move_module(struct module *mod, struct load_info *info)
if (i == info->index.mod &&
(WARN_ON_ONCE(shdr->sh_size != sizeof(struct module)))) {
ret = -ENOEXEC;
- goto out_enomem;
+ goto out_err;
}
memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
}
@@ -2336,9 +2353,12 @@ static int move_module(struct module *mod, struct load_info *info)
}
return 0;
-out_enomem:
+out_err:
for (t--; t >= 0; t--)
- module_memory_free(mod, t, true);
+ module_memory_free(mod, t);
+ if (codetag_section_found)
+ codetag_free_module_sections(mod);
+
return ret;
}
@@ -2459,6 +2479,8 @@ static struct module *layout_and_allocate(struct load_info *info, int flags)
/* Module has been copied to its final place now: return it. */
mod = (void *)info->sechdrs[info->index.mod].sh_addr;
kmemleak_load_module(mod, info);
+ codetag_module_replaced(info->mod, mod);
+
return mod;
}
@@ -2468,7 +2490,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
percpu_modfree(mod);
module_arch_freeing_init(mod);
- free_mod_mem(mod, true);
+ free_mod_mem(mod);
}
int __weak module_finalize(const Elf_Ehdr *hdr,
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 81e5f9a70f22..b10e7f17eeda 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/alloc_tag.h>
+#include <linux/execmem.h>
#include <linux/fs.h>
#include <linux/gfp.h>
#include <linux/module.h>
@@ -149,31 +150,231 @@ static void __init procfs_init(void)
proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
}
-static bool alloc_tag_module_unload(struct codetag_type *cttype,
- struct codetag_module *cmod)
+#ifdef CONFIG_MODULES
+
+static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
+/* A dummy object used to indicate an unloaded module */
+static struct module unloaded_mod;
+/* A dummy object used to indicate a module prepended area */
+static struct module prepend_mod;
+
+static struct alloc_tag_module_section module_tags;
+
+static bool needs_section_mem(struct module *mod, unsigned long size)
+{
+ return size >= sizeof(struct alloc_tag);
+}
+
+/* Called under RCU read lock */
+static void clean_unused_module_areas_locked(void)
+{
+ MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
+ struct module *val;
+
+ mas_for_each(&mas, val, module_tags.size) {
+ if (val == &unloaded_mod) {
+ struct alloc_tag *tag;
+ struct alloc_tag *last;
+ bool unused = true;
+
+ tag = (struct alloc_tag *)(module_tags.start_addr + mas.index);
+ last = (struct alloc_tag *)(module_tags.start_addr + mas.last);
+ while (tag <= last) {
+ struct alloc_tag_counters counter;
+
+ counter = alloc_tag_read(tag);
+ if (counter.bytes) {
+ unused = false;
+ break;
+ }
+ tag++;
+ }
+ if (unused)
+ mas_erase(&mas);
+ }
+ }
+}
+
+static void *reserve_module_tags(struct module *mod, unsigned long size,
+ unsigned int prepend, unsigned long align)
+{
+ unsigned long section_size = module_tags.end_addr - module_tags.start_addr;
+ MA_STATE(mas, &mod_area_mt, 0, section_size - 1);
+ bool cleanup_done = false;
+ unsigned long offset;
+ void *ret;
+
+ /* If no tags return NULL */
+ if (size < sizeof(struct alloc_tag))
+ return NULL;
+
+ /*
+ * align is always power of 2, so we can use IS_ALIGNED and ALIGN.
+ * align 0 or 1 means no alignment, to simplify set to 1.
+ */
+ if (!align)
+ align = 1;
+
+ mas_lock(&mas);
+repeat:
+ /* Try finding exact size and hope the start is aligned */
+ if (mas_empty_area(&mas, 0, section_size - 1, prepend + size))
+ goto cleanup;
+
+ if (IS_ALIGNED(mas.index + prepend, align))
+ goto found;
+
+ /* Try finding larger area to align later */
+ mas_reset(&mas);
+ if (!mas_empty_area(&mas, 0, section_size - 1,
+ size + prepend + align - 1))
+ goto found;
+
+cleanup:
+ /* No free area, try cleanup stale data and repeat the search once */
+ if (!cleanup_done) {
+ clean_unused_module_areas_locked();
+ cleanup_done = true;
+ mas_reset(&mas);
+ goto repeat;
+ } else {
+ ret = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+found:
+ offset = mas.index;
+ offset += prepend;
+ offset = ALIGN(offset, align);
+
+ if (offset != mas.index) {
+ unsigned long pad_start = mas.index;
+
+ mas.last = offset - 1;
+ mas_store(&mas, &prepend_mod);
+ if (mas_is_err(&mas)) {
+ ret = ERR_PTR(xa_err(mas.node));
+ goto out;
+ }
+ mas.index = offset;
+ mas.last = offset + size - 1;
+ mas_store(&mas, mod);
+ if (mas_is_err(&mas)) {
+ ret = ERR_PTR(xa_err(mas.node));
+ mas.index = pad_start;
+ mas_erase(&mas);
+ goto out;
+ }
+
+ } else {
+ mas.last = offset + size - 1;
+ mas_store(&mas, mod);
+ if (mas_is_err(&mas)) {
+ ret = ERR_PTR(xa_err(mas.node));
+ goto out;
+ }
+ }
+
+ if (module_tags.size < offset + size)
+ module_tags.size = offset + size;
+
+ ret = (struct alloc_tag *)(module_tags.start_addr + offset);
+out:
+ mas_unlock(&mas);
+
+ return ret;
+}
+
+static void release_module_tags(struct module *mod, bool unused)
{
- struct codetag_iterator iter = codetag_get_ct_iter(cttype);
- struct alloc_tag_counters counter;
- bool module_unused = true;
+ MA_STATE(mas, &mod_area_mt, module_tags.size, module_tags.size);
+ struct alloc_tag *last;
struct alloc_tag *tag;
- struct codetag *ct;
+ struct module *val;
- for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
- if (iter.cmod != cmod)
- continue;
+ if (unused)
+ return;
+
+ mas_lock(&mas);
+ mas_for_each_rev(&mas, val, 0)
+ if (val == mod)
+ break;
+
+ if (!val) /* module not found */
+ goto out;
+
+ tag = (struct alloc_tag *)(module_tags.start_addr + mas.index);
+ last = (struct alloc_tag *)(module_tags.start_addr + mas.last);
+
+ unused = true;
+ while (tag <= last) {
+ struct alloc_tag_counters counter;
- tag = ct_to_alloc_tag(ct);
counter = alloc_tag_read(tag);
+ if (counter.bytes) {
+ struct codetag *ct = &tag->ct;
+
+ pr_info("%s:%u module %s func:%s has %llu allocated at module unload\n",
+ ct->filename, ct->lineno, ct->modname,
+ ct->function, counter.bytes);
+ unused = false;
+ break;
+ }
+ tag++;
+ }
- if (WARN(counter.bytes,
- "%s:%u module %s func:%s has %llu allocated at module unload",
- ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
- module_unused = false;
+ mas_store(&mas, unused ? NULL : &unloaded_mod);
+ val = mas_prev_range(&mas, 0);
+ if (val == &prepend_mod)
+ mas_store(&mas, NULL);
+out:
+ mas_unlock(&mas);
+}
+
+static void replace_module(struct module *mod, struct module *new_mod)
+{
+ MA_STATE(mas, &mod_area_mt, 0, module_tags.size);
+ struct module *val;
+
+ mas_lock(&mas);
+ mas_for_each(&mas, val, module_tags.size) {
+ if (val != mod)
+ continue;
+
+ mas_store_gfp(&mas, new_mod, GFP_KERNEL);
+ break;
}
+ mas_unlock(&mas);
+}
- return module_unused;
+#define MODULE_ALLOC_TAG_VMAP_SIZE (100000 * sizeof(struct alloc_tag))
+
+static int __init alloc_mod_tags_mem(void)
+{
+ /* Allocate space to copy allocation tags */
+ module_tags.start_addr = (unsigned long)execmem_alloc(EXECMEM_MODULE_DATA,
+ MODULE_ALLOC_TAG_VMAP_SIZE);
+ if (!module_tags.start_addr)
+ return -ENOMEM;
+
+ module_tags.end_addr = module_tags.start_addr + MODULE_ALLOC_TAG_VMAP_SIZE;
+
+ return 0;
+}
+
+static void __init free_mod_tags_mem(void)
+{
+ execmem_free((void *)module_tags.start_addr);
+ module_tags.start_addr = 0;
}
+#else /* CONFIG_MODULES */
+
+static inline int alloc_mod_tags_mem(void) { return 0; }
+static inline void free_mod_tags_mem(void) {}
+
+#endif /* CONFIG_MODULES */
+
#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
static bool mem_profiling_support __meminitdata = true;
#else
@@ -255,14 +456,26 @@ static inline void sysctl_init(void) {}
static int __init alloc_tag_init(void)
{
const struct codetag_type_desc desc = {
- .section = "alloc_tags",
- .tag_size = sizeof(struct alloc_tag),
- .module_unload = alloc_tag_module_unload,
+ .section = ALLOC_TAG_SECTION_NAME,
+ .tag_size = sizeof(struct alloc_tag),
+#ifdef CONFIG_MODULES
+ .needs_section_mem = needs_section_mem,
+ .alloc_section_mem = reserve_module_tags,
+ .free_section_mem = release_module_tags,
+ .module_replaced = replace_module,
+#endif
};
+ int res;
+
+ res = alloc_mod_tags_mem();
+ if (res)
+ return res;
alloc_tag_cttype = codetag_register_type(&desc);
- if (IS_ERR(alloc_tag_cttype))
+ if (IS_ERR(alloc_tag_cttype)) {
+ free_mod_tags_mem();
return PTR_ERR(alloc_tag_cttype);
+ }
sysctl_init();
procfs_init();
diff --git a/lib/codetag.c b/lib/codetag.c
index d1fbbb7c2ec3..3f3a4f4ca72d 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -207,6 +207,94 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
}
#ifdef CONFIG_MODULES
+#define CODETAG_SECTION_PREFIX ".codetag."
+
+/* Some codetag types need a separate module section */
+bool codetag_needs_module_section(struct module *mod, const char *name,
+ unsigned long size)
+{
+ const char *type_name;
+ struct codetag_type *cttype;
+ bool ret = false;
+
+ if (strncmp(name, CODETAG_SECTION_PREFIX, strlen(CODETAG_SECTION_PREFIX)))
+ return false;
+
+ type_name = name + strlen(CODETAG_SECTION_PREFIX);
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (strcmp(type_name, cttype->desc.section) == 0) {
+ if (!cttype->desc.needs_section_mem)
+ break;
+
+ down_write(&cttype->mod_lock);
+ ret = cttype->desc.needs_section_mem(mod, size);
+ up_write(&cttype->mod_lock);
+ break;
+ }
+ }
+ mutex_unlock(&codetag_lock);
+
+ return ret;
+}
+
+void *codetag_alloc_module_section(struct module *mod, const char *name,
+ unsigned long size, unsigned int prepend,
+ unsigned long align)
+{
+ const char *type_name = name + strlen(CODETAG_SECTION_PREFIX);
+ struct codetag_type *cttype;
+ void *ret = NULL;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (strcmp(type_name, cttype->desc.section) == 0) {
+ if (WARN_ON(!cttype->desc.alloc_section_mem))
+ break;
+
+ down_write(&cttype->mod_lock);
+ ret = cttype->desc.alloc_section_mem(mod, size, prepend, align);
+ up_write(&cttype->mod_lock);
+ break;
+ }
+ }
+ mutex_unlock(&codetag_lock);
+
+ return ret;
+}
+
+void codetag_free_module_sections(struct module *mod)
+{
+ struct codetag_type *cttype;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (!cttype->desc.free_section_mem)
+ continue;
+
+ down_write(&cttype->mod_lock);
+ cttype->desc.free_section_mem(mod, true);
+ up_write(&cttype->mod_lock);
+ }
+ mutex_unlock(&codetag_lock);
+}
+
+void codetag_module_replaced(struct module *mod, struct module *new_mod)
+{
+ struct codetag_type *cttype;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ if (!cttype->desc.module_replaced)
+ continue;
+
+ down_write(&cttype->mod_lock);
+ cttype->desc.module_replaced(mod, new_mod);
+ up_write(&cttype->mod_lock);
+ }
+ mutex_unlock(&codetag_lock);
+}
+
void codetag_load_module(struct module *mod)
{
struct codetag_type *cttype;
@@ -220,13 +308,12 @@ void codetag_load_module(struct module *mod)
mutex_unlock(&codetag_lock);
}
-bool codetag_unload_module(struct module *mod)
+void codetag_unload_module(struct module *mod)
{
struct codetag_type *cttype;
- bool unload_ok = true;
if (!mod)
- return true;
+ return;
/* await any module's kfree_rcu() operations to complete */
kvfree_rcu_barrier();
@@ -246,18 +333,17 @@ bool codetag_unload_module(struct module *mod)
}
if (found) {
if (cttype->desc.module_unload)
- if (!cttype->desc.module_unload(cttype, cmod))
- unload_ok = false;
+ cttype->desc.module_unload(cttype, cmod);
cttype->count -= range_size(cttype, &cmod->range);
idr_remove(&cttype->mod_idr, mod_id);
kfree(cmod);
}
up_write(&cttype->mod_lock);
+ if (found && cttype->desc.free_section_mem)
+ cttype->desc.free_section_mem(mod, false);
}
mutex_unlock(&codetag_lock);
-
- return unload_ok;
}
#endif /* CONFIG_MODULES */
diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index 3f43edef813c..711c6e029936 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -50,7 +50,7 @@ SECTIONS {
.data : {
*(.data .data.[0-9a-zA-Z_]*)
*(.data..L*)
- CODETAG_SECTIONS()
+ MOD_CODETAG_SECTIONS()
}
.rodata : {
@@ -59,9 +59,10 @@ SECTIONS {
}
#else
.data : {
- CODETAG_SECTIONS()
+ MOD_CODETAG_SECTIONS()
}
#endif
+ MOD_SEPARATE_CODETAG_SECTIONS()
}
/* bring in arch-specific sections */
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH v3 3/5] alloc_tag: populate memory for module tags as needed
2024-10-14 20:36 [PATCH v3 0/5] page allocation tag compression Suren Baghdasaryan
2024-10-14 20:36 ` [PATCH v3 1/5] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
2024-10-14 20:36 ` [PATCH v3 2/5] alloc_tag: load module tags into separate contiguous memory Suren Baghdasaryan
@ 2024-10-14 20:36 ` Suren Baghdasaryan
2024-10-15 12:15 ` Mike Rapoport
2024-10-14 20:36 ` [PATCH v3 4/5] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references Suren Baghdasaryan
` (2 subsequent siblings)
5 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-14 20:36 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
The memory reserved for module tags does not need to be backed by
physical pages until there are tags to store there. Change the way
we reserve this memory to allocate only virtual area for the tags
and populate it with physical pages as needed when we load a module.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/execmem.h | 11 ++++++
include/linux/vmalloc.h | 9 +++++
lib/alloc_tag.c | 84 +++++++++++++++++++++++++++++++++--------
mm/execmem.c | 16 ++++++++
mm/vmalloc.c | 4 +-
5 files changed, 106 insertions(+), 18 deletions(-)
diff --git a/include/linux/execmem.h b/include/linux/execmem.h
index 7436aa547818..a159a073270a 100644
--- a/include/linux/execmem.h
+++ b/include/linux/execmem.h
@@ -127,6 +127,17 @@ void *execmem_alloc(enum execmem_type type, size_t size);
*/
void execmem_free(void *ptr);
+/**
+ * execmem_vmap - create virtual mapping for executable memory
+ * @type: type of the allocation
+ * @size: size of the virtual mapping in bytes
+ *
+ * Maps virtually contiguous area that can be populated with executable code.
+ *
+ * Return: the area descriptor on success or %NULL on failure.
+ */
+struct vm_struct *execmem_vmap(enum execmem_type type, size_t size);
+
/**
* execmem_update_copy - copy an update to executable memory
* @dst: destination address to update
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 9a012cd4fad2..9d64cc6f24d1 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -202,6 +202,9 @@ extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
unsigned long pgoff);
+int vmap_pages_range(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages, unsigned int page_shift);
+
/*
* Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
* and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
@@ -239,6 +242,12 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size,
unsigned long flags,
unsigned long start, unsigned long end,
const void *caller);
+struct vm_struct *__get_vm_area_node(unsigned long size,
+ unsigned long align, unsigned long shift,
+ unsigned long flags, unsigned long start,
+ unsigned long end, int node, gfp_t gfp_mask,
+ const void *caller);
+
void free_vm_area(struct vm_struct *area);
extern struct vm_struct *remove_vm_area(const void *addr);
extern struct vm_struct *find_vm_area(const void *addr);
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index b10e7f17eeda..648f32d52b8d 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -8,6 +8,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_buf.h>
#include <linux/seq_file.h>
+#include <linux/vmalloc.h>
static struct codetag_type *alloc_tag_cttype;
@@ -153,6 +154,7 @@ static void __init procfs_init(void)
#ifdef CONFIG_MODULES
static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
+static struct vm_struct *vm_module_tags;
/* A dummy object used to indicate an unloaded module */
static struct module unloaded_mod;
/* A dummy object used to indicate a module prepended area */
@@ -195,6 +197,25 @@ static void clean_unused_module_areas_locked(void)
}
}
+static int vm_module_tags_grow(unsigned long addr, unsigned long bytes)
+{
+ struct page **next_page = vm_module_tags->pages + vm_module_tags->nr_pages;
+ unsigned long more_pages = ALIGN(bytes, PAGE_SIZE) >> PAGE_SHIFT;
+ unsigned long nr;
+
+ nr = alloc_pages_bulk_array_node(GFP_KERNEL | __GFP_NOWARN,
+ NUMA_NO_NODE, more_pages, next_page);
+ if (nr != more_pages)
+ return -ENOMEM;
+
+ vm_module_tags->nr_pages += nr;
+ if (vmap_pages_range(addr, addr + (nr << PAGE_SHIFT),
+ PAGE_KERNEL, next_page, PAGE_SHIFT) < 0)
+ return -ENOMEM;
+
+ return 0;
+}
+
static void *reserve_module_tags(struct module *mod, unsigned long size,
unsigned int prepend, unsigned long align)
{
@@ -202,7 +223,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
MA_STATE(mas, &mod_area_mt, 0, section_size - 1);
bool cleanup_done = false;
unsigned long offset;
- void *ret;
+ void *ret = NULL;
/* If no tags return NULL */
if (size < sizeof(struct alloc_tag))
@@ -239,7 +260,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
goto repeat;
} else {
ret = ERR_PTR(-ENOMEM);
- goto out;
+ goto unlock;
}
found:
@@ -254,7 +275,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
mas_store(&mas, &prepend_mod);
if (mas_is_err(&mas)) {
ret = ERR_PTR(xa_err(mas.node));
- goto out;
+ goto unlock;
}
mas.index = offset;
mas.last = offset + size - 1;
@@ -263,7 +284,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
ret = ERR_PTR(xa_err(mas.node));
mas.index = pad_start;
mas_erase(&mas);
- goto out;
+ goto unlock;
}
} else {
@@ -271,18 +292,33 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
mas_store(&mas, mod);
if (mas_is_err(&mas)) {
ret = ERR_PTR(xa_err(mas.node));
- goto out;
+ goto unlock;
}
}
+unlock:
+ mas_unlock(&mas);
+ if (IS_ERR(ret))
+ return ret;
- if (module_tags.size < offset + size)
- module_tags.size = offset + size;
+ if (module_tags.size < offset + size) {
+ unsigned long phys_size = vm_module_tags->nr_pages << PAGE_SHIFT;
- ret = (struct alloc_tag *)(module_tags.start_addr + offset);
-out:
- mas_unlock(&mas);
+ module_tags.size = offset + size;
+ if (phys_size < module_tags.size) {
+ int grow_res;
+
+ grow_res = vm_module_tags_grow(module_tags.start_addr + phys_size,
+ module_tags.size - phys_size);
+ if (grow_res) {
+ static_branch_disable(&mem_alloc_profiling_key);
+ pr_warn("Failed to allocate tags memory for module %s. Memory profiling is disabled!\n",
+ mod->name);
+ return ERR_PTR(grow_res);
+ }
+ }
+ }
- return ret;
+ return (struct alloc_tag *)(module_tags.start_addr + offset);
}
static void release_module_tags(struct module *mod, bool unused)
@@ -351,12 +387,23 @@ static void replace_module(struct module *mod, struct module *new_mod)
static int __init alloc_mod_tags_mem(void)
{
- /* Allocate space to copy allocation tags */
- module_tags.start_addr = (unsigned long)execmem_alloc(EXECMEM_MODULE_DATA,
- MODULE_ALLOC_TAG_VMAP_SIZE);
- if (!module_tags.start_addr)
+ /* Map space to copy allocation tags */
+ vm_module_tags = execmem_vmap(EXECMEM_MODULE_DATA, MODULE_ALLOC_TAG_VMAP_SIZE);
+ if (!vm_module_tags) {
+ pr_err("Failed to map %lu bytes for module allocation tags\n",
+ MODULE_ALLOC_TAG_VMAP_SIZE);
+ module_tags.start_addr = 0;
return -ENOMEM;
+ }
+ vm_module_tags->pages = kmalloc_array(get_vm_area_size(vm_module_tags) >> PAGE_SHIFT,
+ sizeof(struct page *), GFP_KERNEL | __GFP_ZERO);
+ if (!vm_module_tags->pages) {
+ free_vm_area(vm_module_tags);
+ return -ENOMEM;
+ }
+
+ module_tags.start_addr = (unsigned long)vm_module_tags->addr;
module_tags.end_addr = module_tags.start_addr + MODULE_ALLOC_TAG_VMAP_SIZE;
return 0;
@@ -364,8 +411,13 @@ static int __init alloc_mod_tags_mem(void)
static void __init free_mod_tags_mem(void)
{
- execmem_free((void *)module_tags.start_addr);
+ int i;
+
module_tags.start_addr = 0;
+ for (i = 0; i < vm_module_tags->nr_pages; i++)
+ __free_page(vm_module_tags->pages[i]);
+ kfree(vm_module_tags->pages);
+ free_vm_area(vm_module_tags);
}
#else /* CONFIG_MODULES */
diff --git a/mm/execmem.c b/mm/execmem.c
index 97706d8ed720..eb346f4eaaff 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -366,6 +366,22 @@ void execmem_free(void *ptr)
vfree(ptr);
}
+struct vm_struct *execmem_vmap(enum execmem_type type, size_t size)
+{
+ struct execmem_range *range = &execmem_info->ranges[type];
+ struct vm_struct *area;
+
+ area = __get_vm_area_node(size, range->alignment, PAGE_SHIFT, VM_ALLOC,
+ range->start, range->end, NUMA_NO_NODE,
+ GFP_KERNEL, __builtin_return_address(0));
+ if (!area && range->fallback_start)
+ area = __get_vm_area_node(size, range->alignment, PAGE_SHIFT, VM_ALLOC,
+ range->fallback_start, range->fallback_end,
+ NUMA_NO_NODE, GFP_KERNEL, __builtin_return_address(0));
+
+ return area;
+}
+
void *execmem_update_copy(void *dst, const void *src, size_t size)
{
return text_poke_copy(dst, src, size);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 74c0a5eae210..7ed39d104201 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -653,7 +653,7 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
* RETURNS:
* 0 on success, -errno on failure.
*/
-static int vmap_pages_range(unsigned long addr, unsigned long end,
+int vmap_pages_range(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
int err;
@@ -3106,7 +3106,7 @@ static void clear_vm_uninitialized_flag(struct vm_struct *vm)
vm->flags &= ~VM_UNINITIALIZED;
}
-static struct vm_struct *__get_vm_area_node(unsigned long size,
+struct vm_struct *__get_vm_area_node(unsigned long size,
unsigned long align, unsigned long shift, unsigned long flags,
unsigned long start, unsigned long end, int node,
gfp_t gfp_mask, const void *caller)
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH v3 4/5] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references
2024-10-14 20:36 [PATCH v3 0/5] page allocation tag compression Suren Baghdasaryan
` (2 preceding siblings ...)
2024-10-14 20:36 ` [PATCH v3 3/5] alloc_tag: populate memory for module tags as needed Suren Baghdasaryan
@ 2024-10-14 20:36 ` Suren Baghdasaryan
2024-10-14 20:36 ` [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
2024-10-14 23:32 ` [PATCH v3 0/5] page allocation tag compression Andrew Morton
5 siblings, 0 replies; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-14 20:36 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
To simplify later changes to page tag references, introduce new
pgalloc_tag_ref and pgtag_ref_handle types. This allows easy
replacement of page_ext as a storage of page allocation tags.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/mm.h | 25 +++++----
include/linux/pgalloc_tag.h | 108 ++++++++++++++++++++++++------------
lib/alloc_tag.c | 3 +-
3 files changed, 88 insertions(+), 48 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5cd22303fbc0..8efb4a6a1a70 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4180,37 +4180,38 @@ static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new
return;
for (i = nr_pages; i < (1 << old_order); i += nr_pages) {
- union codetag_ref *ref = get_page_tag_ref(folio_page(folio, i));
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
+ if (get_page_tag_ref(folio_page(folio, i), &ref, &handle)) {
/* Set new reference to point to the original tag */
- alloc_tag_ref_set(ref, tag);
- put_page_tag_ref(ref);
+ alloc_tag_ref_set(&ref, tag);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
static inline void pgalloc_tag_copy(struct folio *new, struct folio *old)
{
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
struct alloc_tag *tag;
- union codetag_ref *ref;
tag = pgalloc_tag_get(&old->page);
if (!tag)
return;
- ref = get_page_tag_ref(&new->page);
- if (!ref)
+ if (!get_page_tag_ref(&new->page, &ref, &handle))
return;
/* Clear the old ref to the original allocation tag. */
clear_page_tag_ref(&old->page);
/* Decrement the counters of the tag on get_new_folio. */
- alloc_tag_sub(ref, folio_nr_pages(new));
-
- __alloc_tag_ref_set(ref, tag);
-
- put_page_tag_ref(ref);
+ alloc_tag_sub(&ref, folio_nr_pages(new));
+ __alloc_tag_ref_set(&ref, tag);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
#else /* !CONFIG_MEM_ALLOC_PROFILING */
static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new_order)
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 59a3deb792a8..bc65710ee1f9 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -9,48 +9,83 @@
#ifdef CONFIG_MEM_ALLOC_PROFILING
+typedef union codetag_ref pgalloc_tag_ref;
+
+static inline void read_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+ ref->ct = pgref->ct;
+}
+
+static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+ pgref->ct = ref->ct;
+}
+
+union pgtag_ref_handle {
+ pgalloc_tag_ref *pgref; /* reference in page extension */
+};
+
#include <linux/page_ext.h>
extern struct page_ext_operations page_alloc_tagging_ops;
-static inline union codetag_ref *codetag_ref_from_page_ext(struct page_ext *page_ext)
+static inline pgalloc_tag_ref *pgref_from_page_ext(struct page_ext *page_ext)
{
- return (union codetag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
+ return (pgalloc_tag_ref *)page_ext_data(page_ext, &page_alloc_tagging_ops);
}
-static inline struct page_ext *page_ext_from_codetag_ref(union codetag_ref *ref)
+static inline struct page_ext *page_ext_from_pgref(pgalloc_tag_ref *pgref)
{
- return (void *)ref - page_alloc_tagging_ops.offset;
+ return (void *)pgref - page_alloc_tagging_ops.offset;
}
/* Should be called only if mem_alloc_profiling_enabled() */
-static inline union codetag_ref *get_page_tag_ref(struct page *page)
+static inline bool get_page_tag_ref(struct page *page, union codetag_ref *ref,
+ union pgtag_ref_handle *handle)
{
- if (page) {
- struct page_ext *page_ext = page_ext_get(page);
+ struct page_ext *page_ext;
+ pgalloc_tag_ref *pgref;
- if (page_ext)
- return codetag_ref_from_page_ext(page_ext);
- }
- return NULL;
+ if (!page)
+ return false;
+
+ page_ext = page_ext_get(page);
+ if (!page_ext)
+ return false;
+
+ pgref = pgref_from_page_ext(page_ext);
+ read_pgref(pgref, ref);
+ handle->pgref = pgref;
+ return true;
+}
+
+static inline void put_page_tag_ref(union pgtag_ref_handle handle)
+{
+ if (WARN_ON(!handle.pgref))
+ return;
+
+ page_ext_put(page_ext_from_pgref(handle.pgref));
}
-static inline void put_page_tag_ref(union codetag_ref *ref)
+static inline void update_page_tag_ref(union pgtag_ref_handle handle,
+ union codetag_ref *ref)
{
- if (WARN_ON(!ref))
+ if (WARN_ON(!handle.pgref || !ref))
return;
- page_ext_put(page_ext_from_codetag_ref(ref));
+ write_pgref(handle.pgref, ref);
}
static inline void clear_page_tag_ref(struct page *page)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
- set_codetag_empty(ref);
- put_page_tag_ref(ref);
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ set_codetag_empty(&ref);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -59,11 +94,13 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
- alloc_tag_add(ref, task->alloc_tag, PAGE_SIZE * nr);
- put_page_tag_ref(ref);
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ alloc_tag_add(&ref, task->alloc_tag, PAGE_SIZE * nr);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -71,11 +108,13 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
static inline void pgalloc_tag_sub(struct page *page, unsigned int nr)
{
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
- if (ref) {
- alloc_tag_sub(ref, PAGE_SIZE * nr);
- put_page_tag_ref(ref);
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ alloc_tag_sub(&ref, PAGE_SIZE * nr);
+ update_page_tag_ref(handle, &ref);
+ put_page_tag_ref(handle);
}
}
}
@@ -85,13 +124,14 @@ static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
struct alloc_tag *tag = NULL;
if (mem_alloc_profiling_enabled()) {
- union codetag_ref *ref = get_page_tag_ref(page);
-
- alloc_tag_sub_check(ref);
- if (ref) {
- if (ref->ct)
- tag = ct_to_alloc_tag(ref->ct);
- put_page_tag_ref(ref);
+ union pgtag_ref_handle handle;
+ union codetag_ref ref;
+
+ if (get_page_tag_ref(page, &ref, &handle)) {
+ alloc_tag_sub_check(&ref);
+ if (ref.ct)
+ tag = ct_to_alloc_tag(ref.ct);
+ put_page_tag_ref(handle);
}
}
@@ -106,8 +146,6 @@ static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
#else /* CONFIG_MEM_ALLOC_PROFILING */
-static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
-static inline void put_page_tag_ref(union codetag_ref *ref) {}
static inline void clear_page_tag_ref(struct page *page) {}
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int nr) {}
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 648f32d52b8d..2ef762acc203 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -5,6 +5,7 @@
#include <linux/gfp.h>
#include <linux/module.h>
#include <linux/page_ext.h>
+#include <linux/pgalloc_tag.h>
#include <linux/proc_fs.h>
#include <linux/seq_buf.h>
#include <linux/seq_file.h>
@@ -474,7 +475,7 @@ static __init void init_page_alloc_tagging(void)
}
struct page_ext_operations page_alloc_tagging_ops = {
- .size = sizeof(union codetag_ref),
+ .size = sizeof(pgalloc_tag_ref),
.need = need_page_alloc_tagging,
.init = init_page_alloc_tagging,
};
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-14 20:36 [PATCH v3 0/5] page allocation tag compression Suren Baghdasaryan
` (3 preceding siblings ...)
2024-10-14 20:36 ` [PATCH v3 4/5] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references Suren Baghdasaryan
@ 2024-10-14 20:36 ` Suren Baghdasaryan
2024-10-14 23:48 ` Yosry Ahmed
2024-10-14 23:32 ` [PATCH v3 0/5] page allocation tag compression Andrew Morton
5 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-14 20:36 UTC (permalink / raw)
To: akpm
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team,
surenb
Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
references directly in the page flags. This eliminates memory
overhead caused by page_ext and results in better performance
for page allocations.
If the number of available page flag bits is insufficient to
address all kernel allocations, profiling falls back to using
page extensions with an appropriate warning to disable this
config.
If dynamically loaded modules add enough tags that they can't
be addressed anymore with available page flag bits, memory
profiling gets disabled and a warning is issued.
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
include/linux/alloc_tag.h | 10 +-
include/linux/codetag.h | 3 +
include/linux/page-flags-layout.h | 7 ++
include/linux/pgalloc_tag.h | 184 +++++++++++++++++++++++++++++-
lib/Kconfig.debug | 19 +++
lib/alloc_tag.c | 90 ++++++++++++++-
lib/codetag.c | 4 +-
mm/mm_init.c | 5 +-
8 files changed, 310 insertions(+), 12 deletions(-)
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 7431757999c5..4f811ec0ffe0 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,8 +30,16 @@ struct alloc_tag {
struct alloc_tag_counters __percpu *counters;
} __aligned(8);
+struct alloc_tag_kernel_section {
+ struct alloc_tag *first_tag;
+ unsigned long count;
+};
+
struct alloc_tag_module_section {
- unsigned long start_addr;
+ union {
+ unsigned long start_addr;
+ struct alloc_tag *first_tag;
+ };
unsigned long end_addr;
/* used size */
unsigned long size;
diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index fb4e7adfa746..401fc297eeda 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -13,6 +13,9 @@ struct codetag_module;
struct seq_buf;
struct module;
+#define CODETAG_SECTION_START_PREFIX "__start_"
+#define CODETAG_SECTION_STOP_PREFIX "__stop_"
+
/*
* An instance of this structure is created in a special ELF section at every
* code location being tagged. At runtime, the special section is treated as
diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h
index 7d79818dc065..4f5c9e979bb9 100644
--- a/include/linux/page-flags-layout.h
+++ b/include/linux/page-flags-layout.h
@@ -111,5 +111,12 @@
ZONES_WIDTH - LRU_GEN_WIDTH - SECTIONS_WIDTH - \
NODES_WIDTH - KASAN_TAG_WIDTH - LAST_CPUPID_WIDTH)
+#define NR_NON_PAGEFLAG_BITS (SECTIONS_WIDTH + NODES_WIDTH + ZONES_WIDTH + \
+ LAST_CPUPID_SHIFT + KASAN_TAG_WIDTH + \
+ LRU_GEN_WIDTH + LRU_REFS_WIDTH)
+
+#define NR_UNUSED_PAGEFLAG_BITS (BITS_PER_LONG - \
+ (NR_NON_PAGEFLAG_BITS + NR_PAGEFLAGS))
+
#endif
#endif /* _LINUX_PAGE_FLAGS_LAYOUT */
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index bc65710ee1f9..69976cf747a6 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -9,6 +9,93 @@
#ifdef CONFIG_MEM_ALLOC_PROFILING
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+
+typedef u16 pgalloc_tag_ref;
+
+extern struct alloc_tag_kernel_section kernel_tags;
+
+#define CODETAG_ID_NULL 0
+#define CODETAG_ID_EMPTY 1
+#define CODETAG_ID_FIRST 2
+
+#ifdef CONFIG_MODULES
+
+extern struct alloc_tag_module_section module_tags;
+
+static inline struct codetag *get_module_ct(pgalloc_tag_ref pgref)
+{
+ return &module_tags.first_tag[pgref - kernel_tags.count].ct;
+}
+
+static inline pgalloc_tag_ref get_module_pgref(struct alloc_tag *tag)
+{
+ return CODETAG_ID_FIRST + kernel_tags.count + (tag - module_tags.first_tag);
+}
+
+#else /* CONFIG_MODULES */
+
+static inline struct codetag *get_module_ct(pgalloc_tag_ref pgref)
+{
+ pr_warn("invalid page tag reference %lu\n", (unsigned long)pgref);
+ return NULL;
+}
+
+static inline pgalloc_tag_ref get_module_pgref(struct alloc_tag *tag)
+{
+ pr_warn("invalid page tag 0x%lx\n", (unsigned long)tag);
+ return CODETAG_ID_NULL;
+}
+
+#endif /* CONFIG_MODULES */
+
+static inline void read_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+ pgalloc_tag_ref pgref_val = *pgref;
+
+ switch (pgref_val) {
+ case (CODETAG_ID_NULL):
+ ref->ct = NULL;
+ break;
+ case (CODETAG_ID_EMPTY):
+ set_codetag_empty(ref);
+ break;
+ default:
+ pgref_val -= CODETAG_ID_FIRST;
+ ref->ct = pgref_val < kernel_tags.count ?
+ &kernel_tags.first_tag[pgref_val].ct :
+ get_module_ct(pgref_val);
+ break;
+ }
+}
+
+static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
+{
+ struct alloc_tag *tag;
+
+ if (!ref->ct) {
+ *pgref = CODETAG_ID_NULL;
+ return;
+ }
+
+ if (is_codetag_empty(ref)) {
+ *pgref = CODETAG_ID_EMPTY;
+ return;
+ }
+
+ tag = ct_to_alloc_tag(ref->ct);
+ if (tag >= kernel_tags.first_tag && tag < kernel_tags.first_tag + kernel_tags.count) {
+ *pgref = CODETAG_ID_FIRST + (tag - kernel_tags.first_tag);
+ return;
+ }
+
+ *pgref = get_module_pgref(tag);
+}
+
+void __init alloc_tag_sec_init(void);
+
+#else /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
typedef union codetag_ref pgalloc_tag_ref;
static inline void read_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
@@ -21,8 +108,13 @@ static inline void write_pgref(pgalloc_tag_ref *pgref, union codetag_ref *ref)
pgref->ct = ref->ct;
}
+static inline void alloc_tag_sec_init(void) {}
+
+#endif /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
union pgtag_ref_handle {
pgalloc_tag_ref *pgref; /* reference in page extension */
+ struct page *page; /* reference in page flags */
};
#include <linux/page_ext.h>
@@ -40,8 +132,8 @@ static inline struct page_ext *page_ext_from_pgref(pgalloc_tag_ref *pgref)
}
/* Should be called only if mem_alloc_profiling_enabled() */
-static inline bool get_page_tag_ref(struct page *page, union codetag_ref *ref,
- union pgtag_ref_handle *handle)
+static inline bool page_ext_get_page_tag_ref(struct page *page, union codetag_ref *ref,
+ union pgtag_ref_handle *handle)
{
struct page_ext *page_ext;
pgalloc_tag_ref *pgref;
@@ -59,7 +151,7 @@ static inline bool get_page_tag_ref(struct page *page, union codetag_ref *ref,
return true;
}
-static inline void put_page_tag_ref(union pgtag_ref_handle handle)
+static inline void page_ext_put_page_tag_ref(union pgtag_ref_handle handle)
{
if (WARN_ON(!handle.pgref))
return;
@@ -67,8 +159,8 @@ static inline void put_page_tag_ref(union pgtag_ref_handle handle)
page_ext_put(page_ext_from_pgref(handle.pgref));
}
-static inline void update_page_tag_ref(union pgtag_ref_handle handle,
- union codetag_ref *ref)
+static inline void page_ext_update_page_tag_ref(union pgtag_ref_handle handle,
+ union codetag_ref *ref)
{
if (WARN_ON(!handle.pgref || !ref))
return;
@@ -76,6 +168,87 @@ static inline void update_page_tag_ref(union pgtag_ref_handle handle,
write_pgref(handle.pgref, ref);
}
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+
+DECLARE_STATIC_KEY_TRUE(mem_profiling_use_pageflags);
+extern unsigned long alloc_tag_ref_mask;
+extern int alloc_tag_ref_offs;
+
+/* Should be called only if mem_alloc_profiling_enabled() */
+static inline bool get_page_tag_ref(struct page *page, union codetag_ref *ref,
+ union pgtag_ref_handle *handle)
+{
+ pgalloc_tag_ref pgref;
+
+ if (!static_key_enabled(&mem_profiling_use_pageflags))
+ return page_ext_get_page_tag_ref(page, ref, handle);
+
+ if (!page)
+ return false;
+
+ pgref = (page->flags >> alloc_tag_ref_offs) & alloc_tag_ref_mask;
+ read_pgref(&pgref, ref);
+ handle->page = page;
+ return true;
+}
+
+static inline void put_page_tag_ref(union pgtag_ref_handle handle)
+{
+ if (!static_key_enabled(&mem_profiling_use_pageflags)) {
+ page_ext_put_page_tag_ref(handle);
+ return;
+ }
+
+ WARN_ON(!handle.page);
+}
+
+static inline void update_page_tag_ref(union pgtag_ref_handle handle, union codetag_ref *ref)
+{
+ unsigned long old_flags, flags, val;
+ struct page *page = handle.page;
+ pgalloc_tag_ref pgref;
+
+ if (!static_key_enabled(&mem_profiling_use_pageflags)) {
+ page_ext_update_page_tag_ref(handle, ref);
+ return;
+ }
+
+ if (WARN_ON(!page || !ref))
+ return;
+
+ write_pgref(&pgref, ref);
+ val = (unsigned long)pgref;
+ val = (val & alloc_tag_ref_mask) << alloc_tag_ref_offs;
+ do {
+ old_flags = READ_ONCE(page->flags);
+ flags = old_flags;
+ flags &= ~(alloc_tag_ref_mask << alloc_tag_ref_offs);
+ flags |= val;
+ } while (unlikely(!try_cmpxchg(&page->flags, &old_flags, flags)));
+}
+
+#else /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
+/* Should be called only if mem_alloc_profiling_enabled() */
+static inline bool get_page_tag_ref(struct page *page, union codetag_ref *ref,
+ union pgtag_ref_handle *handle)
+{
+ return page_ext_get_page_tag_ref(page, ref, handle);
+}
+
+static inline void put_page_tag_ref(union pgtag_ref_handle handle)
+{
+ page_ext_put_page_tag_ref(handle);
+}
+
+static inline void update_page_tag_ref(union pgtag_ref_handle handle,
+ union codetag_ref *ref)
+{
+ page_ext_update_page_tag_ref(handle, ref);
+}
+
+#endif /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
static inline void clear_page_tag_ref(struct page *page)
{
if (mem_alloc_profiling_enabled()) {
@@ -152,6 +325,7 @@ static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; }
static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
+static inline void alloc_tag_sec_init(void) {}
#endif /* CONFIG_MEM_ALLOC_PROFILING */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7315f643817a..f4c272c30719 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1017,6 +1017,25 @@ config MEM_ALLOC_PROFILING_DEBUG
Adds warnings with helpful error messages for memory allocation
profiling.
+config PGALLOC_TAG_USE_PAGEFLAGS
+ bool "Use pageflags to encode page allocation tag reference"
+ default n
+ depends on MEM_ALLOC_PROFILING
+ help
+ When set, page allocation tag references are encoded inside page
+ flags if they fit, otherwise they are encoded in page extensions.
+
+ Setting this flag reduces memory and performance overhead of memory
+ allocation profiling but it requires enough unused page flag bits to
+ address all kernel allocations. If enabled but there are not enough
+ unused page flag bits, profiling will fall back to using page
+ extensions and a warning will be issued to disable this config.
+ If dynamically loaded modules add enough tags that they can't be
+ addressed anymore with available page flag bits, profiling for such
+ modules will be disabled and a warning will be issued.
+
+ Say N if unsure.
+
source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"
source "lib/Kconfig.kmsan"
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 2ef762acc203..341a5c826449 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -3,6 +3,7 @@
#include <linux/execmem.h>
#include <linux/fs.h>
#include <linux/gfp.h>
+#include <linux/kallsyms.h>
#include <linux/module.h>
#include <linux/page_ext.h>
#include <linux/pgalloc_tag.h>
@@ -152,6 +153,42 @@ static void __init procfs_init(void)
proc_create_seq("allocinfo", 0400, NULL, &allocinfo_seq_op);
}
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+
+DEFINE_STATIC_KEY_TRUE(mem_profiling_use_pageflags);
+unsigned long alloc_tag_ref_mask;
+int alloc_tag_ref_offs;
+
+#define SECTION_START(NAME) (CODETAG_SECTION_START_PREFIX NAME)
+#define SECTION_STOP(NAME) (CODETAG_SECTION_STOP_PREFIX NAME)
+
+struct alloc_tag_kernel_section kernel_tags = { NULL, 0 };
+
+void __init alloc_tag_sec_init(void)
+{
+ struct alloc_tag *last_codetag;
+
+ kernel_tags.first_tag = (struct alloc_tag *)kallsyms_lookup_name(
+ SECTION_START(ALLOC_TAG_SECTION_NAME));
+ last_codetag = (struct alloc_tag *)kallsyms_lookup_name(
+ SECTION_STOP(ALLOC_TAG_SECTION_NAME));
+ kernel_tags.count = last_codetag - kernel_tags.first_tag;
+
+ /* Check if kernel tags fit into page flags */
+ if (kernel_tags.count > (1UL << NR_UNUSED_PAGEFLAG_BITS)) {
+ static_branch_disable(&mem_profiling_use_pageflags);
+ pr_warn("%lu kernel tags do not fit into %d available page flag bits, page extensions will be used instead!\n",
+ kernel_tags.count, NR_UNUSED_PAGEFLAG_BITS);
+ } else {
+ alloc_tag_ref_offs = (LRU_REFS_PGOFF - NR_UNUSED_PAGEFLAG_BITS);
+ alloc_tag_ref_mask = ((1UL << NR_UNUSED_PAGEFLAG_BITS) - 1);
+ pr_info("All unused page flags (%d bits) are used to store page tag references!\n",
+ NR_UNUSED_PAGEFLAG_BITS);
+ }
+}
+
+#endif /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
#ifdef CONFIG_MODULES
static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
@@ -161,7 +198,29 @@ static struct module unloaded_mod;
/* A dummy object used to indicate a module prepended area */
static struct module prepend_mod;
-static struct alloc_tag_module_section module_tags;
+struct alloc_tag_module_section module_tags;
+
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+static inline unsigned long alloc_tag_align(unsigned long val)
+{
+ if (val % sizeof(struct alloc_tag) == 0)
+ return val;
+ return ((val / sizeof(struct alloc_tag)) + 1) * sizeof(struct alloc_tag);
+}
+
+static inline bool tags_addressable(void)
+{
+ unsigned long tag_idx_count = CODETAG_ID_FIRST + kernel_tags.count +
+ module_tags.size / sizeof(struct alloc_tag);
+
+ return tag_idx_count < (1UL << NR_UNUSED_PAGEFLAG_BITS);
+}
+
+#else /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
+
+static inline bool tags_addressable(void) { return true; }
+
+#endif /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
static bool needs_section_mem(struct module *mod, unsigned long size)
{
@@ -237,6 +296,21 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
if (!align)
align = 1;
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+ /*
+ * If alloc_tag size is not a multiple of required alignment tag
+ * indexing does not work.
+ */
+ if (!IS_ALIGNED(sizeof(struct alloc_tag), align)) {
+ pr_err("%s: alignment %lu is incompatible with allocation tag indexing, disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS",
+ mod->name, align);
+ return ERR_PTR(-EINVAL);
+ }
+
+ /* Ensure prepend consumes multiple of alloc_tag-sized blocks */
+ if (prepend)
+ prepend = alloc_tag_align(prepend);
+#endif /* CONFIG_PGALLOC_TAG_USE_PAGEFLAGS */
mas_lock(&mas);
repeat:
/* Try finding exact size and hope the start is aligned */
@@ -305,6 +379,12 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
unsigned long phys_size = vm_module_tags->nr_pages << PAGE_SHIFT;
module_tags.size = offset + size;
+ if (!tags_addressable() && mem_alloc_profiling_enabled()) {
+ static_branch_disable(&mem_alloc_profiling_key);
+ pr_warn("With module %s there are too many tags to fit in %d page flag bits. Memory profiling is disabled!\n",
+ mod->name, NR_UNUSED_PAGEFLAG_BITS);
+ }
+
if (phys_size < module_tags.size) {
int grow_res;
@@ -406,6 +486,10 @@ static int __init alloc_mod_tags_mem(void)
module_tags.start_addr = (unsigned long)vm_module_tags->addr;
module_tags.end_addr = module_tags.start_addr + MODULE_ALLOC_TAG_VMAP_SIZE;
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+ /* Ensure the base is alloc_tag aligned */
+ module_tags.start_addr = alloc_tag_align(module_tags.start_addr);
+#endif
return 0;
}
@@ -467,6 +551,10 @@ early_param("sysctl.vm.mem_profiling", setup_early_mem_profiling);
static __init bool need_page_alloc_tagging(void)
{
+#ifdef CONFIG_PGALLOC_TAG_USE_PAGEFLAGS
+ if (static_key_enabled(&mem_profiling_use_pageflags))
+ return false;
+#endif
return mem_profiling_support;
}
diff --git a/lib/codetag.c b/lib/codetag.c
index 3f3a4f4ca72d..cda5a852ef30 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -149,8 +149,8 @@ static struct codetag_range get_section_range(struct module *mod,
const char *section)
{
return (struct codetag_range) {
- get_symbol(mod, "__start_", section),
- get_symbol(mod, "__stop_", section),
+ get_symbol(mod, CODETAG_SECTION_START_PREFIX, section),
+ get_symbol(mod, CODETAG_SECTION_STOP_PREFIX, section),
};
}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 4ba5607aaf19..1c205b0a86ed 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -83,8 +83,7 @@ void __init mminit_verify_pageflags_layout(void)
unsigned long or_mask, add_mask;
shift = BITS_PER_LONG;
- width = shift - SECTIONS_WIDTH - NODES_WIDTH - ZONES_WIDTH
- - LAST_CPUPID_SHIFT - KASAN_TAG_WIDTH - LRU_GEN_WIDTH - LRU_REFS_WIDTH;
+ width = shift - NR_NON_PAGEFLAG_BITS;
mminit_dprintk(MMINIT_TRACE, "pageflags_layout_widths",
"Section %d Node %d Zone %d Lastcpupid %d Kasantag %d Gen %d Tier %d Flags %d\n",
SECTIONS_WIDTH,
@@ -2639,7 +2638,7 @@ void __init mm_core_init(void)
BUILD_BUG_ON(MAX_ZONELISTS > 2);
build_all_zonelists(NULL);
page_alloc_init_cpuhp();
-
+ alloc_tag_sec_init();
/*
* page_ext requires contiguous pages,
* bigger than MAX_PAGE_ORDER unless SPARSEMEM.
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH v3 0/5] page allocation tag compression
2024-10-14 20:36 [PATCH v3 0/5] page allocation tag compression Suren Baghdasaryan
` (4 preceding siblings ...)
2024-10-14 20:36 ` [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
@ 2024-10-14 23:32 ` Andrew Morton
2024-10-15 1:48 ` Suren Baghdasaryan
5 siblings, 1 reply; 47+ messages in thread
From: Andrew Morton @ 2024-10-14 23:32 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, 14 Oct 2024 13:36:41 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> Patch #2 copies module tags into virtually contiguous memory which
> serves two purposes:
> - Lets us deal with the situation when module is unloaded while there
> are still live allocations from that module. Since we are using a copy
> version of the tags we can safely unload the module. Space and gaps in
> this contiguous memory are managed using a maple tree.
Does this make "lib: alloc_tag_module_unload must wait for pending
kfree_rcu calls" unneeded? If so, that patch was cc:stable
(justifyably), so what to do about that?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-14 20:36 ` [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
@ 2024-10-14 23:48 ` Yosry Ahmed
2024-10-14 23:53 ` John Hubbard
0 siblings, 1 reply; 47+ messages in thread
From: Yosry Ahmed @ 2024-10-14 23:48 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
> references directly in the page flags. This eliminates memory
> overhead caused by page_ext and results in better performance
> for page allocations.
> If the number of available page flag bits is insufficient to
> address all kernel allocations, profiling falls back to using
> page extensions with an appropriate warning to disable this
> config.
> If dynamically loaded modules add enough tags that they can't
> be addressed anymore with available page flag bits, memory
> profiling gets disabled and a warning is issued.
Just curious, why do we need a config option? If there are enough bits
in page flags, why not use them automatically or fallback to page_ext
otherwise?
Is the reason that dynamically loadable modules, where the user may
know in advance that the page flags won't be enough with the modules
loaded?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 2/5] alloc_tag: load module tags into separate contiguous memory
2024-10-14 20:36 ` [PATCH v3 2/5] alloc_tag: load module tags into separate contiguous memory Suren Baghdasaryan
@ 2024-10-14 23:51 ` Andrew Morton
2024-10-15 2:10 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: Andrew Morton @ 2024-10-14 23:51 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, 14 Oct 2024 13:36:43 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> When a module gets unloaded there is a possibility that some of the
> allocations it made are still used and therefore the allocation tags
> corresponding to these allocations are still referenced. As such, the
> memory for these tags can't be freed. This is currently handled as an
> abnormal situation and module's data section is not being unloaded.
> To handle this situation without keeping module's data in memory,
> allow codetags with longer lifespan than the module to be loaded into
> their own separate memory. The in-use memory areas and gaps after
> module unloading in this separate memory are tracked using maple trees.
> Allocation tags arrange their separate memory so that it is virtually
> contiguous and that will allow simple allocation tag indexing later on
> in this patchset. The size of this virtually contiguous memory is set
> to store up to 100000 allocation tags.
>
> ...
>
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -1254,22 +1254,17 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
> return 0;
> }
>
> -static void module_memory_free(struct module *mod, enum mod_mem_type type,
> - bool unload_codetags)
> +static void module_memory_free(struct module *mod, enum mod_mem_type type)
> {
> struct module_memory *mem = &mod->mem[type];
> - void *ptr = mem->base;
>
> if (mem->is_rox)
> vfree(mem->rw_copy);
>
> - if (!unload_codetags && mod_mem_type_is_core_data(type))
> - return;
> -
> - execmem_free(ptr);
> + execmem_free(mem->base);
> }
The changes around here are dependent upon Mike's "module: make
module_memory_{alloc,free} more self-contained", which is no longer in
mm-unstable. I assume Mike is working on a v2 so I'll park this series
for now.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-14 23:48 ` Yosry Ahmed
@ 2024-10-14 23:53 ` John Hubbard
2024-10-14 23:56 ` Yosry Ahmed
2024-10-15 7:32 ` David Hildenbrand
0 siblings, 2 replies; 47+ messages in thread
From: John Hubbard @ 2024-10-14 23:53 UTC (permalink / raw)
To: Yosry Ahmed, Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On 10/14/24 4:48 PM, Yosry Ahmed wrote:
> On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
>> references directly in the page flags. This eliminates memory
>> overhead caused by page_ext and results in better performance
>> for page allocations.
>> If the number of available page flag bits is insufficient to
>> address all kernel allocations, profiling falls back to using
>> page extensions with an appropriate warning to disable this
>> config.
>> If dynamically loaded modules add enough tags that they can't
>> be addressed anymore with available page flag bits, memory
>> profiling gets disabled and a warning is issued.
>
> Just curious, why do we need a config option? If there are enough bits
> in page flags, why not use them automatically or fallback to page_ext
> otherwise?
Or better yet, *always* fall back to page_ext, thus leaving the
scarce and valuable page flags available for other features?
Sorry Suren, to keep coming back to this suggestion, I know
I'm driving you crazy here! But I just keep thinking it through
and failing to see why this feature deserves to consume so
many page flags.
thanks,
--
John Hubbard
>
> Is the reason that dynamically loadable modules, where the user may
> know in advance that the page flags won't be enough with the modules
> loaded?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-14 23:53 ` John Hubbard
@ 2024-10-14 23:56 ` Yosry Ahmed
2024-10-15 0:03 ` John Hubbard
2024-10-15 7:32 ` David Hildenbrand
1 sibling, 1 reply; 47+ messages in thread
From: Yosry Ahmed @ 2024-10-14 23:56 UTC (permalink / raw)
To: John Hubbard
Cc: Suren Baghdasaryan, akpm, kent.overstreet, corbet, arnd, mcgrof,
rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb, david,
vbabka, mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Mon, Oct 14, 2024 at 4:53 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/14/24 4:48 PM, Yosry Ahmed wrote:
> > On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>
> >> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
> >> references directly in the page flags. This eliminates memory
> >> overhead caused by page_ext and results in better performance
> >> for page allocations.
> >> If the number of available page flag bits is insufficient to
> >> address all kernel allocations, profiling falls back to using
> >> page extensions with an appropriate warning to disable this
> >> config.
> >> If dynamically loaded modules add enough tags that they can't
> >> be addressed anymore with available page flag bits, memory
> >> profiling gets disabled and a warning is issued.
> >
> > Just curious, why do we need a config option? If there are enough bits
> > in page flags, why not use them automatically or fallback to page_ext
> > otherwise?
>
> Or better yet, *always* fall back to page_ext, thus leaving the
> scarce and valuable page flags available for other features?
>
> Sorry Suren, to keep coming back to this suggestion, I know
> I'm driving you crazy here! But I just keep thinking it through
> and failing to see why this feature deserves to consume so
> many page flags.
I think we already always use page_ext today. My understanding is that
the purpose of this series is to give the option to avoid using
page_ext if there are enough unused page flags anyway, which reduces
memory waste and improves performance.
My question is just why not have that be the default behavior with a
config option, use page flags if there are enough unused bits,
otherwise use page_ext.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-14 23:56 ` Yosry Ahmed
@ 2024-10-15 0:03 ` John Hubbard
2024-10-15 1:40 ` Matthew Wilcox
2024-10-15 1:58 ` Suren Baghdasaryan
0 siblings, 2 replies; 47+ messages in thread
From: John Hubbard @ 2024-10-15 0:03 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Suren Baghdasaryan, akpm, kent.overstreet, corbet, arnd, mcgrof,
rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb, david,
vbabka, mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On 10/14/24 4:56 PM, Yosry Ahmed wrote:
> On Mon, Oct 14, 2024 at 4:53 PM John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> On 10/14/24 4:48 PM, Yosry Ahmed wrote:
>>> On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>
>>>> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
>>>> references directly in the page flags. This eliminates memory
>>>> overhead caused by page_ext and results in better performance
>>>> for page allocations.
>>>> If the number of available page flag bits is insufficient to
>>>> address all kernel allocations, profiling falls back to using
>>>> page extensions with an appropriate warning to disable this
>>>> config.
>>>> If dynamically loaded modules add enough tags that they can't
>>>> be addressed anymore with available page flag bits, memory
>>>> profiling gets disabled and a warning is issued.
>>>
>>> Just curious, why do we need a config option? If there are enough bits
>>> in page flags, why not use them automatically or fallback to page_ext
>>> otherwise?
>>
>> Or better yet, *always* fall back to page_ext, thus leaving the
>> scarce and valuable page flags available for other features?
>>
>> Sorry Suren, to keep coming back to this suggestion, I know
>> I'm driving you crazy here! But I just keep thinking it through
>> and failing to see why this feature deserves to consume so
>> many page flags.
>
> I think we already always use page_ext today. My understanding is that
> the purpose of this series is to give the option to avoid using
> page_ext if there are enough unused page flags anyway, which reduces
> memory waste and improves performance.
>
> My question is just why not have that be the default behavior with a
> config option, use page flags if there are enough unused bits,
> otherwise use page_ext.
I agree that if you're going to implement this feature at all, then
keying off of CONFIG_MEM_ALLOC_PROFILING seems sufficient, and no
need to add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS on top of that.
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-15 0:03 ` John Hubbard
@ 2024-10-15 1:40 ` Matthew Wilcox
2024-10-15 2:03 ` Suren Baghdasaryan
2024-10-15 1:58 ` Suren Baghdasaryan
1 sibling, 1 reply; 47+ messages in thread
From: Matthew Wilcox @ 2024-10-15 1:40 UTC (permalink / raw)
To: John Hubbard
Cc: Yosry Ahmed, Suren Baghdasaryan, akpm, kent.overstreet, corbet,
arnd, mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
david, vbabka, mhocko, hannes, roman.gushchin, dave, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Mon, Oct 14, 2024 at 05:03:32PM -0700, John Hubbard wrote:
> > > Or better yet, *always* fall back to page_ext, thus leaving the
> > > scarce and valuable page flags available for other features?
> > >
> > > Sorry Suren, to keep coming back to this suggestion, I know
> > > I'm driving you crazy here! But I just keep thinking it through
> > > and failing to see why this feature deserves to consume so
> > > many page flags.
> >
> > I think we already always use page_ext today. My understanding is that
> > the purpose of this series is to give the option to avoid using
> > page_ext if there are enough unused page flags anyway, which reduces
> > memory waste and improves performance.
> >
> > My question is just why not have that be the default behavior with a
> > config option, use page flags if there are enough unused bits,
> > otherwise use page_ext.
>
> I agree that if you're going to implement this feature at all, then
> keying off of CONFIG_MEM_ALLOC_PROFILING seems sufficient, and no
> need to add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS on top of that.
Maybe the right idea is to use all the bits which are unused in this
configuration for the first N callsites, then use page_ext for all the
ones larger than N. It doesn't save any memory, but it does have the
performance improvement.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 0/5] page allocation tag compression
2024-10-14 23:32 ` [PATCH v3 0/5] page allocation tag compression Andrew Morton
@ 2024-10-15 1:48 ` Suren Baghdasaryan
2024-10-15 16:26 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-15 1:48 UTC (permalink / raw)
To: Andrew Morton
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, Oct 14, 2024 at 4:32 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 14 Oct 2024 13:36:41 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > Patch #2 copies module tags into virtually contiguous memory which
> > serves two purposes:
> > - Lets us deal with the situation when module is unloaded while there
> > are still live allocations from that module. Since we are using a copy
> > version of the tags we can safely unload the module. Space and gaps in
> > this contiguous memory are managed using a maple tree.
>
> Does this make "lib: alloc_tag_module_unload must wait for pending
> kfree_rcu calls" unneeded?
With this change we can unload a module even when tags from that
module are still in use. However "lib: alloc_tag_module_unload must
wait for pending kfree_rcu calls" would still be useful because it
will allow us to release the memory occupied by module's tags and let
other modules use that memory.
> If so, that patch was cc:stable (justifyably), so what to do about that?
Now that I posted this patchset I'll work on backporting "lib:
alloc_tag_module_unload must wait for pending kfree_rcu calls" and its
prerequisites to 6.10 and 6.11. I'll try to get backports out
tomorrow.
I don't think we need to backport this patchset to pre-6.12 kernels
since this is an improvement and not a bug fix. But if it's needed I
can backport it as well.
Thanks,
Suren.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-15 0:03 ` John Hubbard
2024-10-15 1:40 ` Matthew Wilcox
@ 2024-10-15 1:58 ` Suren Baghdasaryan
2024-10-15 8:10 ` Yosry Ahmed
1 sibling, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-15 1:58 UTC (permalink / raw)
To: John Hubbard
Cc: Yosry Ahmed, akpm, kent.overstreet, corbet, arnd, mcgrof, rppt,
paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Mon, Oct 14, 2024 at 5:03 PM 'John Hubbard' via kernel-team
<kernel-team@android.com> wrote:
>
> On 10/14/24 4:56 PM, Yosry Ahmed wrote:
> > On Mon, Oct 14, 2024 at 4:53 PM John Hubbard <jhubbard@nvidia.com> wrote:
> >>
> >> On 10/14/24 4:48 PM, Yosry Ahmed wrote:
> >>> On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>>
> >>>> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
> >>>> references directly in the page flags. This eliminates memory
> >>>> overhead caused by page_ext and results in better performance
> >>>> for page allocations.
> >>>> If the number of available page flag bits is insufficient to
> >>>> address all kernel allocations, profiling falls back to using
> >>>> page extensions with an appropriate warning to disable this
> >>>> config.
> >>>> If dynamically loaded modules add enough tags that they can't
> >>>> be addressed anymore with available page flag bits, memory
> >>>> profiling gets disabled and a warning is issued.
> >>>
> >>> Just curious, why do we need a config option? If there are enough bits
> >>> in page flags, why not use them automatically or fallback to page_ext
> >>> otherwise?
> >>
> >> Or better yet, *always* fall back to page_ext, thus leaving the
> >> scarce and valuable page flags available for other features?
> >>
> >> Sorry Suren, to keep coming back to this suggestion, I know
> >> I'm driving you crazy here! But I just keep thinking it through
> >> and failing to see why this feature deserves to consume so
> >> many page flags.
> >
> > I think we already always use page_ext today. My understanding is that
> > the purpose of this series is to give the option to avoid using
> > page_ext if there are enough unused page flags anyway, which reduces
> > memory waste and improves performance.
> >
> > My question is just why not have that be the default behavior with a
> > config option, use page flags if there are enough unused bits,
> > otherwise use page_ext.
>
> I agree that if you're going to implement this feature at all, then
> keying off of CONFIG_MEM_ALLOC_PROFILING seems sufficient, and no
> need to add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS on top of that.
Yosry's original guess was correct. If not for loadable modules we
could get away with having no CONFIG_PGALLOC_TAG_USE_PAGEFLAGS. We
could try to fit codetag references into page flags and if they do not
fit we could fall back to page_ext. That works fine when we have a
predetermined number of tags. But loadable modules make this number
variable at runtime and there is a possibility we run out of page flag
bits at runtime. In that case, the current patchset disables memory
profiling and issues a warning that the user should disable
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to avoid this situation. I expect
this to be a rare case but it can happen and we have to provide a way
for a user to avoid running out of bits, which is where
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS would be used.
Thanks,
Suren.
>
> thanks,
> --
> John Hubbard
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-15 1:40 ` Matthew Wilcox
@ 2024-10-15 2:03 ` Suren Baghdasaryan
0 siblings, 0 replies; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-15 2:03 UTC (permalink / raw)
To: Matthew Wilcox
Cc: John Hubbard, Yosry Ahmed, akpm, kent.overstreet, corbet, arnd,
mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
david, vbabka, mhocko, hannes, roman.gushchin, dave, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Mon, Oct 14, 2024 at 6:40 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Oct 14, 2024 at 05:03:32PM -0700, John Hubbard wrote:
> > > > Or better yet, *always* fall back to page_ext, thus leaving the
> > > > scarce and valuable page flags available for other features?
> > > >
> > > > Sorry Suren, to keep coming back to this suggestion, I know
> > > > I'm driving you crazy here! But I just keep thinking it through
> > > > and failing to see why this feature deserves to consume so
> > > > many page flags.
> > >
> > > I think we already always use page_ext today. My understanding is that
> > > the purpose of this series is to give the option to avoid using
> > > page_ext if there are enough unused page flags anyway, which reduces
> > > memory waste and improves performance.
> > >
> > > My question is just why not have that be the default behavior with a
> > > config option, use page flags if there are enough unused bits,
> > > otherwise use page_ext.
> >
> > I agree that if you're going to implement this feature at all, then
> > keying off of CONFIG_MEM_ALLOC_PROFILING seems sufficient, and no
> > need to add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS on top of that.
>
> Maybe the right idea is to use all the bits which are unused in this
> configuration for the first N callsites, then use page_ext for all the
> ones larger than N. It doesn't save any memory, but it does have the
> performance improvement.
Thanks for the idea! This would be more complicated and likely would
require additional branching instructions in the allocation path. I'll
think more about details but would prefer to get the simple solution
first before adding more complexity.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 2/5] alloc_tag: load module tags into separate contiguous memory
2024-10-14 23:51 ` Andrew Morton
@ 2024-10-15 2:10 ` Suren Baghdasaryan
2024-10-15 21:08 ` Shakeel Butt
0 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-15 2:10 UTC (permalink / raw)
To: Andrew Morton
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, Oct 14, 2024 at 4:51 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 14 Oct 2024 13:36:43 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > When a module gets unloaded there is a possibility that some of the
> > allocations it made are still used and therefore the allocation tags
> > corresponding to these allocations are still referenced. As such, the
> > memory for these tags can't be freed. This is currently handled as an
> > abnormal situation and module's data section is not being unloaded.
> > To handle this situation without keeping module's data in memory,
> > allow codetags with longer lifespan than the module to be loaded into
> > their own separate memory. The in-use memory areas and gaps after
> > module unloading in this separate memory are tracked using maple trees.
> > Allocation tags arrange their separate memory so that it is virtually
> > contiguous and that will allow simple allocation tag indexing later on
> > in this patchset. The size of this virtually contiguous memory is set
> > to store up to 100000 allocation tags.
> >
> > ...
> >
> > --- a/kernel/module/main.c
> > +++ b/kernel/module/main.c
> > @@ -1254,22 +1254,17 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
> > return 0;
> > }
> >
> > -static void module_memory_free(struct module *mod, enum mod_mem_type type,
> > - bool unload_codetags)
> > +static void module_memory_free(struct module *mod, enum mod_mem_type type)
> > {
> > struct module_memory *mem = &mod->mem[type];
> > - void *ptr = mem->base;
> >
> > if (mem->is_rox)
> > vfree(mem->rw_copy);
> >
> > - if (!unload_codetags && mod_mem_type_is_core_data(type))
> > - return;
> > -
> > - execmem_free(ptr);
> > + execmem_free(mem->base);
> > }
>
> The changes around here are dependent upon Mike's "module: make
> module_memory_{alloc,free} more self-contained", which is no longer in
> mm-unstable. I assume Mike is working on a v2 so I'll park this series
> for now.
Looks like the last update on Mike's patchset was back in May. Let me
check with Mike if he is planning to get it out soon. I would like my
patchset to get into 6.12 if possible.
Thanks,
Suren.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-14 23:53 ` John Hubbard
2024-10-14 23:56 ` Yosry Ahmed
@ 2024-10-15 7:32 ` David Hildenbrand
2024-10-15 14:59 ` Suren Baghdasaryan
1 sibling, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2024-10-15 7:32 UTC (permalink / raw)
To: John Hubbard, Yosry Ahmed, Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On 15.10.24 01:53, John Hubbard wrote:
> On 10/14/24 4:48 PM, Yosry Ahmed wrote:
>> On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>>
>>> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
>>> references directly in the page flags. This eliminates memory
>>> overhead caused by page_ext and results in better performance
>>> for page allocations.
>>> If the number of available page flag bits is insufficient to
>>> address all kernel allocations, profiling falls back to using
>>> page extensions with an appropriate warning to disable this
>>> config.
>>> If dynamically loaded modules add enough tags that they can't
>>> be addressed anymore with available page flag bits, memory
>>> profiling gets disabled and a warning is issued.
>>
>> Just curious, why do we need a config option? If there are enough bits
>> in page flags, why not use them automatically or fallback to page_ext
>> otherwise?
>
> Or better yet, *always* fall back to page_ext, thus leaving the
> scarce and valuable page flags available for other features?
>
> Sorry Suren, to keep coming back to this suggestion, I know
> I'm driving you crazy here! But I just keep thinking it through
> and failing to see why this feature deserves to consume so
> many page flags.
My 2 cents: there is nothing wrong about consuming unused page flags in
a configuration. No need to let them stay unused in a configuration :)
The real issue starts once another feature wants to make use of some of
them ... in such configuration there would be less available for
allocation tags and the performance of allocations tags might
consequently get worse again.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-15 1:58 ` Suren Baghdasaryan
@ 2024-10-15 8:10 ` Yosry Ahmed
2024-10-15 15:06 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: Yosry Ahmed @ 2024-10-15 8:10 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: John Hubbard, akpm, kent.overstreet, corbet, arnd, mcgrof, rppt,
paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Mon, Oct 14, 2024 at 6:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Oct 14, 2024 at 5:03 PM 'John Hubbard' via kernel-team
> <kernel-team@android.com> wrote:
> >
> > On 10/14/24 4:56 PM, Yosry Ahmed wrote:
> > > On Mon, Oct 14, 2024 at 4:53 PM John Hubbard <jhubbard@nvidia.com> wrote:
> > >>
> > >> On 10/14/24 4:48 PM, Yosry Ahmed wrote:
> > >>> On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >>>>
> > >>>> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
> > >>>> references directly in the page flags. This eliminates memory
> > >>>> overhead caused by page_ext and results in better performance
> > >>>> for page allocations.
> > >>>> If the number of available page flag bits is insufficient to
> > >>>> address all kernel allocations, profiling falls back to using
> > >>>> page extensions with an appropriate warning to disable this
> > >>>> config.
> > >>>> If dynamically loaded modules add enough tags that they can't
> > >>>> be addressed anymore with available page flag bits, memory
> > >>>> profiling gets disabled and a warning is issued.
> > >>>
> > >>> Just curious, why do we need a config option? If there are enough bits
> > >>> in page flags, why not use them automatically or fallback to page_ext
> > >>> otherwise?
> > >>
> > >> Or better yet, *always* fall back to page_ext, thus leaving the
> > >> scarce and valuable page flags available for other features?
> > >>
> > >> Sorry Suren, to keep coming back to this suggestion, I know
> > >> I'm driving you crazy here! But I just keep thinking it through
> > >> and failing to see why this feature deserves to consume so
> > >> many page flags.
> > >
> > > I think we already always use page_ext today. My understanding is that
> > > the purpose of this series is to give the option to avoid using
> > > page_ext if there are enough unused page flags anyway, which reduces
> > > memory waste and improves performance.
> > >
> > > My question is just why not have that be the default behavior with a
> > > config option, use page flags if there are enough unused bits,
> > > otherwise use page_ext.
> >
> > I agree that if you're going to implement this feature at all, then
> > keying off of CONFIG_MEM_ALLOC_PROFILING seems sufficient, and no
> > need to add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS on top of that.
>
> Yosry's original guess was correct. If not for loadable modules we
> could get away with having no CONFIG_PGALLOC_TAG_USE_PAGEFLAGS. We
> could try to fit codetag references into page flags and if they do not
> fit we could fall back to page_ext. That works fine when we have a
> predetermined number of tags. But loadable modules make this number
> variable at runtime and there is a possibility we run out of page flag
> bits at runtime. In that case, the current patchset disables memory
> profiling and issues a warning that the user should disable
> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to avoid this situation. I expect
> this to be a rare case but it can happen and we have to provide a way
> for a user to avoid running out of bits, which is where
> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS would be used.
If this is in fact a rare case, I think it may make more sense for the
config to be on by default, and the config description would clearly
state that it is intended to be turned off only if the loadable
modules warning fires.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 3/5] alloc_tag: populate memory for module tags as needed
2024-10-14 20:36 ` [PATCH v3 3/5] alloc_tag: populate memory for module tags as needed Suren Baghdasaryan
@ 2024-10-15 12:15 ` Mike Rapoport
2024-10-15 14:49 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: Mike Rapoport @ 2024-10-15 12:15 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, Oct 14, 2024 at 01:36:44PM -0700, Suren Baghdasaryan wrote:
> The memory reserved for module tags does not need to be backed by
> physical pages until there are tags to store there. Change the way
> we reserve this memory to allocate only virtual area for the tags
> and populate it with physical pages as needed when we load a module.
>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
> include/linux/execmem.h | 11 ++++++
> include/linux/vmalloc.h | 9 +++++
> lib/alloc_tag.c | 84 +++++++++++++++++++++++++++++++++--------
> mm/execmem.c | 16 ++++++++
> mm/vmalloc.c | 4 +-
> 5 files changed, 106 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/execmem.h b/include/linux/execmem.h
> index 7436aa547818..a159a073270a 100644
> --- a/include/linux/execmem.h
> +++ b/include/linux/execmem.h
> @@ -127,6 +127,17 @@ void *execmem_alloc(enum execmem_type type, size_t size);
> */
> void execmem_free(void *ptr);
>
> +/**
> + * execmem_vmap - create virtual mapping for executable memory
> + * @type: type of the allocation
> + * @size: size of the virtual mapping in bytes
> + *
> + * Maps virtually contiguous area that can be populated with executable code.
> + *
> + * Return: the area descriptor on success or %NULL on failure.
> + */
> +struct vm_struct *execmem_vmap(enum execmem_type type, size_t size);
> +
I think it's better limit it to EXECMEM_MODULE_DATA
> /**
> * execmem_update_copy - copy an update to executable memory
> * @dst: destination address to update
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 9a012cd4fad2..9d64cc6f24d1 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -202,6 +202,9 @@ extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
> extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
> unsigned long pgoff);
>
> +int vmap_pages_range(unsigned long addr, unsigned long end,
> + pgprot_t prot, struct page **pages, unsigned int page_shift);
> +
>
> /*
> * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
> * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
> @@ -239,6 +242,12 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size,
> unsigned long flags,
> unsigned long start, unsigned long end,
> const void *caller);
> +struct vm_struct *__get_vm_area_node(unsigned long size,
> + unsigned long align, unsigned long shift,
> + unsigned long flags, unsigned long start,
> + unsigned long end, int node, gfp_t gfp_mask,
> + const void *caller);
> +
This is not used outside mm/, let's put it into mm/internal.h
> void free_vm_area(struct vm_struct *area);
> extern struct vm_struct *remove_vm_area(const void *addr);
> extern struct vm_struct *find_vm_area(const void *addr);
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index b10e7f17eeda..648f32d52b8d 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -8,6 +8,7 @@
> #include <linux/proc_fs.h>
> #include <linux/seq_buf.h>
> #include <linux/seq_file.h>
> +#include <linux/vmalloc.h>
>
> static struct codetag_type *alloc_tag_cttype;
>
> @@ -153,6 +154,7 @@ static void __init procfs_init(void)
> #ifdef CONFIG_MODULES
>
> static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
> +static struct vm_struct *vm_module_tags;
> /* A dummy object used to indicate an unloaded module */
> static struct module unloaded_mod;
> /* A dummy object used to indicate a module prepended area */
> @@ -195,6 +197,25 @@ static void clean_unused_module_areas_locked(void)
> }
> }
>
> +static int vm_module_tags_grow(unsigned long addr, unsigned long bytes)
> +{
> + struct page **next_page = vm_module_tags->pages + vm_module_tags->nr_pages;
> + unsigned long more_pages = ALIGN(bytes, PAGE_SIZE) >> PAGE_SHIFT;
> + unsigned long nr;
> +
> + nr = alloc_pages_bulk_array_node(GFP_KERNEL | __GFP_NOWARN,
> + NUMA_NO_NODE, more_pages, next_page);
> + if (nr != more_pages)
> + return -ENOMEM;
> +
> + vm_module_tags->nr_pages += nr;
> + if (vmap_pages_range(addr, addr + (nr << PAGE_SHIFT),
> + PAGE_KERNEL, next_page, PAGE_SHIFT) < 0)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> static void *reserve_module_tags(struct module *mod, unsigned long size,
> unsigned int prepend, unsigned long align)
> {
> @@ -202,7 +223,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
> MA_STATE(mas, &mod_area_mt, 0, section_size - 1);
> bool cleanup_done = false;
> unsigned long offset;
> - void *ret;
> + void *ret = NULL;
>
> /* If no tags return NULL */
> if (size < sizeof(struct alloc_tag))
> @@ -239,7 +260,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
> goto repeat;
> } else {
> ret = ERR_PTR(-ENOMEM);
> - goto out;
> + goto unlock;
> }
>
> found:
> @@ -254,7 +275,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
> mas_store(&mas, &prepend_mod);
> if (mas_is_err(&mas)) {
> ret = ERR_PTR(xa_err(mas.node));
> - goto out;
> + goto unlock;
> }
> mas.index = offset;
> mas.last = offset + size - 1;
> @@ -263,7 +284,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
> ret = ERR_PTR(xa_err(mas.node));
> mas.index = pad_start;
> mas_erase(&mas);
> - goto out;
> + goto unlock;
> }
>
> } else {
> @@ -271,18 +292,33 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
> mas_store(&mas, mod);
> if (mas_is_err(&mas)) {
> ret = ERR_PTR(xa_err(mas.node));
> - goto out;
> + goto unlock;
> }
> }
> +unlock:
> + mas_unlock(&mas);
> + if (IS_ERR(ret))
> + return ret;
>
> - if (module_tags.size < offset + size)
> - module_tags.size = offset + size;
> + if (module_tags.size < offset + size) {
> + unsigned long phys_size = vm_module_tags->nr_pages << PAGE_SHIFT;
>
> - ret = (struct alloc_tag *)(module_tags.start_addr + offset);
> -out:
> - mas_unlock(&mas);
> + module_tags.size = offset + size;
> + if (phys_size < module_tags.size) {
> + int grow_res;
> +
> + grow_res = vm_module_tags_grow(module_tags.start_addr + phys_size,
> + module_tags.size - phys_size);
> + if (grow_res) {
> + static_branch_disable(&mem_alloc_profiling_key);
> + pr_warn("Failed to allocate tags memory for module %s. Memory profiling is disabled!\n",
> + mod->name);
> + return ERR_PTR(grow_res);
> + }
> + }
> + }
The diff for reserve_module_tags() is hard to read, and the function itself
becomes really complex to follow with all the gotos back and forth.
Maybe it's possible to split out some parts of it as helpers?
> - return ret;
> + return (struct alloc_tag *)(module_tags.start_addr + offset);
> }
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 3/5] alloc_tag: populate memory for module tags as needed
2024-10-15 12:15 ` Mike Rapoport
@ 2024-10-15 14:49 ` Suren Baghdasaryan
0 siblings, 0 replies; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-15 14:49 UTC (permalink / raw)
To: Mike Rapoport
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Tue, Oct 15, 2024 at 5:19 AM 'Mike Rapoport' via kernel-team
<kernel-team@android.com> wrote:
>
> On Mon, Oct 14, 2024 at 01:36:44PM -0700, Suren Baghdasaryan wrote:
> > The memory reserved for module tags does not need to be backed by
> > physical pages until there are tags to store there. Change the way
> > we reserve this memory to allocate only virtual area for the tags
> > and populate it with physical pages as needed when we load a module.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > ---
> > include/linux/execmem.h | 11 ++++++
> > include/linux/vmalloc.h | 9 +++++
> > lib/alloc_tag.c | 84 +++++++++++++++++++++++++++++++++--------
> > mm/execmem.c | 16 ++++++++
> > mm/vmalloc.c | 4 +-
> > 5 files changed, 106 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/linux/execmem.h b/include/linux/execmem.h
> > index 7436aa547818..a159a073270a 100644
> > --- a/include/linux/execmem.h
> > +++ b/include/linux/execmem.h
> > @@ -127,6 +127,17 @@ void *execmem_alloc(enum execmem_type type, size_t size);
> > */
> > void execmem_free(void *ptr);
> >
> > +/**
> > + * execmem_vmap - create virtual mapping for executable memory
> > + * @type: type of the allocation
> > + * @size: size of the virtual mapping in bytes
> > + *
> > + * Maps virtually contiguous area that can be populated with executable code.
> > + *
> > + * Return: the area descriptor on success or %NULL on failure.
> > + */
> > +struct vm_struct *execmem_vmap(enum execmem_type type, size_t size);
> > +
>
> I think it's better limit it to EXECMEM_MODULE_DATA
Ack.
>
> > /**
> > * execmem_update_copy - copy an update to executable memory
> > * @dst: destination address to update
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 9a012cd4fad2..9d64cc6f24d1 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -202,6 +202,9 @@ extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
> > extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
> > unsigned long pgoff);
> >
> > +int vmap_pages_range(unsigned long addr, unsigned long end,
> > + pgprot_t prot, struct page **pages, unsigned int page_shift);
> > +
> >
> > /*
> > * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
> > * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
> > @@ -239,6 +242,12 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size,
> > unsigned long flags,
> > unsigned long start, unsigned long end,
> > const void *caller);
> > +struct vm_struct *__get_vm_area_node(unsigned long size,
> > + unsigned long align, unsigned long shift,
> > + unsigned long flags, unsigned long start,
> > + unsigned long end, int node, gfp_t gfp_mask,
> > + const void *caller);
> > +
>
> This is not used outside mm/, let's put it into mm/internal.h
Ack.
>
> > void free_vm_area(struct vm_struct *area);
> > extern struct vm_struct *remove_vm_area(const void *addr);
> > extern struct vm_struct *find_vm_area(const void *addr);
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index b10e7f17eeda..648f32d52b8d 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -8,6 +8,7 @@
> > #include <linux/proc_fs.h>
> > #include <linux/seq_buf.h>
> > #include <linux/seq_file.h>
> > +#include <linux/vmalloc.h>
> >
> > static struct codetag_type *alloc_tag_cttype;
> >
> > @@ -153,6 +154,7 @@ static void __init procfs_init(void)
> > #ifdef CONFIG_MODULES
> >
> > static struct maple_tree mod_area_mt = MTREE_INIT(mod_area_mt, MT_FLAGS_ALLOC_RANGE);
> > +static struct vm_struct *vm_module_tags;
> > /* A dummy object used to indicate an unloaded module */
> > static struct module unloaded_mod;
> > /* A dummy object used to indicate a module prepended area */
> > @@ -195,6 +197,25 @@ static void clean_unused_module_areas_locked(void)
> > }
> > }
> >
> > +static int vm_module_tags_grow(unsigned long addr, unsigned long bytes)
> > +{
> > + struct page **next_page = vm_module_tags->pages + vm_module_tags->nr_pages;
> > + unsigned long more_pages = ALIGN(bytes, PAGE_SIZE) >> PAGE_SHIFT;
> > + unsigned long nr;
> > +
> > + nr = alloc_pages_bulk_array_node(GFP_KERNEL | __GFP_NOWARN,
> > + NUMA_NO_NODE, more_pages, next_page);
> > + if (nr != more_pages)
> > + return -ENOMEM;
> > +
> > + vm_module_tags->nr_pages += nr;
> > + if (vmap_pages_range(addr, addr + (nr << PAGE_SHIFT),
> > + PAGE_KERNEL, next_page, PAGE_SHIFT) < 0)
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > static void *reserve_module_tags(struct module *mod, unsigned long size,
> > unsigned int prepend, unsigned long align)
> > {
> > @@ -202,7 +223,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
> > MA_STATE(mas, &mod_area_mt, 0, section_size - 1);
> > bool cleanup_done = false;
> > unsigned long offset;
> > - void *ret;
> > + void *ret = NULL;
> >
> > /* If no tags return NULL */
> > if (size < sizeof(struct alloc_tag))
> > @@ -239,7 +260,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
> > goto repeat;
> > } else {
> > ret = ERR_PTR(-ENOMEM);
> > - goto out;
> > + goto unlock;
> > }
> >
> > found:
> > @@ -254,7 +275,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
> > mas_store(&mas, &prepend_mod);
> > if (mas_is_err(&mas)) {
> > ret = ERR_PTR(xa_err(mas.node));
> > - goto out;
> > + goto unlock;
> > }
> > mas.index = offset;
> > mas.last = offset + size - 1;
> > @@ -263,7 +284,7 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
> > ret = ERR_PTR(xa_err(mas.node));
> > mas.index = pad_start;
> > mas_erase(&mas);
> > - goto out;
> > + goto unlock;
> > }
> >
> > } else {
> > @@ -271,18 +292,33 @@ static void *reserve_module_tags(struct module *mod, unsigned long size,
> > mas_store(&mas, mod);
> > if (mas_is_err(&mas)) {
> > ret = ERR_PTR(xa_err(mas.node));
> > - goto out;
> > + goto unlock;
> > }
> > }
> > +unlock:
> > + mas_unlock(&mas);
> > + if (IS_ERR(ret))
> > + return ret;
> >
> > - if (module_tags.size < offset + size)
> > - module_tags.size = offset + size;
> > + if (module_tags.size < offset + size) {
> > + unsigned long phys_size = vm_module_tags->nr_pages << PAGE_SHIFT;
> >
> > - ret = (struct alloc_tag *)(module_tags.start_addr + offset);
> > -out:
> > - mas_unlock(&mas);
> > + module_tags.size = offset + size;
> > + if (phys_size < module_tags.size) {
> > + int grow_res;
> > +
> > + grow_res = vm_module_tags_grow(module_tags.start_addr + phys_size,
> > + module_tags.size - phys_size);
> > + if (grow_res) {
> > + static_branch_disable(&mem_alloc_profiling_key);
> > + pr_warn("Failed to allocate tags memory for module %s. Memory profiling is disabled!\n",
> > + mod->name);
> > + return ERR_PTR(grow_res);
> > + }
> > + }
> > + }
>
> The diff for reserve_module_tags() is hard to read, and the function itself
> becomes really complex to follow with all the gotos back and forth.
> Maybe it's possible to split out some parts of it as helpers?
Got it. Will refactor this function to make it easier to review.
Thanks for the prompt review, Mike!
>
> > - return ret;
> > + return (struct alloc_tag *)(module_tags.start_addr + offset);
> > }
> >
>
> --
> Sincerely yours,
> Mike.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-15 7:32 ` David Hildenbrand
@ 2024-10-15 14:59 ` Suren Baghdasaryan
2024-10-15 15:42 ` David Hildenbrand
0 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-15 14:59 UTC (permalink / raw)
To: David Hildenbrand
Cc: John Hubbard, Yosry Ahmed, akpm, kent.overstreet, corbet, arnd,
mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
vbabka, mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Tue, Oct 15, 2024 at 12:32 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 15.10.24 01:53, John Hubbard wrote:
> > On 10/14/24 4:48 PM, Yosry Ahmed wrote:
> >> On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>
> >>> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
> >>> references directly in the page flags. This eliminates memory
> >>> overhead caused by page_ext and results in better performance
> >>> for page allocations.
> >>> If the number of available page flag bits is insufficient to
> >>> address all kernel allocations, profiling falls back to using
> >>> page extensions with an appropriate warning to disable this
> >>> config.
> >>> If dynamically loaded modules add enough tags that they can't
> >>> be addressed anymore with available page flag bits, memory
> >>> profiling gets disabled and a warning is issued.
> >>
> >> Just curious, why do we need a config option? If there are enough bits
> >> in page flags, why not use them automatically or fallback to page_ext
> >> otherwise?
> >
> > Or better yet, *always* fall back to page_ext, thus leaving the
> > scarce and valuable page flags available for other features?
> >
> > Sorry Suren, to keep coming back to this suggestion, I know
> > I'm driving you crazy here! But I just keep thinking it through
> > and failing to see why this feature deserves to consume so
> > many page flags.
>
> My 2 cents: there is nothing wrong about consuming unused page flags in
> a configuration. No need to let them stay unused in a configuration :)
>
> The real issue starts once another feature wants to make use of some of
> them ... in such configuration there would be less available for
> allocation tags and the performance of allocations tags might
> consequently get worse again.
Thanks for the input and indeed this is the case. If this happens, we
will get a warning telling us that page flags could not be used and
page_ext will be used instead. I think that's the best I can do given
that page flag bits is a limited resource.
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-15 8:10 ` Yosry Ahmed
@ 2024-10-15 15:06 ` Suren Baghdasaryan
0 siblings, 0 replies; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-15 15:06 UTC (permalink / raw)
To: Yosry Ahmed
Cc: John Hubbard, akpm, kent.overstreet, corbet, arnd, mcgrof, rppt,
paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Tue, Oct 15, 2024 at 1:10 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Mon, Oct 14, 2024 at 6:58 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Mon, Oct 14, 2024 at 5:03 PM 'John Hubbard' via kernel-team
> > <kernel-team@android.com> wrote:
> > >
> > > On 10/14/24 4:56 PM, Yosry Ahmed wrote:
> > > > On Mon, Oct 14, 2024 at 4:53 PM John Hubbard <jhubbard@nvidia.com> wrote:
> > > >>
> > > >> On 10/14/24 4:48 PM, Yosry Ahmed wrote:
> > > >>> On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >>>>
> > > >>>> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
> > > >>>> references directly in the page flags. This eliminates memory
> > > >>>> overhead caused by page_ext and results in better performance
> > > >>>> for page allocations.
> > > >>>> If the number of available page flag bits is insufficient to
> > > >>>> address all kernel allocations, profiling falls back to using
> > > >>>> page extensions with an appropriate warning to disable this
> > > >>>> config.
> > > >>>> If dynamically loaded modules add enough tags that they can't
> > > >>>> be addressed anymore with available page flag bits, memory
> > > >>>> profiling gets disabled and a warning is issued.
> > > >>>
> > > >>> Just curious, why do we need a config option? If there are enough bits
> > > >>> in page flags, why not use them automatically or fallback to page_ext
> > > >>> otherwise?
> > > >>
> > > >> Or better yet, *always* fall back to page_ext, thus leaving the
> > > >> scarce and valuable page flags available for other features?
> > > >>
> > > >> Sorry Suren, to keep coming back to this suggestion, I know
> > > >> I'm driving you crazy here! But I just keep thinking it through
> > > >> and failing to see why this feature deserves to consume so
> > > >> many page flags.
> > > >
> > > > I think we already always use page_ext today. My understanding is that
> > > > the purpose of this series is to give the option to avoid using
> > > > page_ext if there are enough unused page flags anyway, which reduces
> > > > memory waste and improves performance.
> > > >
> > > > My question is just why not have that be the default behavior with a
> > > > config option, use page flags if there are enough unused bits,
> > > > otherwise use page_ext.
> > >
> > > I agree that if you're going to implement this feature at all, then
> > > keying off of CONFIG_MEM_ALLOC_PROFILING seems sufficient, and no
> > > need to add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS on top of that.
> >
> > Yosry's original guess was correct. If not for loadable modules we
> > could get away with having no CONFIG_PGALLOC_TAG_USE_PAGEFLAGS. We
> > could try to fit codetag references into page flags and if they do not
> > fit we could fall back to page_ext. That works fine when we have a
> > predetermined number of tags. But loadable modules make this number
> > variable at runtime and there is a possibility we run out of page flag
> > bits at runtime. In that case, the current patchset disables memory
> > profiling and issues a warning that the user should disable
> > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to avoid this situation. I expect
> > this to be a rare case but it can happen and we have to provide a way
> > for a user to avoid running out of bits, which is where
> > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS would be used.
>
> If this is in fact a rare case, I think it may make more sense for the
> config to be on by default, and the config description would clearly
> state that it is intended to be turned off only if the loadable
> modules warning fires.
Just to be clear, I think running out of pageflag bits at runtime (as
a result of module loading) will be a rare case. That's because the
core kernel has many more allocations than a typical module. Not
having enough bits for the core kernel might be the most prevalent
case, at least for large systems.
That said, your suggestion makes sense. Since we have to keep
CONFIG_PAGE_EXTENSION dependency (for the cases when we have to fall
back to page_ext), there is no advantage of keeping
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS disabled. I'll change the default in
the next version. Thanks for the suggestion!
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-15 14:59 ` Suren Baghdasaryan
@ 2024-10-15 15:42 ` David Hildenbrand
2024-10-15 15:58 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2024-10-15 15:42 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: John Hubbard, Yosry Ahmed, akpm, kent.overstreet, corbet, arnd,
mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
vbabka, mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On 15.10.24 16:59, Suren Baghdasaryan wrote:
> On Tue, Oct 15, 2024 at 12:32 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 15.10.24 01:53, John Hubbard wrote:
>>> On 10/14/24 4:48 PM, Yosry Ahmed wrote:
>>>> On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>>
>>>>> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
>>>>> references directly in the page flags. This eliminates memory
>>>>> overhead caused by page_ext and results in better performance
>>>>> for page allocations.
>>>>> If the number of available page flag bits is insufficient to
>>>>> address all kernel allocations, profiling falls back to using
>>>>> page extensions with an appropriate warning to disable this
>>>>> config.
>>>>> If dynamically loaded modules add enough tags that they can't
>>>>> be addressed anymore with available page flag bits, memory
>>>>> profiling gets disabled and a warning is issued.
>>>>
>>>> Just curious, why do we need a config option? If there are enough bits
>>>> in page flags, why not use them automatically or fallback to page_ext
>>>> otherwise?
>>>
>>> Or better yet, *always* fall back to page_ext, thus leaving the
>>> scarce and valuable page flags available for other features?
>>>
>>> Sorry Suren, to keep coming back to this suggestion, I know
>>> I'm driving you crazy here! But I just keep thinking it through
>>> and failing to see why this feature deserves to consume so
>>> many page flags.
>>
>> My 2 cents: there is nothing wrong about consuming unused page flags in
>> a configuration. No need to let them stay unused in a configuration :)
>>
>> The real issue starts once another feature wants to make use of some of
>> them ... in such configuration there would be less available for
>> allocation tags and the performance of allocations tags might
>> consequently get worse again.
>
> Thanks for the input and indeed this is the case. If this happens, we
> will get a warning telling us that page flags could not be used and
> page_ext will be used instead. I think that's the best I can do given
> that page flag bits is a limited resource.
Right, I think what John is concerned about (and me as well) is that
once a new feature really needs a page flag, there will be objection
like "no you can't, we need them for allocation tags otherwise that
feature will be degraded".
So a "The Lord has given, and the Lord has taken away!" mentality might
be required when consuming that many scarce resources, meaning, as long
as they are actually unused, use them, but it should not block other
features that really need them.
Does that make sense?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-15 15:42 ` David Hildenbrand
@ 2024-10-15 15:58 ` Suren Baghdasaryan
2024-10-18 13:03 ` Michal Hocko
0 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-15 15:58 UTC (permalink / raw)
To: David Hildenbrand
Cc: John Hubbard, Yosry Ahmed, akpm, kent.overstreet, corbet, arnd,
mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
vbabka, mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Tue, Oct 15, 2024 at 8:42 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 15.10.24 16:59, Suren Baghdasaryan wrote:
> > On Tue, Oct 15, 2024 at 12:32 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 15.10.24 01:53, John Hubbard wrote:
> >>> On 10/14/24 4:48 PM, Yosry Ahmed wrote:
> >>>> On Mon, Oct 14, 2024 at 1:37 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>>>
> >>>>> Add CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to store allocation tag
> >>>>> references directly in the page flags. This eliminates memory
> >>>>> overhead caused by page_ext and results in better performance
> >>>>> for page allocations.
> >>>>> If the number of available page flag bits is insufficient to
> >>>>> address all kernel allocations, profiling falls back to using
> >>>>> page extensions with an appropriate warning to disable this
> >>>>> config.
> >>>>> If dynamically loaded modules add enough tags that they can't
> >>>>> be addressed anymore with available page flag bits, memory
> >>>>> profiling gets disabled and a warning is issued.
> >>>>
> >>>> Just curious, why do we need a config option? If there are enough bits
> >>>> in page flags, why not use them automatically or fallback to page_ext
> >>>> otherwise?
> >>>
> >>> Or better yet, *always* fall back to page_ext, thus leaving the
> >>> scarce and valuable page flags available for other features?
> >>>
> >>> Sorry Suren, to keep coming back to this suggestion, I know
> >>> I'm driving you crazy here! But I just keep thinking it through
> >>> and failing to see why this feature deserves to consume so
> >>> many page flags.
> >>
> >> My 2 cents: there is nothing wrong about consuming unused page flags in
> >> a configuration. No need to let them stay unused in a configuration :)
> >>
> >> The real issue starts once another feature wants to make use of some of
> >> them ... in such configuration there would be less available for
> >> allocation tags and the performance of allocations tags might
> >> consequently get worse again.
> >
> > Thanks for the input and indeed this is the case. If this happens, we
> > will get a warning telling us that page flags could not be used and
> > page_ext will be used instead. I think that's the best I can do given
> > that page flag bits is a limited resource.
>
> Right, I think what John is concerned about (and me as well) is that
> once a new feature really needs a page flag, there will be objection
> like "no you can't, we need them for allocation tags otherwise that
> feature will be degraded".
I do understand your concern but IMHO the possibility of degrading a
feature should not be a reason to always operate at degraded capacity
(which is what we have today). If one is really concerned about
possible future regression they can set
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS=n and keep what we have today. That's
why I'm strongly advocating that we do need
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS so that the user has control over how
this scarce resource is used.
>
> So a "The Lord has given, and the Lord has taken away!" mentality might
> be required when consuming that many scarce resources, meaning, as long
> as they are actually unused, use them, but it should not block other
> features that really need them.
I agree and I think that's what I implemented here. If there are
enough page flag bits we use them, otherwise we automatically fall
back to page_ext.
>
> Does that make sense?
Absolutely and thank you all for the feedback.
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 0/5] page allocation tag compression
2024-10-15 1:48 ` Suren Baghdasaryan
@ 2024-10-15 16:26 ` Suren Baghdasaryan
0 siblings, 0 replies; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-15 16:26 UTC (permalink / raw)
To: Andrew Morton
Cc: kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, liam.howlett, pasha.tatashin,
souravpanda, keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Mon, Oct 14, 2024 at 6:48 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Oct 14, 2024 at 4:32 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Mon, 14 Oct 2024 13:36:41 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > > Patch #2 copies module tags into virtually contiguous memory which
> > > serves two purposes:
> > > - Lets us deal with the situation when module is unloaded while there
> > > are still live allocations from that module. Since we are using a copy
> > > version of the tags we can safely unload the module. Space and gaps in
> > > this contiguous memory are managed using a maple tree.
> >
> > Does this make "lib: alloc_tag_module_unload must wait for pending
> > kfree_rcu calls" unneeded?
>
> With this change we can unload a module even when tags from that
> module are still in use. However "lib: alloc_tag_module_unload must
> wait for pending kfree_rcu calls" would still be useful because it
> will allow us to release the memory occupied by module's tags and let
> other modules use that memory.
>
> > If so, that patch was cc:stable (justifyably), so what to do about that?
>
> Now that I posted this patchset I'll work on backporting "lib:
> alloc_tag_module_unload must wait for pending kfree_rcu calls" and its
> prerequisites to 6.10 and 6.11. I'll try to get backports out
> tomorrow.
I prepared 6.10 and 6.11 backports for
https://lore.kernel.org/all/20241007205236.11847-1-fw@strlen.de but
will wait for it to get merged into Linus' tree before posting them to
stable.
Thanks,
Suren.
> I don't think we need to backport this patchset to pre-6.12 kernels
> since this is an improvement and not a bug fix. But if it's needed I
> can backport it as well.
> Thanks,
> Suren.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 2/5] alloc_tag: load module tags into separate contiguous memory
2024-10-15 2:10 ` Suren Baghdasaryan
@ 2024-10-15 21:08 ` Shakeel Butt
2024-10-15 22:59 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: Shakeel Butt @ 2024-10-15 21:08 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Andrew Morton, kent.overstreet, corbet, arnd, mcgrof, rppt,
paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, jhubbard, yuzhao,
vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Mon, Oct 14, 2024 at 07:10:56PM GMT, Suren Baghdasaryan wrote:
> On Mon, Oct 14, 2024 at 4:51 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Mon, 14 Oct 2024 13:36:43 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > > When a module gets unloaded there is a possibility that some of the
> > > allocations it made are still used and therefore the allocation tags
> > > corresponding to these allocations are still referenced. As such, the
> > > memory for these tags can't be freed. This is currently handled as an
> > > abnormal situation and module's data section is not being unloaded.
> > > To handle this situation without keeping module's data in memory,
> > > allow codetags with longer lifespan than the module to be loaded into
> > > their own separate memory. The in-use memory areas and gaps after
> > > module unloading in this separate memory are tracked using maple trees.
> > > Allocation tags arrange their separate memory so that it is virtually
> > > contiguous and that will allow simple allocation tag indexing later on
> > > in this patchset. The size of this virtually contiguous memory is set
> > > to store up to 100000 allocation tags.
> > >
> > > ...
> > >
> > > --- a/kernel/module/main.c
> > > +++ b/kernel/module/main.c
> > > @@ -1254,22 +1254,17 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
> > > return 0;
> > > }
> > >
> > > -static void module_memory_free(struct module *mod, enum mod_mem_type type,
> > > - bool unload_codetags)
> > > +static void module_memory_free(struct module *mod, enum mod_mem_type type)
> > > {
> > > struct module_memory *mem = &mod->mem[type];
> > > - void *ptr = mem->base;
> > >
> > > if (mem->is_rox)
> > > vfree(mem->rw_copy);
> > >
> > > - if (!unload_codetags && mod_mem_type_is_core_data(type))
> > > - return;
> > > -
> > > - execmem_free(ptr);
> > > + execmem_free(mem->base);
> > > }
> >
> > The changes around here are dependent upon Mike's "module: make
> > module_memory_{alloc,free} more self-contained", which is no longer in
> > mm-unstable. I assume Mike is working on a v2 so I'll park this series
> > for now.
>
> Looks like the last update on Mike's patchset was back in May. Let me
> check with Mike if he is planning to get it out soon. I would like my
> patchset to get into 6.12 if possible.
6.12 or 6.13?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 2/5] alloc_tag: load module tags into separate contiguous memory
2024-10-15 21:08 ` Shakeel Butt
@ 2024-10-15 22:59 ` Suren Baghdasaryan
0 siblings, 0 replies; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-15 22:59 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, kent.overstreet, corbet, arnd, mcgrof, rppt,
paulmck, thuth, tglx, bp, xiongwei.song, ardb, david, vbabka,
mhocko, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, jhubbard, yuzhao,
vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On Tue, Oct 15, 2024 at 2:08 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Mon, Oct 14, 2024 at 07:10:56PM GMT, Suren Baghdasaryan wrote:
> > On Mon, Oct 14, 2024 at 4:51 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Mon, 14 Oct 2024 13:36:43 -0700 Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > > When a module gets unloaded there is a possibility that some of the
> > > > allocations it made are still used and therefore the allocation tags
> > > > corresponding to these allocations are still referenced. As such, the
> > > > memory for these tags can't be freed. This is currently handled as an
> > > > abnormal situation and module's data section is not being unloaded.
> > > > To handle this situation without keeping module's data in memory,
> > > > allow codetags with longer lifespan than the module to be loaded into
> > > > their own separate memory. The in-use memory areas and gaps after
> > > > module unloading in this separate memory are tracked using maple trees.
> > > > Allocation tags arrange their separate memory so that it is virtually
> > > > contiguous and that will allow simple allocation tag indexing later on
> > > > in this patchset. The size of this virtually contiguous memory is set
> > > > to store up to 100000 allocation tags.
> > > >
> > > > ...
> > > >
> > > > --- a/kernel/module/main.c
> > > > +++ b/kernel/module/main.c
> > > > @@ -1254,22 +1254,17 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
> > > > return 0;
> > > > }
> > > >
> > > > -static void module_memory_free(struct module *mod, enum mod_mem_type type,
> > > > - bool unload_codetags)
> > > > +static void module_memory_free(struct module *mod, enum mod_mem_type type)
> > > > {
> > > > struct module_memory *mem = &mod->mem[type];
> > > > - void *ptr = mem->base;
> > > >
> > > > if (mem->is_rox)
> > > > vfree(mem->rw_copy);
> > > >
> > > > - if (!unload_codetags && mod_mem_type_is_core_data(type))
> > > > - return;
> > > > -
> > > > - execmem_free(ptr);
> > > > + execmem_free(mem->base);
> > > > }
> > >
> > > The changes around here are dependent upon Mike's "module: make
> > > module_memory_{alloc,free} more self-contained", which is no longer in
> > > mm-unstable. I assume Mike is working on a v2 so I'll park this series
> > > for now.
> >
> > Looks like the last update on Mike's patchset was back in May. Let me
> > check with Mike if he is planning to get it out soon. I would like my
> > patchset to get into 6.12 if possible.
>
> 6.12 or 6.13?
Right, it's 6.13 at this point.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 1/5] maple_tree: add mas_for_each_rev() helper
2024-10-14 20:36 ` [PATCH v3 1/5] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
@ 2024-10-16 1:48 ` Liam R. Howlett
2024-10-16 5:33 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: Liam R. Howlett @ 2024-10-16 1:48 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth,
tglx, bp, xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, pasha.tatashin, souravpanda,
keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
* Suren Baghdasaryan <surenb@google.com> [241014 16:36]:
> Add mas_for_each_rev() function to iterate maple tree nodes in reverse
> order.
>
> Suggested-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
I am now sure I added a R-B in a reply to this :)
> ---
> include/linux/maple_tree.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index c2c11004085e..e7e2caa1a95a 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -592,6 +592,20 @@ static __always_inline void mas_reset(struct ma_state *mas)
> #define mas_for_each(__mas, __entry, __max) \
> while (((__entry) = mas_find((__mas), (__max))) != NULL)
>
> +/**
> + * mas_for_each_rev() - Iterate over a range of the maple tree in reverse order.
> + * @__mas: Maple Tree operation state (maple_state)
> + * @__entry: Entry retrieved from the tree
> + * @__min: minimum index to retrieve from the tree
> + *
> + * When returned, mas->index and mas->last will hold the entire range for the
> + * entry.
> + *
> + * Note: may return the zero entry.
> + */
> +#define mas_for_each_rev(__mas, __entry, __min) \
> + while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
> +
> #ifdef CONFIG_DEBUG_MAPLE_TREE
> enum mt_dump_format {
> mt_dump_dec,
> --
> 2.47.0.rc1.288.g06298d1525-goog
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 1/5] maple_tree: add mas_for_each_rev() helper
2024-10-16 1:48 ` Liam R. Howlett
@ 2024-10-16 5:33 ` Suren Baghdasaryan
0 siblings, 0 replies; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-16 5:33 UTC (permalink / raw)
To: Liam R. Howlett, Suren Baghdasaryan, akpm, kent.overstreet,
corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx, bp,
xiongwei.song, ardb, david, vbabka, mhocko, hannes,
roman.gushchin, dave, willy, pasha.tatashin, souravpanda,
keescook, dennis, jhubbard, yuzhao, vvvvvv, rostedt,
iamjoonsoo.kim, rientjes, minchan, kaleshsingh, linux-doc,
linux-kernel, linux-arch, linux-mm, linux-modules, kernel-team
On Tue, Oct 15, 2024 at 6:48 PM 'Liam R. Howlett' via kernel-team
<kernel-team@android.com> wrote:
>
> * Suren Baghdasaryan <surenb@google.com> [241014 16:36]:
> > Add mas_for_each_rev() function to iterate maple tree nodes in reverse
> > order.
> >
> > Suggested-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
>
> I am now sure I added a R-B in a reply to this :)
Sorry, I missed it. Will add in the next version. Thanks!
>
> > ---
> > include/linux/maple_tree.h | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> > index c2c11004085e..e7e2caa1a95a 100644
> > --- a/include/linux/maple_tree.h
> > +++ b/include/linux/maple_tree.h
> > @@ -592,6 +592,20 @@ static __always_inline void mas_reset(struct ma_state *mas)
> > #define mas_for_each(__mas, __entry, __max) \
> > while (((__entry) = mas_find((__mas), (__max))) != NULL)
> >
> > +/**
> > + * mas_for_each_rev() - Iterate over a range of the maple tree in reverse order.
> > + * @__mas: Maple Tree operation state (maple_state)
> > + * @__entry: Entry retrieved from the tree
> > + * @__min: minimum index to retrieve from the tree
> > + *
> > + * When returned, mas->index and mas->last will hold the entire range for the
> > + * entry.
> > + *
> > + * Note: may return the zero entry.
> > + */
> > +#define mas_for_each_rev(__mas, __entry, __min) \
> > + while (((__entry) = mas_find_rev((__mas), (__min))) != NULL)
> > +
> > #ifdef CONFIG_DEBUG_MAPLE_TREE
> > enum mt_dump_format {
> > mt_dump_dec,
> > --
> > 2.47.0.rc1.288.g06298d1525-goog
> >
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-15 15:58 ` Suren Baghdasaryan
@ 2024-10-18 13:03 ` Michal Hocko
2024-10-18 16:04 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: Michal Hocko @ 2024-10-18 13:03 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Tue 15-10-24 08:58:59, Suren Baghdasaryan wrote:
> On Tue, Oct 15, 2024 at 8:42 AM David Hildenbrand <david@redhat.com> wrote:
[...]
> > Right, I think what John is concerned about (and me as well) is that
> > once a new feature really needs a page flag, there will be objection
> > like "no you can't, we need them for allocation tags otherwise that
> > feature will be degraded".
>
> I do understand your concern but IMHO the possibility of degrading a
> feature should not be a reason to always operate at degraded capacity
> (which is what we have today). If one is really concerned about
> possible future regression they can set
> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS=n and keep what we have today. That's
> why I'm strongly advocating that we do need
> CONFIG_PGALLOC_TAG_USE_PAGEFLAGS so that the user has control over how
> this scarce resource is used.
I really do not think users will know how/why to setup this and I wouldn't
even bother them thinking about that at all TBH.
This is an implementation detail. It is fine to reuse unused flags space
as a storage as a performance optimization but why do you want users to
bother with that? Why would they ever want to say N here?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-18 13:03 ` Michal Hocko
@ 2024-10-18 16:04 ` Suren Baghdasaryan
2024-10-18 17:08 ` Michal Hocko
0 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-18 16:04 UTC (permalink / raw)
To: Michal Hocko
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Fri, Oct 18, 2024 at 6:03 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 15-10-24 08:58:59, Suren Baghdasaryan wrote:
> > On Tue, Oct 15, 2024 at 8:42 AM David Hildenbrand <david@redhat.com> wrote:
> [...]
> > > Right, I think what John is concerned about (and me as well) is that
> > > once a new feature really needs a page flag, there will be objection
> > > like "no you can't, we need them for allocation tags otherwise that
> > > feature will be degraded".
> >
> > I do understand your concern but IMHO the possibility of degrading a
> > feature should not be a reason to always operate at degraded capacity
> > (which is what we have today). If one is really concerned about
> > possible future regression they can set
> > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS=n and keep what we have today. That's
> > why I'm strongly advocating that we do need
> > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS so that the user has control over how
> > this scarce resource is used.
>
> I really do not think users will know how/why to setup this and I wouldn't
> even bother them thinking about that at all TBH.
>
> This is an implementation detail. It is fine to reuse unused flags space
> as a storage as a performance optimization but why do you want users to
> bother with that? Why would they ever want to say N here?
In this patch you can find a couple of warnings that look like this:
pr_warn("With module %s there are too many tags to fit in %d page flag
bits. Memory profiling is disabled!\n", mod->name,
NR_UNUSED_PAGEFLAG_BITS);
emitted when we run out of page flag bits during a module loading,
pr_err("%s: alignment %lu is incompatible with allocation tag
indexing, disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS", mod->name,
align);
emitted when the arch-specific section alignment is incompatible with
alloc_tag indexing.
I'll change the first one to also specifically guide the user to
disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS.
When these happen, memory profiling gets disabled automatically. These
two cases would be the main ones when the user would want to disable
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to keep memory profiling enabled.
I also think when we auto-disable memory profiling at runtime like
that, I should make /proc/allocinfo empty so that it's apparent it is
disabled and the user does not use stale data.
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-18 16:04 ` Suren Baghdasaryan
@ 2024-10-18 17:08 ` Michal Hocko
2024-10-18 17:45 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: Michal Hocko @ 2024-10-18 17:08 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Fri 18-10-24 09:04:24, Suren Baghdasaryan wrote:
> On Fri, Oct 18, 2024 at 6:03 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Tue 15-10-24 08:58:59, Suren Baghdasaryan wrote:
> > > On Tue, Oct 15, 2024 at 8:42 AM David Hildenbrand <david@redhat.com> wrote:
> > [...]
> > > > Right, I think what John is concerned about (and me as well) is that
> > > > once a new feature really needs a page flag, there will be objection
> > > > like "no you can't, we need them for allocation tags otherwise that
> > > > feature will be degraded".
> > >
> > > I do understand your concern but IMHO the possibility of degrading a
> > > feature should not be a reason to always operate at degraded capacity
> > > (which is what we have today). If one is really concerned about
> > > possible future regression they can set
> > > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS=n and keep what we have today. That's
> > > why I'm strongly advocating that we do need
> > > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS so that the user has control over how
> > > this scarce resource is used.
> >
> > I really do not think users will know how/why to setup this and I wouldn't
> > even bother them thinking about that at all TBH.
> >
> > This is an implementation detail. It is fine to reuse unused flags space
> > as a storage as a performance optimization but why do you want users to
> > bother with that? Why would they ever want to say N here?
>
> In this patch you can find a couple of warnings that look like this:
>
> pr_warn("With module %s there are too many tags to fit in %d page flag
> bits. Memory profiling is disabled!\n", mod->name,
> NR_UNUSED_PAGEFLAG_BITS);
> emitted when we run out of page flag bits during a module loading,
>
> pr_err("%s: alignment %lu is incompatible with allocation tag
> indexing, disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS", mod->name,
> align);
> emitted when the arch-specific section alignment is incompatible with
> alloc_tag indexing.
You are asking users to workaround implementation issue by configuration
which sounds like a really bad idea. Why cannot you make the fallback
automatic?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-18 17:08 ` Michal Hocko
@ 2024-10-18 17:45 ` Suren Baghdasaryan
2024-10-18 21:57 ` Suren Baghdasaryan
2024-10-21 7:21 ` Michal Hocko
0 siblings, 2 replies; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-18 17:45 UTC (permalink / raw)
To: Michal Hocko
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Fri, Oct 18, 2024 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 18-10-24 09:04:24, Suren Baghdasaryan wrote:
> > On Fri, Oct 18, 2024 at 6:03 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Tue 15-10-24 08:58:59, Suren Baghdasaryan wrote:
> > > > On Tue, Oct 15, 2024 at 8:42 AM David Hildenbrand <david@redhat.com> wrote:
> > > [...]
> > > > > Right, I think what John is concerned about (and me as well) is that
> > > > > once a new feature really needs a page flag, there will be objection
> > > > > like "no you can't, we need them for allocation tags otherwise that
> > > > > feature will be degraded".
> > > >
> > > > I do understand your concern but IMHO the possibility of degrading a
> > > > feature should not be a reason to always operate at degraded capacity
> > > > (which is what we have today). If one is really concerned about
> > > > possible future regression they can set
> > > > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS=n and keep what we have today. That's
> > > > why I'm strongly advocating that we do need
> > > > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS so that the user has control over how
> > > > this scarce resource is used.
> > >
> > > I really do not think users will know how/why to setup this and I wouldn't
> > > even bother them thinking about that at all TBH.
> > >
> > > This is an implementation detail. It is fine to reuse unused flags space
> > > as a storage as a performance optimization but why do you want users to
> > > bother with that? Why would they ever want to say N here?
> >
> > In this patch you can find a couple of warnings that look like this:
> >
> > pr_warn("With module %s there are too many tags to fit in %d page flag
> > bits. Memory profiling is disabled!\n", mod->name,
> > NR_UNUSED_PAGEFLAG_BITS);
> > emitted when we run out of page flag bits during a module loading,
> >
> > pr_err("%s: alignment %lu is incompatible with allocation tag
> > indexing, disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS", mod->name,
> > align);
> > emitted when the arch-specific section alignment is incompatible with
> > alloc_tag indexing.
>
> You are asking users to workaround implementation issue by configuration
> which sounds like a really bad idea. Why cannot you make the fallback
> automatic?
Automatic fallback is possible during boot, when we decide whether to
enable page extensions or not. So, if during boot we decide to disable
page extensions and use page flags, we can't go back and re-enable
page extensions after boot is complete. Since there is a possibility
that we run out of page flags at runtime when we load a new module,
this leaves this case when we can't reference the module tags and we
can't fall back to page extensions, so we have to disable memory
profiling.
I could keep page extensions always on just in case this happens but
that's a lot of memory waste to handle a rare case...
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-18 17:45 ` Suren Baghdasaryan
@ 2024-10-18 21:57 ` Suren Baghdasaryan
2024-10-21 7:26 ` Michal Hocko
2024-10-21 7:21 ` Michal Hocko
1 sibling, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-18 21:57 UTC (permalink / raw)
To: Michal Hocko
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Fri, Oct 18, 2024 at 10:45 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Fri, Oct 18, 2024 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Fri 18-10-24 09:04:24, Suren Baghdasaryan wrote:
> > > On Fri, Oct 18, 2024 at 6:03 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Tue 15-10-24 08:58:59, Suren Baghdasaryan wrote:
> > > > > On Tue, Oct 15, 2024 at 8:42 AM David Hildenbrand <david@redhat.com> wrote:
> > > > [...]
> > > > > > Right, I think what John is concerned about (and me as well) is that
> > > > > > once a new feature really needs a page flag, there will be objection
> > > > > > like "no you can't, we need them for allocation tags otherwise that
> > > > > > feature will be degraded".
> > > > >
> > > > > I do understand your concern but IMHO the possibility of degrading a
> > > > > feature should not be a reason to always operate at degraded capacity
> > > > > (which is what we have today). If one is really concerned about
> > > > > possible future regression they can set
> > > > > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS=n and keep what we have today. That's
> > > > > why I'm strongly advocating that we do need
> > > > > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS so that the user has control over how
> > > > > this scarce resource is used.
> > > >
> > > > I really do not think users will know how/why to setup this and I wouldn't
> > > > even bother them thinking about that at all TBH.
> > > >
> > > > This is an implementation detail. It is fine to reuse unused flags space
> > > > as a storage as a performance optimization but why do you want users to
> > > > bother with that? Why would they ever want to say N here?
> > >
> > > In this patch you can find a couple of warnings that look like this:
> > >
> > > pr_warn("With module %s there are too many tags to fit in %d page flag
> > > bits. Memory profiling is disabled!\n", mod->name,
> > > NR_UNUSED_PAGEFLAG_BITS);
> > > emitted when we run out of page flag bits during a module loading,
> > >
> > > pr_err("%s: alignment %lu is incompatible with allocation tag
> > > indexing, disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS", mod->name,
> > > align);
> > > emitted when the arch-specific section alignment is incompatible with
> > > alloc_tag indexing.
> >
> > You are asking users to workaround implementation issue by configuration
> > which sounds like a really bad idea. Why cannot you make the fallback
> > automatic?
>
> Automatic fallback is possible during boot, when we decide whether to
> enable page extensions or not. So, if during boot we decide to disable
> page extensions and use page flags, we can't go back and re-enable
> page extensions after boot is complete. Since there is a possibility
> that we run out of page flags at runtime when we load a new module,
> this leaves this case when we can't reference the module tags and we
> can't fall back to page extensions, so we have to disable memory
> profiling.
> I could keep page extensions always on just in case this happens but
> that's a lot of memory waste to handle a rare case...
After thinking more about this, I suggest a couple of changes that
IMHO would make configuration simpler:
1. Change the CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to an early boot
parameter. Today we have a "mem_profiling" parameter to enable/disable
memory profiling. I suggest adding "mem_profiling_use_pgflags" to
switch the current behavior of using page extensions to use page
flags. We keep the current behavior of using page extensions as
default (mem_profiling_use_pgflags=0) because it always works even
though it has higher overhead.
2. No auto-fallback. If mem_profiling_use_pgflags=1 and we don't have
enough page flags (at boot time or later when we load a module), we
simply disable memory profiling with a warning.
I think this would be less confusing for users and will avoid David's
example when performance is unexpectedly degraded. The default option
will work for everyone as it does today and advanced users who want to
minimize the overhead can set mem_profiling_use_pgflags=1 to check if
that works for their system.
WDYT?
>
> >
> > --
> > Michal Hocko
> > SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-18 17:45 ` Suren Baghdasaryan
2024-10-18 21:57 ` Suren Baghdasaryan
@ 2024-10-21 7:21 ` Michal Hocko
1 sibling, 0 replies; 47+ messages in thread
From: Michal Hocko @ 2024-10-21 7:21 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Fri 18-10-24 10:45:39, Suren Baghdasaryan wrote:
> On Fri, Oct 18, 2024 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Fri 18-10-24 09:04:24, Suren Baghdasaryan wrote:
> > > On Fri, Oct 18, 2024 at 6:03 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Tue 15-10-24 08:58:59, Suren Baghdasaryan wrote:
> > > > > On Tue, Oct 15, 2024 at 8:42 AM David Hildenbrand <david@redhat.com> wrote:
> > > > [...]
> > > > > > Right, I think what John is concerned about (and me as well) is that
> > > > > > once a new feature really needs a page flag, there will be objection
> > > > > > like "no you can't, we need them for allocation tags otherwise that
> > > > > > feature will be degraded".
> > > > >
> > > > > I do understand your concern but IMHO the possibility of degrading a
> > > > > feature should not be a reason to always operate at degraded capacity
> > > > > (which is what we have today). If one is really concerned about
> > > > > possible future regression they can set
> > > > > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS=n and keep what we have today. That's
> > > > > why I'm strongly advocating that we do need
> > > > > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS so that the user has control over how
> > > > > this scarce resource is used.
> > > >
> > > > I really do not think users will know how/why to setup this and I wouldn't
> > > > even bother them thinking about that at all TBH.
> > > >
> > > > This is an implementation detail. It is fine to reuse unused flags space
> > > > as a storage as a performance optimization but why do you want users to
> > > > bother with that? Why would they ever want to say N here?
> > >
> > > In this patch you can find a couple of warnings that look like this:
> > >
> > > pr_warn("With module %s there are too many tags to fit in %d page flag
> > > bits. Memory profiling is disabled!\n", mod->name,
> > > NR_UNUSED_PAGEFLAG_BITS);
> > > emitted when we run out of page flag bits during a module loading,
> > >
> > > pr_err("%s: alignment %lu is incompatible with allocation tag
> > > indexing, disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS", mod->name,
> > > align);
> > > emitted when the arch-specific section alignment is incompatible with
> > > alloc_tag indexing.
> >
> > You are asking users to workaround implementation issue by configuration
> > which sounds like a really bad idea. Why cannot you make the fallback
> > automatic?
>
> Automatic fallback is possible during boot, when we decide whether to
> enable page extensions or not. So, if during boot we decide to disable
> page extensions and use page flags, we can't go back and re-enable
> page extensions after boot is complete. Since there is a possibility
> that we run out of page flags at runtime when we load a new module,
> this leaves this case when we can't reference the module tags and we
> can't fall back to page extensions, so we have to disable memory
> profiling.
Right, I do understand (I guess) the challenge. I am just arguing that
it makes really no sense to tell user to recompile the kernel with a
CONFIG_FOO to workaround this limitation. Please note that many users of
this feature will simply use a precompiled (e.g. distribution) kernels.
Once you force somebody to recompile with
CONFIG_PGALLOC_TAG_USE_PAGEFLAGS=n they are not going back to a more
memory optimal implementation.
Just my 2cents
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-18 21:57 ` Suren Baghdasaryan
@ 2024-10-21 7:26 ` Michal Hocko
2024-10-21 9:13 ` David Hildenbrand
0 siblings, 1 reply; 47+ messages in thread
From: Michal Hocko @ 2024-10-21 7:26 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Fri 18-10-24 14:57:26, Suren Baghdasaryan wrote:
> On Fri, Oct 18, 2024 at 10:45 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Fri, Oct 18, 2024 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Fri 18-10-24 09:04:24, Suren Baghdasaryan wrote:
> > > > On Fri, Oct 18, 2024 at 6:03 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Tue 15-10-24 08:58:59, Suren Baghdasaryan wrote:
> > > > > > On Tue, Oct 15, 2024 at 8:42 AM David Hildenbrand <david@redhat.com> wrote:
> > > > > [...]
> > > > > > > Right, I think what John is concerned about (and me as well) is that
> > > > > > > once a new feature really needs a page flag, there will be objection
> > > > > > > like "no you can't, we need them for allocation tags otherwise that
> > > > > > > feature will be degraded".
> > > > > >
> > > > > > I do understand your concern but IMHO the possibility of degrading a
> > > > > > feature should not be a reason to always operate at degraded capacity
> > > > > > (which is what we have today). If one is really concerned about
> > > > > > possible future regression they can set
> > > > > > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS=n and keep what we have today. That's
> > > > > > why I'm strongly advocating that we do need
> > > > > > CONFIG_PGALLOC_TAG_USE_PAGEFLAGS so that the user has control over how
> > > > > > this scarce resource is used.
> > > > >
> > > > > I really do not think users will know how/why to setup this and I wouldn't
> > > > > even bother them thinking about that at all TBH.
> > > > >
> > > > > This is an implementation detail. It is fine to reuse unused flags space
> > > > > as a storage as a performance optimization but why do you want users to
> > > > > bother with that? Why would they ever want to say N here?
> > > >
> > > > In this patch you can find a couple of warnings that look like this:
> > > >
> > > > pr_warn("With module %s there are too many tags to fit in %d page flag
> > > > bits. Memory profiling is disabled!\n", mod->name,
> > > > NR_UNUSED_PAGEFLAG_BITS);
> > > > emitted when we run out of page flag bits during a module loading,
> > > >
> > > > pr_err("%s: alignment %lu is incompatible with allocation tag
> > > > indexing, disable CONFIG_PGALLOC_TAG_USE_PAGEFLAGS", mod->name,
> > > > align);
> > > > emitted when the arch-specific section alignment is incompatible with
> > > > alloc_tag indexing.
> > >
> > > You are asking users to workaround implementation issue by configuration
> > > which sounds like a really bad idea. Why cannot you make the fallback
> > > automatic?
> >
> > Automatic fallback is possible during boot, when we decide whether to
> > enable page extensions or not. So, if during boot we decide to disable
> > page extensions and use page flags, we can't go back and re-enable
> > page extensions after boot is complete. Since there is a possibility
> > that we run out of page flags at runtime when we load a new module,
> > this leaves this case when we can't reference the module tags and we
> > can't fall back to page extensions, so we have to disable memory
> > profiling.
> > I could keep page extensions always on just in case this happens but
> > that's a lot of memory waste to handle a rare case...
>
> After thinking more about this, I suggest a couple of changes that
> IMHO would make configuration simpler:
> 1. Change the CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to an early boot
> parameter.
This makes much more sense!
> Today we have a "mem_profiling" parameter to enable/disable
> memory profiling. I suggest adding "mem_profiling_use_pgflags" to
> switch the current behavior of using page extensions to use page
> flags.
I do not want to bikeshed about this but to me it would make more sense
to have an extension paramater to mem_profiling and call it something
like compress or similar so that page flags are not really carved into
naming. The docuemntation then can explain that the copression cannot be
always guaranteed and it might fail so this is more of a optimistic and
potentially failing optimization that might need to be dropped in some
usege scenarios.
> We keep the current behavior of using page extensions as
> default (mem_profiling_use_pgflags=0) because it always works even
> though it has higher overhead.
Yes this seems to be a safe default.
> 2. No auto-fallback. If mem_profiling_use_pgflags=1 and we don't have
> enough page flags (at boot time or later when we load a module), we
> simply disable memory profiling with a warning.
No strong opinion on this.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-21 7:26 ` Michal Hocko
@ 2024-10-21 9:13 ` David Hildenbrand
2024-10-21 15:05 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: David Hildenbrand @ 2024-10-21 9:13 UTC (permalink / raw)
To: Michal Hocko, Suren Baghdasaryan
Cc: John Hubbard, Yosry Ahmed, akpm, kent.overstreet, corbet, arnd,
mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
vbabka, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
Am 21.10.24 um 09:26 schrieb Michal Hocko:
> On Fri 18-10-24 14:57:26, Suren Baghdasaryan wrote:
>> On Fri, Oct 18, 2024 at 10:45 AM Suren Baghdasaryan <surenb@google.com> wrote:
>>>
>>> On Fri, Oct 18, 2024 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote:
>>>
>>> Automatic fallback is possible during boot, when we decide whether to
>>> enable page extensions or not. So, if during boot we decide to disable
>>> page extensions and use page flags, we can't go back and re-enable
>>> page extensions after boot is complete. Since there is a possibility
>>> that we run out of page flags at runtime when we load a new module,
>>> this leaves this case when we can't reference the module tags and we
>>> can't fall back to page extensions, so we have to disable memory
>>> profiling.
>>> I could keep page extensions always on just in case this happens but
>>> that's a lot of memory waste to handle a rare case...
>>
>> After thinking more about this, I suggest a couple of changes that
>> IMHO would make configuration simpler:
>> 1. Change the CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to an early boot
>> parameter.
>
> This makes much more sense!
>
>> Today we have a "mem_profiling" parameter to enable/disable
>> memory profiling. I suggest adding "mem_profiling_use_pgflags" to
>> switch the current behavior of using page extensions to use page
>> flags.
>
> I do not want to bikeshed about this but to me it would make more sense
> to have an extension paramater to mem_profiling and call it something
> like compress or similar so that page flags are not really carved into
> naming. The docuemntation then can explain that the copression cannot be
> always guaranteed and it might fail so this is more of a optimistic and
> potentially failing optimization that might need to be dropped in some
> usege scenarios.
Maybe we can reuse the existing parameter (e.g., tristate). Only makes sense if
we don't expect too many other modes though :)
>
>> We keep the current behavior of using page extensions as
>> default (mem_profiling_use_pgflags=0) because it always works even
>> though it has higher overhead.
>
> Yes this seems to be a safe default.
Agreed.
>
>> 2. No auto-fallback. If mem_profiling_use_pgflags=1 and we don't have
>> enough page flags (at boot time or later when we load a module), we
>> simply disable memory profiling with a warning.
Sounds reasonable to me.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-21 9:13 ` David Hildenbrand
@ 2024-10-21 15:05 ` Suren Baghdasaryan
2024-10-21 15:34 ` Michal Hocko
0 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-21 15:05 UTC (permalink / raw)
To: David Hildenbrand
Cc: Michal Hocko, John Hubbard, Yosry Ahmed, akpm, kent.overstreet,
corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx, bp,
xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave, willy,
liam.howlett, pasha.tatashin, souravpanda, keescook, dennis,
yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes, minchan,
kaleshsingh, linux-doc, linux-kernel, linux-arch, linux-mm,
linux-modules, kernel-team
On Mon, Oct 21, 2024 at 2:13 AM David Hildenbrand <david@redhat.com> wrote:
>
>
>
> Am 21.10.24 um 09:26 schrieb Michal Hocko:
> > On Fri 18-10-24 14:57:26, Suren Baghdasaryan wrote:
> >> On Fri, Oct 18, 2024 at 10:45 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>
> >>> On Fri, Oct 18, 2024 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote:
> >>>
> >>> Automatic fallback is possible during boot, when we decide whether to
> >>> enable page extensions or not. So, if during boot we decide to disable
> >>> page extensions and use page flags, we can't go back and re-enable
> >>> page extensions after boot is complete. Since there is a possibility
> >>> that we run out of page flags at runtime when we load a new module,
> >>> this leaves this case when we can't reference the module tags and we
> >>> can't fall back to page extensions, so we have to disable memory
> >>> profiling.
> >>> I could keep page extensions always on just in case this happens but
> >>> that's a lot of memory waste to handle a rare case...
> >>
> >> After thinking more about this, I suggest a couple of changes that
> >> IMHO would make configuration simpler:
> >> 1. Change the CONFIG_PGALLOC_TAG_USE_PAGEFLAGS to an early boot
> >> parameter.
> >
> > This makes much more sense!
> >
> >> Today we have a "mem_profiling" parameter to enable/disable
> >> memory profiling. I suggest adding "mem_profiling_use_pgflags" to
> >> switch the current behavior of using page extensions to use page
> >> flags.
> >
> > I do not want to bikeshed about this but to me it would make more sense
> > to have an extension paramater to mem_profiling and call it something
> > like compress or similar so that page flags are not really carved into
> > naming. The docuemntation then can explain that the copression cannot be
> > always guaranteed and it might fail so this is more of a optimistic and
> > potentially failing optimization that might need to be dropped in some
> > usege scenarios.
>
> Maybe we can reuse the existing parameter (e.g., tristate). Only makes sense if
> we don't expect too many other modes though :)
Yeah, I thought about adding new values to "mem_profiling" but it's a
bit complicated. Today it's a tristate:
mem_profiling=0|1|never
0/1 means we disable/enable memory profiling by default but the user
can enable it at runtime using a sysctl. This means that we enable
page_ext at boot even when it's set to 0.
"never" means we do not enable page_ext, memory profiling is disabled
and sysctl to enable it will not be exposed. Used when a distribution
has CONFIG_MEM_ALLOC_PROFILING=y but the user does not use it and does
not want to waste memory on enabling page_ext.
I can add another option like "pgflags" but then it also needs to
specify whether we should enable or disable profiling by default
(similar to 0|1 for page_ext mode). IOW we will need to encode also
the default state we want. Something like this:
mem_profiling=0|1|never|pgflags_on|pgflags_off
Would this be acceptable?
>
> >
> >> We keep the current behavior of using page extensions as
> >> default (mem_profiling_use_pgflags=0) because it always works even
> >> though it has higher overhead.
> >
> > Yes this seems to be a safe default.
>
> Agreed.
>
> >
> >> 2. No auto-fallback. If mem_profiling_use_pgflags=1 and we don't have
> >> enough page flags (at boot time or later when we load a module), we
> >> simply disable memory profiling with a warning.
>
> Sounds reasonable to me.
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-21 15:05 ` Suren Baghdasaryan
@ 2024-10-21 15:34 ` Michal Hocko
2024-10-21 15:41 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: Michal Hocko @ 2024-10-21 15:34 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Mon 21-10-24 08:05:16, Suren Baghdasaryan wrote:
[...]
> Yeah, I thought about adding new values to "mem_profiling" but it's a
> bit complicated. Today it's a tristate:
>
> mem_profiling=0|1|never
>
> 0/1 means we disable/enable memory profiling by default but the user
> can enable it at runtime using a sysctl. This means that we enable
> page_ext at boot even when it's set to 0.
> "never" means we do not enable page_ext, memory profiling is disabled
> and sysctl to enable it will not be exposed. Used when a distribution
> has CONFIG_MEM_ALLOC_PROFILING=y but the user does not use it and does
> not want to waste memory on enabling page_ext.
>
> I can add another option like "pgflags" but then it also needs to
> specify whether we should enable or disable profiling by default
> (similar to 0|1 for page_ext mode). IOW we will need to encode also
> the default state we want. Something like this:
>
> mem_profiling=0|1|never|pgflags_on|pgflags_off
>
> Would this be acceptable?
Isn't this overcomplicating it? Why cannot you simply go with
mem_profiling={0|never|1}[,$YOUR_OPTIONS]
While $YOUR_OPTIONS could be compress,fallback,ponies and it would apply
or just be ignored if that is not applicable.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-21 15:34 ` Michal Hocko
@ 2024-10-21 15:41 ` Suren Baghdasaryan
2024-10-21 15:49 ` David Hildenbrand
2024-10-21 15:57 ` Michal Hocko
0 siblings, 2 replies; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-21 15:41 UTC (permalink / raw)
To: Michal Hocko
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Mon, Oct 21, 2024 at 8:34 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 21-10-24 08:05:16, Suren Baghdasaryan wrote:
> [...]
> > Yeah, I thought about adding new values to "mem_profiling" but it's a
> > bit complicated. Today it's a tristate:
> >
> > mem_profiling=0|1|never
> >
> > 0/1 means we disable/enable memory profiling by default but the user
> > can enable it at runtime using a sysctl. This means that we enable
> > page_ext at boot even when it's set to 0.
> > "never" means we do not enable page_ext, memory profiling is disabled
> > and sysctl to enable it will not be exposed. Used when a distribution
> > has CONFIG_MEM_ALLOC_PROFILING=y but the user does not use it and does
> > not want to waste memory on enabling page_ext.
> >
> > I can add another option like "pgflags" but then it also needs to
> > specify whether we should enable or disable profiling by default
> > (similar to 0|1 for page_ext mode). IOW we will need to encode also
> > the default state we want. Something like this:
> >
> > mem_profiling=0|1|never|pgflags_on|pgflags_off
> >
> > Would this be acceptable?
>
> Isn't this overcomplicating it? Why cannot you simply go with
> mem_profiling={0|never|1}[,$YOUR_OPTIONS]
>
> While $YOUR_OPTIONS could be compress,fallback,ponies and it would apply
> or just be ignored if that is not applicable.
Oh, you mean having 2 parts in the parameter with supported options being:
mem_profiling=never
mem_profiling=0
mem_profiling=1
mem_profiling=0,pgflags
mem_profiling=1,pgflags
Did I understand correctly? If so then yes, this should work.
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-21 15:41 ` Suren Baghdasaryan
@ 2024-10-21 15:49 ` David Hildenbrand
2024-10-21 15:57 ` Michal Hocko
1 sibling, 0 replies; 47+ messages in thread
From: David Hildenbrand @ 2024-10-21 15:49 UTC (permalink / raw)
To: Suren Baghdasaryan, Michal Hocko
Cc: John Hubbard, Yosry Ahmed, akpm, kent.overstreet, corbet, arnd,
mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
vbabka, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On 21.10.24 17:41, Suren Baghdasaryan wrote:
> On Mon, Oct 21, 2024 at 8:34 AM Michal Hocko <mhocko@suse.com> wrote:
>>
>> On Mon 21-10-24 08:05:16, Suren Baghdasaryan wrote:
>> [...]
>>> Yeah, I thought about adding new values to "mem_profiling" but it's a
>>> bit complicated. Today it's a tristate:
>>>
>>> mem_profiling=0|1|never
>>>
>>> 0/1 means we disable/enable memory profiling by default but the user
>>> can enable it at runtime using a sysctl. This means that we enable
>>> page_ext at boot even when it's set to 0.
>>> "never" means we do not enable page_ext, memory profiling is disabled
>>> and sysctl to enable it will not be exposed. Used when a distribution
>>> has CONFIG_MEM_ALLOC_PROFILING=y but the user does not use it and does
>>> not want to waste memory on enabling page_ext.
>>>
>>> I can add another option like "pgflags" but then it also needs to
>>> specify whether we should enable or disable profiling by default
>>> (similar to 0|1 for page_ext mode). IOW we will need to encode also
>>> the default state we want. Something like this:
>>>
>>> mem_profiling=0|1|never|pgflags_on|pgflags_off
>>>
>>> Would this be acceptable?
>>
>> Isn't this overcomplicating it? Why cannot you simply go with
>> mem_profiling={0|never|1}[,$YOUR_OPTIONS]
>>
>> While $YOUR_OPTIONS could be compress,fallback,ponies and it would apply
>> or just be ignored if that is not applicable.
>
> Oh, you mean having 2 parts in the parameter with supported options being:
>
> mem_profiling=never
> mem_profiling=0
> mem_profiling=1
> mem_profiling=0,pgflags
> mem_profiling=1,pgflags
>
That's also a viable solution indeed.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-21 15:41 ` Suren Baghdasaryan
2024-10-21 15:49 ` David Hildenbrand
@ 2024-10-21 15:57 ` Michal Hocko
2024-10-21 16:16 ` Suren Baghdasaryan
1 sibling, 1 reply; 47+ messages in thread
From: Michal Hocko @ 2024-10-21 15:57 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Mon 21-10-24 08:41:00, Suren Baghdasaryan wrote:
> On Mon, Oct 21, 2024 at 8:34 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 21-10-24 08:05:16, Suren Baghdasaryan wrote:
> > [...]
> > > Yeah, I thought about adding new values to "mem_profiling" but it's a
> > > bit complicated. Today it's a tristate:
> > >
> > > mem_profiling=0|1|never
> > >
> > > 0/1 means we disable/enable memory profiling by default but the user
> > > can enable it at runtime using a sysctl. This means that we enable
> > > page_ext at boot even when it's set to 0.
> > > "never" means we do not enable page_ext, memory profiling is disabled
> > > and sysctl to enable it will not be exposed. Used when a distribution
> > > has CONFIG_MEM_ALLOC_PROFILING=y but the user does not use it and does
> > > not want to waste memory on enabling page_ext.
> > >
> > > I can add another option like "pgflags" but then it also needs to
> > > specify whether we should enable or disable profiling by default
> > > (similar to 0|1 for page_ext mode). IOW we will need to encode also
> > > the default state we want. Something like this:
> > >
> > > mem_profiling=0|1|never|pgflags_on|pgflags_off
> > >
> > > Would this be acceptable?
> >
> > Isn't this overcomplicating it? Why cannot you simply go with
> > mem_profiling={0|never|1}[,$YOUR_OPTIONS]
> >
> > While $YOUR_OPTIONS could be compress,fallback,ponies and it would apply
> > or just be ignored if that is not applicable.
>
> Oh, you mean having 2 parts in the parameter with supported options being:
>
> mem_profiling=never
> mem_profiling=0
> mem_profiling=1
> mem_profiling=0,pgflags
> mem_profiling=1,pgflags
>
> Did I understand correctly? If so then yes, this should work.
yes. I would just not call it pgflags because that just doesn't really
tell what the option is to anybody but kernel developers. You could also
have an option to override the default (disable profiling) failure strategy.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-21 15:57 ` Michal Hocko
@ 2024-10-21 16:16 ` Suren Baghdasaryan
2024-10-21 16:23 ` Michal Hocko
0 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-21 16:16 UTC (permalink / raw)
To: Michal Hocko
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Mon, Oct 21, 2024 at 8:57 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 21-10-24 08:41:00, Suren Baghdasaryan wrote:
> > On Mon, Oct 21, 2024 at 8:34 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 21-10-24 08:05:16, Suren Baghdasaryan wrote:
> > > [...]
> > > > Yeah, I thought about adding new values to "mem_profiling" but it's a
> > > > bit complicated. Today it's a tristate:
> > > >
> > > > mem_profiling=0|1|never
> > > >
> > > > 0/1 means we disable/enable memory profiling by default but the user
> > > > can enable it at runtime using a sysctl. This means that we enable
> > > > page_ext at boot even when it's set to 0.
> > > > "never" means we do not enable page_ext, memory profiling is disabled
> > > > and sysctl to enable it will not be exposed. Used when a distribution
> > > > has CONFIG_MEM_ALLOC_PROFILING=y but the user does not use it and does
> > > > not want to waste memory on enabling page_ext.
> > > >
> > > > I can add another option like "pgflags" but then it also needs to
> > > > specify whether we should enable or disable profiling by default
> > > > (similar to 0|1 for page_ext mode). IOW we will need to encode also
> > > > the default state we want. Something like this:
> > > >
> > > > mem_profiling=0|1|never|pgflags_on|pgflags_off
> > > >
> > > > Would this be acceptable?
> > >
> > > Isn't this overcomplicating it? Why cannot you simply go with
> > > mem_profiling={0|never|1}[,$YOUR_OPTIONS]
> > >
> > > While $YOUR_OPTIONS could be compress,fallback,ponies and it would apply
> > > or just be ignored if that is not applicable.
> >
> > Oh, you mean having 2 parts in the parameter with supported options being:
> >
> > mem_profiling=never
> > mem_profiling=0
> > mem_profiling=1
> > mem_profiling=0,pgflags
> > mem_profiling=1,pgflags
> >
> > Did I understand correctly? If so then yes, this should work.
>
> yes. I would just not call it pgflags because that just doesn't really
> tell what the option is to anybody but kernel developers. You could also
> have an option to override the default (disable profiling) failure strategy.
Ok, how about "compressed" instead? Like this:
mem_profiling=0,compressed
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-21 16:16 ` Suren Baghdasaryan
@ 2024-10-21 16:23 ` Michal Hocko
2024-10-21 16:32 ` Suren Baghdasaryan
0 siblings, 1 reply; 47+ messages in thread
From: Michal Hocko @ 2024-10-21 16:23 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Mon 21-10-24 09:16:14, Suren Baghdasaryan wrote:
> On Mon, Oct 21, 2024 at 8:57 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 21-10-24 08:41:00, Suren Baghdasaryan wrote:
> > > On Mon, Oct 21, 2024 at 8:34 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Mon 21-10-24 08:05:16, Suren Baghdasaryan wrote:
> > > > [...]
> > > > > Yeah, I thought about adding new values to "mem_profiling" but it's a
> > > > > bit complicated. Today it's a tristate:
> > > > >
> > > > > mem_profiling=0|1|never
> > > > >
> > > > > 0/1 means we disable/enable memory profiling by default but the user
> > > > > can enable it at runtime using a sysctl. This means that we enable
> > > > > page_ext at boot even when it's set to 0.
> > > > > "never" means we do not enable page_ext, memory profiling is disabled
> > > > > and sysctl to enable it will not be exposed. Used when a distribution
> > > > > has CONFIG_MEM_ALLOC_PROFILING=y but the user does not use it and does
> > > > > not want to waste memory on enabling page_ext.
> > > > >
> > > > > I can add another option like "pgflags" but then it also needs to
> > > > > specify whether we should enable or disable profiling by default
> > > > > (similar to 0|1 for page_ext mode). IOW we will need to encode also
> > > > > the default state we want. Something like this:
> > > > >
> > > > > mem_profiling=0|1|never|pgflags_on|pgflags_off
> > > > >
> > > > > Would this be acceptable?
> > > >
> > > > Isn't this overcomplicating it? Why cannot you simply go with
> > > > mem_profiling={0|never|1}[,$YOUR_OPTIONS]
> > > >
> > > > While $YOUR_OPTIONS could be compress,fallback,ponies and it would apply
> > > > or just be ignored if that is not applicable.
> > >
> > > Oh, you mean having 2 parts in the parameter with supported options being:
> > >
> > > mem_profiling=never
> > > mem_profiling=0
> > > mem_profiling=1
> > > mem_profiling=0,pgflags
> > > mem_profiling=1,pgflags
> > >
> > > Did I understand correctly? If so then yes, this should work.
> >
> > yes. I would just not call it pgflags because that just doesn't really
> > tell what the option is to anybody but kernel developers. You could also
> > have an option to override the default (disable profiling) failure strategy.
>
> Ok, how about "compressed" instead? Like this:
>
> mem_profiling=0,compressed
Sounds good to me. And just to repeat, I do not really care about
specific name but let's just stay away from something as specific as
page flags because that is really not helping to understand the purpose
but rather the underlying mechanism which is not telling much to most
users outside of kernel developers.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-21 16:23 ` Michal Hocko
@ 2024-10-21 16:32 ` Suren Baghdasaryan
2024-10-21 18:12 ` John Hubbard
0 siblings, 1 reply; 47+ messages in thread
From: Suren Baghdasaryan @ 2024-10-21 16:32 UTC (permalink / raw)
To: Michal Hocko
Cc: David Hildenbrand, John Hubbard, Yosry Ahmed, akpm,
kent.overstreet, corbet, arnd, mcgrof, rppt, paulmck, thuth, tglx,
bp, xiongwei.song, ardb, vbabka, hannes, roman.gushchin, dave,
willy, liam.howlett, pasha.tatashin, souravpanda, keescook,
dennis, yuzhao, vvvvvv, rostedt, iamjoonsoo.kim, rientjes,
minchan, kaleshsingh, linux-doc, linux-kernel, linux-arch,
linux-mm, linux-modules, kernel-team
On Mon, Oct 21, 2024 at 9:23 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 21-10-24 09:16:14, Suren Baghdasaryan wrote:
> > On Mon, Oct 21, 2024 at 8:57 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 21-10-24 08:41:00, Suren Baghdasaryan wrote:
> > > > On Mon, Oct 21, 2024 at 8:34 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > >
> > > > > On Mon 21-10-24 08:05:16, Suren Baghdasaryan wrote:
> > > > > [...]
> > > > > > Yeah, I thought about adding new values to "mem_profiling" but it's a
> > > > > > bit complicated. Today it's a tristate:
> > > > > >
> > > > > > mem_profiling=0|1|never
> > > > > >
> > > > > > 0/1 means we disable/enable memory profiling by default but the user
> > > > > > can enable it at runtime using a sysctl. This means that we enable
> > > > > > page_ext at boot even when it's set to 0.
> > > > > > "never" means we do not enable page_ext, memory profiling is disabled
> > > > > > and sysctl to enable it will not be exposed. Used when a distribution
> > > > > > has CONFIG_MEM_ALLOC_PROFILING=y but the user does not use it and does
> > > > > > not want to waste memory on enabling page_ext.
> > > > > >
> > > > > > I can add another option like "pgflags" but then it also needs to
> > > > > > specify whether we should enable or disable profiling by default
> > > > > > (similar to 0|1 for page_ext mode). IOW we will need to encode also
> > > > > > the default state we want. Something like this:
> > > > > >
> > > > > > mem_profiling=0|1|never|pgflags_on|pgflags_off
> > > > > >
> > > > > > Would this be acceptable?
> > > > >
> > > > > Isn't this overcomplicating it? Why cannot you simply go with
> > > > > mem_profiling={0|never|1}[,$YOUR_OPTIONS]
> > > > >
> > > > > While $YOUR_OPTIONS could be compress,fallback,ponies and it would apply
> > > > > or just be ignored if that is not applicable.
> > > >
> > > > Oh, you mean having 2 parts in the parameter with supported options being:
> > > >
> > > > mem_profiling=never
> > > > mem_profiling=0
> > > > mem_profiling=1
> > > > mem_profiling=0,pgflags
> > > > mem_profiling=1,pgflags
> > > >
> > > > Did I understand correctly? If so then yes, this should work.
> > >
> > > yes. I would just not call it pgflags because that just doesn't really
> > > tell what the option is to anybody but kernel developers. You could also
> > > have an option to override the default (disable profiling) failure strategy.
> >
> > Ok, how about "compressed" instead? Like this:
> >
> > mem_profiling=0,compressed
>
> Sounds good to me. And just to repeat, I do not really care about
> specific name but let's just stay away from something as specific as
> page flags because that is really not helping to understand the purpose
> but rather the underlying mechanism which is not telling much to most
> users outside of kernel developers.
Understood. Ok, I'll start changing my patchset to incorporate this
feedback and will post the new version this week.
Thanks for the input everyone!
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags
2024-10-21 16:32 ` Suren Baghdasaryan
@ 2024-10-21 18:12 ` John Hubbard
0 siblings, 0 replies; 47+ messages in thread
From: John Hubbard @ 2024-10-21 18:12 UTC (permalink / raw)
To: Suren Baghdasaryan, Michal Hocko
Cc: David Hildenbrand, Yosry Ahmed, akpm, kent.overstreet, corbet,
arnd, mcgrof, rppt, paulmck, thuth, tglx, bp, xiongwei.song, ardb,
vbabka, hannes, roman.gushchin, dave, willy, liam.howlett,
pasha.tatashin, souravpanda, keescook, dennis, yuzhao, vvvvvv,
rostedt, iamjoonsoo.kim, rientjes, minchan, kaleshsingh,
linux-doc, linux-kernel, linux-arch, linux-mm, linux-modules,
kernel-team
On 10/21/24 9:32 AM, Suren Baghdasaryan wrote:
> On Mon, Oct 21, 2024 at 9:23 AM Michal Hocko <mhocko@suse.com> wrote:
>> On Mon 21-10-24 09:16:14, Suren Baghdasaryan wrote:
>>> On Mon, Oct 21, 2024 at 8:57 AM Michal Hocko <mhocko@suse.com> wrote:
>>>> On Mon 21-10-24 08:41:00, Suren Baghdasaryan wrote:
>>>>> On Mon, Oct 21, 2024 at 8:34 AM Michal Hocko <mhocko@suse.com> wrote:
>>>>>> On Mon 21-10-24 08:05:16, Suren Baghdasaryan wrote:
>>>>>> [...]
>>>>>>> Yeah, I thought about adding new values to "mem_profiling" but it's a
>>>>>>> bit complicated. Today it's a tristate:
>>>>>>>
>>>>>>> mem_profiling=0|1|never
>>>>>>>
>>>>>>> 0/1 means we disable/enable memory profiling by default but the user
>>>>>>> can enable it at runtime using a sysctl. This means that we enable
>>>>>>> page_ext at boot even when it's set to 0.
>>>>>>> "never" means we do not enable page_ext, memory profiling is disabled
>>>>>>> and sysctl to enable it will not be exposed. Used when a distribution
>>>>>>> has CONFIG_MEM_ALLOC_PROFILING=y but the user does not use it and does
>>>>>>> not want to waste memory on enabling page_ext.
>>>>>>>
>>>>>>> I can add another option like "pgflags" but then it also needs to
>>>>>>> specify whether we should enable or disable profiling by default
>>>>>>> (similar to 0|1 for page_ext mode). IOW we will need to encode also
>>>>>>> the default state we want. Something like this:
>>>>>>>
>>>>>>> mem_profiling=0|1|never|pgflags_on|pgflags_off
>>>>>>>
>>>>>>> Would this be acceptable?
>>>>>>
>>>>>> Isn't this overcomplicating it? Why cannot you simply go with
>>>>>> mem_profiling={0|never|1}[,$YOUR_OPTIONS]
>>>>>>
>>>>>> While $YOUR_OPTIONS could be compress,fallback,ponies and it would apply
>>>>>> or just be ignored if that is not applicable.
>>>>>
>>>>> Oh, you mean having 2 parts in the parameter with supported options being:
>>>>>
>>>>> mem_profiling=never
>>>>> mem_profiling=0
>>>>> mem_profiling=1
>>>>> mem_profiling=0,pgflags
>>>>> mem_profiling=1,pgflags
>>>>>
>>>>> Did I understand correctly? If so then yes, this should work.
>>>>
>>>> yes. I would just not call it pgflags because that just doesn't really
>>>> tell what the option is to anybody but kernel developers. You could also
>>>> have an option to override the default (disable profiling) failure strategy.
>>>
>>> Ok, how about "compressed" instead? Like this:
>>>
>>> mem_profiling=0,compressed
Yes. The configuration options all fit together nicely now, and the
naming seems exactly right as well. And no more "you must rebuild your
kernel" messages. Great!
thanks,
--
John Hubbard
>>
>> Sounds good to me. And just to repeat, I do not really care about
>> specific name but let's just stay away from something as specific as
>> page flags because that is really not helping to understand the purpose
>> but rather the underlying mechanism which is not telling much to most
>> users outside of kernel developers.
>
> Understood. Ok, I'll start changing my patchset to incorporate this
> feedback and will post the new version this week.
> Thanks for the input everyone!
>
>>
>> --
>> Michal Hocko
>> SUSE Labs
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2024-10-21 18:13 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-14 20:36 [PATCH v3 0/5] page allocation tag compression Suren Baghdasaryan
2024-10-14 20:36 ` [PATCH v3 1/5] maple_tree: add mas_for_each_rev() helper Suren Baghdasaryan
2024-10-16 1:48 ` Liam R. Howlett
2024-10-16 5:33 ` Suren Baghdasaryan
2024-10-14 20:36 ` [PATCH v3 2/5] alloc_tag: load module tags into separate contiguous memory Suren Baghdasaryan
2024-10-14 23:51 ` Andrew Morton
2024-10-15 2:10 ` Suren Baghdasaryan
2024-10-15 21:08 ` Shakeel Butt
2024-10-15 22:59 ` Suren Baghdasaryan
2024-10-14 20:36 ` [PATCH v3 3/5] alloc_tag: populate memory for module tags as needed Suren Baghdasaryan
2024-10-15 12:15 ` Mike Rapoport
2024-10-15 14:49 ` Suren Baghdasaryan
2024-10-14 20:36 ` [PATCH v3 4/5] alloc_tag: introduce pgalloc_tag_ref to abstract page tag references Suren Baghdasaryan
2024-10-14 20:36 ` [PATCH v3 5/5] alloc_tag: config to store page allocation tag refs in page flags Suren Baghdasaryan
2024-10-14 23:48 ` Yosry Ahmed
2024-10-14 23:53 ` John Hubbard
2024-10-14 23:56 ` Yosry Ahmed
2024-10-15 0:03 ` John Hubbard
2024-10-15 1:40 ` Matthew Wilcox
2024-10-15 2:03 ` Suren Baghdasaryan
2024-10-15 1:58 ` Suren Baghdasaryan
2024-10-15 8:10 ` Yosry Ahmed
2024-10-15 15:06 ` Suren Baghdasaryan
2024-10-15 7:32 ` David Hildenbrand
2024-10-15 14:59 ` Suren Baghdasaryan
2024-10-15 15:42 ` David Hildenbrand
2024-10-15 15:58 ` Suren Baghdasaryan
2024-10-18 13:03 ` Michal Hocko
2024-10-18 16:04 ` Suren Baghdasaryan
2024-10-18 17:08 ` Michal Hocko
2024-10-18 17:45 ` Suren Baghdasaryan
2024-10-18 21:57 ` Suren Baghdasaryan
2024-10-21 7:26 ` Michal Hocko
2024-10-21 9:13 ` David Hildenbrand
2024-10-21 15:05 ` Suren Baghdasaryan
2024-10-21 15:34 ` Michal Hocko
2024-10-21 15:41 ` Suren Baghdasaryan
2024-10-21 15:49 ` David Hildenbrand
2024-10-21 15:57 ` Michal Hocko
2024-10-21 16:16 ` Suren Baghdasaryan
2024-10-21 16:23 ` Michal Hocko
2024-10-21 16:32 ` Suren Baghdasaryan
2024-10-21 18:12 ` John Hubbard
2024-10-21 7:21 ` Michal Hocko
2024-10-14 23:32 ` [PATCH v3 0/5] page allocation tag compression Andrew Morton
2024-10-15 1:48 ` Suren Baghdasaryan
2024-10-15 16:26 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).