* [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
@ 2022-04-13 15:33 Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Song Liu @ 2022-04-13 15:33 UTC (permalink / raw)
To: bpf, netdev, linux-mm, linux-kernel
Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
imbrenda, mcgrof, Song Liu
Changes v2 => v3:
1. Use __vmalloc_huge in alloc_large_system_hash.
2. Use EXPORT_SYMBOL_GPL for new functions. (Christoph Hellwig)
3. Add more description about the issues and changes.(Christoph Hellwig,
Rick Edgecombe).
Changes v1 => v2:
1. Add vmalloc_huge(). (Christoph Hellwig)
2. Add module_alloc_huge(). (Christoph Hellwig)
3. Add Fixes tag and Link tag. (Thorsten Leemhuis)
Enabling HAVE_ARCH_HUGE_VMALLOC on x86_64 and use it for bpf_prog_pack has
caused some issues [1], as many users of vmalloc are not yet ready to
handle huge pages. To enable a more smooth transition to use huge page
backed vmalloc memory, this set replaces VM_NO_HUGE_VMAP flag with an new
opt-in flag, VM_ALLOW_HUGE_VMAP. More discussions about this topic can be
found at [2].
Patch 1 removes VM_NO_HUGE_VMAP and adds VM_ALLOW_HUGE_VMAP.
Patch 2 uses VM_ALLOW_HUGE_VMAP in bpf_prog_pack.
[1] https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/
[2] https://lore.kernel.org/linux-mm/20220330225642.1163897-1-song@kernel.org/
Song Liu (4):
vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
page_alloc: use __vmalloc_huge for large system hash
module: introduce module_alloc_huge
bpf: use module_alloc_huge for bpf_prog_pack
arch/Kconfig | 6 ++----
arch/powerpc/kernel/module.c | 2 +-
arch/s390/kvm/pv.c | 2 +-
arch/x86/kernel/module.c | 21 +++++++++++++++++++++
include/linux/moduleloader.h | 5 +++++
include/linux/vmalloc.h | 5 +++--
kernel/bpf/core.c | 7 ++++---
kernel/module.c | 5 +++++
mm/page_alloc.c | 2 +-
mm/vmalloc.c | 34 ++++++++++++++++++++++++++++------
10 files changed, 71 insertions(+), 18 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
2022-04-13 15:33 [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-13 15:33 ` Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 2/4] page_alloc: use __vmalloc_huge for large system hash Song Liu
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2022-04-13 15:33 UTC (permalink / raw)
To: bpf, netdev, linux-mm, linux-kernel
Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
imbrenda, mcgrof, Song Liu
Huge page backed vmalloc memory could benefit performance in many cases.
However, some users of vmalloc may not be ready to handle huge pages for
various reasons: hardware constraints, potential pages split, etc.
VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
pages. However, it is not easy to track down all the users that require
the opt-out, as the allocation are passed different stacks and may cause
issues in different layers.
To address this issue, replace VM_NO_HUGE_VMAP with an opt-in flag,
VM_ALLOW_HUGE_VMAP, so that users that benefit from huge pages could ask
specificially.
Also, replace vmalloc_no_huge() with opt-in helpers vmalloc_huge(), and
__vmalloc_huge().
Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with
HAVE_ARCH_HUGE_VMAP")
Link: https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/"
Signed-off-by: Song Liu <song@kernel.org>
---
arch/Kconfig | 6 ++----
arch/powerpc/kernel/module.c | 2 +-
arch/s390/kvm/pv.c | 2 +-
include/linux/vmalloc.h | 5 +++--
mm/vmalloc.c | 34 ++++++++++++++++++++++++++++------
5 files changed, 35 insertions(+), 14 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 29b0167c088b..31c4fdc4a4ba 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -854,10 +854,8 @@ config HAVE_ARCH_HUGE_VMAP
#
# Archs that select this would be capable of PMD-sized vmaps (i.e.,
-# arch_vmap_pmd_supported() returns true), and they must make no assumptions
-# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
-# can be used to prohibit arch-specific allocations from using hugepages to
-# help with this (e.g., modules may require it).
+# arch_vmap_pmd_supported() returns true). The VM_ALLOW_HUGE_VMAP flag
+# must be used to enable allocations to use hugepages.
#
config HAVE_ARCH_HUGE_VMALLOC
depends on HAVE_ARCH_HUGE_VMAP
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index 40a583e9d3c7..97a76a8619fb 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start, unsigned long end, bool
* too.
*/
return __vmalloc_node_range(size, 1, start, end, gfp, prot,
- VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP,
+ VM_FLUSH_RESET_PERMS,
NUMA_NO_NODE, __builtin_return_address(0));
}
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 7f7c0d6af2ce..8afede243903 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -142,7 +142,7 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
* using large pages for the virtual memory area.
* This is a hardware limitation.
*/
- kvm->arch.pv.stor_var = vmalloc_no_huge(vlen);
+ kvm->arch.pv.stor_var = vmalloc(vlen);
if (!kvm->arch.pv.stor_var)
goto out_err;
return 0;
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3b1df7da402d..20205c4e3b23 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -26,7 +26,7 @@ struct notifier_block; /* in notifier.h */
#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
#define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */
#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */
-#define VM_NO_HUGE_VMAP 0x00000400 /* force PAGE_SIZE pte mapping */
+#define VM_ALLOW_HUGE_VMAP 0x00000400 /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */
#if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
!defined(CONFIG_KASAN_VMALLOC)
@@ -153,7 +153,8 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
const void *caller) __alloc_size(1);
void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
int node, const void *caller) __alloc_size(1);
-void *vmalloc_no_huge(unsigned long size) __alloc_size(1);
+void *vmalloc_huge(unsigned long size) __alloc_size(1);
+void *__vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e163372d3967..1dac30c0ea41 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3106,7 +3106,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
return NULL;
}
- if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) {
+ if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
unsigned long size_per_node;
/*
@@ -3273,21 +3273,43 @@ void *vmalloc(unsigned long size)
EXPORT_SYMBOL(vmalloc);
/**
- * vmalloc_no_huge - allocate virtually contiguous memory using small pages
+ * vmalloc_huge - allocate virtually contiguous memory, allow huge pages
* @size: allocation size
*
- * Allocate enough non-huge pages to cover @size from the page level
+ * Allocate enough pages to cover @size from the page level
+ * allocator and map them into contiguous kernel virtual space.
+ * If @size is greater than or equal to PMD_SIZE, allow using
+ * huge pages for the memory
+ *
+ * Return: pointer to the allocated memory or %NULL on error
+ */
+void *vmalloc_huge(unsigned long size)
+{
+ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+ GFP_KERNEL, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+ NUMA_NO_NODE, __builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(vmalloc_huge);
+
+/**
+ * __vmalloc_huge - allocate virtually contiguous memory, allow huge pages
+ * @size: allocation size
+ * @gfp_mask: flags for the page level allocator
+ *
+ * Allocate enough pages to cover @size from the page level
* allocator and map them into contiguous kernel virtual space.
+ * If @size is greater than or equal to PMD_SIZE, allow using
+ * huge pages for the memory
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_no_huge(unsigned long size)
+void *__vmalloc_huge(unsigned long size, gfp_t gfp_mask)
{
return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
- GFP_KERNEL, PAGE_KERNEL, VM_NO_HUGE_VMAP,
+ gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
NUMA_NO_NODE, __builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc_no_huge);
+EXPORT_SYMBOL_GPL(__vmalloc_huge);
/**
* vzalloc - allocate virtually contiguous memory with zero fill
--
2.30.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 bpf 2/4] page_alloc: use __vmalloc_huge for large system hash
2022-04-13 15:33 [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-13 15:33 ` Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 3/4] module: introduce module_alloc_huge Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
3 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2022-04-13 15:33 UTC (permalink / raw)
To: bpf, netdev, linux-mm, linux-kernel
Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
imbrenda, mcgrof, Song Liu
Use __vmalloc_huge() in alloc_large_system_hash() so that large system
hash (>= PMD_SIZE) could benefit from huge pages. Note that __vmalloc_huge
only allocates huge pages for systems with HAVE_ARCH_HUGE_VMALLOC.
Signed-off-by: Song Liu <song@kernel.org>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e5b4488a0c5..20d38b8482c4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8919,7 +8919,7 @@ void *__init alloc_large_system_hash(const char *tablename,
table = memblock_alloc_raw(size,
SMP_CACHE_BYTES);
} else if (get_order(size) >= MAX_ORDER || hashdist) {
- table = __vmalloc(size, gfp_flags);
+ table = __vmalloc_huge(size, gfp_flags);
virt = true;
if (table)
huge = is_vm_area_hugepages(table);
--
2.30.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 bpf 3/4] module: introduce module_alloc_huge
2022-04-13 15:33 [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 2/4] page_alloc: use __vmalloc_huge for large system hash Song Liu
@ 2022-04-13 15:33 ` Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
3 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2022-04-13 15:33 UTC (permalink / raw)
To: bpf, netdev, linux-mm, linux-kernel
Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
imbrenda, mcgrof, Song Liu
Introduce module_alloc_huge, which allocates huge page backed memory in
module memory space. The primary user of this memory is bpf_prog_pack
(multiple BPF programs sharing a huge page).
Signed-off-by: Song Liu <song@kernel.org>
---
arch/x86/kernel/module.c | 21 +++++++++++++++++++++
include/linux/moduleloader.h | 5 +++++
kernel/module.c | 5 +++++
3 files changed, 31 insertions(+)
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b98ffcf4d250..63f6a16c70dc 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -86,6 +86,27 @@ void *module_alloc(unsigned long size)
return p;
}
+void *module_alloc_huge(unsigned long size)
+{
+ gfp_t gfp_mask = GFP_KERNEL;
+ void *p;
+
+ if (PAGE_ALIGN(size) > MODULES_LEN)
+ return NULL;
+
+ p = __vmalloc_node_range(size, MODULE_ALIGN,
+ MODULES_VADDR + get_module_load_offset(),
+ MODULES_END, gfp_mask, PAGE_KERNEL,
+ VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
+ NUMA_NO_NODE, __builtin_return_address(0));
+ if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
+ vfree(p);
+ return NULL;
+ }
+
+ return p;
+}
+
#ifdef CONFIG_X86_32
int apply_relocate(Elf32_Shdr *sechdrs,
const char *strtab,
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..d34743a88938 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -26,6 +26,11 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
sections. Returns NULL on failure. */
void *module_alloc(unsigned long size);
+/* Allocator used for allocating memory in module memory space. If size is
+ * greater than PMD_SIZE, allow using huge pages. Returns NULL on failure.
+ */
+void *module_alloc_huge(unsigned long size);
+
/* Free memory returned from module_alloc. */
void module_memfree(void *module_region);
diff --git a/kernel/module.c b/kernel/module.c
index 6cea788fd965..b2c6cb682a7d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2839,6 +2839,11 @@ void * __weak module_alloc(unsigned long size)
NUMA_NO_NODE, __builtin_return_address(0));
}
+void * __weak module_alloc_huge(unsigned long size)
+{
+ return vmalloc_huge(size);
+}
+
bool __weak module_init_section(const char *name)
{
return strstarts(name, ".init");
--
2.30.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack
2022-04-13 15:33 [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
` (2 preceding siblings ...)
2022-04-13 15:33 ` [PATCH v3 bpf 3/4] module: introduce module_alloc_huge Song Liu
@ 2022-04-13 15:33 ` Song Liu
3 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2022-04-13 15:33 UTC (permalink / raw)
To: bpf, netdev, linux-mm, linux-kernel
Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
imbrenda, mcgrof, Song Liu
Use module_alloc_huge for bpf_prog_pack so that BPF programs sit on
PMD_SIZE pages. This benefits system performance by reducing iTLB miss
rate.
Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and use
set_memory_[nx|rw] in bpf_prog_pack_free(). This is because
VM_FLUSH_RESET_PERMS does not work with huge pages yet.
Signed-off-by: Song Liu <song@kernel.org>
---
kernel/bpf/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 13e9dbeeedf3..b2a634d0f842 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
void *ptr;
size = BPF_HPAGE_SIZE * num_online_nodes();
- ptr = module_alloc(size);
+ ptr = module_alloc_huge(size);
/* Test whether we can get huge pages. If not just use PAGE_SIZE
* packs.
@@ -881,7 +881,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
GFP_KERNEL);
if (!pack)
return NULL;
- pack->ptr = module_alloc(bpf_prog_pack_size);
+ pack->ptr = module_alloc_huge(bpf_prog_pack_size);
if (!pack->ptr) {
kfree(pack);
return NULL;
@@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
list_add_tail(&pack->list, &pack_list);
- set_vm_flush_reset_perms(pack->ptr);
set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
return pack;
@@ -970,6 +969,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
bpf_prog_chunk_count(), 0) == 0) {
list_del(&pack->list);
+ set_memory_nx((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+ set_memory_rw((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
module_memfree(pack->ptr);
kfree(pack);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-04-13 16:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-13 15:33 [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 2/4] page_alloc: use __vmalloc_huge for large system hash Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 3/4] module: introduce module_alloc_huge Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).