* [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
@ 2022-04-11 23:18 Song Liu
2022-04-11 23:18 ` [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Song Liu @ 2022-04-11 23:18 UTC (permalink / raw)
To: bpf, netdev, linux-mm, linux-kernel
Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
imbrenda, mcgrof, Song Liu
Changes v1 => v2:
1. Add vmalloc_huge(). (Christoph Hellwig)
2. Add module_alloc_huge(). (Christoph Hellwig)
3. Add Fixes tag and Link tag. (Thorsten Leemhuis)
Enabling HAVE_ARCH_HUGE_VMALLOC on x86_64 and use it for bpf_prog_pack has
caused some issues [1], as many users of vmalloc are not yet ready to
handle huge pages. To enable a more smooth transition to use huge page
backed vmalloc memory, this set replaces VM_NO_HUGE_VMAP flag with an new
opt-in flag, VM_ALLOW_HUGE_VMAP. More discussions about this topic can be
found at [2].
Patch 1 removes VM_NO_HUGE_VMAP and adds VM_ALLOW_HUGE_VMAP.
Patch 2 uses VM_ALLOW_HUGE_VMAP in bpf_prog_pack.
[1] https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/
[2] https://lore.kernel.org/linux-mm/20220330225642.1163897-1-song@kernel.org/
Song Liu (3):
vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
module: introduce module_alloc_huge
bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
arch/Kconfig | 6 ++----
arch/powerpc/kernel/module.c | 2 +-
arch/s390/kvm/pv.c | 2 +-
arch/x86/kernel/module.c | 21 +++++++++++++++++++
include/linux/moduleloader.h | 5 +++++
include/linux/vmalloc.h | 4 ++--
kernel/bpf/core.c | 9 +++++----
kernel/module.c | 8 ++++++++
mm/vmalloc.c | 39 +++++++++++++++++++-----------------
9 files changed, 66 insertions(+), 30 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
2022-04-11 23:18 [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-11 23:18 ` Song Liu
2022-04-11 23:18 ` [PATCH v2 bpf 2/3] module: introduce module_alloc_huge Song Liu
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2022-04-11 23:18 UTC (permalink / raw)
To: bpf, netdev, linux-mm, linux-kernel
Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
imbrenda, mcgrof, Song Liu
Huge page backed vmalloc memory could benefit performance in many cases.
Since some users of vmalloc may not be ready to handle huge pages,
VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
pages. However, it is not easy to add VM_NO_HUGE_VMAP to all the users
that may try to allocate >= PMD_SIZE pages, but are not ready to handle
huge pages properly.
Replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that
users that benefit from huge pages could ask specificially.
Also, replace vmalloc_no_huge() with opt-in helper vmalloc_huge().
Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with
HAVE_ARCH_HUGE_VMAP")
Link: https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/"
Signed-off-by: Song Liu <song@kernel.org>
---
arch/Kconfig | 6 ++----
arch/powerpc/kernel/module.c | 2 +-
arch/s390/kvm/pv.c | 2 +-
include/linux/vmalloc.h | 4 ++--
mm/vmalloc.c | 39 +++++++++++++++++++-----------------
5 files changed, 27 insertions(+), 26 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 29b0167c088b..31c4fdc4a4ba 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -854,10 +854,8 @@ config HAVE_ARCH_HUGE_VMAP
#
# Archs that select this would be capable of PMD-sized vmaps (i.e.,
-# arch_vmap_pmd_supported() returns true), and they must make no assumptions
-# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
-# can be used to prohibit arch-specific allocations from using hugepages to
-# help with this (e.g., modules may require it).
+# arch_vmap_pmd_supported() returns true). The VM_ALLOW_HUGE_VMAP flag
+# must be used to enable allocations to use hugepages.
#
config HAVE_ARCH_HUGE_VMALLOC
depends on HAVE_ARCH_HUGE_VMAP
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index 40a583e9d3c7..97a76a8619fb 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start, unsigned long end, bool
* too.
*/
return __vmalloc_node_range(size, 1, start, end, gfp, prot,
- VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP,
+ VM_FLUSH_RESET_PERMS,
NUMA_NO_NODE, __builtin_return_address(0));
}
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 7f7c0d6af2ce..8afede243903 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -142,7 +142,7 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
* using large pages for the virtual memory area.
* This is a hardware limitation.
*/
- kvm->arch.pv.stor_var = vmalloc_no_huge(vlen);
+ kvm->arch.pv.stor_var = vmalloc(vlen);
if (!kvm->arch.pv.stor_var)
goto out_err;
return 0;
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3b1df7da402d..1024517d7937 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -26,7 +26,7 @@ struct notifier_block; /* in notifier.h */
#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
#define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */
#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */
-#define VM_NO_HUGE_VMAP 0x00000400 /* force PAGE_SIZE pte mapping */
+#define VM_ALLOW_HUGE_VMAP 0x00000400 /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */
#if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
!defined(CONFIG_KASAN_VMALLOC)
@@ -153,7 +153,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
const void *caller) __alloc_size(1);
void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
int node, const void *caller) __alloc_size(1);
-void *vmalloc_no_huge(unsigned long size) __alloc_size(1);
+extern void *vmalloc_huge(unsigned long size) __alloc_size(1);
extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e163372d3967..7cc2be6a7554 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3106,7 +3106,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
return NULL;
}
- if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) {
+ if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
unsigned long size_per_node;
/*
@@ -3272,23 +3272,6 @@ void *vmalloc(unsigned long size)
}
EXPORT_SYMBOL(vmalloc);
-/**
- * vmalloc_no_huge - allocate virtually contiguous memory using small pages
- * @size: allocation size
- *
- * Allocate enough non-huge pages to cover @size from the page level
- * allocator and map them into contiguous kernel virtual space.
- *
- * Return: pointer to the allocated memory or %NULL on error
- */
-void *vmalloc_no_huge(unsigned long size)
-{
- return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
- GFP_KERNEL, PAGE_KERNEL, VM_NO_HUGE_VMAP,
- NUMA_NO_NODE, __builtin_return_address(0));
-}
-EXPORT_SYMBOL(vmalloc_no_huge);
-
/**
* vzalloc - allocate virtually contiguous memory with zero fill
* @size: allocation size
@@ -3347,6 +3330,26 @@ void *vmalloc_node(unsigned long size, int node)
}
EXPORT_SYMBOL(vmalloc_node);
+/**
+ * vmalloc_huge - allocate virtually contiguous memory, allow huge pages
+ * @size: allocation size
+ *
+ * Allocate enough pages to cover @size from the page level
+ * allocator and map them into contiguous kernel virtual space.
+ * If @size is greater than or equal to PMD_SIZE, allow using
+ * huge pages for the memory
+ *
+ * Return: pointer to the allocated memory or %NULL on error
+ */
+void *vmalloc_huge(unsigned long size)
+{
+ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+ GFP_KERNEL, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+ NUMA_NO_NODE, __builtin_return_address(0));
+
+}
+EXPORT_SYMBOL(vmalloc_huge);
+
/**
* vzalloc_node - allocate memory on a specific node with zero fill
* @size: allocation size
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 bpf 2/3] module: introduce module_alloc_huge
2022-04-11 23:18 [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-11 23:18 ` [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-11 23:18 ` Song Liu
2022-04-11 23:18 ` [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Song Liu
2022-04-12 15:38 ` [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Luis Chamberlain
3 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2022-04-11 23:18 UTC (permalink / raw)
To: bpf, netdev, linux-mm, linux-kernel
Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
imbrenda, mcgrof, Song Liu
Introduce module_alloc_huge, which allocates huge page backed memory in
module memory space. The primary user of this memory is bpf_prog_pack
(multiple BPF programs sharing a huge page).
Signed-off-by: Song Liu <song@kernel.org>
---
arch/x86/kernel/module.c | 21 +++++++++++++++++++++
include/linux/moduleloader.h | 5 +++++
kernel/module.c | 8 ++++++++
3 files changed, 34 insertions(+)
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b98ffcf4d250..63f6a16c70dc 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -86,6 +86,27 @@ void *module_alloc(unsigned long size)
return p;
}
+void *module_alloc_huge(unsigned long size)
+{
+ gfp_t gfp_mask = GFP_KERNEL;
+ void *p;
+
+ if (PAGE_ALIGN(size) > MODULES_LEN)
+ return NULL;
+
+ p = __vmalloc_node_range(size, MODULE_ALIGN,
+ MODULES_VADDR + get_module_load_offset(),
+ MODULES_END, gfp_mask, PAGE_KERNEL,
+ VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
+ NUMA_NO_NODE, __builtin_return_address(0));
+ if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
+ vfree(p);
+ return NULL;
+ }
+
+ return p;
+}
+
#ifdef CONFIG_X86_32
int apply_relocate(Elf32_Shdr *sechdrs,
const char *strtab,
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..d34743a88938 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -26,6 +26,11 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
sections. Returns NULL on failure. */
void *module_alloc(unsigned long size);
+/* Allocator used for allocating memory in module memory space. If size is
+ * greater than PMD_SIZE, allow using huge pages. Returns NULL on failure.
+ */
+void *module_alloc_huge(unsigned long size);
+
/* Free memory returned from module_alloc. */
void module_memfree(void *module_region);
diff --git a/kernel/module.c b/kernel/module.c
index 6cea788fd965..2af20ac3209c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2839,6 +2839,14 @@ void * __weak module_alloc(unsigned long size)
NUMA_NO_NODE, __builtin_return_address(0));
}
+void * __weak module_alloc_huge(unsigned long size)
+{
+ return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+ GFP_KERNEL, PAGE_KERNEL_EXEC,
+ VM_FLUSH_RESET_PERMS | VM_ALLOW_HUGE_VMAP,
+ NUMA_NO_NODE, __builtin_return_address(0));
+}
+
bool __weak module_init_section(const char *name)
{
return strstarts(name, ".init");
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
2022-04-11 23:18 [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-11 23:18 ` [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
2022-04-11 23:18 ` [PATCH v2 bpf 2/3] module: introduce module_alloc_huge Song Liu
@ 2022-04-11 23:18 ` Song Liu
2022-04-12 15:38 ` [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Luis Chamberlain
3 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2022-04-11 23:18 UTC (permalink / raw)
To: bpf, netdev, linux-mm, linux-kernel
Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
imbrenda, mcgrof, Song Liu
Use __vmalloc_node_range with VM_ALLOW_HUGE_VMAP for bpf_prog_pack so that
BPF programs sit on PMD_SIZE pages. This benefits system performance by
reducing iTLB miss rate.
Signed-off-by: Song Liu <song@kernel.org>
---
kernel/bpf/core.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 13e9dbeeedf3..fd45bdd80a75 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
void *ptr;
size = BPF_HPAGE_SIZE * num_online_nodes();
- ptr = module_alloc(size);
+ ptr = module_alloc_huge(size);
/* Test whether we can get huge pages. If not just use PAGE_SIZE
* packs.
@@ -881,7 +881,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
GFP_KERNEL);
if (!pack)
return NULL;
- pack->ptr = module_alloc(bpf_prog_pack_size);
+ pack->ptr = module_alloc_huge(bpf_prog_pack_size);
if (!pack->ptr) {
kfree(pack);
return NULL;
@@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
list_add_tail(&pack->list, &pack_list);
- set_vm_flush_reset_perms(pack->ptr);
set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
return pack;
@@ -970,7 +969,9 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
bpf_prog_chunk_count(), 0) == 0) {
list_del(&pack->list);
- module_memfree(pack->ptr);
+ set_memory_nx((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+ set_memory_rw((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+ vfree(pack->ptr);
kfree(pack);
}
out:
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
2022-04-11 23:35 Song Liu
@ 2022-04-11 23:35 ` Song Liu
2022-04-12 4:20 ` Christoph Hellwig
2022-04-12 17:20 ` Edgecombe, Rick P
0 siblings, 2 replies; 12+ messages in thread
From: Song Liu @ 2022-04-11 23:35 UTC (permalink / raw)
To: bpf, linux-mm, linux-kernel
Cc: ast, daniel, andrii, akpm, rick.p.edgecombe, hch, imbrenda,
mcgrof, Song Liu
Use __vmalloc_node_range with VM_ALLOW_HUGE_VMAP for bpf_prog_pack so that
BPF programs sit on PMD_SIZE pages. This benefits system performance by
reducing iTLB miss rate.
Signed-off-by: Song Liu <song@kernel.org>
---
kernel/bpf/core.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 13e9dbeeedf3..fd45bdd80a75 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
void *ptr;
size = BPF_HPAGE_SIZE * num_online_nodes();
- ptr = module_alloc(size);
+ ptr = module_alloc_huge(size);
/* Test whether we can get huge pages. If not just use PAGE_SIZE
* packs.
@@ -881,7 +881,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
GFP_KERNEL);
if (!pack)
return NULL;
- pack->ptr = module_alloc(bpf_prog_pack_size);
+ pack->ptr = module_alloc_huge(bpf_prog_pack_size);
if (!pack->ptr) {
kfree(pack);
return NULL;
@@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
list_add_tail(&pack->list, &pack_list);
- set_vm_flush_reset_perms(pack->ptr);
set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
return pack;
@@ -970,7 +969,9 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
bpf_prog_chunk_count(), 0) == 0) {
list_del(&pack->list);
- module_memfree(pack->ptr);
+ set_memory_nx((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+ set_memory_rw((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+ vfree(pack->ptr);
kfree(pack);
}
out:
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
2022-04-11 23:35 ` [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Song Liu
@ 2022-04-12 4:20 ` Christoph Hellwig
2022-04-12 6:12 ` Song Liu
2022-04-12 17:20 ` Edgecombe, Rick P
1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-04-12 4:20 UTC (permalink / raw)
To: Song Liu
Cc: bpf, linux-mm, linux-kernel, ast, daniel, andrii, akpm,
rick.p.edgecombe, hch, imbrenda, mcgrof
On Mon, Apr 11, 2022 at 04:35:49PM -0700, Song Liu wrote:
> Use __vmalloc_node_range with VM_ALLOW_HUGE_VMAP for bpf_prog_pack so that
That is only very indirectly true now.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
2022-04-12 4:20 ` Christoph Hellwig
@ 2022-04-12 6:12 ` Song Liu
0 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2022-04-12 6:12 UTC (permalink / raw)
To: Christoph Hellwig
Cc: bpf, Linux-MM, open list, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Andrew Morton, rick.p.edgecombe, imbrenda,
Luis Chamberlain
On Mon, Apr 11, 2022 at 9:20 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Mon, Apr 11, 2022 at 04:35:49PM -0700, Song Liu wrote:
> > Use __vmalloc_node_range with VM_ALLOW_HUGE_VMAP for bpf_prog_pack so that
>
> That is only very indirectly true now.
Yeah, I realized I missed this part after sending it. Will fix.
Thanks,
Song
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
2022-04-11 23:18 [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
` (2 preceding siblings ...)
2022-04-11 23:18 ` [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Song Liu
@ 2022-04-12 15:38 ` Luis Chamberlain
2022-04-13 0:35 ` Song Liu
3 siblings, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2022-04-12 15:38 UTC (permalink / raw)
To: Song Liu
Cc: bpf, netdev, linux-mm, linux-kernel, ast, daniel, andrii,
kernel-team, akpm, rick.p.edgecombe, hch, imbrenda
On Mon, Apr 11, 2022 at 04:18:05PM -0700, Song Liu wrote:
> Changes v1 => v2:
> 1. Add vmalloc_huge(). (Christoph Hellwig)
> 2. Add module_alloc_huge(). (Christoph Hellwig)
> 3. Add Fixes tag and Link tag. (Thorsten Leemhuis)
>
> Enabling HAVE_ARCH_HUGE_VMALLOC on x86_64 and use it for bpf_prog_pack has
> caused some issues [1], as many users of vmalloc are not yet ready to
> handle huge pages. To enable a more smooth transition to use huge page
> backed vmalloc memory, this set replaces VM_NO_HUGE_VMAP flag with an new
> opt-in flag, VM_ALLOW_HUGE_VMAP. More discussions about this topic can be
> found at [2].
>
> Patch 1 removes VM_NO_HUGE_VMAP and adds VM_ALLOW_HUGE_VMAP.
> Patch 2 uses VM_ALLOW_HUGE_VMAP in bpf_prog_pack.
>
> [1] https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/
> [2] https://lore.kernel.org/linux-mm/20220330225642.1163897-1-song@kernel.org/
>
> Song Liu (3):
> vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
> module: introduce module_alloc_huge
> bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
>
> arch/Kconfig | 6 ++----
> arch/powerpc/kernel/module.c | 2 +-
> arch/s390/kvm/pv.c | 2 +-
> arch/x86/kernel/module.c | 21 +++++++++++++++++++
> include/linux/moduleloader.h | 5 +++++
> include/linux/vmalloc.h | 4 ++--
> kernel/bpf/core.c | 9 +++++----
> kernel/module.c | 8 ++++++++
Please use modules-next [0] as that has queued up changes which change
kernel/module.c quite a bit.
[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
2022-04-11 23:35 ` [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Song Liu
2022-04-12 4:20 ` Christoph Hellwig
@ 2022-04-12 17:20 ` Edgecombe, Rick P
2022-04-12 21:00 ` Song Liu
1 sibling, 1 reply; 12+ messages in thread
From: Edgecombe, Rick P @ 2022-04-12 17:20 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, song@kernel.org,
bpf@vger.kernel.org
Cc: daniel@iogearbox.net, andrii@kernel.org, hch@infradead.org,
imbrenda@linux.ibm.com, akpm@linux-foundation.org, ast@kernel.org,
mcgrof@kernel.org
On Mon, 2022-04-11 at 16:35 -0700, Song Liu wrote:
> @@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
> bitmap_zero(pack->bitmap, bpf_prog_pack_size /
> BPF_PROG_CHUNK_SIZE);
> list_add_tail(&pack->list, &pack_list);
>
> - set_vm_flush_reset_perms(pack->ptr);
> set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size /
> PAGE_SIZE);
> set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size /
> PAGE_SIZE);
> return pack;
Dropping set_vm_flush_reset_perms() is not mentioned in the commit log.
It is kind of a fix for a different issue.
Now that x86 supports vmalloc huge pages, but VM_FLUSH_RESET_PERMS does
not work with them, we should have some comments or warnings to that
effect somewhere. Someone may try to pass the flags in together.
> @@ -970,7 +969,9 @@ static void bpf_prog_pack_free(struct
> bpf_binary_header *hdr)
> if (bitmap_find_next_zero_area(pack->bitmap,
> bpf_prog_chunk_count(), 0,
> bpf_prog_chunk_count(), 0) ==
> 0) {
> list_del(&pack->list);
> - module_memfree(pack->ptr);
> + set_memory_nx((unsigned long)pack->ptr,
> bpf_prog_pack_size / PAGE_SIZE);
> + set_memory_rw((unsigned long)pack->ptr,
> bpf_prog_pack_size / PAGE_SIZE);
> + vfree(pack->ptr);
> kfree(pack);
Now that it calls module_alloc_huge() instead of vmalloc_node_range(),
should it call module_memfree() instead of vfree()?
Since there are bugs, simple, immediate fixes seem like the right thing
to do, but I had a couple long term focused comments on this new
feature:
It would be nice if bpf and the other module_alloc() callers could
share the same large pages. Meaning, ultimately that this whole thing
should probably live outside of bpf. BPF tracing usages might benefit
for example, and kprobes and ftrace are not too different than bpf
progs from a text allocation perspective.
I agree that the module's part is non-trivial. A while back I had tried
to do something like bpf_prog_pack() that worked for all the
module_alloc() callers. It had some modules changes to allow different
permissions to go to different allocations so they could be made to
share large pages:
https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/
I thought the existing kernel special permission allocation methods
were just too brittle and intertwined to improve without a new
interface. The hope was the new interface could wrap all the arch
intricacies instead of leaving them exposed in the cross-arch callers.
I wonder what you think of that general direction or if you have any
follow up plans for this?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
2022-04-12 17:20 ` Edgecombe, Rick P
@ 2022-04-12 21:00 ` Song Liu
2022-04-13 15:51 ` Edgecombe, Rick P
0 siblings, 1 reply; 12+ messages in thread
From: Song Liu @ 2022-04-12 21:00 UTC (permalink / raw)
To: Edgecombe, Rick P
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org,
hch@infradead.org, imbrenda@linux.ibm.com,
akpm@linux-foundation.org, ast@kernel.org, mcgrof@kernel.org
On Tue, Apr 12, 2022 at 10:21 AM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
>
> On Mon, 2022-04-11 at 16:35 -0700, Song Liu wrote:
> > @@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
> > bitmap_zero(pack->bitmap, bpf_prog_pack_size /
> > BPF_PROG_CHUNK_SIZE);
> > list_add_tail(&pack->list, &pack_list);
> >
> > - set_vm_flush_reset_perms(pack->ptr);
> > set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size /
> > PAGE_SIZE);
> > set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size /
> > PAGE_SIZE);
> > return pack;
>
> Dropping set_vm_flush_reset_perms() is not mentioned in the commit log.
> It is kind of a fix for a different issue.
>
> Now that x86 supports vmalloc huge pages, but VM_FLUSH_RESET_PERMS does
> not work with them, we should have some comments or warnings to that
> effect somewhere. Someone may try to pass the flags in together.
Good catch! I will add it in the next version.
>
> > @@ -970,7 +969,9 @@ static void bpf_prog_pack_free(struct
> > bpf_binary_header *hdr)
> > if (bitmap_find_next_zero_area(pack->bitmap,
> > bpf_prog_chunk_count(), 0,
> > bpf_prog_chunk_count(), 0) ==
> > 0) {
> > list_del(&pack->list);
> > - module_memfree(pack->ptr);
>
>
> > + set_memory_nx((unsigned long)pack->ptr,
> > bpf_prog_pack_size / PAGE_SIZE);
> > + set_memory_rw((unsigned long)pack->ptr,
> > bpf_prog_pack_size / PAGE_SIZE);
> > + vfree(pack->ptr);
> > kfree(pack);
>
> Now that it calls module_alloc_huge() instead of vmalloc_node_range(),
> should it call module_memfree() instead of vfree()?
Right. Let me sort that out. (Also, whether we introduce module_alloc_huge()
or not).
>
>
>
> Since there are bugs, simple, immediate fixes seem like the right thing
> to do, but I had a couple long term focused comments on this new
> feature:
>
> It would be nice if bpf and the other module_alloc() callers could
> share the same large pages. Meaning, ultimately that this whole thing
> should probably live outside of bpf. BPF tracing usages might benefit
> for example, and kprobes and ftrace are not too different than bpf
> progs from a text allocation perspective.
Agreed.
>
> I agree that the module's part is non-trivial. A while back I had tried
> to do something like bpf_prog_pack() that worked for all the
> module_alloc() callers. It had some modules changes to allow different
> permissions to go to different allocations so they could be made to
> share large pages:
>
> https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@intel.com/
>
> I thought the existing kernel special permission allocation methods
> were just too brittle and intertwined to improve without a new
> interface. The hope was the new interface could wrap all the arch
> intricacies instead of leaving them exposed in the cross-arch callers.
>
> I wonder what you think of that general direction or if you have any
> follow up plans for this?
Since I am still learning the vmalloc/module_alloc code, I think I am
not really capable of commenting on the direction. From our use
cases, we do see performance hit due to large number of BPF
program fragmenting the page table. Kernel module, OTOH, is not
too big an issue for us, as we usually build hot modules into the
kernel. That being said, we are interested in making the huge page
interface general for BPF program and kernel module. We can
commit resources to this effort.
Thanks,
Song
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
2022-04-12 15:38 ` [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Luis Chamberlain
@ 2022-04-13 0:35 ` Song Liu
0 siblings, 0 replies; 12+ messages in thread
From: Song Liu @ 2022-04-13 0:35 UTC (permalink / raw)
To: Luis Chamberlain
Cc: bpf, Networking, Linux-MM, open list, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Kernel Team, Andrew Morton,
Edgecombe, Rick P, Christoph Hellwig, imbrenda
Hi Luis,
On Tue, Apr 12, 2022 at 8:38 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Mon, Apr 11, 2022 at 04:18:05PM -0700, Song Liu wrote:
> > Changes v1 => v2:
> > 1. Add vmalloc_huge(). (Christoph Hellwig)
> > 2. Add module_alloc_huge(). (Christoph Hellwig)
> > 3. Add Fixes tag and Link tag. (Thorsten Leemhuis)
> >
> > Enabling HAVE_ARCH_HUGE_VMALLOC on x86_64 and use it for bpf_prog_pack has
> > caused some issues [1], as many users of vmalloc are not yet ready to
> > handle huge pages. To enable a more smooth transition to use huge page
> > backed vmalloc memory, this set replaces VM_NO_HUGE_VMAP flag with an new
> > opt-in flag, VM_ALLOW_HUGE_VMAP. More discussions about this topic can be
> > found at [2].
> >
> > Patch 1 removes VM_NO_HUGE_VMAP and adds VM_ALLOW_HUGE_VMAP.
> > Patch 2 uses VM_ALLOW_HUGE_VMAP in bpf_prog_pack.
> >
> > [1] https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/
> > [2] https://lore.kernel.org/linux-mm/20220330225642.1163897-1-song@kernel.org/
> >
> > Song Liu (3):
> > vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
> > module: introduce module_alloc_huge
> > bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
> >
> > arch/Kconfig | 6 ++----
> > arch/powerpc/kernel/module.c | 2 +-
> > arch/s390/kvm/pv.c | 2 +-
> > arch/x86/kernel/module.c | 21 +++++++++++++++++++
> > include/linux/moduleloader.h | 5 +++++
> > include/linux/vmalloc.h | 4 ++--
> > kernel/bpf/core.c | 9 +++++----
> > kernel/module.c | 8 ++++++++
>
> Please use modules-next [0] as that has queued up changes which change
> kernel/module.c quite a bit.
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next
We are hoping to ship this set to fix some issues with 5.18. So I guess it
shouldn't go through modules-next branch? Would this work for you?
We are adding a new API module_alloc_huge(), so it shouldn't break
existing features.
Thanks,
Song
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack
2022-04-12 21:00 ` Song Liu
@ 2022-04-13 15:51 ` Edgecombe, Rick P
0 siblings, 0 replies; 12+ messages in thread
From: Edgecombe, Rick P @ 2022-04-13 15:51 UTC (permalink / raw)
To: song@kernel.org
Cc: linux-kernel@vger.kernel.org, daniel@iogearbox.net,
bpf@vger.kernel.org, hch@infradead.org, ast@kernel.org,
rppt@kernel.org, linux-mm@kvack.org, andrii@kernel.org,
akpm@linux-foundation.org, mcgrof@kernel.org,
imbrenda@linux.ibm.com
CC Mike, who has been working on a direct map fragmentation solution.
[0]
On Tue, 2022-04-12 at 14:00 -0700, Song Liu wrote:
> Since I am still learning the vmalloc/module_alloc code, I think I am
> not really capable of commenting on the direction. From our use
> cases, we do see performance hit due to large number of BPF
> program fragmenting the page table. Kernel module, OTOH, is not
> too big an issue for us, as we usually build hot modules into the
> kernel. That being said, we are interested in making the huge page
> interface general for BPF program and kernel module. We can
> commit resources to this effort.
That sounds great. Please feel free to loop me in if you do.
[0]
https://lore.kernel.org/lkml/20220127085608.306306-1-rppt@kernel.org/
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-04-13 15:51 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-11 23:18 [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-11 23:18 ` [PATCH v2 bpf 1/3] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
2022-04-11 23:18 ` [PATCH v2 bpf 2/3] module: introduce module_alloc_huge Song Liu
2022-04-11 23:18 ` [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Song Liu
2022-04-12 15:38 ` [PATCH v2 bpf 0/3] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Luis Chamberlain
2022-04-13 0:35 ` Song Liu
-- strict thread matches above, loose matches on Subject: below --
2022-04-11 23:35 Song Liu
2022-04-11 23:35 ` [PATCH v2 bpf 3/3] bpf: use vmalloc with VM_ALLOW_HUGE_VMAP for bpf_prog_pack Song Liu
2022-04-12 4:20 ` Christoph Hellwig
2022-04-12 6:12 ` Song Liu
2022-04-12 17:20 ` Edgecombe, Rick P
2022-04-12 21:00 ` Song Liu
2022-04-13 15:51 ` Edgecombe, Rick P
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).