linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP
@ 2022-04-13 15:33 Song Liu
  2022-04-13 15:33 ` [PATCH v3 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Song Liu @ 2022-04-13 15:33 UTC (permalink / raw)
  To: bpf, netdev, linux-mm, linux-kernel
  Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
	imbrenda, mcgrof, Song Liu

Changes v2 => v3:
1. Use __vmalloc_huge in alloc_large_system_hash.
2. Use EXPORT_SYMBOL_GPL for new functions. (Christoph Hellwig)
3. Add more description about the issues and changes.(Christoph Hellwig,
   Rick Edgecombe).

Changes v1 => v2:
1. Add vmalloc_huge(). (Christoph Hellwig)
2. Add module_alloc_huge(). (Christoph Hellwig)
3. Add Fixes tag and Link tag. (Thorsten Leemhuis)

Enabling HAVE_ARCH_HUGE_VMALLOC on x86_64 and use it for bpf_prog_pack has
caused some issues [1], as many users of vmalloc are not yet ready to
handle huge pages. To enable a more smooth transition to use huge page
backed vmalloc memory, this set replaces VM_NO_HUGE_VMAP flag with an new
opt-in flag, VM_ALLOW_HUGE_VMAP. More discussions about this topic can be
found at [2].

Patch 1 removes VM_NO_HUGE_VMAP and adds VM_ALLOW_HUGE_VMAP.
Patch 2 uses VM_ALLOW_HUGE_VMAP in bpf_prog_pack.

[1] https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/
[2] https://lore.kernel.org/linux-mm/20220330225642.1163897-1-song@kernel.org/

Song Liu (4):
  vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  page_alloc: use __vmalloc_huge for large system hash
  module: introduce module_alloc_huge
  bpf: use module_alloc_huge for bpf_prog_pack

 arch/Kconfig                 |  6 ++----
 arch/powerpc/kernel/module.c |  2 +-
 arch/s390/kvm/pv.c           |  2 +-
 arch/x86/kernel/module.c     | 21 +++++++++++++++++++++
 include/linux/moduleloader.h |  5 +++++
 include/linux/vmalloc.h      |  5 +++--
 kernel/bpf/core.c            |  7 ++++---
 kernel/module.c              |  5 +++++
 mm/page_alloc.c              |  2 +-
 mm/vmalloc.c                 | 34 ++++++++++++++++++++++++++++------
 10 files changed, 71 insertions(+), 18 deletions(-)

--
2.30.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP
  2022-04-13 15:33 [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-13 15:33 ` Song Liu
  2022-04-13 15:33 ` [PATCH v3 bpf 2/4] page_alloc: use __vmalloc_huge for large system hash Song Liu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2022-04-13 15:33 UTC (permalink / raw)
  To: bpf, netdev, linux-mm, linux-kernel
  Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
	imbrenda, mcgrof, Song Liu

Huge page backed vmalloc memory could benefit performance in many cases.
However, some users of vmalloc may not be ready to handle huge pages for
various reasons: hardware constraints, potential pages split, etc.
VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge
pages. However, it is not easy to track down all the users that require
the opt-out, as the allocation are passed different stacks and may cause
issues in different layers.

To address this issue, replace VM_NO_HUGE_VMAP with an opt-in flag,
VM_ALLOW_HUGE_VMAP, so that users that benefit from huge pages could ask
specificially.

Also, replace vmalloc_no_huge() with opt-in helpers vmalloc_huge(), and
__vmalloc_huge().

Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with
                      HAVE_ARCH_HUGE_VMAP")
Link: https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/"
Signed-off-by: Song Liu <song@kernel.org>
---
 arch/Kconfig                 |  6 ++----
 arch/powerpc/kernel/module.c |  2 +-
 arch/s390/kvm/pv.c           |  2 +-
 include/linux/vmalloc.h      |  5 +++--
 mm/vmalloc.c                 | 34 ++++++++++++++++++++++++++++------
 5 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 29b0167c088b..31c4fdc4a4ba 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -854,10 +854,8 @@ config HAVE_ARCH_HUGE_VMAP
 
 #
 #  Archs that select this would be capable of PMD-sized vmaps (i.e.,
-#  arch_vmap_pmd_supported() returns true), and they must make no assumptions
-#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
-#  can be used to prohibit arch-specific allocations from using hugepages to
-#  help with this (e.g., modules may require it).
+#  arch_vmap_pmd_supported() returns true). The VM_ALLOW_HUGE_VMAP flag
+#  must be used to enable allocations to use hugepages.
 #
 config HAVE_ARCH_HUGE_VMALLOC
 	depends on HAVE_ARCH_HUGE_VMAP
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index 40a583e9d3c7..97a76a8619fb 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start, unsigned long end, bool
 	 * too.
 	 */
 	return __vmalloc_node_range(size, 1, start, end, gfp, prot,
-				    VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP,
+				    VM_FLUSH_RESET_PERMS,
 				    NUMA_NO_NODE, __builtin_return_address(0));
 }
 
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 7f7c0d6af2ce..8afede243903 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -142,7 +142,7 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
 	 * using large pages for the virtual memory area.
 	 * This is a hardware limitation.
 	 */
-	kvm->arch.pv.stor_var = vmalloc_no_huge(vlen);
+	kvm->arch.pv.stor_var = vmalloc(vlen);
 	if (!kvm->arch.pv.stor_var)
 		goto out_err;
 	return 0;
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3b1df7da402d..20205c4e3b23 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -26,7 +26,7 @@ struct notifier_block;		/* in notifier.h */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
 #define VM_FLUSH_RESET_PERMS	0x00000100	/* reset direct map and flush TLB on unmap, can't be freed in atomic context */
 #define VM_MAP_PUT_PAGES	0x00000200	/* put pages and free array in vfree */
-#define VM_NO_HUGE_VMAP		0x00000400	/* force PAGE_SIZE pte mapping */
+#define VM_ALLOW_HUGE_VMAP	0x00000400      /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */
 
 #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
 	!defined(CONFIG_KASAN_VMALLOC)
@@ -153,7 +153,8 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			const void *caller) __alloc_size(1);
 void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
 		int node, const void *caller) __alloc_size(1);
-void *vmalloc_no_huge(unsigned long size) __alloc_size(1);
+void *vmalloc_huge(unsigned long size) __alloc_size(1);
+void *__vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
 
 extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
 extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e163372d3967..1dac30c0ea41 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3106,7 +3106,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 		return NULL;
 	}
 
-	if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) {
+	if (vmap_allow_huge && (vm_flags & VM_ALLOW_HUGE_VMAP)) {
 		unsigned long size_per_node;
 
 		/*
@@ -3273,21 +3273,43 @@ void *vmalloc(unsigned long size)
 EXPORT_SYMBOL(vmalloc);
 
 /**
- * vmalloc_no_huge - allocate virtually contiguous memory using small pages
+ * vmalloc_huge - allocate virtually contiguous memory, allow huge pages
  * @size:    allocation size
  *
- * Allocate enough non-huge pages to cover @size from the page level
+ * Allocate enough pages to cover @size from the page level
+ * allocator and map them into contiguous kernel virtual space.
+ * If @size is greater than or equal to PMD_SIZE, allow using
+ * huge pages for the memory
+ *
+ * Return: pointer to the allocated memory or %NULL on error
+ */
+void *vmalloc_huge(unsigned long size)
+{
+	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+				    GFP_KERNEL, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+				    NUMA_NO_NODE, __builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(vmalloc_huge);
+
+/**
+ * __vmalloc_huge - allocate virtually contiguous memory, allow huge pages
+ * @size:      allocation size
+ * @gfp_mask:  flags for the page level allocator
+ *
+ * Allocate enough pages to cover @size from the page level
  * allocator and map them into contiguous kernel virtual space.
+ * If @size is greater than or equal to PMD_SIZE, allow using
+ * huge pages for the memory
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-void *vmalloc_no_huge(unsigned long size)
+void *__vmalloc_huge(unsigned long size, gfp_t gfp_mask)
 {
 	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
-				    GFP_KERNEL, PAGE_KERNEL, VM_NO_HUGE_VMAP,
+				    gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
 				    NUMA_NO_NODE, __builtin_return_address(0));
 }
-EXPORT_SYMBOL(vmalloc_no_huge);
+EXPORT_SYMBOL_GPL(__vmalloc_huge);
 
 /**
  * vzalloc - allocate virtually contiguous memory with zero fill
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 bpf 2/4] page_alloc: use __vmalloc_huge for large system hash
  2022-04-13 15:33 [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
  2022-04-13 15:33 ` [PATCH v3 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
@ 2022-04-13 15:33 ` Song Liu
  2022-04-13 15:33 ` [PATCH v3 bpf 3/4] module: introduce module_alloc_huge Song Liu
  2022-04-13 15:33 ` [PATCH v3 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
  3 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2022-04-13 15:33 UTC (permalink / raw)
  To: bpf, netdev, linux-mm, linux-kernel
  Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
	imbrenda, mcgrof, Song Liu

Use __vmalloc_huge() in alloc_large_system_hash() so that large system
hash (>= PMD_SIZE) could benefit from huge pages. Note that __vmalloc_huge
only allocates huge pages for systems with HAVE_ARCH_HUGE_VMALLOC.

Signed-off-by: Song Liu <song@kernel.org>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e5b4488a0c5..20d38b8482c4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8919,7 +8919,7 @@ void *__init alloc_large_system_hash(const char *tablename,
 				table = memblock_alloc_raw(size,
 							   SMP_CACHE_BYTES);
 		} else if (get_order(size) >= MAX_ORDER || hashdist) {
-			table = __vmalloc(size, gfp_flags);
+			table = __vmalloc_huge(size, gfp_flags);
 			virt = true;
 			if (table)
 				huge = is_vm_area_hugepages(table);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 bpf 3/4] module: introduce module_alloc_huge
  2022-04-13 15:33 [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
  2022-04-13 15:33 ` [PATCH v3 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
  2022-04-13 15:33 ` [PATCH v3 bpf 2/4] page_alloc: use __vmalloc_huge for large system hash Song Liu
@ 2022-04-13 15:33 ` Song Liu
  2022-04-13 15:33 ` [PATCH v3 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
  3 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2022-04-13 15:33 UTC (permalink / raw)
  To: bpf, netdev, linux-mm, linux-kernel
  Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
	imbrenda, mcgrof, Song Liu

Introduce module_alloc_huge, which allocates huge page backed memory in
module memory space. The primary user of this memory is bpf_prog_pack
(multiple BPF programs sharing a huge page).

Signed-off-by: Song Liu <song@kernel.org>
---
 arch/x86/kernel/module.c     | 21 +++++++++++++++++++++
 include/linux/moduleloader.h |  5 +++++
 kernel/module.c              |  5 +++++
 3 files changed, 31 insertions(+)

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b98ffcf4d250..63f6a16c70dc 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -86,6 +86,27 @@ void *module_alloc(unsigned long size)
 	return p;
 }
 
+void *module_alloc_huge(unsigned long size)
+{
+	gfp_t gfp_mask = GFP_KERNEL;
+	void *p;
+
+	if (PAGE_ALIGN(size) > MODULES_LEN)
+		return NULL;
+
+	p = __vmalloc_node_range(size, MODULE_ALIGN,
+				 MODULES_VADDR + get_module_load_offset(),
+				 MODULES_END, gfp_mask, PAGE_KERNEL,
+				 VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
+				 NUMA_NO_NODE, __builtin_return_address(0));
+	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
+		vfree(p);
+		return NULL;
+	}
+
+	return p;
+}
+
 #ifdef CONFIG_X86_32
 int apply_relocate(Elf32_Shdr *sechdrs,
 		   const char *strtab,
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..d34743a88938 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -26,6 +26,11 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
    sections.  Returns NULL on failure. */
 void *module_alloc(unsigned long size);
 
+/* Allocator used for allocating memory in module memory space. If size is
+ * greater than PMD_SIZE, allow using huge pages. Returns NULL on failure.
+ */
+void *module_alloc_huge(unsigned long size);
+
 /* Free memory returned from module_alloc. */
 void module_memfree(void *module_region);
 
diff --git a/kernel/module.c b/kernel/module.c
index 6cea788fd965..b2c6cb682a7d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2839,6 +2839,11 @@ void * __weak module_alloc(unsigned long size)
 			NUMA_NO_NODE, __builtin_return_address(0));
 }
 
+void * __weak module_alloc_huge(unsigned long size)
+{
+	return vmalloc_huge(size);
+}
+
 bool __weak module_init_section(const char *name)
 {
 	return strstarts(name, ".init");
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack
  2022-04-13 15:33 [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
                   ` (2 preceding siblings ...)
  2022-04-13 15:33 ` [PATCH v3 bpf 3/4] module: introduce module_alloc_huge Song Liu
@ 2022-04-13 15:33 ` Song Liu
  3 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2022-04-13 15:33 UTC (permalink / raw)
  To: bpf, netdev, linux-mm, linux-kernel
  Cc: ast, daniel, andrii, kernel-team, akpm, rick.p.edgecombe, hch,
	imbrenda, mcgrof, Song Liu

Use module_alloc_huge for bpf_prog_pack so that BPF programs sit on
PMD_SIZE pages. This benefits system performance by reducing iTLB miss
rate.

Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and use
set_memory_[nx|rw] in bpf_prog_pack_free(). This is because
VM_FLUSH_RESET_PERMS does not work with huge pages yet.

Signed-off-by: Song Liu <song@kernel.org>
---
 kernel/bpf/core.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 13e9dbeeedf3..b2a634d0f842 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
 	void *ptr;
 
 	size = BPF_HPAGE_SIZE * num_online_nodes();
-	ptr = module_alloc(size);
+	ptr = module_alloc_huge(size);
 
 	/* Test whether we can get huge pages. If not just use PAGE_SIZE
 	 * packs.
@@ -881,7 +881,7 @@ static struct bpf_prog_pack *alloc_new_pack(void)
 		       GFP_KERNEL);
 	if (!pack)
 		return NULL;
-	pack->ptr = module_alloc(bpf_prog_pack_size);
+	pack->ptr = module_alloc_huge(bpf_prog_pack_size);
 	if (!pack->ptr) {
 		kfree(pack);
 		return NULL;
@@ -889,7 +889,6 @@ static struct bpf_prog_pack *alloc_new_pack(void)
 	bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE);
 	list_add_tail(&pack->list, &pack_list);
 
-	set_vm_flush_reset_perms(pack->ptr);
 	set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 	set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 	return pack;
@@ -970,6 +969,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr)
 	if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0,
 				       bpf_prog_chunk_count(), 0) == 0) {
 		list_del(&pack->list);
+		set_memory_nx((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
+		set_memory_rw((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE);
 		module_memfree(pack->ptr);
 		kfree(pack);
 	}
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-04-13 16:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-13 15:33 [PATCH v3 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 1/4] vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 2/4] page_alloc: use __vmalloc_huge for large system hash Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 3/4] module: introduce module_alloc_huge Song Liu
2022-04-13 15:33 ` [PATCH v3 bpf 4/4] bpf: use module_alloc_huge for bpf_prog_pack Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).