Re: [PATCH 00/13] mm: jit/text allocator

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 00/13] mm: jit/text allocator
       [not found]           ` <CAPhsuW7ntn_HpVWdGK_hYVd3zsPEFToBNfmtt0m6K8SwfxJ66Q@mail.gmail.com>
@ 2023-06-08 18:41             ` Mike Rapoport
  2023-06-09 17:02               ` Song Liu
  2023-06-13 18:56               ` Kent Overstreet
  0 siblings, 2 replies; 7+ messages in thread
From: Mike Rapoport @ 2023-06-08 18:41 UTC (permalink / raw)
  To: Song Liu
  Cc: Mark Rutland, Kent Overstreet, linux-kernel, Andrew Morton,
	Catalin Marinas, Christophe Leroy, David S. Miller, Dinh Nguyen,
	Heiko Carstens, Helge Deller, Huacai Chen, Luis Chamberlain,
	Michael Ellerman, Naveen N. Rao, Palmer Dabbelt, Russell King,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner, Will Deacon,
	bpf, linux-arm-kernel, linux-mips, linux-mm, linux-modules,
	linux-parisc, linux-riscv, linux-s390, linux-trace-kernel,
	linuxppc-dev, loongarch, netdev, sparclinux, x86

On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@arm.com> wrote:
> 
> [...]
> 
> > > > > Can you give more detail on what parameters you need? If the only extra
> > > > > parameter is just "does this allocation need to live close to kernel
> > > > > text", that's not that big of a deal.
> > > >
> > > > My thinking was that we at least need the start + end for each caller. That
> > > > might be it, tbh.
> > >
> > > Do you mean that modules will have something like
> > >
> > >       jit_text_alloc(size, MODULES_START, MODULES_END);
> > >
> > > and kprobes will have
> > >
> > >       jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > ?
> >
> > Yes.
> 
> How about we start with two APIs:
>      jit_text_alloc(size);
>      jit_text_alloc_range(size, start, end);
> 
> AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> not quite convinced it is needed.
 
Right now arm64 and riscv override bpf and kprobes allocations to use the
entire vmalloc address space, but having the ability to allocate generated
code outside of modules area may be useful for other architectures.

Still the start + end for the callers feels backwards to me because the
callers do not define the ranges, but rather the architectures, so we still
need a way for architectures to define how they want allocate memory for
the generated code.

> > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > adding enum jit_type parameter to jit_text_alloc().
> >
> > That feels backwards to me; it centralizes a bunch of information about
> > distinct users to be able to shove that into a static array, when the callsites
> > can pass that information.
> 
> I think we only two type of users: module and everything else (ftrace, kprobe,
> bpf stuff). The key differences are:
> 
>   1. module uses text and data; while everything else only uses text.
>   2. module code is generated by the compiler, and thus has stronger
>   requirements in address ranges; everything else are generated via some
>   JIT or manual written assembly, so they are more flexible with address
>   ranges (in JIT, we can avoid using instructions that requires a specific
>   address range).
> 
> The next question is, can we have the two types of users share the same
> address ranges? If not, we can reserve the preferred range for modules,
> and let everything else use the other range. I don't see reasons to further
> separate users in the "everything else" group.
 
I agree that we can define only two types: modules and everything else and
let the architectures define if they need different ranges for these two
types, or want the same range for everything.

With only two types we can have two API calls for alloc, and a single
structure that defines the ranges etc from the architecture side rather
than spread all over.

Like something along these lines:

	struct execmem_range {
		unsigned long   start;
		unsigned long   end;
		unsigned long   fallback_start;
		unsigned long   fallback_end;
		pgprot_t        pgprot;
		unsigned int	alignment;
	};

	struct execmem_modules_range {
		enum execmem_module_flags flags;
		struct execmem_range text;
		struct execmem_range data;
	};

	struct execmem_jit_range {
		struct execmem_range text;
	};

	struct execmem_params {
		struct execmem_modules_range	modules;
		struct execmem_jit_range	jit;
	};

	struct execmem_params *execmem_arch_params(void);

	void *execmem_text_alloc(size_t size);
	void *execmem_data_alloc(size_t size);
	void execmem_free(void *ptr);

	void *jit_text_alloc(size_t size);
	void jit_free(void *ptr);

Modules or anything that must live close to the kernel image can use
execmem_*_alloc() and the callers that don't generally care about relative
addressing will use jit_text_alloc(), presuming that arch will restrict jit
range if necessary, like e.g. below for arm64 jit can be anywhere in
vmalloc and for x86 and s390 it will share the modules range. 


	struct execmem_params arm64_execmem = {
		.modules = {
			.flags = KASAN,
			.text = {
				.start = MODULES_VADDR,
				.end = MODULES_END,
				.pgprot = PAGE_KERNEL_ROX,
				.fallback_start = VMALLOC_START,
				.fallback_start = VMALLOC_END,
			},
		},
		.jit = {
			.text = {
				.start = VMALLOC_START,
				.end = VMALLOC_END,
				.pgprot = PAGE_KERNEL_ROX,
			},
		},
	};

	/* x86 and s390 */
	struct execmem_params cisc_execmem = {
		.modules = {
			.flags = KASAN,
			.text = {
				.start = MODULES_VADDR,
				.end = MODULES_END,
				.pgprot = PAGE_KERNEL_ROX,
			},
		},
		.jit_range = {},	/* impplies reusing .modules */
	};

	struct execmem_params default_execmem = {
		.modules = {
			.flags = KASAN,
			.text = {
				.start = VMALLOC_START,
				.end = VMALLOC_END,
				.pgprot = PAGE_KERNEL_EXEC,
			},
		},
	};

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 00/13] mm: jit/text allocator
  2023-06-08 18:41             ` [PATCH 00/13] mm: jit/text allocator Mike Rapoport
@ 2023-06-09 17:02               ` Song Liu
  2023-06-12 21:34                 ` Mike Rapoport
  2023-06-13 18:56               ` Kent Overstreet
  1 sibling, 1 reply; 7+ messages in thread
From: Song Liu @ 2023-06-09 17:02 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Mark Rutland, Kent Overstreet, linux-kernel, Andrew Morton,
	Catalin Marinas, Christophe Leroy, David S. Miller, Dinh Nguyen,
	Heiko Carstens, Helge Deller, Huacai Chen, Luis Chamberlain,
	Michael Ellerman, Naveen N. Rao, Palmer Dabbelt, Russell King,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner, Will Deacon,
	bpf, linux-arm-kernel, linux-mips, linux-mm, linux-modules,
	linux-parisc, linux-riscv, linux-s390, linux-trace-kernel,
	linuxppc-dev, loongarch, netdev, sparclinux, x86

On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > [...]
> >
> > > > > > Can you give more detail on what parameters you need? If the only extra
> > > > > > parameter is just "does this allocation need to live close to kernel
> > > > > > text", that's not that big of a deal.
> > > > >
> > > > > My thinking was that we at least need the start + end for each caller. That
> > > > > might be it, tbh.
> > > >
> > > > Do you mean that modules will have something like
> > > >
> > > >       jit_text_alloc(size, MODULES_START, MODULES_END);
> > > >
> > > > and kprobes will have
> > > >
> > > >       jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > ?
> > >
> > > Yes.
> >
> > How about we start with two APIs:
> >      jit_text_alloc(size);
> >      jit_text_alloc_range(size, start, end);
> >
> > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > not quite convinced it is needed.
>
> Right now arm64 and riscv override bpf and kprobes allocations to use the
> entire vmalloc address space, but having the ability to allocate generated
> code outside of modules area may be useful for other architectures.
>
> Still the start + end for the callers feels backwards to me because the
> callers do not define the ranges, but rather the architectures, so we still
> need a way for architectures to define how they want allocate memory for
> the generated code.

Yeah, this makes sense.

>
> > > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > > adding enum jit_type parameter to jit_text_alloc().
> > >
> > > That feels backwards to me; it centralizes a bunch of information about
> > > distinct users to be able to shove that into a static array, when the callsites
> > > can pass that information.
> >
> > I think we only two type of users: module and everything else (ftrace, kprobe,
> > bpf stuff). The key differences are:
> >
> >   1. module uses text and data; while everything else only uses text.
> >   2. module code is generated by the compiler, and thus has stronger
> >   requirements in address ranges; everything else are generated via some
> >   JIT or manual written assembly, so they are more flexible with address
> >   ranges (in JIT, we can avoid using instructions that requires a specific
> >   address range).
> >
> > The next question is, can we have the two types of users share the same
> > address ranges? If not, we can reserve the preferred range for modules,
> > and let everything else use the other range. I don't see reasons to further
> > separate users in the "everything else" group.
>
> I agree that we can define only two types: modules and everything else and
> let the architectures define if they need different ranges for these two
> types, or want the same range for everything.
>
> With only two types we can have two API calls for alloc, and a single
> structure that defines the ranges etc from the architecture side rather
> than spread all over.
>
> Like something along these lines:
>
>         struct execmem_range {
>                 unsigned long   start;
>                 unsigned long   end;
>                 unsigned long   fallback_start;
>                 unsigned long   fallback_end;
>                 pgprot_t        pgprot;
>                 unsigned int    alignment;
>         };
>
>         struct execmem_modules_range {
>                 enum execmem_module_flags flags;
>                 struct execmem_range text;
>                 struct execmem_range data;
>         };
>
>         struct execmem_jit_range {
>                 struct execmem_range text;
>         };
>
>         struct execmem_params {
>                 struct execmem_modules_range    modules;
>                 struct execmem_jit_range        jit;
>         };
>
>         struct execmem_params *execmem_arch_params(void);
>
>         void *execmem_text_alloc(size_t size);
>         void *execmem_data_alloc(size_t size);
>         void execmem_free(void *ptr);

With the jit variation, maybe we can just call these
module_[text|data]_alloc()?

btw: Depending on the implementation of the allocator, we may also
need separate free()s for text and data.

>
>         void *jit_text_alloc(size_t size);
>         void jit_free(void *ptr);
>

[...]

How should we move ahead from here?

AFAICT, all these changes can be easily extended and refactored
in the future, so we don't have to make it perfect the first time.
OTOH, having the interface committed (either this set or my
module_alloc_type version) can unblock works in the binpack
allocator and the users side. Therefore, I think we can move
relatively fast here?

Thanks,
Song


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 00/13] mm: jit/text allocator
  2023-06-09 17:02               ` Song Liu
@ 2023-06-12 21:34                 ` Mike Rapoport
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Rapoport @ 2023-06-12 21:34 UTC (permalink / raw)
  To: Song Liu
  Cc: Mark Rutland, Kent Overstreet, linux-kernel, Andrew Morton,
	Catalin Marinas, Christophe Leroy, David S. Miller, Dinh Nguyen,
	Heiko Carstens, Helge Deller, Huacai Chen, Luis Chamberlain,
	Michael Ellerman, Naveen N. Rao, Palmer Dabbelt, Russell King,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner, Will Deacon,
	bpf, linux-arm-kernel, linux-mips, linux-mm, linux-modules,
	linux-parisc, linux-riscv, linux-s390, linux-trace-kernel,
	linuxppc-dev, loongarch, netdev, sparclinux, x86

On Fri, Jun 09, 2023 at 10:02:16AM -0700, Song Liu wrote:
> On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > [...]
> > >
> > > > > > > Can you give more detail on what parameters you need? If the only extra
> > > > > > > parameter is just "does this allocation need to live close to kernel
> > > > > > > text", that's not that big of a deal.
> > > > > >
> > > > > > My thinking was that we at least need the start + end for each caller. That
> > > > > > might be it, tbh.
> > > > >
> > > > > Do you mean that modules will have something like
> > > > >
> > > > >       jit_text_alloc(size, MODULES_START, MODULES_END);
> > > > >
> > > > > and kprobes will have
> > > > >
> > > > >       jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > > ?
> > > >
> > > > Yes.
> > >
> > > How about we start with two APIs:
> > >      jit_text_alloc(size);
> > >      jit_text_alloc_range(size, start, end);
> > >
> > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > > not quite convinced it is needed.
> >
> > Right now arm64 and riscv override bpf and kprobes allocations to use the
> > entire vmalloc address space, but having the ability to allocate generated
> > code outside of modules area may be useful for other architectures.
> >
> > Still the start + end for the callers feels backwards to me because the
> > callers do not define the ranges, but rather the architectures, so we still
> > need a way for architectures to define how they want allocate memory for
> > the generated code.
> 
> Yeah, this makes sense.
> 
> >
> > > > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > > > adding enum jit_type parameter to jit_text_alloc().
> > > >
> > > > That feels backwards to me; it centralizes a bunch of information about
> > > > distinct users to be able to shove that into a static array, when the callsites
> > > > can pass that information.
> > >
> > > I think we only two type of users: module and everything else (ftrace, kprobe,
> > > bpf stuff). The key differences are:
> > >
> > >   1. module uses text and data; while everything else only uses text.
> > >   2. module code is generated by the compiler, and thus has stronger
> > >   requirements in address ranges; everything else are generated via some
> > >   JIT or manual written assembly, so they are more flexible with address
> > >   ranges (in JIT, we can avoid using instructions that requires a specific
> > >   address range).
> > >
> > > The next question is, can we have the two types of users share the same
> > > address ranges? If not, we can reserve the preferred range for modules,
> > > and let everything else use the other range. I don't see reasons to further
> > > separate users in the "everything else" group.
> >
> > I agree that we can define only two types: modules and everything else and
> > let the architectures define if they need different ranges for these two
> > types, or want the same range for everything.
> >
> > With only two types we can have two API calls for alloc, and a single
> > structure that defines the ranges etc from the architecture side rather
> > than spread all over.
> >
> > Like something along these lines:
> >
> >         struct execmem_range {
> >                 unsigned long   start;
> >                 unsigned long   end;
> >                 unsigned long   fallback_start;
> >                 unsigned long   fallback_end;
> >                 pgprot_t        pgprot;
> >                 unsigned int    alignment;
> >         };
> >
> >         struct execmem_modules_range {
> >                 enum execmem_module_flags flags;
> >                 struct execmem_range text;
> >                 struct execmem_range data;
> >         };
> >
> >         struct execmem_jit_range {
> >                 struct execmem_range text;
> >         };
> >
> >         struct execmem_params {
> >                 struct execmem_modules_range    modules;
> >                 struct execmem_jit_range        jit;
> >         };
> >
> >         struct execmem_params *execmem_arch_params(void);
> >
> >         void *execmem_text_alloc(size_t size);
> >         void *execmem_data_alloc(size_t size);
> >         void execmem_free(void *ptr);
> 
> With the jit variation, maybe we can just call these
> module_[text|data]_alloc()?

I was thinking about "execmem_*_alloc()" for allocations that must be close to kernel
image, like modules, ftrace on x86 and s390 and maybe something else in the
future.

And jit_text_alloc() for allocations that can reside anywhere.

I tried to find a different name for 'struct execmem_modules_range' but
couldn't think of anything better than 'struct execmem_close_to_kernel', so
I've left modules in the name.
 
> btw: Depending on the implementation of the allocator, we may also
> need separate free()s for text and data.
> 
> >
> >         void *jit_text_alloc(size_t size);
> >         void jit_free(void *ptr);
> >

Let's just add jit_free() for completeness even if it will be the same as
execmem_free() for now.
 
> [...]
> 
> How should we move ahead from here?
> 
> AFAICT, all these changes can be easily extended and refactored
> in the future, so we don't have to make it perfect the first time.
> OTOH, having the interface committed (either this set or my
> module_alloc_type version) can unblock works in the binpack
> allocator and the users side. Therefore, I think we can move
> relatively fast here?

Once the interface and architecture abstraction is ready we can work on the
allocator and the users. We also need to update text_poking/alternatives on
architectures that would allocate executable memory as ROX. I did some
quick tests and with these patches 'modprobe xfs' takes tens time more than
before.
 
> Thanks,
> Song

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 00/13] mm: jit/text allocator
  2023-06-08 18:41             ` [PATCH 00/13] mm: jit/text allocator Mike Rapoport
  2023-06-09 17:02               ` Song Liu
@ 2023-06-13 18:56               ` Kent Overstreet
  2023-06-13 21:09                 ` Mike Rapoport
  1 sibling, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2023-06-13 18:56 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Song Liu, Mark Rutland, linux-kernel, Andrew Morton,
	Catalin Marinas, Christophe Leroy, David S. Miller, Dinh Nguyen,
	Heiko Carstens, Helge Deller, Huacai Chen, Luis Chamberlain,
	Michael Ellerman, Naveen N. Rao, Palmer Dabbelt, Russell King,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner, Will Deacon,
	bpf, linux-arm-kernel, linux-mips, linux-mm, linux-modules,
	linux-parisc, linux-riscv, linux-s390, linux-trace-kernel,
	linuxppc-dev, loongarch, netdev, sparclinux, x86

On Thu, Jun 08, 2023 at 09:41:16PM +0300, Mike Rapoport wrote:
> On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > 
> > [...]
> > 
> > > > > > Can you give more detail on what parameters you need? If the only extra
> > > > > > parameter is just "does this allocation need to live close to kernel
> > > > > > text", that's not that big of a deal.
> > > > >
> > > > > My thinking was that we at least need the start + end for each caller. That
> > > > > might be it, tbh.
> > > >
> > > > Do you mean that modules will have something like
> > > >
> > > >       jit_text_alloc(size, MODULES_START, MODULES_END);
> > > >
> > > > and kprobes will have
> > > >
> > > >       jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > ?
> > >
> > > Yes.
> > 
> > How about we start with two APIs:
> >      jit_text_alloc(size);
> >      jit_text_alloc_range(size, start, end);
> > 
> > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > not quite convinced it is needed.
>  
> Right now arm64 and riscv override bpf and kprobes allocations to use the
> entire vmalloc address space, but having the ability to allocate generated
> code outside of modules area may be useful for other architectures.
> 
> Still the start + end for the callers feels backwards to me because the
> callers do not define the ranges, but rather the architectures, so we still
> need a way for architectures to define how they want allocate memory for
> the generated code.

So, the start + end just comes from the need to keep relative pointers
under a certain size. I think this could be just a flag, I see no reason
to expose actual addresses here.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 00/13] mm: jit/text allocator
  2023-06-13 18:56               ` Kent Overstreet
@ 2023-06-13 21:09                 ` Mike Rapoport
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Rapoport @ 2023-06-13 21:09 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Song Liu, Mark Rutland, linux-kernel, Andrew Morton,
	Catalin Marinas, Christophe Leroy, David S. Miller, Dinh Nguyen,
	Heiko Carstens, Helge Deller, Huacai Chen, Luis Chamberlain,
	Michael Ellerman, Naveen N. Rao, Palmer Dabbelt, Russell King,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner, Will Deacon,
	bpf, linux-arm-kernel, linux-mips, linux-mm, linux-modules,
	linux-parisc, linux-riscv, linux-s390, linux-trace-kernel,
	linuxppc-dev, loongarch, netdev, sparclinux, x86

On Tue, Jun 13, 2023 at 02:56:14PM -0400, Kent Overstreet wrote:
> On Thu, Jun 08, 2023 at 09:41:16PM +0300, Mike Rapoport wrote:
> > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@arm.com> wrote:
> > > 
> > > [...]
> > > 
> > > > > > > Can you give more detail on what parameters you need? If the only extra
> > > > > > > parameter is just "does this allocation need to live close to kernel
> > > > > > > text", that's not that big of a deal.
> > > > > >
> > > > > > My thinking was that we at least need the start + end for each caller. That
> > > > > > might be it, tbh.
> > > > >
> > > > > Do you mean that modules will have something like
> > > > >
> > > > >       jit_text_alloc(size, MODULES_START, MODULES_END);
> > > > >
> > > > > and kprobes will have
> > > > >
> > > > >       jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > > ?
> > > >
> > > > Yes.
> > > 
> > > How about we start with two APIs:
> > >      jit_text_alloc(size);
> > >      jit_text_alloc_range(size, start, end);
> > > 
> > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > > not quite convinced it is needed.
> >  
> > Right now arm64 and riscv override bpf and kprobes allocations to use the
> > entire vmalloc address space, but having the ability to allocate generated
> > code outside of modules area may be useful for other architectures.
> > 
> > Still the start + end for the callers feels backwards to me because the
> > callers do not define the ranges, but rather the architectures, so we still
> > need a way for architectures to define how they want allocate memory for
> > the generated code.
> 
> So, the start + end just comes from the need to keep relative pointers
> under a certain size. I think this could be just a flag, I see no reason
> to expose actual addresses here.

It's the other way around. The start + end comes from the need to restrict
allocation to certain range because of the relative addressing. I don't see
how a flag can help here.

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 01/13] nios2: define virtual address space for modules
       [not found] ` <20230601101257.530867-2-rppt@kernel.org>
@ 2023-06-13 22:16   ` Dinh Nguyen
  0 siblings, 0 replies; 7+ messages in thread
From: Dinh Nguyen @ 2023-06-13 22:16 UTC (permalink / raw)
  To: Mike Rapoport, linux-kernel
  Cc: Andrew Morton, Catalin Marinas, Christophe Leroy, David S. Miller,
	Heiko Carstens, Helge Deller, Huacai Chen, Kent Overstreet,
	Luis Chamberlain, Michael Ellerman, Naveen N. Rao, Palmer Dabbelt,
	Russell King, Song Liu, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Will Deacon, bpf, linux-arm-kernel, linux-mips,
	linux-mm, linux-modules, linux-parisc, linux-riscv, linux-s390,
	linux-trace-kernel, linuxppc-dev, loongarch, netdev, sparclinux,
	x86



On 6/1/23 05:12, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> 
> nios2 uses kmalloc() to implement module_alloc() because CALL26/PCREL26
> cannot reach all of vmalloc address space.
> 
> Define module space as 32MiB below the kernel base and switch nios2 to
> use vmalloc for module allocations.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> ---
>   arch/nios2/include/asm/pgtable.h |  5 ++++-
>   arch/nios2/kernel/module.c       | 19 ++++---------------
>   2 files changed, 8 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
> index 0f5c2564e9f5..0073b289c6a4 100644
> --- a/arch/nios2/include/asm/pgtable.h
> +++ b/arch/nios2/include/asm/pgtable.h
> @@ -25,7 +25,10 @@
>   #include <asm-generic/pgtable-nopmd.h>
>   
>   #define VMALLOC_START		CONFIG_NIOS2_KERNEL_MMU_REGION_BASE
> -#define VMALLOC_END		(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
> +#define VMALLOC_END		(CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M - 1)
> +
> +#define MODULES_VADDR		(CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M)
> +#define MODULES_END		(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
>   
>   struct mm_struct;
>   
> diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
> index 76e0a42d6e36..9c97b7513853 100644
> --- a/arch/nios2/kernel/module.c
> +++ b/arch/nios2/kernel/module.c
> @@ -21,23 +21,12 @@
>   
>   #include <asm/cacheflush.h>
>   
> -/*
> - * Modules should NOT be allocated with kmalloc for (obvious) reasons.
> - * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
> - * from 0x80000000 (vmalloc area) to 0xc00000000 (kernel) (kmalloc returns
> - * addresses in 0xc0000000)
> - */
>   void *module_alloc(unsigned long size)
>   {
> -	if (size == 0)
> -		return NULL;
> -	return kmalloc(size, GFP_KERNEL);
> -}
> -
> -/* Free memory returned from module_alloc */
> -void module_memfree(void *module_region)
> -{
> -	kfree(module_region);
> +	return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> +				    GFP_KERNEL, PAGE_KERNEL_EXEC,
> +				    VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> +				    __builtin_return_address(0));
>   }
>   
>   int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,

Acked-by: Dinh Nguyen <dinguyen@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 00/13] mm: jit/text allocator
       [not found]         ` <ZH20XkD74prrdN4u@FVFF77S0Q05N>
       [not found]           ` <CAPhsuW7ntn_HpVWdGK_hYVd3zsPEFToBNfmtt0m6K8SwfxJ66Q@mail.gmail.com>
@ 2023-07-20  8:53           ` Mike Rapoport
  1 sibling, 0 replies; 7+ messages in thread
From: Mike Rapoport @ 2023-07-20  8:53 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kent Overstreet, linux-kernel, Andrew Morton, Catalin Marinas,
	Christophe Leroy, David S. Miller, Dinh Nguyen, Heiko Carstens,
	Helge Deller, Huacai Chen, Luis Chamberlain, Michael Ellerman,
	Naveen N. Rao, Palmer Dabbelt, Russell King, Song Liu,
	Steven Rostedt, Thomas Bogendoerfer, Thomas Gleixner, Will Deacon,
	bpf, linux-arm-kernel, linux-mips, linux-mm, linux-modules,
	linux-parisc, linux-riscv, linux-s390, linux-trace-kernel,
	linuxppc-dev, loongarch, netdev, sparclinux, x86

On Mon, Jun 05, 2023 at 11:09:34AM +0100, Mark Rutland wrote:
> On Mon, Jun 05, 2023 at 12:20:40PM +0300, Mike Rapoport wrote:
> > On Fri, Jun 02, 2023 at 10:35:09AM +0100, Mark Rutland wrote:
> > > On Thu, Jun 01, 2023 at 02:14:56PM -0400, Kent Overstreet wrote:
> > > > On Thu, Jun 01, 2023 at 05:12:03PM +0100, Mark Rutland wrote:
> > > > > For a while I have wanted to give kprobes its own allocator so that it can work
> > > > > even with CONFIG_MODULES=n, and so that it doesn't have to waste VA space in
> > > > > the modules area.
> > > > > 
> > > > > Given that, I think these should have their own allocator functions that can be
> > > > > provided independently, even if those happen to use common infrastructure.
> > > > 
> > > > How much memory can kprobes conceivably use? I think we also want to try
> > > > to push back on combinatorial new allocators, if we can.
> > > 
> > > That depends on who's using it, and how (e.g. via BPF).
> > > 
> > > To be clear, I'm not necessarily asking for entirely different allocators, but
> > > I do thinkg that we want wrappers that can at least pass distinct start+end
> > > parameters to a common allocator, and for arm64's modules code I'd expect that
> > > we'd keep the range falblack logic out of the common allcoator, and just call
> > > it twice.
> > > 
> > > > > > Several architectures override module_alloc() because of various
> > > > > > constraints where the executable memory can be located and this causes
> > > > > > additional obstacles for improvements of code allocation.
> > > > > > 
> > > > > > This set splits code allocation from modules by introducing
> > > > > > jit_text_alloc(), jit_data_alloc() and jit_free() APIs, replaces call
> > > > > > sites of module_alloc() and module_memfree() with the new APIs and
> > > > > > implements core text and related allocation in a central place.
> > > > > > 
> > > > > > Instead of architecture specific overrides for module_alloc(), the
> > > > > > architectures that require non-default behaviour for text allocation must
> > > > > > fill jit_alloc_params structure and implement jit_alloc_arch_params() that
> > > > > > returns a pointer to that structure. If an architecture does not implement
> > > > > > jit_alloc_arch_params(), the defaults compatible with the current
> > > > > > modules::module_alloc() are used.
> > > > > 
> > > > > As above, I suspect that each of the callsites should probably be using common
> > > > > infrastructure, but I don't think that a single jit_alloc_arch_params() makes
> > > > > sense, since the parameters for each case may need to be distinct.
> > > > 
> > > > I don't see how that follows. The whole point of function parameters is
> > > > that they may be different :)
> > > 
> > > What I mean is that jit_alloc_arch_params() tries to aggregate common
> > > parameters, but they aren't actually common (e.g. the actual start+end range
> > > for allocation).
> > 
> > jit_alloc_arch_params() tries to aggregate architecture constraints and
> > requirements for allocations of executable memory and this exactly what
> > the first 6 patches of this set do.
> > 
> > A while ago Thomas suggested to use a structure that parametrizes
> > architecture constraints by the memory type used in modules [1] and Song
> > implemented the infrastructure for it and x86 part [2].
> > 
> > I liked the idea of defining parameters in a single structure, but I
> > thought that approaching the problem from the arch side rather than from
> > modules perspective will be better starting point, hence these patches.
> > 
> > I don't see a fundamental reason why a single structure cannot describe
> > what is needed for different code allocation cases, be it modules, kprobes
> > or bpf. There is of course an assumption that the core allocations will be
> > the same for all the users, and it seems to me that something like 
> > 
> > * allocate physical memory if allocator caches are empty
> > * map it in vmalloc or modules address space
> > * return memory from the allocator cache to the caller
> > 
> > will work for all usecases.
> > 
> > We might need separate caches for different cases on different
> > architectures, and a way to specify what cache should be used in the
> > allocator API, but that does not contradict a single structure for arch
> > specific parameters, but only makes it more elaborate, e.g. something like
> > 
> > enum jit_type {
> > 	JIT_MODULES_TEXT,
> > 	JIT_MODULES_DATA,
> > 	JIT_KPROBES,
> > 	JIT_FTRACE,
> > 	JIT_BPF,
> > 	JIT_TYPE_MAX,
> > };
> > 
> > struct jit_alloc_params {
> > 	struct jit_range	ranges[JIT_TYPE_MAX];
> > 	/* ... */
> > };
> > 
> > > > Can you give more detail on what parameters you need? If the only extra
> > > > parameter is just "does this allocation need to live close to kernel
> > > > text", that's not that big of a deal.
> > > 
> > > My thinking was that we at least need the start + end for each caller. That
> > > might be it, tbh.
> > 
> > Do you mean that modules will have something like
> > 
> > 	jit_text_alloc(size, MODULES_START, MODULES_END);
> > 
> > and kprobes will have
> > 
> > 	jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > ?
> 
> Yes.
> 
> > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > adding enum jit_type parameter to jit_text_alloc().
> 
> That feels backwards to me; it centralizes a bunch of information about
> distinct users to be able to shove that into a static array, when the callsites
> can pass that information. 
> 
> What's *actually* common after separating out the ranges? Is it just the
> permissions?

Even if for some architecture the only common thing are the permissions,
having a definition for code allocations in a single place an improvement.
The diffstat of the patches is indeed positive (even without comments), but
having a single structure that specifies how the code should be allocated
would IMHO actually reduce the maintenance burden.

And features like caching of large pages and sub-page size allocations are
surely will be easier to opt-in this way.
 
> If we want this to be able to share allocations and so on, why can't we do this
> like a kmem_cache, and have the callsite pass a pointer to the allocator data?
> That would make it easy for callsites to share an allocator or use a distinct
> one.

I've looked into doing this like a kmem_cache with call sites passing the
allocator data, and this gets really hairy. For each user we need to pass
the arch specific parameters to that user, create a cache there and only
then the cache can be used. Since we don't have hooks to setup any of the
users in the arch code, the initialization gets more complex than shoving
everything into an array.

I think that jit_alloc(type, size) is the best way to move forward to let
different users choose their ranges and potentially caches. Differentiation
by the API name will explode even now and it'll get worse if/when new users
will show up and we can't even force users to avoid using PC-relative
addressing because, e.g. RISC-V explicitly switched their BPF JIT to use
that.
 
> Thanks,
> Mark.

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-20  8:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20230601101257.530867-1-rppt@kernel.org>
     [not found] ` <20230601101257.530867-2-rppt@kernel.org>
2023-06-13 22:16   ` [PATCH 01/13] nios2: define virtual address space for modules Dinh Nguyen
     [not found] ` <ZHjDU/mxE+cugpLj@FVFF77S0Q05N.cambridge.arm.com>
     [not found]   ` <ZHjgIH3aX9dCvVZc@moria.home.lan>
     [not found]     ` <ZHm3zUUbwqlsZBBF@FVFF77S0Q05N>
     [not found]       ` <20230605092040.GB3460@kernel.org>
     [not found]         ` <ZH20XkD74prrdN4u@FVFF77S0Q05N>
     [not found]           ` <CAPhsuW7ntn_HpVWdGK_hYVd3zsPEFToBNfmtt0m6K8SwfxJ66Q@mail.gmail.com>
2023-06-08 18:41             ` [PATCH 00/13] mm: jit/text allocator Mike Rapoport
2023-06-09 17:02               ` Song Liu
2023-06-12 21:34                 ` Mike Rapoport
2023-06-13 18:56               ` Kent Overstreet
2023-06-13 21:09                 ` Mike Rapoport
2023-07-20  8:53           ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).