LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
From: Bjorn Helgaas @ 2020-07-14 18:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Keith Busch, Paul Mackerras, sparclinux, Toan Le, Greg Ungerer,
	Marek Vasut, Rob Herring, Lorenzo Pieralisi, Sagi Grimberg,
	Russell King, Ley Foon Tan, Christoph Hellwig, Geert Uytterhoeven,
	Kevin Hilman, linux-pci, Jakub Kicinski, Matt Turner,
	linux-kernel-mentees, Guenter Roeck, Ray Jui, Jens Axboe,
	Ivan Kokshaysky, Shuah Khan, bjorn, Boris Ostrovsky,
	Richard Henderson, Juergen Gross, Bjorn Helgaas,
	Thomas Bogendoerfer, Scott Branden, Jingoo Han,
	Saheed O. Bolarinwa, linux-kernel@vger.kernel.org, Philipp Zabel,
	Greg Kroah-Hartman, Gustavo Pimentel, linuxppc-dev,
	David S. Miller, Heiner Kallweit
In-Reply-To: <CAK8P3a3NWSZw6678k1O2eJ6-c5GuW7484PRvEzU9MEPPrCD-yw@mail.gmail.com>

[trimmed the cc list; it's still too large but maybe arch folks care]

On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote:
> On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa
> <refactormyself@gmail.com> wrote:
> > This goal of these series is to move the definition of *all*
> > PCIBIOS* from include/linux/pci.h to arch/x86 and limit their use
> > within there.  All other tree specific definition will be left for
> > intact. Maybe they can be renamed.
> >
> > PCIBIOS* is an x86 concept as defined by the PCI spec. The
> > returned error codes of PCIBIOS* are positive values and this
> > introduces some complexities which other archs need not incur.
> 
> I think the intention is good, but I find the series in its current
> form very hard to review, in particular the way you touch some
> functions three times with trivial changes. Instead of
> 
> 1) replace PCIBIOS_SUCCESSFUL with 0
> 2) drop pointless 0-comparison
> 3) reformat whitespace
> 
> I would suggest to combine the first two steps into one patch per
> subsystem and drop the third step.

I agree.  BUT please don't just run out and post new patches to do
this.  Let's talk about Arnd's further ideas below first.

> ...
> Maybe the work can be split up differently, with a similar end
> result but fewer and easier reviewed patches. The way I'd look at
> the problem, there are three main areas that can be dealt with one
> at a time:
> 
> a) callers of the high-level config space accessors
>    pci_{write,read}_config_{byte,word,dword}, mostly in device
>    drivers.
> b) low-level implementation of the config space accessors
>     through struct pci_ops
> c) all other occurrences of these constants
> 
> Starting with a), my first question is whether any high-level
> drivers even need to care about errors from these functions. I see
> 4913 callers that ignore the return code, and 576 that actually
> check it, and almost none care about the specific error (as you
> found as well). Unless we conclude that most PCI drivers are wrong,
> could we just change the return type to 'void' and assume they never
> fail for valid arguments on a valid pci_device* ?

I really like this idea.

pci_write_config_*() has one return value, and only 100ish of 2500
callers check for errors.  It's sometimes possible for config
accessors to detect PCI errors and return failure, e.g., device was
removed or didn't respond, but most of them don't, and detecting these
errors is not really that valuable.

pci_read_config_*() is much more interesting because it returns two
things, the function return value and the value read from the PCI
device, and it's complicated to check both. 

Again it's sometimes possible for config read accessors to detect PCI
errors, but in most cases a PCI error means the accessor returns
success and the value from PCI is ~0.

Checking the function return value catches programming errors (bad
alignment, etc) but misses most of the interesting errors (device was
unplugged or reported a PCI error).

Checking the value returned from PCI is tricky because ~0 is a valid
value for some config registers, and only the driver knows for sure.
If the driver knows that ~0 is a possible value, it would have to do
something else, e.g., another config read of a register that *cannot*
be ~0, to see whether it's really an error.

I suspect that if we had a single value to look at it would be easier
to get right.  Error checking with current interface would look like
this:

  err = pci_read_config_word(dev, addr, &val);
  if (err)
    return -EINVAL;

  if (PCI_POSSIBLE_ERROR(val)) {
    /* if driver knows ~0 is invalid */
    return -EINVAL;

    /* if ~0 is potentially a valid value */
    err = pci_read_config_word(dev, PCI_VENDOR_ID, &val2);
    if (err)
      return -EINVAL;

    if (PCI_POSSIBLE_ERROR(val2))
      return -EINVAL;
  }

Error checking with a possible interface that returned only a single
value could look like this:

  val = pci_config_read_word(dev, addr);
  if (PCI_POSSIBLE_ERROR(val)) {
    /* if driver knows ~0 is invalid */
    return -EINVAL;

    /* if ~0 is potentially a valid value */
    val2 = pci_config_read_word(dev, PCI_VENDOR_ID);
    if (PCI_POSSIBLE_ERROR(val2))
      return -EINVAL;
  }

Am I understanding you correctly?

> For b), it might be nice to also change other aspects of the
> interface, e.g. passing a pci_host_bridge pointer plus bus number
> instead of a pci_bus pointer, or having the callback in the
> pci_host_bridge structure.

I like this idea a lot, too.  I think the fact that
pci_bus_read_config_word() requires a pci_bus * complicates things in
a few places.

I think it's completely separate, as you say, and we should defer it
for now because even part a) is a lot of work.  I added it to my list
of possible future projects.

Bjorn

^ permalink raw reply

* Re: [PATCH v2] powerpc/pseries: detect secure and trusted boot state of the system.
From: Mimi Zohar @ 2020-07-14 15:07 UTC (permalink / raw)
  To: Daniel Axtens, Nayna Jain, linuxppc-dev; +Cc: linux-kernel
In-Reply-To: <87y2nmtxce.fsf@dja-thinkpad.axtens.net>

On Tue, 2020-07-14 at 16:38 +1000, Daniel Axtens wrote:
> Hi Nayna,
> 
> Thanks! Would you be able to fold in some of the information from my
> reply to v1 into the changelog? Until we have public PAPR release with
> it, that information is the extent of the public documentation. It would
> be good to get it into the git log rather than just floating around in
> the mail archives!
> 
> A couple of small nits:
> 
> > +	if (enabled)
> > +		goto out;
> > +
> > +	if (!of_property_read_u32(of_root, "ibm,secure-boot", &secureboot)) {
> > +		if (secureboot)
> > +			enabled = (secureboot > 1) ? true : false;
> 
> Your tests double up here - you don't need both the 'if' statement and
> the 'secureboot > 1' ternary operator.
> 
> Just
> 
> +	if (!of_property_read_u32(of_root, "ibm,secure-boot", &secureboot)) {
> +		enabled = (secureboot > 1) ? true : false;
> 
> or even
> 
> +	if (!of_property_read_u32(of_root, "ibm,secure-boot", &secureboot)) {
> +		enabled = (secureboot > 1);
> 
> would work.

I haven't been following this thread, which might be the reason I'm
missing something here.  The patch description should explain why the
test is for "(secureboot > 1)", rather than a fixed number.

thanks,

Mimi

^ permalink raw reply

* [PATCH 07/13] cpufreq: powernv-cpufreq: Fix a bunch of kerneldoc related issues
From: Lee Jones @ 2020-07-14 14:50 UTC (permalink / raw)
  To: rjw, viresh.kumar
  Cc: linux-pm, linuxppc-dev, linux-kernel, Paul Mackerras, Lee Jones,
	linux-arm-kernel
In-Reply-To: <20200714145049.2496163-1-lee.jones@linaro.org>

Repair problems with formatting and missing attributes/parameters, and
demote header comments which do not meet the required standards
applicable to kerneldoc.

Fixes the following W=1 kernel build warning(s):

 drivers/cpufreq/powernv-cpufreq.c:84: warning: Function parameter or member 'last_lpstate_idx' not described in 'global_pstate_info'
 drivers/cpufreq/powernv-cpufreq.c:84: warning: Function parameter or member 'last_gpstate_idx' not described in 'global_pstate_info'
 drivers/cpufreq/powernv-cpufreq.c:84: warning: Function parameter or member 'policy' not described in 'global_pstate_info'
 drivers/cpufreq/powernv-cpufreq.c:182: warning: Function parameter or member 'i' not described in 'idx_to_pstate'
 drivers/cpufreq/powernv-cpufreq.c:201: warning: Function parameter or member 'pstate' not described in 'pstate_to_idx'
 drivers/cpufreq/powernv-cpufreq.c:670: warning: Function parameter or member 't' not described in 'gpstate_timer_handler'
 drivers/cpufreq/powernv-cpufreq.c:670: warning: Excess function parameter 'data' description in 'gpstate_timer_handler'

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 drivers/cpufreq/powernv-cpufreq.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 068cc53abe320..2e5a8b8a4abaa 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -64,13 +64,14 @@
  *				highest_lpstate_idx
  * @last_sampled_time:		Time from boot in ms when global pstates were
  *				last set
- * @last_lpstate_idx,		Last set value of local pstate and global
- * last_gpstate_idx		pstate in terms of cpufreq table index
+ * @last_lpstate_idx:		Last set value of local pstate and global
+ * @last_gpstate_idx:		pstate in terms of cpufreq table index
  * @timer:			Is used for ramping down if cpu goes idle for
  *				a long time with global pstate held high
  * @gpstate_lock:		A spinlock to maintain synchronization between
  *				routines called by the timer handler and
  *				governer's target_index calls
+ * @policy:			Associated CPUFreq policy
  */
 struct global_pstate_info {
 	int highest_lpstate_idx;
@@ -170,7 +171,7 @@ static inline u8 extract_pstate(u64 pmsr_val, unsigned int shift)
 
 /* Use following functions for conversions between pstate_id and index */
 
-/**
+/*
  * idx_to_pstate : Returns the pstate id corresponding to the
  *		   frequency in the cpufreq frequency table
  *		   powernv_freqs indexed by @i.
@@ -188,7 +189,7 @@ static inline u8 idx_to_pstate(unsigned int i)
 	return powernv_freqs[i].driver_data;
 }
 
-/**
+/*
  * pstate_to_idx : Returns the index in the cpufreq frequencytable
  *		   powernv_freqs for the frequency whose corresponding
  *		   pstate id is @pstate.
@@ -660,7 +661,7 @@ static inline void  queue_gpstate_timer(struct global_pstate_info *gpstates)
 /**
  * gpstate_timer_handler
  *
- * @data: pointer to cpufreq_policy on which timer was queued
+ * @t: Timer context used to fetch global pstate info struct
  *
  * This handler brings down the global pstate closer to the local pstate
  * according quadratic equation. Queues a new timer if it is still not equal
-- 
2.25.1


^ permalink raw reply related

* [PATCH 05/13] cpufreq/arch: powerpc: pasemi: Move prototypes to shared header
From: Lee Jones @ 2020-07-14 14:50 UTC (permalink / raw)
  To: rjw, viresh.kumar
  Cc: linux-pm, linuxppc-dev, linux-kernel, Paul Mackerras,
	Olof Johansson, Lee Jones, linux-arm-kernel
In-Reply-To: <20200714145049.2496163-1-lee.jones@linaro.org>

If function callers and providers do not share the same prototypes the
compiler complains of missing prototypes.  Fix this by moving the
already existing prototypes out to a mutually convenient location.

Fixes the following W=1 kernel build warning(s):

 drivers/cpufreq/pasemi-cpufreq.c:109:5: warning: no previous prototype for ‘check_astate’ [-Wmissing-prototypes]
 109 | int check_astate(void)
 | ^~~~~~~~~~~~
 drivers/cpufreq/pasemi-cpufreq.c:114:6: warning: no previous prototype for ‘restore_astate’ [-Wmissing-prototypes]
 114 | void restore_astate(int cpu)
 | ^~~~~~~~~~~~~~

Cc: Olof Johansson <olof@lixom.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 arch/powerpc/platforms/pasemi/pasemi.h    | 15 ------------
 arch/powerpc/platforms/pasemi/powersave.S |  2 ++
 drivers/cpufreq/pasemi-cpufreq.c          |  1 +
 include/linux/platform_data/pasemi.h      | 28 +++++++++++++++++++++++
 4 files changed, 31 insertions(+), 15 deletions(-)
 create mode 100644 include/linux/platform_data/pasemi.h

diff --git a/arch/powerpc/platforms/pasemi/pasemi.h b/arch/powerpc/platforms/pasemi/pasemi.h
index 70b56048ed1be..528d81ef748ad 100644
--- a/arch/powerpc/platforms/pasemi/pasemi.h
+++ b/arch/powerpc/platforms/pasemi/pasemi.h
@@ -15,21 +15,6 @@ extern void __init pasemi_map_registers(void);
 extern void idle_spin(void);
 extern void idle_doze(void);
 
-/* Restore astate to last set */
-#ifdef CONFIG_PPC_PASEMI_CPUFREQ
-extern int check_astate(void);
-extern void restore_astate(int cpu);
-#else
-static inline int check_astate(void)
-{
-	/* Always return >0 so we never power save */
-	return 1;
-}
-static inline void restore_astate(int cpu)
-{
-}
-#endif
-
 extern struct pci_controller_ops pasemi_pci_controller_ops;
 
 #endif /* _PASEMI_PASEMI_H */
diff --git a/arch/powerpc/platforms/pasemi/powersave.S b/arch/powerpc/platforms/pasemi/powersave.S
index d0215d5329ca7..7747b48963286 100644
--- a/arch/powerpc/platforms/pasemi/powersave.S
+++ b/arch/powerpc/platforms/pasemi/powersave.S
@@ -5,6 +5,8 @@
  * Maintained by: Olof Johansson <olof@lixom.net>
  */
 
+#include <linux/platform_data/pasemi.h>
+
 #include <asm/processor.h>
 #include <asm/page.h>
 #include <asm/ppc_asm.h>
diff --git a/drivers/cpufreq/pasemi-cpufreq.c b/drivers/cpufreq/pasemi-cpufreq.c
index c66f566a854cb..c6bb3ecc90ef3 100644
--- a/drivers/cpufreq/pasemi-cpufreq.c
+++ b/drivers/cpufreq/pasemi-cpufreq.c
@@ -15,6 +15,7 @@
 #include <linux/timer.h>
 #include <linux/module.h>
 #include <linux/of_address.h>
+#include <linux/platform_data/pasemi.h>
 
 #include <asm/hw_irq.h>
 #include <asm/io.h>
diff --git a/include/linux/platform_data/pasemi.h b/include/linux/platform_data/pasemi.h
new file mode 100644
index 0000000000000..3fed0687fcc9a
--- /dev/null
+++ b/include/linux/platform_data/pasemi.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 Linaro Ltd.
+ *
+ * Author: Lee Jones <lee.jones@linaro.org>
+ */
+
+#ifndef _LINUX_PLATFORM_DATA_PASEMI_H
+#define _LINUX_PLATFORM_DATA_PASEMI_H
+
+/* Restore astate to last set */
+#ifdef CONFIG_PPC_PASEMI_CPUFREQ
+int check_astate(void);
+void restore_astate(int cpu);
+#else
+static inline int check_astate(void)
+{
+	/* Always return >0 so we never power save */
+	return 1;
+}
+static inline void restore_astate(int cpu)
+{
+}
+#endif
+
+#endif /* _LINUX_PLATFORM_DATA_PASEMI_H */
+
+
-- 
2.25.1


^ permalink raw reply related

* [PATCH 06/13] cpufreq: powernv-cpufreq: Functions only used in call-backs should be static
From: Lee Jones @ 2020-07-14 14:50 UTC (permalink / raw)
  To: rjw, viresh.kumar
  Cc: linux-pm, linuxppc-dev, linux-kernel, Paul Mackerras, Lee Jones,
	linux-arm-kernel
In-Reply-To: <20200714145049.2496163-1-lee.jones@linaro.org>

Fixes the following W=1 kernel build warning(s):

 drivers/cpufreq/powernv-cpufreq.c:669:6: warning: no previous prototype for ‘gpstate_timer_handler’ [-Wmissing-prototypes]
 drivers/cpufreq/powernv-cpufreq.c:902:6: warning: no previous prototype for ‘powernv_cpufreq_work_fn’ [-Wmissing-prototypes]

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 drivers/cpufreq/powernv-cpufreq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 8646eb197cd96..068cc53abe320 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -666,7 +666,7 @@ static inline void  queue_gpstate_timer(struct global_pstate_info *gpstates)
  * according quadratic equation. Queues a new timer if it is still not equal
  * to local pstate
  */
-void gpstate_timer_handler(struct timer_list *t)
+static void gpstate_timer_handler(struct timer_list *t)
 {
 	struct global_pstate_info *gpstates = from_timer(gpstates, t, timer);
 	struct cpufreq_policy *policy = gpstates->policy;
@@ -899,7 +899,7 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
 	.notifier_call = powernv_cpufreq_reboot_notifier,
 };
 
-void powernv_cpufreq_work_fn(struct work_struct *work)
+static void powernv_cpufreq_work_fn(struct work_struct *work)
 {
 	struct chip *chip = container_of(work, struct chip, throttle);
 	struct cpufreq_policy *policy;
-- 
2.25.1


^ permalink raw reply related

* [PATCH -next] cpuidle/pseries: Make symbol 'pseries_idle_driver' static
From: Wei Yongjun @ 2020-07-14 14:24 UTC (permalink / raw)
  To: Hulk Robot, Rafael J. Wysocki, Daniel Lezcano, Michael Ellerman
  Cc: linuxppc-dev, Wei Yongjun, linux-pm

The sparse tool complains as follows:

drivers/cpuidle/cpuidle-pseries.c:25:23: warning:
 symbol 'pseries_idle_driver' was not declared. Should it be static?
 
'pseries_idle_driver' is not used outside of this file, so marks
it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/cpuidle/cpuidle-pseries.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index 6513ef2af66a..3e058ad2bb51 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -22,7 +22,7 @@
 #include <asm/idle.h>
 #include <asm/plpar_wrappers.h>
 
-struct cpuidle_driver pseries_idle_driver = {
+static struct cpuidle_driver pseries_idle_driver = {
 	.name             = "pseries_idle",
 	.owner            = THIS_MODULE,
 };


^ permalink raw reply related

* [PATCH -next] cpufreq: powernv: Make some symbols static
From: Wei Yongjun @ 2020-07-14 14:23 UTC (permalink / raw)
  To: Hulk Robot, Rafael J. Wysocki, Viresh Kumar, Michael Ellerman
  Cc: linuxppc-dev, Wei Yongjun, linux-pm

The sparse tool complains as follows:

drivers/cpufreq/powernv-cpufreq.c:88:1: warning:
 symbol 'pstate_revmap' was not declared. Should it be static?
drivers/cpufreq/powernv-cpufreq.c:383:18: warning:
 symbol 'cpufreq_freq_attr_cpuinfo_nominal_freq' was not declared. Should it be static?
drivers/cpufreq/powernv-cpufreq.c:669:6: warning:
 symbol 'gpstate_timer_handler' was not declared. Should it be static?
drivers/cpufreq/powernv-cpufreq.c:902:6: warning:
 symbol 'powernv_cpufreq_work_fn' was not declared. Should it be static?

Those symbols are not used outside of this file, so mark
them static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 8646eb197cd9..cf118263ec65 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -85,7 +85,7 @@ struct global_pstate_info {
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 
-DEFINE_HASHTABLE(pstate_revmap, POWERNV_MAX_PSTATES_ORDER);
+static DEFINE_HASHTABLE(pstate_revmap, POWERNV_MAX_PSTATES_ORDER);
 /**
  * struct pstate_idx_revmap_data: Entry in the hashmap pstate_revmap
  *				  indexed by a function of pstate id.
@@ -380,7 +380,7 @@ static ssize_t cpuinfo_nominal_freq_show(struct cpufreq_policy *policy,
 		powernv_freqs[powernv_pstate_info.nominal].frequency);
 }
 
-struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq =
+static struct freq_attr cpufreq_freq_attr_cpuinfo_nominal_freq =
 	__ATTR_RO(cpuinfo_nominal_freq);
 
 #define SCALING_BOOST_FREQS_ATTR_INDEX		2
@@ -666,7 +666,7 @@ static inline void  queue_gpstate_timer(struct global_pstate_info *gpstates)
  * according quadratic equation. Queues a new timer if it is still not equal
  * to local pstate
  */
-void gpstate_timer_handler(struct timer_list *t)
+static void gpstate_timer_handler(struct timer_list *t)
 {
 	struct global_pstate_info *gpstates = from_timer(gpstates, t, timer);
 	struct cpufreq_policy *policy = gpstates->policy;
@@ -899,7 +899,7 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
 	.notifier_call = powernv_cpufreq_reboot_notifier,
 };
 
-void powernv_cpufreq_work_fn(struct work_struct *work)
+static void powernv_cpufreq_work_fn(struct work_struct *work)
 {
 	struct chip *chip = container_of(work, struct chip, throttle);
 	struct cpufreq_policy *policy;


^ permalink raw reply related

* Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Peter Zijlstra @ 2020-07-14 13:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, Arnd Bergmann, X86 ML, LKML, Nicholas Piggin,
	Linux-MM, Mathieu Desnoyers, Andy Lutomirski, linuxppc-dev
In-Reply-To: <6D3D1346-DB1E-43EB-812A-184918CCC16A@amacapital.net>

On Tue, Jul 14, 2020 at 05:46:05AM -0700, Andy Lutomirski wrote:
> x86 has this exact problem. At least no more than 64*8 CPUs share the cache line :)

I've seen patches for a 'sparse' bitmap to solve related problems.

It's basically the same code, except it multiplies everything (size,
bit-nr) by a constant to reduce the number of active bits per line.

This sadly doesn't take topology into account, but reducing contention
is still good ofcourse.

^ permalink raw reply

* Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Andy Lutomirski @ 2020-07-14 12:46 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Arnd Bergmann, Peter Zijlstra, X86 ML, LKML, Linux-MM,
	Mathieu Desnoyers, Andy Lutomirski, linuxppc-dev
In-Reply-To: <1594708054.04iuyxuyb5.astroid@bobo.none>



> On Jul 13, 2020, at 11:31 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from Nicholas Piggin's message of July 14, 2020 3:04 pm:
>> Excerpts from Andy Lutomirski's message of July 14, 2020 4:18 am:
>>> 
>>>> On Jul 13, 2020, at 9:48 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
>>>> 
>>>> Excerpts from Andy Lutomirski's message of July 14, 2020 1:59 am:
>>>>>> On Thu, Jul 9, 2020 at 6:57 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>>>>> 
>>>>>> On big systems, the mm refcount can become highly contented when doing
>>>>>> a lot of context switching with threaded applications (particularly
>>>>>> switching between the idle thread and an application thread).
>>>>>> 
>>>>>> Abandoning lazy tlb slows switching down quite a bit in the important
>>>>>> user->idle->user cases, so so instead implement a non-refcounted scheme
>>>>>> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
>>>>>> any remaining lazy ones.
>>>>>> 
>>>>>> On a 16-socket 192-core POWER8 system, a context switching benchmark
>>>>>> with as many software threads as CPUs (so each switch will go in and
>>>>>> out of idle), upstream can achieve a rate of about 1 million context
>>>>>> switches per second. After this patch it goes up to 118 million.
>>>>>> 
>>>>> 
>>>>> I read the patch a couple of times, and I have a suggestion that could
>>>>> be nonsense.  You are, effectively, using mm_cpumask() as a sort of
>>>>> refcount.  You're saying "hey, this mm has no more references, but it
>>>>> still has nonempty mm_cpumask(), so let's send an IPI and shoot down
>>>>> those references too."  I'm wondering whether you actually need the
>>>>> IPI.  What if, instead, you actually treated mm_cpumask as a refcount
>>>>> for real?  Roughly, in __mmdrop(), you would only free the page tables
>>>>> if mm_cpumask() is empty.  And, in the code that removes a CPU from
>>>>> mm_cpumask(), you would check if mm_users == 0 and, if so, check if
>>>>> you just removed the last bit from mm_cpumask and potentially free the
>>>>> mm.
>>>>> 
>>>>> Getting the locking right here could be a bit tricky -- you need to
>>>>> avoid two CPUs simultaneously exiting lazy TLB and thinking they
>>>>> should free the mm, and you also need to avoid an mm with mm_users
>>>>> hitting zero concurrently with the last remote CPU using it lazily
>>>>> exiting lazy TLB.  Perhaps this could be resolved by having mm_count
>>>>> == 1 mean "mm_cpumask() is might contain bits and, if so, it owns the
>>>>> mm" and mm_count == 0 meaning "now it's dead" and using some careful
>>>>> cmpxchg or dec_return to make sure that only one CPU frees it.
>>>>> 
>>>>> Or maybe you'd need a lock or RCU for this, but the idea would be to
>>>>> only ever take the lock after mm_users goes to zero.
>>>> 
>>>> I don't think it's nonsense, it could be a good way to avoid IPIs.
>>>> 
>>>> I haven't seen much problem here that made me too concerned about IPIs 
>>>> yet, so I think the simple patch may be good enough to start with
>>>> for powerpc. I'm looking at avoiding/reducing the IPIs by combining the
>>>> unlazying with the exit TLB flush without doing anything fancy with
>>>> ref counting, but we'll see.
>>> 
>>> I would be cautious with benchmarking here. I would expect that the
>>> nasty cases may affect power consumption more than performance — the 
>>> specific issue is IPIs hitting idle cores, and the main effects are to 
>>> slow down exit() a bit but also to kick the idle core out of idle. 
>>> Although, if the idle core is in a deep sleep, that IPI could be 
>>> *very* slow.
>> 
>> It will tend to be self-limiting to some degree (deeper idle cores
>> would tend to have less chance of IPI) but we have bigger issues on
>> powerpc with that, like broadcast IPIs to the mm cpumask for THP
>> management. Power hasn't really shown up as an issue but powerpc
>> CPUs may have their own requirements and issues there, shall we say.
>> 
>>> So I think it’s worth at least giving this a try.
>> 
>> To be clear it's not a complete solution itself. The problem is of 
>> course that mm cpumask gives you false negatives, so the bits
>> won't always clean up after themselves as CPUs switch away from their
>> lazy tlb mms.
> 
> ^^
> 
> False positives: CPU is in the mm_cpumask, but is not using the mm
> as a lazy tlb. So there can be bits left and never freed.
> 
> If you closed the false positives, you're back to a shared mm cache
> line on lazy mm context switches.

x86 has this exact problem. At least no more than 64*8 CPUs share the cache line :)

Can your share your benchmark?

> 
> Thanks,
> Nick

^ permalink raw reply

* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14 12:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
	linux-kernel, Philipp Rudo, Torsten Duwe, Masami Hiramatsu,
	Andrew Morton, Mark Rutland, James E.J. Bottomley, Vincent Chen,
	Omar Sandoval, open list:S390, Joe Lawrence, Helge Deller,
	John Fastabend, Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
	Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
	Alexei Starovoitov, Atish Patra, Will Deacon, Daniel Borkmann,
	Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, David Howells,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
	Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
	KP Singh, Dmitry Vyukov, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714112927.GV10769@hirez.programming.kicks-ass.net>

On Tue, Jul 14, 2020 at 01:29:27PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 14, 2020 at 11:28:27AM +0100, Will Deacon wrote:
> 
> > As Ard says, module_alloc() _is_ special, in the sense that the virtual
> > memory it allocates wants to be close to the kernel text, whereas the
> > concept of allocating executable memory is broader and doesn't have these
> > restrictions. So, while I'm in favour of having a text_alloc() interface
> > that can be used by callers which only require an executable mapping, I'd
> > much prefer for the module_alloc() code to remain for, err, modules.
> 
> So on x86 all those things (kprobes, bpf, ftrace) require that same
> closeness.
> 
> An interface like the late vmalloc_exec() will simply not work for us.
> 
> We recently talked about arm64-kprobes and how you're not doing any of
> the optimizations and fully rely on the exception return. And I see
> you're one of the few archs that has bpf_jit_alloc_exec() (also,
> shouldn't you be using PAGE_KERNEL_EXEC there?). But the BPF core seems
> to use module_alloc() as a default means of allocating text.
> 
> 
> So what should this look like? Have a text_alloc() with an argument that
> indicates where? But then I suppose we also need a means to manage PLT
> entries. Otherwise I don't exactly see how you're going to call BPF
> code, or how that BPF stuff is going to call back into its helpers.

To make this less of a havoc to arch maintainers what if:

void * __weak module_alloc(unsigned long size)
{
	if (IS_ENABLED(HAS_TEXT_ALLOC))
		return text_alloc(size);

	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
			GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
			NUMA_NO_NODE, __builtin_return_address(0));
}

Then in arch/x86/Kconfig I could put:

config HAS_TEXT_ALLOC
	def_bool y

This would scale down the patch set just to add kernel/text.c and
arch/x86/kernel/text.c, and allows gradual migration to other arch's.

/Jarkko

^ permalink raw reply

* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14 12:11 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
	Anup Patel, Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
	Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
	Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
	Atish Patra, Will Deacon, Daniel Borkmann, Masahiro Yamada,
	Nayna Jain, Ley Foon Tan, Christian Borntraeger, Dmitry Vyukov,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
	Albert Ou, Paul E. McKenney, Josh Poimboeuf, KP Singh,
	Gerald Schaefer, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Sergey Senozhatsky, Palmer Dabbelt,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714103333.GB1551@shell.armlinux.org.uk>

On Tue, Jul 14, 2020 at 11:33:33AM +0100, Russell King - ARM Linux admin wrote:
> On Tue, Jul 14, 2020 at 01:17:22PM +0300, Ard Biesheuvel wrote:
> > On Tue, 14 Jul 2020 at 12:53, Jarkko Sakkinen
> > <jarkko.sakkinen@linux.intel.com> wrote:
> > >
> > > On Mon, Jul 13, 2020 at 10:49:48PM +0300, Ard Biesheuvel wrote:
> > > > This patch suggests that there are other reasons why conflating
> > > > allocation of module space and allocating  text pages for other uses
> > > > is a bad idea, but switching all users to text_alloc() is a step in
> > > > the wrong direction. It would be better to stop using module_alloc()
> > > > in core code except in the module loader, and have a generic
> > > > text_alloc() that can be overridden by the arch if necessary. Note
> > > > that x86  and s390 are the only architectures that use module_alloc()
> > > > in ftrace code.
> > >
> > > This series essentially does this: introduces text_alloc() and
> > > text_memfree(), which have generic implementations in kernel/text.c.
> > > Those can be overriddent by arch specific implementations.
> > >
> > > What you think should be done differently than in my patch set?
> > >
> > 
> > On arm64, module_alloc is only used by the module loader, and so
> > pulling it out and renaming it will cause unused code to be
> > incorporated into the kernel when building without module support,
> > which is the use case you claim to be addressing.
> > 
> > Module_alloc has semantics that are intimately tied to the module
> > loader, but over the years, it ended up being (ab)used by other
> > subsystems, which don't require those semantics but just need n pages
> > of vmalloc space with executable permissions.
> > 
> > So the correct approach is to make text_alloc() implement just that,
> > generically, and switch bpf etc to use it. Then, only on architectures
> > that need it, override it with an implementation that has the required
> > additional semantics.
> > 
> > Refactoring 10+ architectures like this without any regard for how
> > text_alloc() deviates from module_alloc() just creates a lot of churn
> > that others will have to clean up after you.
> 
> For 32-bit ARM, our bpf code uses "blx/bx" (or equivalent code
> sequences) rather than encoding a "bl" or "b", so BPF there doesn't
> care where the executable memory is mapped, and doesn't need any
> PLTs.  Given that, should bpf always allocate from the vmalloc()
> region to preserve the module space for modules?

Most of the allocators use __vmalloc_node_range() but arch/nios2
uses just plain kmalloc():

/*
 * Modules should NOT be allocated with kmalloc for (obvious) reasons.
 * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
 * from 0x80000000 (vmalloc area) to 0xc00000000 (kernel) (kmalloc returns
 * addresses in 0xc0000000)
 */
void *module_alloc(unsigned long size)
{
	if (size == 0)
		return NULL;
	return kmalloc(size, GFP_KERNEL);
}

Also consider arch/x86 module_alloc():

void *module_alloc(unsigned long size)
{
	void *p;

	if (PAGE_ALIGN(size) > MODULES_LEN)
		return NULL;

	p = __vmalloc_node_range(size, MODULE_ALIGN,
				    MODULES_VADDR + get_module_load_offset(),
				    MODULES_END, GFP_KERNEL,
				    PAGE_KERNEL, 0, NUMA_NO_NODE,
				    __builtin_return_address(0));
	if (p && (kasan_module_alloc(p, size) < 0)) {
		vfree(p);
		return NULL;
	}

	return p;
}

The generic version is

void * __weak module_alloc(unsigned long size)
{
	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
			GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
			NUMA_NO_NODE, __builtin_return_address(0));
}

There is quite a lot of divergence from the generic version.

However, in other arch's it's mostly just divergence in vmalloc()
parameters and not as radical as in x86.

I could probably limit the total havoc to just nios2 and x86 if there
is a set of vmalloc parameters that work for all arch's. Then there
could be kernel/text.c and re-implementations for x86 and nios2.

I'm all for having separate text_alloc() and text_memfree() if these
issues can be somehow sorted out.

/Jarkko

^ permalink raw reply

* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14 11:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
	Anup Patel, Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
	Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
	Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
	Atish Patra, Will Deacon, Daniel Borkmann, Masahiro Yamada,
	Nayna Jain, Ley Foon Tan, Christian Borntraeger, Dmitry Vyukov,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, open list:RISC-V ARCHITECTURE, Vincenzo Frascino,
	Anders Roxell, Sven Schnelle,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
	Mike Rapoport, Ingo Molnar, Albert Ou, Paul E. McKenney,
	Josh Poimboeuf, KP Singh, Gerald Schaefer, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Sergey Senozhatsky, Palmer Dabbelt,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <CAMj1kXGV_bWehdQvxaMBTOYHXUoFjifBWNpyVy3gaWKktko1mg@mail.gmail.com>

On Tue, Jul 14, 2020 at 01:17:22PM +0300, Ard Biesheuvel wrote:
> > This series essentially does this: introduces text_alloc() and
> > text_memfree(), which have generic implementations in kernel/text.c.
> > Those can be overriddent by arch specific implementations.
> >
> > What you think should be done differently than in my patch set?
> >
> 
> On arm64, module_alloc is only used by the module loader, and so
> pulling it out and renaming it will cause unused code to be
> incorporated into the kernel when building without module support,
> which is the use case you claim to be addressing.

It certainly does not cause the full module loader to be bundle, only
the allocator.

> Module_alloc has semantics that are intimately tied to the module
> loader, but over the years, it ended up being (ab)used by other
> subsystems, which don't require those semantics but just need n pages
> of vmalloc space with executable permissions.
> 
> So the correct approach is to make text_alloc() implement just that,
> generically, and switch bpf etc to use it. Then, only on architectures
> that need it, override it with an implementation that has the required
> additional semantics.
> 
> Refactoring 10+ architectures like this without any regard for how
> text_alloc() deviates from module_alloc() just creates a lot of churn
> that others will have to clean up after you.

Using generic text_alloc() in kernel/kprobes.c would make it behave
differently in arch's that reimplement module_alloc(). That's the main
driver for my approach.

/Jarkko

^ permalink raw reply

* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Ard Biesheuvel @ 2020-07-14 12:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
	Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, Omar Sandoval, open list:S390,
	Joe Lawrence, Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Yonghong Song, Iurii Zaikin, Andrii Nakryiko, Thomas Huth,
	Vasily Gorbik, moderated list:ARM PORT, Daniel Axtens,
	Damien Le Moal, Martin KaFai Lau, Song Liu, Paul Walmsley,
	Heiko Carstens, Alexei Starovoitov, Jarkko Sakkinen, Atish Patra,
	Will Deacon, Daniel Borkmann, Masahiro Yamada, Nayna Jain,
	Ley Foon Tan, Christian Borntraeger, Sami Tolvanen, Naveen N. Rao,
	Mao Han, Marco Elver, Steven Rostedt, Babu Moger, Borislav Petkov,
	Greentime Hu, Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, David Howells,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, open list:RISC-V ARCHITECTURE, Vincenzo Frascino,
	Anders Roxell, Sven Schnelle,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
	Mike Rapoport, Ingo Molnar, Albert Ou, Paul E. McKenney,
	Josh Poimboeuf, KP Singh, Dmitry Vyukov, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714112927.GV10769@hirez.programming.kicks-ass.net>

On Tue, 14 Jul 2020 at 14:31, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jul 14, 2020 at 11:28:27AM +0100, Will Deacon wrote:
>
> > As Ard says, module_alloc() _is_ special, in the sense that the virtual
> > memory it allocates wants to be close to the kernel text, whereas the
> > concept of allocating executable memory is broader and doesn't have these
> > restrictions. So, while I'm in favour of having a text_alloc() interface
> > that can be used by callers which only require an executable mapping, I'd
> > much prefer for the module_alloc() code to remain for, err, modules.
>
> So on x86 all those things (kprobes, bpf, ftrace) require that same
> closeness.
>
> An interface like the late vmalloc_exec() will simply not work for us.
>

Fair enough. So for x86, implementing text_alloc() as an alias of
module_alloc() makes sense. But that is not the case in general.

> We recently talked about arm64-kprobes and how you're not doing any of
> the optimizations and fully rely on the exception return. And I see
> you're one of the few archs that has bpf_jit_alloc_exec() (also,
> shouldn't you be using PAGE_KERNEL_EXEC there?). But the BPF core seems
> to use module_alloc() as a default means of allocating text.
>

Indeed. Which means it uses up module space which may be scarce,
especially on 32-bit ARM, and gets backed by kasan shadow pages, which
only makes sense for modules (if CONFIG_KASAN=y)

>
> So what should this look like? Have a text_alloc() with an argument that
> indicates where? But then I suppose we also need a means to manage PLT
> entries. Otherwise I don't exactly see how you're going to call BPF
> code, or how that BPF stuff is going to call back into its helpers.
>

If x86 chooses to back its implementation of text_alloc() by
module_alloc(), that is absolutely fine. But arm64 has no use for
text_alloc() at all today (bpf and kprobes don't use module_alloc(),
and ftrace does not implement dynamic trampoline allocation), and in
the general case, bpf, kprobes, ftrace and the module loader all have
different requirements that deviate subtly between architectures.

So perhaps the answer is to have text_alloc() not with a 'where'
argument but with a 'why' argument. Or more simply, just have separate
alloc/free APIs for each case, with generic versions that can be
overridden by the architecture.

^ permalink raw reply

* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Peter Zijlstra @ 2020-07-14 11:45 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
	Anup Patel, Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
	Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
	Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
	Jarkko Sakkinen, Atish Patra, Will Deacon, Daniel Borkmann,
	Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
	Dmitry Vyukov, Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
	Albert Ou, Paul E. McKenney, Josh Poimboeuf, KP Singh,
	Gerald Schaefer, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Sergey Senozhatsky, Palmer Dabbelt,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714103333.GB1551@shell.armlinux.org.uk>

On Tue, Jul 14, 2020 at 11:33:33AM +0100, Russell King - ARM Linux admin wrote:
> For 32-bit ARM, our bpf code uses "blx/bx" (or equivalent code
> sequences) rather than encoding a "bl" or "b", so BPF there doesn't
> care where the executable memory is mapped, and doesn't need any
> PLTs.  Given that, should bpf always allocate from the vmalloc()
> region to preserve the module space for modules?

Ah, okay, then I suspect arm64 does something similar there. Thanks!

> I'm more concerned about ftrace though, but only because I don't
> have the understanding of that subsystem to really say whether there
> are any side effects from having the allocations potentially be out
> of range of a "bl" or "b" instruction.
> 
> If ftrace jumps both to and from the allocated page using a "load
> address to register, branch to register" approach like BPF does, then
> ftrace should be safe - and again, raises the issue that maybe it
> should always come from vmalloc() space.

I think the problem with ftrace is patching multiple instruction;
because it sounds like you'd need something to load the absolute address
in a register and then jump to that. And where it's relatively easy to
replace a single instruction, replace multiple instructions gets real
tricky real quick.

Which then leads to you being stuck with that 26bit displacement, IIRC.


^ permalink raw reply

* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14 11:43 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
	Anup Patel, linux-kernel, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
	Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
	Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
	Atish Patra, Will Deacon, Daniel Borkmann, Masahiro Yamada,
	Nayna Jain, Ley Foon Tan, Christian Borntraeger, Dmitry Vyukov,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
	Albert Ou, Paul E. McKenney, Josh Poimboeuf, KP Singh,
	Gerald Schaefer, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Sergey Senozhatsky, Palmer Dabbelt,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714095346.GA1551@shell.armlinux.org.uk>

On Tue, Jul 14, 2020 at 10:53:46AM +0100, Russell King - ARM Linux admin wrote:
> On Tue, Jul 14, 2020 at 12:49:28PM +0300, Jarkko Sakkinen wrote:
> > On Mon, Jul 13, 2020 at 07:34:10PM +0100, Russell King - ARM Linux admin wrote:
> > > On Mon, Jul 13, 2020 at 09:19:37PM +0300, Jarkko Sakkinen wrote:
> > > > Rename module_alloc() to text_alloc() and module_memfree() to
> > > > text_memfree(), and move them to kernel/text.c, which is unconditionally
> > > > compiled to the kernel proper. This allows kprobes, ftrace and bpf to
> > > > allocate space for executable code without requiring to compile the modules
> > > > support (CONFIG_MODULES=y) in.
> > > 
> > > I'm not sure this is a good idea for 32-bit ARM.  The code you are
> > > moving for 32-bit ARM is quite specific to module use in that it also
> > > supports allocating from the vmalloc area, where the module code
> > > knows to add PLT entries.
> > > 
> > > If the other proposed users of this text_alloc() do not have the logic
> > > to add PLT entries when branches between kernel code and this
> > > allocation are not reachable by, e.g. a 26-bit signed offset for 32-bit
> > > ARM code, then this code is not suitable for that use.
> > 
> > My intention is to use this in kprobes code in the place of
> > module_alloc().  I'm not sure why moving this code out of the module
> > subsystem could possibly break anything.  Unfortunately I forgot to add
> > covere letter to my series. Sending v2 with that to explain my use case
> > for this.
> 
> Ah, so you're merely renaming module_alloc() to text_alloc() everywhere?
> It sounded from the initial patch like you were also converting other
> users to use module_alloc().

Yes, exactly. I'm making the allocators unconditionally part of the
kernel proper.

My application for this is test kernels. I never compile anything as
modules when I test a release.

Also, I could imagine that especially in small scale embedded devices,
it could be sometimes useful to be able to have tracing support w/o
module support.

/Jarkko

^ permalink raw reply

* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Peter Zijlstra @ 2020-07-14 11:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
	linux-kernel, Philipp Rudo, Torsten Duwe, Masami Hiramatsu,
	Andrew Morton, Mark Rutland, James E.J. Bottomley, Vincent Chen,
	Omar Sandoval, open list:S390, Joe Lawrence, Helge Deller,
	John Fastabend, Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
	Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
	Alexei Starovoitov, Jarkko Sakkinen, Atish Patra, Daniel Borkmann,
	Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, David Howells,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Sven Schnelle, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Jiri Olsa, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
	Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
	KP Singh, Dmitry Vyukov, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714102826.GB4756@willie-the-truck>

On Tue, Jul 14, 2020 at 11:28:27AM +0100, Will Deacon wrote:

> As Ard says, module_alloc() _is_ special, in the sense that the virtual
> memory it allocates wants to be close to the kernel text, whereas the
> concept of allocating executable memory is broader and doesn't have these
> restrictions. So, while I'm in favour of having a text_alloc() interface
> that can be used by callers which only require an executable mapping, I'd
> much prefer for the module_alloc() code to remain for, err, modules.

So on x86 all those things (kprobes, bpf, ftrace) require that same
closeness.

An interface like the late vmalloc_exec() will simply not work for us.

We recently talked about arm64-kprobes and how you're not doing any of
the optimizations and fully rely on the exception return. And I see
you're one of the few archs that has bpf_jit_alloc_exec() (also,
shouldn't you be using PAGE_KERNEL_EXEC there?). But the BPF core seems
to use module_alloc() as a default means of allocating text.


So what should this look like? Have a text_alloc() with an argument that
indicates where? But then I suppose we also need a means to manage PLT
entries. Otherwise I don't exactly see how you're going to call BPF
code, or how that BPF stuff is going to call back into its helpers.



^ permalink raw reply

* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Russell King - ARM Linux admin @ 2020-07-14 10:33 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
	Anup Patel, Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
	Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
	Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
	Jarkko Sakkinen, Atish Patra, Will Deacon, Daniel Borkmann,
	Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
	Dmitry Vyukov, Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, open list:RISC-V ARCHITECTURE, Vincenzo Frascino,
	Anders Roxell, Sven Schnelle,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Mike Rapoport,
	Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
	KP Singh, Gerald Schaefer, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Sergey Senozhatsky, Palmer Dabbelt,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <CAMj1kXGV_bWehdQvxaMBTOYHXUoFjifBWNpyVy3gaWKktko1mg@mail.gmail.com>

On Tue, Jul 14, 2020 at 01:17:22PM +0300, Ard Biesheuvel wrote:
> On Tue, 14 Jul 2020 at 12:53, Jarkko Sakkinen
> <jarkko.sakkinen@linux.intel.com> wrote:
> >
> > On Mon, Jul 13, 2020 at 10:49:48PM +0300, Ard Biesheuvel wrote:
> > > This patch suggests that there are other reasons why conflating
> > > allocation of module space and allocating  text pages for other uses
> > > is a bad idea, but switching all users to text_alloc() is a step in
> > > the wrong direction. It would be better to stop using module_alloc()
> > > in core code except in the module loader, and have a generic
> > > text_alloc() that can be overridden by the arch if necessary. Note
> > > that x86  and s390 are the only architectures that use module_alloc()
> > > in ftrace code.
> >
> > This series essentially does this: introduces text_alloc() and
> > text_memfree(), which have generic implementations in kernel/text.c.
> > Those can be overriddent by arch specific implementations.
> >
> > What you think should be done differently than in my patch set?
> >
> 
> On arm64, module_alloc is only used by the module loader, and so
> pulling it out and renaming it will cause unused code to be
> incorporated into the kernel when building without module support,
> which is the use case you claim to be addressing.
> 
> Module_alloc has semantics that are intimately tied to the module
> loader, but over the years, it ended up being (ab)used by other
> subsystems, which don't require those semantics but just need n pages
> of vmalloc space with executable permissions.
> 
> So the correct approach is to make text_alloc() implement just that,
> generically, and switch bpf etc to use it. Then, only on architectures
> that need it, override it with an implementation that has the required
> additional semantics.
> 
> Refactoring 10+ architectures like this without any regard for how
> text_alloc() deviates from module_alloc() just creates a lot of churn
> that others will have to clean up after you.

For 32-bit ARM, our bpf code uses "blx/bx" (or equivalent code
sequences) rather than encoding a "bl" or "b", so BPF there doesn't
care where the executable memory is mapped, and doesn't need any
PLTs.  Given that, should bpf always allocate from the vmalloc()
region to preserve the module space for modules?

I'm more concerned about ftrace though, but only because I don't
have the understanding of that subsystem to really say whether there
are any side effects from having the allocations potentially be out
of range of a "bl" or "b" instruction.

If ftrace jumps both to and from the allocated page using a "load
address to register, branch to register" approach like BPF does, then
ftrace should be safe - and again, raises the issue that maybe it
should always come from vmalloc() space.

So, I think we need to keep module_alloc() for allocating memory for
modules, and provide a new text_alloc() for these other cases.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Will Deacon @ 2020-07-14 10:28 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
	linux-kernel, Philipp Rudo, Torsten Duwe, Masami Hiramatsu,
	Andrew Morton, Mark Rutland, James E.J. Bottomley, Vincent Chen,
	Omar Sandoval, open list:S390, Joe Lawrence, Helge Deller,
	John Fastabend, Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
	Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
	Alexei Starovoitov, Atish Patra, Daniel Borkmann, Masahiro Yamada,
	Nayna Jain, Ley Foon Tan, Christian Borntraeger, Sami Tolvanen,
	Naveen N. Rao, Mao Han, Marco Elver, Steven Rostedt, Babu Moger,
	Borislav Petkov, Greentime Hu, Ben Dooks, Guan Xuetao,
	Thomas Bogendoerfer, open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra, David Howells,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Sven Schnelle, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Jiri Olsa, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
	Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
	KP Singh, Dmitry Vyukov, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714094625.1443261-2-jarkko.sakkinen@linux.intel.com>

On Tue, Jul 14, 2020 at 12:45:36PM +0300, Jarkko Sakkinen wrote:
> Rename module_alloc() to text_alloc() and module_memfree() to
> text_memfree(), and move them to kernel/text.c, which is unconditionally
> compiled to the kernel proper. This allows kprobes, ftrace and bpf to
> allocate space for executable code without requiring to compile the modules
> support (CONFIG_MODULES=y) in.
> 
> Cc: Andi Kleen <ak@linux.intel.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

[...]

> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> index 1cd1a4d0ed30..adde022f703c 100644
> --- a/arch/arm64/kernel/module.c
> +++ b/arch/arm64/kernel/module.c
> @@ -20,48 +20,6 @@
>  #include <asm/insn.h>
>  #include <asm/sections.h>
>  
> -void *module_alloc(unsigned long size)
> -{
> -	u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
> -	gfp_t gfp_mask = GFP_KERNEL;
> -	void *p;
> -
> -	/* Silence the initial allocation */
> -	if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> -		gfp_mask |= __GFP_NOWARN;
> -
> -	if (IS_ENABLED(CONFIG_KASAN))
> -		/* don't exceed the static module region - see below */
> -		module_alloc_end = MODULES_END;
> -
> -	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> -				module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
> -				NUMA_NO_NODE, __builtin_return_address(0));
> -
> -	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> -	    !IS_ENABLED(CONFIG_KASAN))
> -		/*
> -		 * KASAN can only deal with module allocations being served
> -		 * from the reserved module region, since the remainder of
> -		 * the vmalloc region is already backed by zero shadow pages,
> -		 * and punching holes into it is non-trivial. Since the module
> -		 * region is not randomized when KASAN is enabled, it is even
> -		 * less likely that the module region gets exhausted, so we
> -		 * can simply omit this fallback in that case.
> -		 */
> -		p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> -				module_alloc_base + SZ_2G, GFP_KERNEL,
> -				PAGE_KERNEL, 0, NUMA_NO_NODE,
> -				__builtin_return_address(0));
> -
> -	if (p && (kasan_module_alloc(p, size) < 0)) {
> -		vfree(p);
> -		return NULL;
> -	}
> -
> -	return p;
> -}
> -
>  enum aarch64_reloc_op {
>  	RELOC_OP_NONE,
>  	RELOC_OP_ABS,
> diff --git a/arch/arm64/kernel/text.c b/arch/arm64/kernel/text.c
> new file mode 100644
> index 000000000000..64fc7e2d85df
> --- /dev/null
> +++ b/arch/arm64/kernel/text.c
> @@ -0,0 +1,54 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * AArch64 loadable module support.
> + *
> + * Copyright (C) 2012 ARM Limited
> + *
> + * Author: Will Deacon <will.deacon@arm.com>
> + */
> +#include <linux/kasan.h>
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +void *text_alloc(unsigned long size)
> +{
> +	u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
> +	gfp_t gfp_mask = GFP_KERNEL;
> +	void *p;
> +
> +	/* Silence the initial allocation */
> +	if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> +		gfp_mask |= __GFP_NOWARN;
> +
> +	if (IS_ENABLED(CONFIG_KASAN))
> +		/* don't exceed the static module region - see below */
> +		module_alloc_end = MODULES_END;
> +
> +	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> +				module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
> +				NUMA_NO_NODE, __builtin_return_address(0));
> +
> +	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> +	    !IS_ENABLED(CONFIG_KASAN))
> +		/*
> +		 * KASAN can only deal with module allocations being served
> +		 * from the reserved module region, since the remainder of
> +		 * the vmalloc region is already backed by zero shadow pages,
> +		 * and punching holes into it is non-trivial. Since the module
> +		 * region is not randomized when KASAN is enabled, it is even
> +		 * less likely that the module region gets exhausted, so we
> +		 * can simply omit this fallback in that case.
> +		 */
> +		p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> +				module_alloc_base + SZ_2G, GFP_KERNEL,
> +				PAGE_KERNEL, 0, NUMA_NO_NODE,
> +				__builtin_return_address(0));
> +
> +	if (p && (kasan_module_alloc(p, size) < 0)) {
> +		vfree(p);
> +		return NULL;
> +	}
> +
> +	return p;
> +}

I'm not especially keen on this approach.

As Ard says, module_alloc() _is_ special, in the sense that the virtual
memory it allocates wants to be close to the kernel text, whereas the
concept of allocating executable memory is broader and doesn't have these
restrictions. So, while I'm in favour of having a text_alloc() interface
that can be used by callers which only require an executable mapping, I'd
much prefer for the module_alloc() code to remain for, err, modules.

Thanks,

Will

^ permalink raw reply

* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Ard Biesheuvel @ 2020-07-14 10:17 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
	Anup Patel, Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
	Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
	Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
	Atish Patra, Will Deacon, Daniel Borkmann, Masahiro Yamada,
	Nayna Jain, Ley Foon Tan, Christian Borntraeger, Dmitry Vyukov,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, open list:RISC-V ARCHITECTURE, Vincenzo Frascino,
	Anders Roxell, Sven Schnelle,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
	Mike Rapoport, Ingo Molnar, Albert Ou, Paul E. McKenney,
	Josh Poimboeuf, KP Singh, Gerald Schaefer, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Sergey Senozhatsky, Palmer Dabbelt,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714095243.GB1442951@linux.intel.com>

On Tue, 14 Jul 2020 at 12:53, Jarkko Sakkinen
<jarkko.sakkinen@linux.intel.com> wrote:
>
> On Mon, Jul 13, 2020 at 10:49:48PM +0300, Ard Biesheuvel wrote:
> > This patch suggests that there are other reasons why conflating
> > allocation of module space and allocating  text pages for other uses
> > is a bad idea, but switching all users to text_alloc() is a step in
> > the wrong direction. It would be better to stop using module_alloc()
> > in core code except in the module loader, and have a generic
> > text_alloc() that can be overridden by the arch if necessary. Note
> > that x86  and s390 are the only architectures that use module_alloc()
> > in ftrace code.
>
> This series essentially does this: introduces text_alloc() and
> text_memfree(), which have generic implementations in kernel/text.c.
> Those can be overriddent by arch specific implementations.
>
> What you think should be done differently than in my patch set?
>

On arm64, module_alloc is only used by the module loader, and so
pulling it out and renaming it will cause unused code to be
incorporated into the kernel when building without module support,
which is the use case you claim to be addressing.

Module_alloc has semantics that are intimately tied to the module
loader, but over the years, it ended up being (ab)used by other
subsystems, which don't require those semantics but just need n pages
of vmalloc space with executable permissions.

So the correct approach is to make text_alloc() implement just that,
generically, and switch bpf etc to use it. Then, only on architectures
that need it, override it with an implementation that has the required
additional semantics.

Refactoring 10+ architectures like this without any regard for how
text_alloc() deviates from module_alloc() just creates a lot of churn
that others will have to clean up after you.

^ permalink raw reply

* Re: [PATCH 06/20] Documentation: gpu/komeda-kms: eliminate duplicated word
From: Daniel Vetter @ 2020-07-14 10:10 UTC (permalink / raw)
  To: james qian wang (Arm Technology China)
  Cc: kvm, linux-doc, David Airlie, kgdb-bugreport, linux-fpga,
	Liviu Dudau, dri-devel, Douglas Anderson, Paul Cercueil, keyrings,
	Paul Mackerras, linux-i2c, Pavel Machek, Srinivas Pandruvada,
	Mihail Atanassov, linux-leds, linux-s390, Daniel Thompson,
	linux-scsi, Jonathan Corbet, Masahiro Yamada, Matthew Wilcox,
	Halil Pasic, Jarkko Sakkinen, Mali DP Maintainers, linux-input,
	Derek Kiernan, linux-mips, Dragan Cvetic, Wu Hao, Tony Krowiak,
	linux-kbuild, James E.J. Bottomley, Jiri Kosina, Hannes Reinecke,
	linux-block, Thomas Bogendoerfer, Jacek Anaszewski, linux-mm,
	Dan Williams, nd, Andrew Morton, Mimi Zohar, Jens Axboe,
	Michal Marek, Martin K. Petersen, Pierre Morel, Randy Dunlap,
	linux-kernel, Wolfram Sang, Daniel Vetter, Jason Wessel,
	Paolo Bonzini, linux-integrity, linuxppc-dev, Mike Rapoport,
	Dan Murphy
In-Reply-To: <20200708050821.GA1121718@jamwan02-TSP300>

This and next patch merged to drm-misc-next, thanks.

On Wed, Jul 08, 2020 at 01:08:21PM +0800, james qian wang (Arm Technology China) wrote:
> Hi Randy
> 
> On Tue, Jul 07, 2020 at 11:04:00AM -0700, Randy Dunlap wrote:
> > Drop the doubled word "and".
> > 
> > Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: linux-doc@vger.kernel.org
> > Cc: James (Qian) Wang <james.qian.wang@arm.com>
> > Cc: Liviu Dudau <liviu.dudau@arm.com>
> > Cc: Mihail Atanassov <mihail.atanassov@arm.com>
> > Cc: Mali DP Maintainers <malidp@foss.arm.com>
> > ---
> >  Documentation/gpu/komeda-kms.rst |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > --- linux-next-20200701.orig/Documentation/gpu/komeda-kms.rst
> > +++ linux-next-20200701/Documentation/gpu/komeda-kms.rst
> > @@ -41,7 +41,7 @@ Compositor blends multiple layers or pix
> >  frame. its output frame can be fed into post image processor for showing it on
> >  the monitor or fed into wb_layer and written to memory at the same time.
> >  user can also insert a scaler between compositor and wb_layer to down scale
> > -the display frame first and and then write to memory.
> > +the display frame first and then write to memory.
> 
> Thank you for the patch.
> 
> Reviewed-by: James Qian Wang <james.qian.wang@arm.com>

James, for simple patches like this just go ahead and merge them. You're
the maintainer for this, just slapping an r-b onto a patch and no
indiciation whether you will pick it up only confuses people and increases
the risk that patches get lost.

So either pick up right away, or state clearly that you will pick it up
later, or that you expect someone else to merge this.

Thanks, Daniel
> 
> >  Writeback Layer (wb_layer)
> >  --------------------------

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply

* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Russell King - ARM Linux admin @ 2020-07-14  9:53 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
	Anup Patel, linux-kernel, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
	Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
	Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
	Atish Patra, Will Deacon, Daniel Borkmann, Masahiro Yamada,
	Nayna Jain, Ley Foon Tan, Christian Borntraeger, Dmitry Vyukov,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
	Albert Ou, Paul E. McKenney, Josh Poimboeuf, KP Singh,
	Gerald Schaefer, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Sergey Senozhatsky, Palmer Dabbelt,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714094928.GA1442951@linux.intel.com>

On Tue, Jul 14, 2020 at 12:49:28PM +0300, Jarkko Sakkinen wrote:
> On Mon, Jul 13, 2020 at 07:34:10PM +0100, Russell King - ARM Linux admin wrote:
> > On Mon, Jul 13, 2020 at 09:19:37PM +0300, Jarkko Sakkinen wrote:
> > > Rename module_alloc() to text_alloc() and module_memfree() to
> > > text_memfree(), and move them to kernel/text.c, which is unconditionally
> > > compiled to the kernel proper. This allows kprobes, ftrace and bpf to
> > > allocate space for executable code without requiring to compile the modules
> > > support (CONFIG_MODULES=y) in.
> > 
> > I'm not sure this is a good idea for 32-bit ARM.  The code you are
> > moving for 32-bit ARM is quite specific to module use in that it also
> > supports allocating from the vmalloc area, where the module code
> > knows to add PLT entries.
> > 
> > If the other proposed users of this text_alloc() do not have the logic
> > to add PLT entries when branches between kernel code and this
> > allocation are not reachable by, e.g. a 26-bit signed offset for 32-bit
> > ARM code, then this code is not suitable for that use.
> 
> My intention is to use this in kprobes code in the place of
> module_alloc().  I'm not sure why moving this code out of the module
> subsystem could possibly break anything.  Unfortunately I forgot to add
> covere letter to my series. Sending v2 with that to explain my use case
> for this.

Ah, so you're merely renaming module_alloc() to text_alloc() everywhere?
It sounded from the initial patch like you were also converting other
users to use module_alloc().

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14  9:52 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
	Anup Patel, Linux Kernel Mailing List, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
	Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
	Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
	Atish Patra, Will Deacon, Daniel Borkmann, Masahiro Yamada,
	Nayna Jain, Ley Foon Tan, Christian Borntraeger, Dmitry Vyukov,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, open list:RISC-V ARCHITECTURE, Vincenzo Frascino,
	Anders Roxell, Sven Schnelle,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
	Mike Rapoport, Ingo Molnar, Albert Ou, Paul E. McKenney,
	Josh Poimboeuf, KP Singh, Gerald Schaefer, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Sergey Senozhatsky, Palmer Dabbelt,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <CAMj1kXGhZYxjZTP+_PGdBy4hZgdeeTNUkuaE_eQKwB4pPAYNXA@mail.gmail.com>

On Mon, Jul 13, 2020 at 10:49:48PM +0300, Ard Biesheuvel wrote:
> This patch suggests that there are other reasons why conflating
> allocation of module space and allocating  text pages for other uses
> is a bad idea, but switching all users to text_alloc() is a step in
> the wrong direction. It would be better to stop using module_alloc()
> in core code except in the module loader, and have a generic
> text_alloc() that can be overridden by the arch if necessary. Note
> that x86  and s390 are the only architectures that use module_alloc()
> in ftrace code.

This series essentially does this: introduces text_alloc() and
text_memfree(), which have generic implementations in kernel/text.c.
Those can be overriddent by arch specific implementations.

What you think should be done differently than in my patch set?

/Jarkko

^ permalink raw reply

* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Russell King - ARM Linux admin @ 2020-07-14  9:50 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
	linux-kernel, Philipp Rudo, Torsten Duwe, Masami Hiramatsu,
	Andrew Morton, Mark Rutland, James E.J. Bottomley, Vincent Chen,
	Omar Sandoval, open list:S390, Joe Lawrence, Helge Deller,
	John Fastabend, Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
	Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
	Alexei Starovoitov, Atish Patra, Will Deacon, Daniel Borkmann,
	Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra, David Howells,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
	Albert Ou, Paul E. McKenney, Josh Poimboeuf, KP Singh,
	Dmitry Vyukov, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Palmer Dabbelt, open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200714094625.1443261-2-jarkko.sakkinen@linux.intel.com>

On Tue, Jul 14, 2020 at 12:45:36PM +0300, Jarkko Sakkinen wrote:
> Rename module_alloc() to text_alloc() and module_memfree() to
> text_memfree(), and move them to kernel/text.c, which is unconditionally
> compiled to the kernel proper. This allows kprobes, ftrace and bpf to
> allocate space for executable code without requiring to compile the modules
> support (CONFIG_MODULES=y) in.

As I said in response to your first post of this, which seems to have
gone unacknowledged, I don't think this is a good idea.

So, NAK on the whole concept this without some discussion on the 32-bit
ARM issues.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14  9:49 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Yonghong Song, Thomas Gleixner, Jiri Kosina,
	Anup Patel, linux-kernel, Philipp Rudo, Torsten Duwe,
	Masami Hiramatsu, Andrew Morton, Mark Rutland,
	James E.J. Bottomley, Vincent Chen, open list:S390, Joe Lawrence,
	Helge Deller, John Fastabend, Anil S Keshavamurthy,
	Andrey Ryabinin, Iurii Zaikin, Andrii Nakryiko, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Peter Oberparleiter, Sean Christopherson, Martin KaFai Lau,
	Song Liu, Paul Walmsley, Heiko Carstens, Alexei Starovoitov,
	Atish Patra, Will Deacon, Daniel Borkmann, Masahiro Yamada,
	Nayna Jain, Ley Foon Tan, Christian Borntraeger, Dmitry Vyukov,
	Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
	Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
	Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra,
	open list:SPARC + UltraSPARC (sparc/sparc64), Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Sven Schnelle, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
	Albert Ou, Paul E. McKenney, Josh Poimboeuf, KP Singh,
	Gerald Schaefer, Nick Hu,
	open list:BPF JIT for MIPS (32-BIT AND 64-BIT), open list:MIPS,
	Sergey Senozhatsky, Palmer Dabbelt,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20200713183410.GY1551@shell.armlinux.org.uk>

On Mon, Jul 13, 2020 at 07:34:10PM +0100, Russell King - ARM Linux admin wrote:
> On Mon, Jul 13, 2020 at 09:19:37PM +0300, Jarkko Sakkinen wrote:
> > Rename module_alloc() to text_alloc() and module_memfree() to
> > text_memfree(), and move them to kernel/text.c, which is unconditionally
> > compiled to the kernel proper. This allows kprobes, ftrace and bpf to
> > allocate space for executable code without requiring to compile the modules
> > support (CONFIG_MODULES=y) in.
> 
> I'm not sure this is a good idea for 32-bit ARM.  The code you are
> moving for 32-bit ARM is quite specific to module use in that it also
> supports allocating from the vmalloc area, where the module code
> knows to add PLT entries.
> 
> If the other proposed users of this text_alloc() do not have the logic
> to add PLT entries when branches between kernel code and this
> allocation are not reachable by, e.g. a 26-bit signed offset for 32-bit
> ARM code, then this code is not suitable for that use.

My intention is to use this in kprobes code in the place of
module_alloc().  I'm not sure why moving this code out of the module
subsystem could possibly break anything.  Unfortunately I forgot to add
covere letter to my series. Sending v2 with that to explain my use case
for this.

/Jarkko

^ permalink raw reply

* [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Jarkko Sakkinen @ 2020-07-14  9:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
	Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
	Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
	Philipp Rudo, Torsten Duwe, Masami Hiramatsu, Andrew Morton,
	Mark Rutland, James E.J. Bottomley, Vincent Chen, Omar Sandoval,
	open list:S390, Joe Lawrence, Helge Deller, John Fastabend,
	Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
	Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
	moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
	Martin KaFai Lau, Song Liu, Paul Walmsley, Heiko Carstens,
	Alexei Starovoitov, Jarkko Sakkinen, Atish Patra, Will Deacon,
	Daniel Borkmann, Masahiro Yamada, Nayna Jain, Ley Foon Tan,
	Christian Borntraeger, Sami Tolvanen, Naveen N. Rao, Mao Han,
	Marco Elver, Steven Rostedt, Babu Moger, Borislav Petkov,
	Greentime Hu, Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
	open list:PARISC ARCHITECTURE, Jessica Yu,
	open list:BPF JIT for MIPS 32-BIT AND 64-BIT, David S. Miller,
	Thiago Jung Bauermann, Peter Zijlstra, David Howells,
	open list:SPARC + UltraSPARC sparc/sparc64, Sandipan Das,
	H. Peter Anvin, Amit Daniel Kachhap, Tiezhu Yang, Miroslav Benes,
	Jiri Olsa, Ard Biesheuvel, Vincenzo Frascino, Anders Roxell,
	Sven Schnelle, maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT,
	Russell King, open list:RISC-V ARCHITECTURE, Mike Rapoport,
	Ingo Molnar, Albert Ou, Paul E. McKenney, Josh Poimboeuf,
	KP Singh, Dmitry Vyukov, Nick Hu,
	open list:BPF JIT for MIPS 32-BIT AND 64-BIT, open list:MIPS,
	Palmer Dabbelt, open list:LINUX FOR POWERPC 32-BIT AND 64-BIT
In-Reply-To: <20200714094625.1443261-1-jarkko.sakkinen@linux.intel.com>

Rename module_alloc() to text_alloc() and module_memfree() to
text_memfree(), and move them to kernel/text.c, which is unconditionally
compiled to the kernel proper. This allows kprobes, ftrace and bpf to
allocate space for executable code without requiring to compile the modules
support (CONFIG_MODULES=y) in.

Cc: Andi Kleen <ak@linux.intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/arm/kernel/Makefile         |  3 +-
 arch/arm/kernel/module.c         | 21 -----------
 arch/arm/kernel/text.c           | 33 ++++++++++++++++++
 arch/arm64/kernel/Makefile       |  2 +-
 arch/arm64/kernel/module.c       | 42 ----------------------
 arch/arm64/kernel/text.c         | 54 ++++++++++++++++++++++++++++
 arch/mips/kernel/Makefile        |  2 +-
 arch/mips/kernel/module.c        |  9 -----
 arch/mips/kernel/text.c          | 19 ++++++++++
 arch/mips/net/bpf_jit.c          |  4 +--
 arch/nds32/kernel/Makefile       |  2 +-
 arch/nds32/kernel/module.c       |  7 ----
 arch/nds32/kernel/text.c         | 12 +++++++
 arch/nios2/kernel/Makefile       |  1 +
 arch/nios2/kernel/module.c       | 19 ----------
 arch/nios2/kernel/text.c         | 34 ++++++++++++++++++
 arch/parisc/kernel/Makefile      |  2 +-
 arch/parisc/kernel/module.c      | 11 ------
 arch/parisc/kernel/text.c        | 22 ++++++++++++
 arch/powerpc/net/bpf_jit_comp.c  |  4 +--
 arch/riscv/kernel/Makefile       |  1 +
 arch/riscv/kernel/module.c       | 12 -------
 arch/riscv/kernel/text.c         | 20 +++++++++++
 arch/s390/kernel/Makefile        |  2 +-
 arch/s390/kernel/ftrace.c        |  2 +-
 arch/s390/kernel/module.c        | 16 ---------
 arch/s390/kernel/text.c          | 23 ++++++++++++
 arch/sparc/kernel/Makefile       |  1 +
 arch/sparc/kernel/module.c       | 30 ----------------
 arch/sparc/kernel/text.c         | 39 +++++++++++++++++++++
 arch/sparc/net/bpf_jit_comp_32.c |  6 ++--
 arch/unicore32/kernel/Makefile   |  1 +
 arch/unicore32/kernel/module.c   |  7 ----
 arch/unicore32/kernel/text.c     | 18 ++++++++++
 arch/x86/kernel/Makefile         |  1 +
 arch/x86/kernel/ftrace.c         |  4 +--
 arch/x86/kernel/kprobes/core.c   |  4 +--
 arch/x86/kernel/module.c         | 49 --------------------------
 arch/x86/kernel/text.c           | 60 ++++++++++++++++++++++++++++++++
 include/linux/moduleloader.h     |  4 +--
 kernel/Makefile                  |  2 +-
 kernel/bpf/core.c                |  4 +--
 kernel/kprobes.c                 |  4 +--
 kernel/module.c                  | 37 ++++++--------------
 kernel/text.c                    | 25 +++++++++++++
 45 files changed, 400 insertions(+), 275 deletions(-)
 create mode 100644 arch/arm/kernel/text.c
 create mode 100644 arch/arm64/kernel/text.c
 create mode 100644 arch/mips/kernel/text.c
 create mode 100644 arch/nds32/kernel/text.c
 create mode 100644 arch/nios2/kernel/text.c
 create mode 100644 arch/parisc/kernel/text.c
 create mode 100644 arch/riscv/kernel/text.c
 create mode 100644 arch/s390/kernel/text.c
 create mode 100644 arch/sparc/kernel/text.c
 create mode 100644 arch/unicore32/kernel/text.c
 create mode 100644 arch/x86/kernel/text.c
 create mode 100644 kernel/text.c

diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 89e5d864e923..69bfacfd60ef 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -19,7 +19,8 @@ CFLAGS_REMOVE_return_address.o = -pg
 obj-y		:= elf.o entry-common.o irq.o opcodes.o \
 		   process.o ptrace.o reboot.o \
 		   setup.o signal.o sigreturn_codes.o \
-		   stacktrace.o sys_arm.o time.o traps.o
+		   stacktrace.o sys_arm.o time.o traps.o \
+		   text.o
 
 ifneq ($(CONFIG_ARM_UNWIND),y)
 obj-$(CONFIG_FRAME_POINTER)	+= return_address.o
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index e15444b25ca0..13e3442a6b9f 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -33,27 +33,6 @@
 #define MODULES_VADDR	(((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
 #endif
 
-#ifdef CONFIG_MMU
-void *module_alloc(unsigned long size)
-{
-	gfp_t gfp_mask = GFP_KERNEL;
-	void *p;
-
-	/* Silence the initial allocation */
-	if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
-		gfp_mask |= __GFP_NOWARN;
-
-	p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-				gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-				__builtin_return_address(0));
-	if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
-		return p;
-	return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
-				GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-				__builtin_return_address(0));
-}
-#endif
-
 bool module_init_section(const char *name)
 {
 	return strstarts(name, ".init") ||
diff --git a/arch/arm/kernel/text.c b/arch/arm/kernel/text.c
new file mode 100644
index 000000000000..600143fb909d
--- /dev/null
+++ b/arch/arm/kernel/text.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  linux/arch/arm/kernel/module.c
+ *
+ *  Copyright (C) 2002 Russell King.
+ *  Modified for nommu by Hyok S. Choi
+ *
+ * Module allocation method suggested by Andi Kleen.
+ */
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+#ifdef CONFIG_MMU
+void *text_alloc(unsigned long size)
+{
+	gfp_t gfp_mask = GFP_KERNEL;
+	void *p;
+
+	/* Silence the initial allocation */
+	if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
+		gfp_mask |= __GFP_NOWARN;
+
+	p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+				gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+				__builtin_return_address(0));
+	if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
+		return p;
+	return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
+				GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+				__builtin_return_address(0));
+}
+#endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index a561cbb91d4d..7765a45d71b5 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -19,7 +19,7 @@ obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
 			   return_address.o cpuinfo.o cpu_errata.o		\
 			   cpufeature.o alternative.o cacheinfo.o		\
 			   smp.o smp_spin_table.o topology.o smccc-call.o	\
-			   syscall.o
+			   syscall.o text.o
 
 targets			+= efi-entry.o
 
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 1cd1a4d0ed30..adde022f703c 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -20,48 +20,6 @@
 #include <asm/insn.h>
 #include <asm/sections.h>
 
-void *module_alloc(unsigned long size)
-{
-	u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
-	gfp_t gfp_mask = GFP_KERNEL;
-	void *p;
-
-	/* Silence the initial allocation */
-	if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
-		gfp_mask |= __GFP_NOWARN;
-
-	if (IS_ENABLED(CONFIG_KASAN))
-		/* don't exceed the static module region - see below */
-		module_alloc_end = MODULES_END;
-
-	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
-				module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
-				NUMA_NO_NODE, __builtin_return_address(0));
-
-	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
-	    !IS_ENABLED(CONFIG_KASAN))
-		/*
-		 * KASAN can only deal with module allocations being served
-		 * from the reserved module region, since the remainder of
-		 * the vmalloc region is already backed by zero shadow pages,
-		 * and punching holes into it is non-trivial. Since the module
-		 * region is not randomized when KASAN is enabled, it is even
-		 * less likely that the module region gets exhausted, so we
-		 * can simply omit this fallback in that case.
-		 */
-		p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
-				module_alloc_base + SZ_2G, GFP_KERNEL,
-				PAGE_KERNEL, 0, NUMA_NO_NODE,
-				__builtin_return_address(0));
-
-	if (p && (kasan_module_alloc(p, size) < 0)) {
-		vfree(p);
-		return NULL;
-	}
-
-	return p;
-}
-
 enum aarch64_reloc_op {
 	RELOC_OP_NONE,
 	RELOC_OP_ABS,
diff --git a/arch/arm64/kernel/text.c b/arch/arm64/kernel/text.c
new file mode 100644
index 000000000000..64fc7e2d85df
--- /dev/null
+++ b/arch/arm64/kernel/text.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AArch64 loadable module support.
+ *
+ * Copyright (C) 2012 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ */
+#include <linux/kasan.h>
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+void *text_alloc(unsigned long size)
+{
+	u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
+	gfp_t gfp_mask = GFP_KERNEL;
+	void *p;
+
+	/* Silence the initial allocation */
+	if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
+		gfp_mask |= __GFP_NOWARN;
+
+	if (IS_ENABLED(CONFIG_KASAN))
+		/* don't exceed the static module region - see below */
+		module_alloc_end = MODULES_END;
+
+	p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
+				module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
+				NUMA_NO_NODE, __builtin_return_address(0));
+
+	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
+	    !IS_ENABLED(CONFIG_KASAN))
+		/*
+		 * KASAN can only deal with module allocations being served
+		 * from the reserved module region, since the remainder of
+		 * the vmalloc region is already backed by zero shadow pages,
+		 * and punching holes into it is non-trivial. Since the module
+		 * region is not randomized when KASAN is enabled, it is even
+		 * less likely that the module region gets exhausted, so we
+		 * can simply omit this fallback in that case.
+		 */
+		p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
+				module_alloc_base + SZ_2G, GFP_KERNEL,
+				PAGE_KERNEL, 0, NUMA_NO_NODE,
+				__builtin_return_address(0));
+
+	if (p && (kasan_module_alloc(p, size) < 0)) {
+		vfree(p);
+		return NULL;
+	}
+
+	return p;
+}
diff --git a/arch/mips/kernel/Makefile b/arch/mips/kernel/Makefile
index 8c7a043295ed..37ebf9a7f259 100644
--- a/arch/mips/kernel/Makefile
+++ b/arch/mips/kernel/Makefile
@@ -8,7 +8,7 @@ extra-y		:= head.o vmlinux.lds
 obj-y		+= cmpxchg.o cpu-probe.o branch.o elf.o entry.o genex.o idle.o irq.o \
 		   process.o prom.o ptrace.o reset.o setup.o signal.o \
 		   syscall.o time.o topology.o traps.o unaligned.o watch.o \
-		   vdso.o cacheinfo.o
+		   vdso.o cacheinfo.o text.o
 
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_ftrace.o = -pg
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 3c0c3d1260c1..f5ac4ebc4bad 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -31,15 +31,6 @@ struct mips_hi16 {
 static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
-#ifdef MODULE_START
-void *module_alloc(unsigned long size)
-{
-	return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
-				GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
-				__builtin_return_address(0));
-}
-#endif
-
 static int apply_r_mips_none(struct module *me, u32 *location,
 			     u32 base, Elf_Addr v, bool rela)
 {
diff --git a/arch/mips/kernel/text.c b/arch/mips/kernel/text.c
new file mode 100644
index 000000000000..55ca87a421c3
--- /dev/null
+++ b/arch/mips/kernel/text.c
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ *
+ *  Copyright (C) 2001 Rusty Russell.
+ *  Copyright (C) 2003, 2004 Ralf Baechle (ralf@linux-mips.org)
+ *  Copyright (C) 2005 Thiemo Seufer
+ */
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+#ifdef MODULE_START
+void *text_alloc(unsigned long size)
+{
+	return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
+				GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
+				__builtin_return_address(0));
+}
+#endif
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
index 0af88622c619..2b03f7128809 100644
--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -1233,7 +1233,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
 	build_epilogue(&ctx);
 
 	alloc_size = 4 * ctx.idx;
-	ctx.target = module_alloc(alloc_size);
+	ctx.target = text_alloc(alloc_size);
 	if (ctx.target == NULL)
 		goto out;
 
@@ -1264,7 +1264,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
 void bpf_jit_free(struct bpf_prog *fp)
 {
 	if (fp->jited)
-		module_memfree(fp->bpf_func);
+		text_memfree(fp->bpf_func);
 
 	bpf_prog_unlock_free(fp);
 }
diff --git a/arch/nds32/kernel/Makefile b/arch/nds32/kernel/Makefile
index 394df3f6442c..fc15778c59d1 100644
--- a/arch/nds32/kernel/Makefile
+++ b/arch/nds32/kernel/Makefile
@@ -10,7 +10,7 @@ AFLAGS_head.o		:= -DTEXTADDR=$(TEXTADDR)
 obj-y			:= ex-entry.o ex-exit.o ex-scall.o irq.o \
 			process.o ptrace.o setup.o signal.o \
 			sys_nds32.o time.o traps.o cacheinfo.o \
-			dma.o syscall_table.o vdso.o
+			dma.o syscall_table.o vdso.o text.o
 
 obj-$(CONFIG_MODULES)		+= nds32_ksyms.o module.o
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
diff --git a/arch/nds32/kernel/module.c b/arch/nds32/kernel/module.c
index 3897fd14a21d..3d23a12ed535 100644
--- a/arch/nds32/kernel/module.c
+++ b/arch/nds32/kernel/module.c
@@ -7,13 +7,6 @@
 #include <linux/moduleloader.h>
 #include <linux/pgtable.h>
 
-void *module_alloc(unsigned long size)
-{
-	return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-				    GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
-				    __builtin_return_address(0));
-}
-
 void module_free(struct module *module, void *region)
 {
 	vfree(region);
diff --git a/arch/nds32/kernel/text.c b/arch/nds32/kernel/text.c
new file mode 100644
index 000000000000..6e86eff9aaf0
--- /dev/null
+++ b/arch/nds32/kernel/text.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2017 Andes Technology Corporation
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+void *text_alloc(unsigned long size)
+{
+	return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+				    GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
+				    __builtin_return_address(0));
+}
diff --git a/arch/nios2/kernel/Makefile b/arch/nios2/kernel/Makefile
index 0b645e1e3158..5476fc749f37 100644
--- a/arch/nios2/kernel/Makefile
+++ b/arch/nios2/kernel/Makefile
@@ -18,6 +18,7 @@ obj-y	+= setup.o
 obj-y	+= signal.o
 obj-y	+= sys_nios2.o
 obj-y	+= syscall_table.o
+obj-y	+= text.o
 obj-y	+= time.o
 obj-y	+= traps.o
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 76e0a42d6e36..20a0faf64e38 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -21,25 +21,6 @@
 
 #include <asm/cacheflush.h>
 
-/*
- * Modules should NOT be allocated with kmalloc for (obvious) reasons.
- * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
- * from 0x80000000 (vmalloc area) to 0xc00000000 (kernel) (kmalloc returns
- * addresses in 0xc0000000)
- */
-void *module_alloc(unsigned long size)
-{
-	if (size == 0)
-		return NULL;
-	return kmalloc(size, GFP_KERNEL);
-}
-
-/* Free memory returned from module_alloc */
-void module_memfree(void *module_region)
-{
-	kfree(module_region);
-}
-
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
 			unsigned int symindex, unsigned int relsec,
 			struct module *mod)
diff --git a/arch/nios2/kernel/text.c b/arch/nios2/kernel/text.c
new file mode 100644
index 000000000000..af424174442f
--- /dev/null
+++ b/arch/nios2/kernel/text.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Kernel module support for Nios II.
+ *
+ * Copyright (C) 2004 Microtronix Datacom Ltd.
+ *   Written by Wentao Xu <xuwentao@microtronix.com>
+ * Copyright (C) 2001, 2003 Rusty Russell
+ *
+ * This file is subject to the terms and conditions of the GNU General
+ * Public License.  See the file COPYING in the main directory of this
+ * archive for more details.
+ */
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+/*
+ * Modules should NOT be allocated with kmalloc for (obvious) reasons.
+ * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
+ * from 0x80000000 (vmalloc area) to 0xc00000000 (kernel) (kmalloc returns
+ * addresses in 0xc0000000)
+ */
+void *text_alloc(unsigned long size)
+{
+	if (size == 0)
+		return NULL;
+	return kmalloc(size, GFP_KERNEL);
+}
+
+/* Free memory returned from module_alloc */
+void text_memfree(void *module_region)
+{
+	kfree(module_region);
+}
diff --git a/arch/parisc/kernel/Makefile b/arch/parisc/kernel/Makefile
index 068d90950d93..f71f7ffdae2e 100644
--- a/arch/parisc/kernel/Makefile
+++ b/arch/parisc/kernel/Makefile
@@ -10,7 +10,7 @@ obj-y	     	:= cache.o pacache.o setup.o pdt.o traps.o time.o irq.o \
 		   ptrace.o hardware.o inventory.o drivers.o alternative.o \
 		   signal.o hpmc.o real2.o parisc_ksyms.o unaligned.o \
 		   process.o processor.o pdc_cons.o pdc_chassis.o unwind.o \
-		   patch.o
+		   patch.o text.o
 
 ifdef CONFIG_FUNCTION_TRACER
 # Do not profile debug and lowlevel utilities
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index 7df140545b22..c81e63e2549b 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -192,17 +192,6 @@ static inline int reassemble_22(int as22)
 		((as22 & 0x0003ff) << 3));
 }
 
-void *module_alloc(unsigned long size)
-{
-	/* using RWX means less protection for modules, but it's
-	 * easier than trying to map the text, data, init_text and
-	 * init_data correctly */
-	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
-				    GFP_KERNEL,
-				    PAGE_KERNEL_RWX, 0, NUMA_NO_NODE,
-				    __builtin_return_address(0));
-}
-
 #ifndef CONFIG_64BIT
 static inline unsigned long count_gots(const Elf_Rela *rela, unsigned long n)
 {
diff --git a/arch/parisc/kernel/text.c b/arch/parisc/kernel/text.c
new file mode 100644
index 000000000000..9ff503084191
--- /dev/null
+++ b/arch/parisc/kernel/text.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ *    Linux/PA-RISC Project
+ *    Copyright (C) 2003 Randolph Chung <tausq at debian . org>
+ *    Copyright (C) 2008 Helge Deller <deller@gmx.de>
+ */
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+void *text_alloc(unsigned long size)
+{
+	/*
+	 * Using RWX means less protection for modules, but it's
+	 * easier than trying to map the text, data, init_text and
+	 * init_data correctly.
+	 */
+	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+				    GFP_KERNEL,
+				    PAGE_KERNEL_RWX, 0, NUMA_NO_NODE,
+				    __builtin_return_address(0));
+}
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 0acc9d5fb19e..ba1cef7a812d 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -634,7 +634,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
 
 	proglen = cgctx.idx * 4;
 	alloclen = proglen + FUNCTION_DESCR_SIZE;
-	image = module_alloc(alloclen);
+	image = text_alloc(alloclen);
 	if (!image)
 		goto out;
 
@@ -678,7 +678,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
 void bpf_jit_free(struct bpf_prog *fp)
 {
 	if (fp->jited)
-		module_memfree(fp->bpf_func);
+		text_memfree(fp->bpf_func);
 
 	bpf_prog_unlock_free(fp);
 }
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index b355cf485671..d0b30f286ce6 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -29,6 +29,7 @@ obj-y	+= riscv_ksyms.o
 obj-y	+= stacktrace.o
 obj-y	+= cacheinfo.o
 obj-y	+= patch.o
+obj-y	+= text.o
 obj-$(CONFIG_MMU) += vdso.o vdso/
 
 obj-$(CONFIG_RISCV_M_MODE)	+= clint.o traps_misaligned.o
diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
index 7191342c54da..f6aa66431c9e 100644
--- a/arch/riscv/kernel/module.c
+++ b/arch/riscv/kernel/module.c
@@ -390,15 +390,3 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 
 	return 0;
 }
-
-#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
-#define VMALLOC_MODULE_START \
-	 max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
-void *module_alloc(unsigned long size)
-{
-	return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
-				    VMALLOC_END, GFP_KERNEL,
-				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-				    __builtin_return_address(0));
-}
-#endif
diff --git a/arch/riscv/kernel/text.c b/arch/riscv/kernel/text.c
new file mode 100644
index 000000000000..201608a25641
--- /dev/null
+++ b/arch/riscv/kernel/text.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ *
+ *  Copyright (C) 2017 Zihao Yu
+ */
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
+#define VMALLOC_MODULE_START \
+	 max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
+void *text_alloc(unsigned long size)
+{
+	return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
+				    VMALLOC_END, GFP_KERNEL,
+				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+				    __builtin_return_address(0));
+}
+#endif
diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
index a8f136943deb..9f00c320b938 100644
--- a/arch/s390/kernel/Makefile
+++ b/arch/s390/kernel/Makefile
@@ -40,7 +40,7 @@ obj-y	+= sysinfo.o lgr.o os_info.o machine_kexec.o pgm_check.o
 obj-y	+= runtime_instr.o cache.o fpu.o dumpstack.o guarded_storage.o sthyi.o
 obj-y	+= entry.o reipl.o relocate_kernel.o kdebugfs.o alternative.o
 obj-y	+= nospec-branch.o ipl_vmparm.o machine_kexec_reloc.o unwind_bc.o
-obj-y	+= smp.o
+obj-y	+= smp.o text.o
 
 extra-y				+= head64.o vmlinux.lds
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index b388e87a08bf..a752b1442846 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -134,7 +134,7 @@ static int __init ftrace_plt_init(void)
 {
 	unsigned int *ip;
 
-	ftrace_plt = (unsigned long) module_alloc(PAGE_SIZE);
+	ftrace_plt = (unsigned long) text_alloc(PAGE_SIZE);
 	if (!ftrace_plt)
 		panic("cannot allocate ftrace plt\n");
 	ip = (unsigned int *) ftrace_plt;
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 4055f1c49814..087cb5951de6 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -32,22 +32,6 @@
 
 #define PLT_ENTRY_SIZE 20
 
-void *module_alloc(unsigned long size)
-{
-	void *p;
-
-	if (PAGE_ALIGN(size) > MODULES_LEN)
-		return NULL;
-	p = __vmalloc_node_range(size, MODULE_ALIGN, MODULES_VADDR, MODULES_END,
-				 GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-				 __builtin_return_address(0));
-	if (p && (kasan_module_alloc(p, size) < 0)) {
-		vfree(p);
-		return NULL;
-	}
-	return p;
-}
-
 void module_arch_freeing_init(struct module *mod)
 {
 	if (is_livepatch_module(mod) &&
diff --git a/arch/s390/kernel/text.c b/arch/s390/kernel/text.c
new file mode 100644
index 000000000000..63aaa1ab727b
--- /dev/null
+++ b/arch/s390/kernel/text.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ *  Kernel module help for s390.
+ */
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+void *text_alloc(unsigned long size)
+{
+	void *p;
+
+	if (PAGE_ALIGN(size) > MODULES_LEN)
+		return NULL;
+	p = __vmalloc_node_range(size, MODULE_ALIGN, MODULES_VADDR, MODULES_END,
+				 GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+				 __builtin_return_address(0));
+	if (p && (kasan_module_alloc(p, size) < 0)) {
+		vfree(p);
+		return NULL;
+	}
+	return p;
+}
diff --git a/arch/sparc/kernel/Makefile b/arch/sparc/kernel/Makefile
index 97c0e19263d1..e025f9e1db4a 100644
--- a/arch/sparc/kernel/Makefile
+++ b/arch/sparc/kernel/Makefile
@@ -52,6 +52,7 @@ obj-y                   += prom_common.o
 obj-y                   += prom_$(BITS).o
 obj-y                   += of_device_common.o
 obj-y                   += of_device_$(BITS).o
+obj-y			+= text.o
 obj-$(CONFIG_SPARC64)   += prom_irqtrans.o
 
 obj-$(CONFIG_SPARC32)   += leon_kernel.o
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index df39580f398d..f2babc69f189 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -21,36 +21,6 @@
 
 #include "entry.h"
 
-#ifdef CONFIG_SPARC64
-
-#include <linux/jump_label.h>
-
-static void *module_map(unsigned long size)
-{
-	if (PAGE_ALIGN(size) > MODULES_LEN)
-		return NULL;
-	return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-				GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
-				__builtin_return_address(0));
-}
-#else
-static void *module_map(unsigned long size)
-{
-	return vmalloc(size);
-}
-#endif /* CONFIG_SPARC64 */
-
-void *module_alloc(unsigned long size)
-{
-	void *ret;
-
-	ret = module_map(size);
-	if (ret)
-		memset(ret, 0, size);
-
-	return ret;
-}
-
 /* Make generic code ignore STT_REGISTER dummy undefined symbols.  */
 int module_frob_arch_sections(Elf_Ehdr *hdr,
 			      Elf_Shdr *sechdrs,
diff --git a/arch/sparc/kernel/text.c b/arch/sparc/kernel/text.c
new file mode 100644
index 000000000000..d16663f2c6ba
--- /dev/null
+++ b/arch/sparc/kernel/text.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Kernel module help for sparc64.
+ *
+ * Copyright (C) 2001 Rusty Russell.
+ * Copyright (C) 2002 David S. Miller.
+ */
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+#ifdef CONFIG_SPARC64
+
+#include <linux/jump_label.h>
+
+static void *module_map(unsigned long size)
+{
+	if (PAGE_ALIGN(size) > MODULES_LEN)
+		return NULL;
+	return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+				GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
+				__builtin_return_address(0));
+}
+#else
+static void *module_map(unsigned long size)
+{
+	return vmalloc(size);
+}
+#endif /* CONFIG_SPARC64 */
+
+void *text_alloc(unsigned long size)
+{
+	void *ret;
+
+	ret = module_map(size);
+	if (ret)
+		memset(ret, 0, size);
+
+	return ret;
+}
diff --git a/arch/sparc/net/bpf_jit_comp_32.c b/arch/sparc/net/bpf_jit_comp_32.c
index c8eabb973b86..d9dd513b27b2 100644
--- a/arch/sparc/net/bpf_jit_comp_32.c
+++ b/arch/sparc/net/bpf_jit_comp_32.c
@@ -713,7 +713,7 @@ cond_branch:			f_offset = addrs[i + filter[i].jf];
 				if (unlikely(proglen + ilen > oldproglen)) {
 					pr_err("bpb_jit_compile fatal error\n");
 					kfree(addrs);
-					module_memfree(image);
+					text_memfree(image);
 					return;
 				}
 				memcpy(image + proglen, temp, ilen);
@@ -736,7 +736,7 @@ cond_branch:			f_offset = addrs[i + filter[i].jf];
 			break;
 		}
 		if (proglen == oldproglen) {
-			image = module_alloc(proglen);
+			image = text_alloc(proglen);
 			if (!image)
 				goto out;
 		}
@@ -758,7 +758,7 @@ cond_branch:			f_offset = addrs[i + filter[i].jf];
 void bpf_jit_free(struct bpf_prog *fp)
 {
 	if (fp->jited)
-		module_memfree(fp->bpf_func);
+		text_memfree(fp->bpf_func);
 
 	bpf_prog_unlock_free(fp);
 }
diff --git a/arch/unicore32/kernel/Makefile b/arch/unicore32/kernel/Makefile
index 2f79aa56735b..96eb8cfc8b1e 100644
--- a/arch/unicore32/kernel/Makefile
+++ b/arch/unicore32/kernel/Makefile
@@ -6,6 +6,7 @@
 # Object file lists.
 obj-y				:= dma.o elf.o entry.o process.o ptrace.o
 obj-y				+= setup.o signal.o sys.o stacktrace.o traps.o
+obj-y				+= text.o
 
 obj-$(CONFIG_MODULES)		+= ksyms.o module.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
diff --git a/arch/unicore32/kernel/module.c b/arch/unicore32/kernel/module.c
index 67c89ef2d6ee..e1e703c02379 100644
--- a/arch/unicore32/kernel/module.c
+++ b/arch/unicore32/kernel/module.c
@@ -18,13 +18,6 @@
 
 #include <asm/sections.h>
 
-void *module_alloc(unsigned long size)
-{
-	return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-				GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-				__builtin_return_address(0));
-}
-
 int
 apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned int symindex,
 	       unsigned int relindex, struct module *module)
diff --git a/arch/unicore32/kernel/text.c b/arch/unicore32/kernel/text.c
new file mode 100644
index 000000000000..b94aac824bb8
--- /dev/null
+++ b/arch/unicore32/kernel/text.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * linux/arch/unicore32/kernel/module.c
+ *
+ * Code specific to PKUnity SoC and UniCore ISA
+ *
+ * Copyright (C) 2001-2010 GUAN Xue-tao
+ */
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+void *text_alloc(unsigned long size)
+{
+	return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+				GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+				__builtin_return_address(0));
+}
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index e77261db2391..2878e4b753a0 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -68,6 +68,7 @@ obj-y			+= tsc.o tsc_msr.o io_delay.o rtc.o
 obj-y			+= pci-iommu_table.o
 obj-y			+= resource.o
 obj-y			+= irqflags.o
+obj-y			+= text.o
 
 obj-y				+= process.o
 obj-y				+= fpu/
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 51504566b3a6..f76703ee96f2 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -265,11 +265,11 @@ int __init ftrace_dyn_arch_init(void)
 /* Module allocation simplifies allocating memory for code */
 static inline void *alloc_tramp(unsigned long size)
 {
-	return module_alloc(size);
+	return text_alloc(size);
 }
 static inline void tramp_free(void *tramp)
 {
-	module_memfree(tramp);
+	text_memfree(tramp);
 }
 #else
 /* Trampolines can only be created if modules are supported */
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index ada39ddbc922..e9ac7d3c658e 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -423,7 +423,7 @@ void *alloc_insn_page(void)
 {
 	void *page;
 
-	page = module_alloc(PAGE_SIZE);
+	page = text_alloc(PAGE_SIZE);
 	if (!page)
 		return NULL;
 
@@ -446,7 +446,7 @@ void *alloc_insn_page(void)
 /* Recover page to RW mode before releasing it */
 void free_insn_page(void *page)
 {
-	module_memfree(page);
+	text_memfree(page);
 }
 
 static int arch_copy_kprobe(struct kprobe *p)
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 34b153cbd4ac..261df078f127 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -36,55 +36,6 @@ do {							\
 } while (0)
 #endif
 
-#ifdef CONFIG_RANDOMIZE_BASE
-static unsigned long module_load_offset;
-
-/* Mutex protects the module_load_offset. */
-static DEFINE_MUTEX(module_kaslr_mutex);
-
-static unsigned long int get_module_load_offset(void)
-{
-	if (kaslr_enabled()) {
-		mutex_lock(&module_kaslr_mutex);
-		/*
-		 * Calculate the module_load_offset the first time this
-		 * code is called. Once calculated it stays the same until
-		 * reboot.
-		 */
-		if (module_load_offset == 0)
-			module_load_offset =
-				(get_random_int() % 1024 + 1) * PAGE_SIZE;
-		mutex_unlock(&module_kaslr_mutex);
-	}
-	return module_load_offset;
-}
-#else
-static unsigned long int get_module_load_offset(void)
-{
-	return 0;
-}
-#endif
-
-void *module_alloc(unsigned long size)
-{
-	void *p;
-
-	if (PAGE_ALIGN(size) > MODULES_LEN)
-		return NULL;
-
-	p = __vmalloc_node_range(size, MODULE_ALIGN,
-				    MODULES_VADDR + get_module_load_offset(),
-				    MODULES_END, GFP_KERNEL,
-				    PAGE_KERNEL, 0, NUMA_NO_NODE,
-				    __builtin_return_address(0));
-	if (p && (kasan_module_alloc(p, size) < 0)) {
-		vfree(p);
-		return NULL;
-	}
-
-	return p;
-}
-
 #ifdef CONFIG_X86_32
 int apply_relocate(Elf32_Shdr *sechdrs,
 		   const char *strtab,
diff --git a/arch/x86/kernel/text.c b/arch/x86/kernel/text.c
new file mode 100644
index 000000000000..724ab2d93ac5
--- /dev/null
+++ b/arch/x86/kernel/text.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ *  Kernel module help for x86.
+ *  Copyright (C) 2001 Rusty Russell.
+ */
+#include <linux/kasan.h>
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/random.h>
+#include <linux/vmalloc.h>
+#include <asm/setup.h>
+
+#ifdef CONFIG_RANDOMIZE_BASE
+static unsigned long module_load_offset;
+
+/* Mutex protects the module_load_offset. */
+static DEFINE_MUTEX(module_kaslr_mutex);
+
+static unsigned long get_module_load_offset(void)
+{
+	if (kaslr_enabled()) {
+		mutex_lock(&module_kaslr_mutex);
+		/*
+		 * Calculate the module_load_offset the first time this
+		 * code is called. Once calculated it stays the same until
+		 * reboot.
+		 */
+		if (module_load_offset == 0)
+			module_load_offset =
+				(get_random_int() % 1024 + 1) * PAGE_SIZE;
+		mutex_unlock(&module_kaslr_mutex);
+	}
+	return module_load_offset;
+}
+#else
+static unsigned long get_module_load_offset(void)
+{
+	return 0;
+}
+#endif
+
+void *text_alloc(unsigned long size)
+{
+	void *p;
+
+	if (PAGE_ALIGN(size) > MODULES_LEN)
+		return NULL;
+
+	p = __vmalloc_node_range(size, MODULE_ALIGN,
+				    MODULES_VADDR + get_module_load_offset(),
+				    MODULES_END, GFP_KERNEL,
+				    PAGE_KERNEL, 0, NUMA_NO_NODE,
+				    __builtin_return_address(0));
+	if (p && (kasan_module_alloc(p, size) < 0)) {
+		vfree(p);
+		return NULL;
+	}
+
+	return p;
+}
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 4fa67a8b2265..4e8b9ba431ee 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -24,10 +24,10 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
 
 /* Allocator used for allocating struct module, core sections and init
    sections.  Returns NULL on failure. */
-void *module_alloc(unsigned long size);
+void *text_alloc(unsigned long size);
 
 /* Free memory returned from module_alloc. */
-void module_memfree(void *module_region);
+void text_memfree(void *module_region);
 
 /* Determines if the section name is an init section (that is only used during
  * module loading).
diff --git a/kernel/Makefile b/kernel/Makefile
index f3218bc5ec69..9e88e81f68ef 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -10,7 +10,7 @@ obj-y     = fork.o exec_domain.o panic.o \
 	    extable.o params.o \
 	    kthread.o sys_ni.o nsproxy.o \
 	    notifier.o ksysfs.o cred.o reboot.o \
-	    async.o range.o smpboot.o ucount.o
+	    async.o range.o smpboot.o ucount.o text.o
 
 obj-$(CONFIG_MODULES) += kmod.o
 obj-$(CONFIG_MULTIUSER) += groups.o
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9df4cc9a2907..febd55019a8a 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -840,12 +840,12 @@ static void bpf_jit_uncharge_modmem(u32 pages)
 
 void *__weak bpf_jit_alloc_exec(unsigned long size)
 {
-	return module_alloc(size);
+	return text_alloc(size);
 }
 
 void __weak bpf_jit_free_exec(void *addr)
 {
-	module_memfree(addr);
+	text_memfree(addr);
 }
 
 struct bpf_binary_header *
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 4a904cc56d68..d1c354ec89de 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -111,12 +111,12 @@ enum kprobe_slot_state {
 
 void __weak *alloc_insn_page(void)
 {
-	return module_alloc(PAGE_SIZE);
+	return text_alloc(PAGE_SIZE);
 }
 
 void __weak free_insn_page(void *page)
 {
-	module_memfree(page);
+	text_memfree(page);
 }
 
 struct kprobe_insn_cache kprobe_insn_slots = {
diff --git a/kernel/module.c b/kernel/module.c
index bee1c25ca5c5..bdb3773f3668 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2141,16 +2141,6 @@ static void free_module_elf(struct module *mod)
 }
 #endif /* CONFIG_LIVEPATCH */
 
-void __weak module_memfree(void *module_region)
-{
-	/*
-	 * This memory may be RO, and freeing RO memory in an interrupt is not
-	 * supported by vmalloc.
-	 */
-	WARN_ON(in_interrupt());
-	vfree(module_region);
-}
-
 void __weak module_arch_cleanup(struct module *mod)
 {
 }
@@ -2200,7 +2190,7 @@ static void free_module(struct module *mod)
 
 	/* This may be empty, but that's OK */
 	module_arch_freeing_init(mod);
-	module_memfree(mod->init_layout.base);
+	text_memfree(mod->init_layout.base);
 	kfree(mod->args);
 	percpu_modfree(mod);
 
@@ -2208,7 +2198,7 @@ static void free_module(struct module *mod)
 	lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);
 
 	/* Finally, free the core (containing the module structure) */
-	module_memfree(mod->core_layout.base);
+	text_memfree(mod->core_layout.base);
 }
 
 void *__symbol_get(const char *symbol)
@@ -2781,13 +2771,6 @@ static void dynamic_debug_remove(struct module *mod, struct _ddebug *debug)
 		ddebug_remove_module(mod->name);
 }
 
-void * __weak module_alloc(unsigned long size)
-{
-	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
-			GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
-			NUMA_NO_NODE, __builtin_return_address(0));
-}
-
 bool __weak module_init_section(const char *name)
 {
 	return strstarts(name, ".init");
@@ -3246,7 +3229,7 @@ static int move_module(struct module *mod, struct load_info *info)
 	void *ptr;
 
 	/* Do the allocs. */
-	ptr = module_alloc(mod->core_layout.size);
+	ptr = text_alloc(mod->core_layout.size);
 	/*
 	 * The pointer to this block is stored in the module structure
 	 * which is inside the block. Just mark it as not being a
@@ -3260,7 +3243,7 @@ static int move_module(struct module *mod, struct load_info *info)
 	mod->core_layout.base = ptr;
 
 	if (mod->init_layout.size) {
-		ptr = module_alloc(mod->init_layout.size);
+		ptr = text_alloc(mod->init_layout.size);
 		/*
 		 * The pointer to this block is stored in the module structure
 		 * which is inside the block. This block doesn't need to be
@@ -3269,7 +3252,7 @@ static int move_module(struct module *mod, struct load_info *info)
 		 */
 		kmemleak_ignore(ptr);
 		if (!ptr) {
-			module_memfree(mod->core_layout.base);
+			text_memfree(mod->core_layout.base);
 			return -ENOMEM;
 		}
 		memset(ptr, 0, mod->init_layout.size);
@@ -3452,8 +3435,8 @@ static void module_deallocate(struct module *mod, struct load_info *info)
 {
 	percpu_modfree(mod);
 	module_arch_freeing_init(mod);
-	module_memfree(mod->init_layout.base);
-	module_memfree(mod->core_layout.base);
+	text_memfree(mod->init_layout.base);
+	text_memfree(mod->core_layout.base);
 }
 
 int __weak module_finalize(const Elf_Ehdr *hdr,
@@ -3527,7 +3510,7 @@ static void do_free_init(struct work_struct *w)
 
 	llist_for_each_safe(pos, n, list) {
 		initfree = container_of(pos, struct mod_initfree, node);
-		module_memfree(initfree->module_init);
+		text_memfree(initfree->module_init);
 		kfree(initfree);
 	}
 }
@@ -3626,10 +3609,10 @@ static noinline int do_init_module(struct module *mod)
 	 * We want to free module_init, but be aware that kallsyms may be
 	 * walking this with preempt disabled.  In all the failure paths, we
 	 * call synchronize_rcu(), but we don't want to slow down the success
-	 * path. module_memfree() cannot be called in an interrupt, so do the
+	 * path. text_memfree() cannot be called in an interrupt, so do the
 	 * work and call synchronize_rcu() in a work queue.
 	 *
-	 * Note that module_alloc() on most architectures creates W+X page
+	 * Note that text_alloc() on most architectures creates W+X page
 	 * mappings which won't be cleaned up until do_free_init() runs.  Any
 	 * code such as mark_rodata_ro() which depends on those mappings to
 	 * be cleaned up needs to sync with the queued work - ie
diff --git a/kernel/text.c b/kernel/text.c
new file mode 100644
index 000000000000..9a12c508ded5
--- /dev/null
+++ b/kernel/text.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ *  Copyright (C) 2002 Richard Henderson
+ *  Copyright (C) 2001 Rusty Russell, 2002, 2010 Rusty Russell IBM.
+ */
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+void __weak text_memfree(void *module_region)
+{
+	/*
+	 * This memory may be RO, and freeing RO memory in an interrupt is not
+	 * supported by vmalloc.
+	 */
+	WARN_ON(in_interrupt());
+	vfree(module_region);
+}
+
+void * __weak text_alloc(unsigned long size)
+{
+	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+			GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
+			NUMA_NO_NODE, __builtin_return_address(0));
+}
-- 
2.25.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox