* [powerpc:next-test] BUILD SUCCESS 0fbd1eb4df96e1cbd039e0b95fdf62cf65a7faf9
From: kernel test robot @ 2020-07-17 1:58 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
branch HEAD: 0fbd1eb4df96e1cbd039e0b95fdf62cf65a7faf9 papr/scm: Add bad memory ranges to nvdimm bad ranges
elapsed time: 789m
configs tested: 74
configs skipped: 1
The following configs have been built successfully.
More configs may be tested in the coming days.
arm defconfig
arm allyesconfig
arm allmodconfig
arm allnoconfig
arm64 allyesconfig
arm64 defconfig
arm64 allmodconfig
arm64 allnoconfig
i386 allyesconfig
i386 defconfig
i386 debian-10.3
i386 allnoconfig
ia64 allmodconfig
ia64 defconfig
ia64 allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k allnoconfig
m68k sun3_defconfig
m68k defconfig
m68k allyesconfig
nds32 defconfig
nds32 allnoconfig
csky allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
nios2 defconfig
nios2 allyesconfig
openrisc defconfig
c6x allyesconfig
c6x allnoconfig
openrisc allyesconfig
xtensa allyesconfig
h8300 allyesconfig
h8300 allmodconfig
xtensa defconfig
arc defconfig
arc allyesconfig
sh allmodconfig
sh allnoconfig
microblaze allnoconfig
mips allyesconfig
mips allnoconfig
mips allmodconfig
parisc allnoconfig
parisc defconfig
parisc allyesconfig
parisc allmodconfig
powerpc defconfig
powerpc allyesconfig
powerpc rhel-kconfig
powerpc allmodconfig
powerpc allnoconfig
riscv allyesconfig
riscv allnoconfig
riscv defconfig
riscv allmodconfig
s390 allyesconfig
s390 allnoconfig
s390 allmodconfig
s390 defconfig
sparc allyesconfig
sparc defconfig
sparc64 defconfig
sparc64 allnoconfig
sparc64 allyesconfig
sparc64 allmodconfig
x86_64 rhel
x86_64 lkp
x86_64 fedora-25
x86_64 rhel-7.6-kselftests
x86_64 rhel-8.3
x86_64 kexec
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* [powerpc:merge] BUILD SUCCESS 3a60e5fbdc3520d429d7cd6affed5a8daf120c6b
From: kernel test robot @ 2020-07-17 1:58 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge
branch HEAD: 3a60e5fbdc3520d429d7cd6affed5a8daf120c6b Automatic merge of 'master', 'next' and 'fixes' (2020-07-16 22:34)
elapsed time: 791m
configs tested: 80
configs skipped: 1
The following configs have been built successfully.
More configs may be tested in the coming days.
arm defconfig
arm allyesconfig
arm allmodconfig
arm allnoconfig
arm64 allyesconfig
arm64 defconfig
arm64 allmodconfig
arm64 allnoconfig
i386 allnoconfig
i386 allyesconfig
i386 defconfig
i386 debian-10.3
ia64 allmodconfig
ia64 defconfig
ia64 allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k allnoconfig
m68k sun3_defconfig
m68k defconfig
m68k allyesconfig
nds32 defconfig
nds32 allnoconfig
csky allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
nios2 defconfig
nios2 allyesconfig
openrisc defconfig
c6x allyesconfig
c6x allnoconfig
openrisc allyesconfig
xtensa allyesconfig
h8300 allyesconfig
h8300 allmodconfig
xtensa defconfig
arc defconfig
arc allyesconfig
sh allmodconfig
sh allnoconfig
microblaze allnoconfig
mips allyesconfig
mips allnoconfig
mips allmodconfig
parisc allnoconfig
parisc defconfig
parisc allyesconfig
parisc allmodconfig
powerpc defconfig
powerpc allyesconfig
powerpc rhel-kconfig
powerpc allmodconfig
powerpc allnoconfig
i386 randconfig-a016-20200716
i386 randconfig-a011-20200716
i386 randconfig-a015-20200716
i386 randconfig-a012-20200716
i386 randconfig-a013-20200716
i386 randconfig-a014-20200716
riscv allyesconfig
riscv allnoconfig
riscv defconfig
riscv allmodconfig
s390 allyesconfig
s390 allnoconfig
s390 allmodconfig
s390 defconfig
sparc allyesconfig
sparc defconfig
sparc64 defconfig
sparc64 allnoconfig
sparc64 allyesconfig
sparc64 allmodconfig
x86_64 rhel-7.6-kselftests
x86_64 rhel-8.3
x86_64 kexec
x86_64 rhel
x86_64 lkp
x86_64 fedora-25
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* Re: [PATCH net-next] ibmvnic: Increase driver logging
From: Stephen Hemminger @ 2020-07-17 0:26 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Thomas Falcon, drt, Michal Suchánek, linuxppc-dev,
David Miller
In-Reply-To: <20200716132200.37934905@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
On Thu, 16 Jul 2020 13:22:00 -0700
Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 16 Jul 2020 18:07:37 +0200 Michal Suchánek wrote:
> > On Thu, Jul 16, 2020 at 10:59:58AM -0500, Thomas Falcon wrote:
> > > On 7/15/20 8:29 PM, David Miller wrote:
> > > > From: Jakub Kicinski <kuba@kernel.org>
> > > > Date: Wed, 15 Jul 2020 17:06:32 -0700
> > > >
> > > > > On Wed, 15 Jul 2020 18:51:55 -0500 Thomas Falcon wrote:
> > > > > > free_netdev(netdev);
> > > > > > dev_set_drvdata(&dev->dev, NULL);
> > > > > > + netdev_info(netdev, "VNIC client device has been successfully removed.\n");
> > > > > A step too far, perhaps.
> > > > >
> > > > > In general this patch looks a little questionable IMHO, this amount of
> > > > > logging output is not commonly seen in drivers. All the the info
> > > > > messages are just static text, not even carrying any extra information.
> > > > > In an era of ftrace, and bpftrace, do we really need this?
> > > > Agreed, this is too much. This is debugging, and thus suitable for tracing
> > > > facilities, at best.
> > >
> > > Thanks for your feedback. I see now that I was overly aggressive with this
> > > patch to be sure, but it would help with narrowing down problems at a first
> > > glance, should they arise. The driver in its current state logs very little
> > > of what is it doing without the use of additional debugging or tracing
> > > facilities. Would it be worth it to pursue a less aggressive version or
> > > would that be dead on arrival? What are acceptable driver operations to log
> > > at this level?
>
> Sadly it's much more of an art than hard science. Most networking
> drivers will print identifying information when they probe the device
> and then only about major config changes or when link comes up or goes
> down. And obviously when anything unexpected, like an error happens,
> that's key.
>
> You seem to be adding start / end information for each driver init /
> deinit stage. I'd say try to focus on the actual errors you're trying
> to catch.
>
> > Also would it be advisable to add the messages as pr_dbg to be enabled on demand?
>
> I personally have had a pretty poor experience with pr_debug() because
> CONFIG_DYNAMIC_DEBUG is not always enabled. Since you're just printing
> static text there shouldn't be much difference between pr_debug and
> ftrace and/or bpftrace, honestly.
>
> Again, slightly hard to advise not knowing what you're trying to catch.
Linux drivers in general are far too noisy.
In production it is not uncommon to set kernel to suppress all info messages.
^ permalink raw reply
* Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
From: Nicholas Piggin @ 2020-07-17 0:00 UTC (permalink / raw)
To: Mathieu Desnoyers, paulmck, Alan Stern
Cc: linux-arch, Arnd Bergmann, Peter Zijlstra, x86, linux-kernel,
linux-mm, Andy Lutomirski, linuxppc-dev
In-Reply-To: <595582123.17106.1594925921537.JavaMail.zimbra@efficios.com>
Excerpts from Mathieu Desnoyers's message of July 17, 2020 4:58 am:
> ----- On Jul 16, 2020, at 12:03 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
>
>> ----- On Jul 16, 2020, at 11:46 AM, Mathieu Desnoyers
>> mathieu.desnoyers@efficios.com wrote:
>>
>>> ----- On Jul 16, 2020, at 12:42 AM, Nicholas Piggin npiggin@gmail.com wrote:
>>>> I should be more complete here, especially since I was complaining
>>>> about unclear barrier comment :)
>>>>
>>>>
>>>> CPU0 CPU1
>>>> a. user stuff 1. user stuff
>>>> b. membarrier() 2. enter kernel
>>>> c. smp_mb() 3. smp_mb__after_spinlock(); // in __schedule
>>>> d. read rq->curr 4. rq->curr switched to kthread
>>>> e. is kthread, skip IPI 5. switch_to kthread
>>>> f. return to user 6. rq->curr switched to user thread
>>>> g. user stuff 7. switch_to user thread
>>>> 8. exit kernel
>>>> 9. more user stuff
>>>>
>>>> What you're really ordering is a, g vs 1, 9 right?
>>>>
>>>> In other words, 9 must see a if it sees g, g must see 1 if it saw 9,
>>>> etc.
>>>>
>>>> Userspace does not care where the barriers are exactly or what kernel
>>>> memory accesses might be being ordered by them, so long as there is a
>>>> mb somewhere between a and g, and 1 and 9. Right?
>>>
>>> This is correct.
>>
>> Actually, sorry, the above is not quite right. It's been a while
>> since I looked into the details of membarrier.
>>
>> The smp_mb() at the beginning of membarrier() needs to be paired with a
>> smp_mb() _after_ rq->curr is switched back to the user thread, so the
>> memory barrier is between store to rq->curr and following user-space
>> accesses.
>>
>> The smp_mb() at the end of membarrier() needs to be paired with the
>> smp_mb__after_spinlock() at the beginning of schedule, which is
>> between accesses to userspace memory and switching rq->curr to kthread.
>>
>> As to *why* this ordering is needed, I'd have to dig through additional
>> scenarios from https://lwn.net/Articles/573436/. Or maybe Paul remembers ?
>
> Thinking further about this, I'm beginning to consider that maybe we have been
> overly cautious by requiring memory barriers before and after store to rq->curr.
>
> If CPU0 observes a CPU1's rq->curr->mm which differs from its own process (current)
> while running the membarrier system call, it necessarily means that CPU1 had
> to issue smp_mb__after_spinlock when entering the scheduler, between any user-space
> loads/stores and update of rq->curr.
>
> Requiring a memory barrier between update of rq->curr (back to current process's
> thread) and following user-space memory accesses does not seem to guarantee
> anything more than what the initial barrier at the beginning of __schedule already
> provides, because the guarantees are only about accesses to user-space memory.
>
> Therefore, with the memory barrier at the beginning of __schedule, just observing that
> CPU1's rq->curr differs from current should guarantee that a memory barrier was issued
> between any sequentially consistent instructions belonging to the current process on
> CPU1.
>
> Or am I missing/misremembering an important point here ?
I might have mislead you.
CPU0 CPU1
r1=y x=1
membarrier() y=1
r2=x
membarrier provides if r1==1 then r2==1 (right?)
CPU0
r1=y
membarrier()
smp_mb();
t = cpu_rq(1)->curr;
if (t->mm == mm)
IPI(CPU1);
smp_mb()
r2=x
vs
CPU1
...
__schedule()
smp_mb__after_spinlock()
rq->curr = kthread
...
__schedule()
smp_mb__after_spinlock()
rq->curr = user thread
exit kernel
x=1
y=1
Now these last 3 stores are not ordered, so CPU0 might see y==1 but
rq->curr == kthread, right? Then it will skip the IPI and stores to x
and y will not be ordered.
So we do need a mb after rq->curr store when mm is switching.
I believe for the global membarrier PF_KTHREAD optimisation, we also
need a barrier when switching from a kernel thread to user, for the
same reason.
So I think I was wrong to say the barrier is not necessary.
I haven't quite worked out why two mb()s are required in membarrier(),
but at least that's less of a performance concern.
Thanks,
Nick
^ permalink raw reply
* Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Tulio Magno Quites Machado Filho @ 2020-07-16 23:18 UTC (permalink / raw)
To: Christophe Leroy, Michael Ellerman
Cc: linux-arch, nathanl, arnd, linux-kernel, Paul Mackerras,
Christophe Leroy, luto, tglx, vincenzo.frascino, linuxppc-dev
In-Reply-To: <20200715204725.Horde.5GZvsEv4ZkdzFHL76HZiFg8@messagerie.si.c-s.fr>
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Michael Ellerman <mpe@ellerman.id.au> a écrit :
>
>> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>>> Prepare for switching VDSO to generic C implementation in following
>>> patch. Here, we:
>>> - Modify __get_datapage() to take an offset
>>> - Prepare the helpers to call the C VDSO functions
>>> - Prepare the required callbacks for the C VDSO functions
>>> - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
>>> - Add the C trampolines to the generic C VDSO functions
>>>
>>> powerpc is a bit special for VDSO as well as system calls in the
>>> way that it requires setting CR SO bit which cannot be done in C.
>>> Therefore, entry/exit needs to be performed in ASM.
>>>
>>> Implementing __arch_get_vdso_data() would clobber the link register,
>>> requiring the caller to save it. As the ASM calling function already
>>> has to set a stack frame and saves the link register before calling
>>> the C vdso function, retriving the vdso data pointer there is lighter.
>> ...
>>
>>> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h
>>> b/arch/powerpc/include/asm/vdso/gettimeofday.h
>>> new file mode 100644
>>> index 000000000000..4452897f9bd8
>>> --- /dev/null
>>> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
>>> @@ -0,0 +1,175 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +#ifndef __ASM_VDSO_GETTIMEOFDAY_H
>>> +#define __ASM_VDSO_GETTIMEOFDAY_H
>>> +
>>> +#include <asm/ptrace.h>
>>> +
>>> +#ifdef __ASSEMBLY__
>>> +
>>> +.macro cvdso_call funct
>>> + .cfi_startproc
>>> + PPC_STLU r1, -STACK_FRAME_OVERHEAD(r1)
>>> + mflr r0
>>> + .cfi_register lr, r0
>>> + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
>>
>> This doesn't work for me on ppc64(le) with glibc.
>>
>> glibc doesn't create a stack frame before making the VDSO call, so the
>> store of r0 (LR) goes into the caller's frame, corrupting the saved LR,
>> leading to an infinite loop.
>
> Where should it be saved if it can't be saved in the standard location ?
As Michael pointed out, userspace doesn't treat the VDSO as a normal function
call. In order to keep compatibility with existent software, LR would need to
be saved on another stack frame.
--
Tulio Magno
^ permalink raw reply
* Re: [PATCH 1/1] ASoC: fsl: fsl-asoc-card: Trivial: Fix misspelling of 'exists'
From: Mark Brown @ 2020-07-16 23:57 UTC (permalink / raw)
To: Lee Jones, lgirdwood
Cc: alsa-devel, Timur Tabi, Xiubo Li, Fabio Estevam, linux-kernel,
Nicolin Chen, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200715094447.3170843-1-lee.jones@linaro.org>
On Wed, 15 Jul 2020 10:44:47 +0100, Lee Jones wrote:
>
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next
Thanks!
[1/1] ASoC: fsl: fsl-asoc-card: Trivial: Fix misspelling of 'exists'
commit: 1b58214113481616b74ee4d196e5b1cb683758ee
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
^ permalink raw reply
* Re: [PATCH v2 0/3] ASoC: fsl-asoc-card: Support hp and mic detection
From: Mark Brown @ 2020-07-16 23:57 UTC (permalink / raw)
To: alsa-devel, Xiubo.Lee, robh+dt, Shengjiu Wang, festevam, timur,
nicoleotsuka, devicetree, tiwai, perex, lgirdwood
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1594822179-1849-1-git-send-email-shengjiu.wang@nxp.com>
On Wed, 15 Jul 2020 22:09:36 +0800, Shengjiu Wang wrote:
> Support hp and mic detection.
> Add a parameter for asoc_simple_init_jack.
>
> Shengjiu Wang (3):
> ASoC: simple-card-utils: Support configure pin_name for
> asoc_simple_init_jack
> ASoC: bindings: fsl-asoc-card: Support hp-det-gpio and mic-det-gpio
> ASoC: fsl-asoc-card: Support Headphone and Microphone Jack detection
>
> [...]
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next
Thanks!
[1/3] ASoC: simple-card-utils: Support configure pin_name for asoc_simple_init_jack
commit: 764aafdb985b182bce0c91503e9233cb97a2f0d7
[2/3] ASoC: bindings: fsl-asoc-card: Support hp-det-gpio and mic-det-gpio
commit: c3c058aba4032a0f88a2f203472d7b5076a926b4
[3/3] ASoC: fsl-asoc-card: Support Headphone and Microphone Jack detection
commit: 3b171194493c5f7b2aa9b76deb402a8e98ab510f
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
^ permalink raw reply
* Re: [PATCH v2 1/1] ASoC: fsl: fsl-asoc-card: Trivial: Fix misspelling of 'exists'
From: Mark Brown @ 2020-07-16 23:56 UTC (permalink / raw)
To: Lee Jones, lgirdwood
Cc: alsa-devel, Timur Tabi, Xiubo Li, Fabio Estevam, linux-kernel,
Nicolin Chen, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200715150009.407442-1-lee.jones@linaro.org>
On Wed, 15 Jul 2020 16:00:09 +0100, Lee Jones wrote:
>
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next
Thanks!
[1/1] ASoC: fsl: fsl-asoc-card: Trivial: Fix misspelling of 'exists'
commit: 1b58214113481616b74ee4d196e5b1cb683758ee
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
^ permalink raw reply
* Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
From: Nicholas Piggin @ 2020-07-16 23:26 UTC (permalink / raw)
To: peterz
Cc: linux-arch, Arnd Bergmann, x86, linux-kernel, Andy Lutomirski,
linux-mm, Mathieu Desnoyers, Andy Lutomirski, linuxppc-dev
In-Reply-To: <20200716110038.GA119549@hirez.programming.kicks-ass.net>
Excerpts from peterz@infradead.org's message of July 16, 2020 9:00 pm:
> On Thu, Jul 16, 2020 at 08:03:36PM +1000, Nicholas Piggin wrote:
>> Excerpts from Peter Zijlstra's message of July 16, 2020 6:50 pm:
>> > On Wed, Jul 15, 2020 at 10:18:20PM -0700, Andy Lutomirski wrote:
>> >> > On Jul 15, 2020, at 9:15 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>
>> >> But I’m wondering if all this deferred sync stuff is wrong. In the
>> >> brave new world of io_uring and such, perhaps kernel access matter
>> >> too. Heck, even:
>> >
>> > IIRC the membarrier SYNC_CORE use-case is about user-space
>> > self-modifying code.
>> >
>> > Userspace re-uses a text address and needs to SYNC_CORE before it can be
>> > sure the old text is forgotten. Nothing the kernel does matters there.
>> >
>> > I suppose the manpage could be more clear there.
>>
>> True, but memory ordering of kernel stores from kernel threads for
>> regular mem barrier is the concern here.
>>
>> Does io_uring update completion queue from kernel thread or interrupt,
>> for example? If it does, then membarrier will not order such stores
>> with user memory accesses.
>
> So we're talking about regular membarrier() then? Not the SYNC_CORE
> variant per-se.
Well, both but Andy in this case was wondering about kernel writes
vs user.
>
> Even there, I'll argue we don't care, but perhaps Mathieu has a
> different opinion. All we care about is that all other threads (or CPUs
> for GLOBAL) observe an smp_mb() before it returns.
>
> Any serialization against whatever those other threads/CPUs are running
> at the instant of the syscall is external to the syscall, we make no
> gauarantees about that. That is, we can fundamentally not say what
> another CPU is executing concurrently. Nor should we want to.
>
> So if you feel that your membarrier() ought to serialize against remote
> execution, you need to arrange a quiecent state on the remote side
> yourself.
>
> Now, normally membarrier() is used to implement userspace RCU like
> things, and there all that matters is that the remote CPUs observe the
> beginngin of the new grace-period, ie counter flip, and we observe their
> read-side critical sections, or smething like that, it's been a while
> since I looked at all that.
>
> It's always been the case that concurrent syscalls could change user
> memory, io_uring doesn't change that, it just makes it even less well
> defined when that would happen. If you want to serialize against that,
> you need to arrange that externally.
membarrier does replace barrier instructions on remote CPUs, which do
order accesses performed by the kernel on the user address space. So
membarrier should too I guess.
Normal process context accesses like read(2) will do so because they
don't get filtered out from IPIs, but kernel threads using the mm may
not.
Thanks,
Nick
^ permalink raw reply
* Question about NUMA distance calculation in powerpc/mm/numa.c
From: Daniel Henrique Barboza @ 2020-07-16 23:13 UTC (permalink / raw)
To: linuxppc-dev
Hello,
I didn't find an explanation about the 'double the distance' logic in
'git log' or anywhere in the kernel docs:
(arch/powerpc/mm/numa.c, __node_distance()):
for (i = 0; i < distance_ref_points_depth; i++) {
if (distance_lookup_table[a][i] == distance_lookup_table[b][i])
break;
/* Double the distance for each NUMA level */
distance *= 2;
}
For reference, the commit that added it:
commit 41eab6f88f24124df89e38067b3766b7bef06ddb
Author: Anton Blanchard <anton@samba.org>
Date: Sun May 16 20:22:31 2010 +0000
powerpc/numa: Use form 1 affinity to setup node distance
Is there a technical reason for the distance being calculated as the double
for each NUMA level?
The reason I'm asking is because of the QEMU/Libvirt capability to define NUMA
node distances in the VMs. For x86, an user is capable of setting any distance
values to the NUMA topology due to how ACPI SLIT works.
The user, of course, wants the pseries guest to behave the same way. The best
we can do for now is document why this will not happen. I'll document the
limitations imposed by the design itself (how ibm,associativity-reference-points
is capped to MAX_DISTANCE_REF_POINTS and so on). I also would like to document
that the pseries kernel will double the distance for each NUMA level, and for
that it would be nice to provide an actual reason for that to happen, if
there is any.
Thanks,
Daniel
^ permalink raw reply
* Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE
From: Stephen Rothwell @ 2020-07-16 23:09 UTC (permalink / raw)
To: Qian Cai
Cc: aneesh.kumar, linux-kernel, npiggin, Bharata B Rao, linux-next,
linuxppc-dev
In-Reply-To: <20200716172713.GA4565@lca.pw>
[-- Attachment #1: Type: text/plain, Size: 308 bytes --]
Hi all,
On Thu, 16 Jul 2020 13:27:14 -0400 Qian Cai <cai@lca.pw> wrote:
>
> Reverting the whole series fixed random memory corruptions during boot on
> POWER9 PowerNV systems below.
I will revert those commits from linux-next today as well (they revert
cleanly).
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH] powerpc/64: Fix an out of date comment about MMIO ordering
From: Benjamin Herrenschmidt @ 2020-07-16 22:38 UTC (permalink / raw)
To: Palmer Dabbelt, Will Deacon
Cc: kernel-team, bigeasy, Palmer Dabbelt, linux-kernel, npiggin,
paulus, jniethe5, tglx, msuchanek, linuxppc-dev
In-Reply-To: <20200716193820.1141936-1-palmer@dabbelt.com>
On Thu, 2020-07-16 at 12:38 -0700, Palmer Dabbelt wrote:
> From: Palmer Dabbelt <palmerdabbelt@google.com>
>
> This primitive has been renamed, but because it was spelled incorrectly in the
> first place it must have escaped the fixup patch. As far as I can tell this
> logic is still correct: smp_mb__after_spinlock() uses the default smp_mb()
> implementation, which is "sync" rather than "hwsync" but those are the same
> (though I'm not that familiar with PowerPC).
Typo ? That must be me ... :)
Looks fine. Yes, sync and hwsync are the same (by opposition to lwsync
which is lighter weight and doesn't order cache inhibited).
Cheers,
Ben.
> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
> ---
> arch/powerpc/kernel/entry_64.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index b3c9f15089b6..7b38b4daca93 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -357,7 +357,7 @@ _GLOBAL(_switch)
> * kernel/sched/core.c).
> *
> * Uncacheable stores in the case of involuntary preemption must
> - * be taken care of. The smp_mb__before_spin_lock() in __schedule()
> + * be taken care of. The smp_mb__after_spinlock() in __schedule()
> * is implemented as hwsync on powerpc, which orders MMIO too. So
> * long as there is an hwsync in the context switch path, it will
> * be executed on the source CPU after the task has performed
^ permalink raw reply
* Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper
From: Christophe Leroy @ 2020-07-16 16:49 UTC (permalink / raw)
To: Jarkko Sakkinen
Cc: Catalin Marinas, Kefeng Wang, Paul Mackerras, Zong Li, Andi Kleen,
Paul Burton, Vincent Whitchurch, Petr Mladek, Brian Gerst,
Andy Lutomirski, Thomas Gleixner, Jiri Kosina, Anup Patel,
linux-kernel, Philipp Rudo, Torsten Duwe, Masami Hiramatsu,
Andrew Morton, Mark Rutland, James E.J. Bottomley, Vincent Chen,
Omar Sandoval, open list:S390, Joe Lawrence, Helge Deller,
John Fastabend, Anil S Keshavamurthy, Yonghong Song, Iurii Zaikin,
Andrii Nakryiko, Thomas Huth, Vasily Gorbik,
moderated list:ARM PORT, Daniel Axtens, Damien Le Moal,
Martin KaFai Lau, Song Liu, Josh Poimboeuf, Heiko Carstens,
Alexei Starovoitov, Atish Patra, Will Deacon, Daniel Borkmann,
Masahiro Yamada, Nayna Jain, Ley Foon Tan, Christian Borntraeger,
Sami Tolvanen, Naveen N. Rao, Mao Han, Marco Elver,
Steven Rostedt, Babu Moger, Borislav Petkov, Greentime Hu,
Ben Dooks, Guan Xuetao, Thomas Bogendoerfer,
open list:PARISC ARCHITECTURE, Jessica Yu,
open list:BPF JIT for MIPS 32-BIT AND 64-BIT, David S. Miller,
Thiago Jung Bauermann, Peter Zijlstra, David Howells,
Amit Daniel Kachhap, Sandipan Das, H. Peter Anvin,
open list:SPARC + UltraSPARC sparc/sparc64, Tiezhu Yang,
Miroslav Benes, Sven Schnelle, Ard Biesheuvel, Vincenzo Frascino,
Anders Roxell, Jiri Olsa,
maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT, Russell King,
open list:RISC-V ARCHITECTURE, Mike Rapoport, Ingo Molnar,
Albert Ou, Paul E. McKenney, Paul Walmsley, KP Singh,
Dmitry Vyukov, Nick Hu,
open list:BPF JIT for MIPS 32-BIT AND 64-BIT, open list:MIPS,
Palmer Dabbelt, open list:LINUX FOR POWERPC 32-BIT AND 64-BIT
In-Reply-To: <20200714094625.1443261-2-jarkko.sakkinen@linux.intel.com>
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> a écrit :
> Rename module_alloc() to text_alloc() and module_memfree() to
> text_memfree(), and move them to kernel/text.c, which is unconditionally
> compiled to the kernel proper. This allows kprobes, ftrace and bpf to
> allocate space for executable code without requiring to compile the modules
> support (CONFIG_MODULES=y) in.
You are not changing enough in powerpc to have this work.
On powerpc 32 bits (6xx), when STRICT_KERNEL_RWX is selected, the
vmalloc space is set to NX (no exec) at segment level (ie by 256Mbytes
zone) unless CONFIG_MODULES is selected.
Christophe
>
> Cc: Andi Kleen <ak@linux.intel.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> ---
> arch/arm/kernel/Makefile | 3 +-
> arch/arm/kernel/module.c | 21 -----------
> arch/arm/kernel/text.c | 33 ++++++++++++++++++
> arch/arm64/kernel/Makefile | 2 +-
> arch/arm64/kernel/module.c | 42 ----------------------
> arch/arm64/kernel/text.c | 54 ++++++++++++++++++++++++++++
> arch/mips/kernel/Makefile | 2 +-
> arch/mips/kernel/module.c | 9 -----
> arch/mips/kernel/text.c | 19 ++++++++++
> arch/mips/net/bpf_jit.c | 4 +--
> arch/nds32/kernel/Makefile | 2 +-
> arch/nds32/kernel/module.c | 7 ----
> arch/nds32/kernel/text.c | 12 +++++++
> arch/nios2/kernel/Makefile | 1 +
> arch/nios2/kernel/module.c | 19 ----------
> arch/nios2/kernel/text.c | 34 ++++++++++++++++++
> arch/parisc/kernel/Makefile | 2 +-
> arch/parisc/kernel/module.c | 11 ------
> arch/parisc/kernel/text.c | 22 ++++++++++++
> arch/powerpc/net/bpf_jit_comp.c | 4 +--
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/module.c | 12 -------
> arch/riscv/kernel/text.c | 20 +++++++++++
> arch/s390/kernel/Makefile | 2 +-
> arch/s390/kernel/ftrace.c | 2 +-
> arch/s390/kernel/module.c | 16 ---------
> arch/s390/kernel/text.c | 23 ++++++++++++
> arch/sparc/kernel/Makefile | 1 +
> arch/sparc/kernel/module.c | 30 ----------------
> arch/sparc/kernel/text.c | 39 +++++++++++++++++++++
> arch/sparc/net/bpf_jit_comp_32.c | 6 ++--
> arch/unicore32/kernel/Makefile | 1 +
> arch/unicore32/kernel/module.c | 7 ----
> arch/unicore32/kernel/text.c | 18 ++++++++++
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/ftrace.c | 4 +--
> arch/x86/kernel/kprobes/core.c | 4 +--
> arch/x86/kernel/module.c | 49 --------------------------
> arch/x86/kernel/text.c | 60 ++++++++++++++++++++++++++++++++
> include/linux/moduleloader.h | 4 +--
> kernel/Makefile | 2 +-
> kernel/bpf/core.c | 4 +--
> kernel/kprobes.c | 4 +--
> kernel/module.c | 37 ++++++--------------
> kernel/text.c | 25 +++++++++++++
> 45 files changed, 400 insertions(+), 275 deletions(-)
> create mode 100644 arch/arm/kernel/text.c
> create mode 100644 arch/arm64/kernel/text.c
> create mode 100644 arch/mips/kernel/text.c
> create mode 100644 arch/nds32/kernel/text.c
> create mode 100644 arch/nios2/kernel/text.c
> create mode 100644 arch/parisc/kernel/text.c
> create mode 100644 arch/riscv/kernel/text.c
> create mode 100644 arch/s390/kernel/text.c
> create mode 100644 arch/sparc/kernel/text.c
> create mode 100644 arch/unicore32/kernel/text.c
> create mode 100644 arch/x86/kernel/text.c
> create mode 100644 kernel/text.c
>
> diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
> index 89e5d864e923..69bfacfd60ef 100644
> --- a/arch/arm/kernel/Makefile
> +++ b/arch/arm/kernel/Makefile
> @@ -19,7 +19,8 @@ CFLAGS_REMOVE_return_address.o = -pg
> obj-y := elf.o entry-common.o irq.o opcodes.o \
> process.o ptrace.o reboot.o \
> setup.o signal.o sigreturn_codes.o \
> - stacktrace.o sys_arm.o time.o traps.o
> + stacktrace.o sys_arm.o time.o traps.o \
> + text.o
>
> ifneq ($(CONFIG_ARM_UNWIND),y)
> obj-$(CONFIG_FRAME_POINTER) += return_address.o
> diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
> index e15444b25ca0..13e3442a6b9f 100644
> --- a/arch/arm/kernel/module.c
> +++ b/arch/arm/kernel/module.c
> @@ -33,27 +33,6 @@
> #define MODULES_VADDR (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
> #endif
>
> -#ifdef CONFIG_MMU
> -void *module_alloc(unsigned long size)
> -{
> - gfp_t gfp_mask = GFP_KERNEL;
> - void *p;
> -
> - /* Silence the initial allocation */
> - if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
> - gfp_mask |= __GFP_NOWARN;
> -
> - p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> - if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
> - return p;
> - return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> - GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> -}
> -#endif
> -
> bool module_init_section(const char *name)
> {
> return strstarts(name, ".init") ||
> diff --git a/arch/arm/kernel/text.c b/arch/arm/kernel/text.c
> new file mode 100644
> index 000000000000..600143fb909d
> --- /dev/null
> +++ b/arch/arm/kernel/text.c
> @@ -0,0 +1,33 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * linux/arch/arm/kernel/module.c
> + *
> + * Copyright (C) 2002 Russell King.
> + * Modified for nommu by Hyok S. Choi
> + *
> + * Module allocation method suggested by Andi Kleen.
> + */
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +#ifdef CONFIG_MMU
> +void *text_alloc(unsigned long size)
> +{
> + gfp_t gfp_mask = GFP_KERNEL;
> + void *p;
> +
> + /* Silence the initial allocation */
> + if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
> + gfp_mask |= __GFP_NOWARN;
> +
> + p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> + gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> + if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
> + return p;
> + return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> + GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> +}
> +#endif
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index a561cbb91d4d..7765a45d71b5 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -19,7 +19,7 @@ obj-y := debug-monitors.o entry.o irq.o fpsimd.o \
> return_address.o cpuinfo.o cpu_errata.o \
> cpufeature.o alternative.o cacheinfo.o \
> smp.o smp_spin_table.o topology.o smccc-call.o \
> - syscall.o
> + syscall.o text.o
>
> targets += efi-entry.o
>
> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> index 1cd1a4d0ed30..adde022f703c 100644
> --- a/arch/arm64/kernel/module.c
> +++ b/arch/arm64/kernel/module.c
> @@ -20,48 +20,6 @@
> #include <asm/insn.h>
> #include <asm/sections.h>
>
> -void *module_alloc(unsigned long size)
> -{
> - u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
> - gfp_t gfp_mask = GFP_KERNEL;
> - void *p;
> -
> - /* Silence the initial allocation */
> - if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> - gfp_mask |= __GFP_NOWARN;
> -
> - if (IS_ENABLED(CONFIG_KASAN))
> - /* don't exceed the static module region - see below */
> - module_alloc_end = MODULES_END;
> -
> - p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> - module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
> - NUMA_NO_NODE, __builtin_return_address(0));
> -
> - if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> - !IS_ENABLED(CONFIG_KASAN))
> - /*
> - * KASAN can only deal with module allocations being served
> - * from the reserved module region, since the remainder of
> - * the vmalloc region is already backed by zero shadow pages,
> - * and punching holes into it is non-trivial. Since the module
> - * region is not randomized when KASAN is enabled, it is even
> - * less likely that the module region gets exhausted, so we
> - * can simply omit this fallback in that case.
> - */
> - p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> - module_alloc_base + SZ_2G, GFP_KERNEL,
> - PAGE_KERNEL, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> -
> - if (p && (kasan_module_alloc(p, size) < 0)) {
> - vfree(p);
> - return NULL;
> - }
> -
> - return p;
> -}
> -
> enum aarch64_reloc_op {
> RELOC_OP_NONE,
> RELOC_OP_ABS,
> diff --git a/arch/arm64/kernel/text.c b/arch/arm64/kernel/text.c
> new file mode 100644
> index 000000000000..64fc7e2d85df
> --- /dev/null
> +++ b/arch/arm64/kernel/text.c
> @@ -0,0 +1,54 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * AArch64 loadable module support.
> + *
> + * Copyright (C) 2012 ARM Limited
> + *
> + * Author: Will Deacon <will.deacon@arm.com>
> + */
> +#include <linux/kasan.h>
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +void *text_alloc(unsigned long size)
> +{
> + u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
> + gfp_t gfp_mask = GFP_KERNEL;
> + void *p;
> +
> + /* Silence the initial allocation */
> + if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> + gfp_mask |= __GFP_NOWARN;
> +
> + if (IS_ENABLED(CONFIG_KASAN))
> + /* don't exceed the static module region - see below */
> + module_alloc_end = MODULES_END;
> +
> + p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> + module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
> + NUMA_NO_NODE, __builtin_return_address(0));
> +
> + if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> + !IS_ENABLED(CONFIG_KASAN))
> + /*
> + * KASAN can only deal with module allocations being served
> + * from the reserved module region, since the remainder of
> + * the vmalloc region is already backed by zero shadow pages,
> + * and punching holes into it is non-trivial. Since the module
> + * region is not randomized when KASAN is enabled, it is even
> + * less likely that the module region gets exhausted, so we
> + * can simply omit this fallback in that case.
> + */
> + p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> + module_alloc_base + SZ_2G, GFP_KERNEL,
> + PAGE_KERNEL, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> +
> + if (p && (kasan_module_alloc(p, size) < 0)) {
> + vfree(p);
> + return NULL;
> + }
> +
> + return p;
> +}
> diff --git a/arch/mips/kernel/Makefile b/arch/mips/kernel/Makefile
> index 8c7a043295ed..37ebf9a7f259 100644
> --- a/arch/mips/kernel/Makefile
> +++ b/arch/mips/kernel/Makefile
> @@ -8,7 +8,7 @@ extra-y := head.o vmlinux.lds
> obj-y += cmpxchg.o cpu-probe.o branch.o elf.o entry.o genex.o
> idle.o irq.o \
> process.o prom.o ptrace.o reset.o setup.o signal.o \
> syscall.o time.o topology.o traps.o unaligned.o watch.o \
> - vdso.o cacheinfo.o
> + vdso.o cacheinfo.o text.o
>
> ifdef CONFIG_FUNCTION_TRACER
> CFLAGS_REMOVE_ftrace.o = -pg
> diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
> index 3c0c3d1260c1..f5ac4ebc4bad 100644
> --- a/arch/mips/kernel/module.c
> +++ b/arch/mips/kernel/module.c
> @@ -31,15 +31,6 @@ struct mips_hi16 {
> static LIST_HEAD(dbe_list);
> static DEFINE_SPINLOCK(dbe_lock);
>
> -#ifdef MODULE_START
> -void *module_alloc(unsigned long size)
> -{
> - return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
> - GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> -}
> -#endif
> -
> static int apply_r_mips_none(struct module *me, u32 *location,
> u32 base, Elf_Addr v, bool rela)
> {
> diff --git a/arch/mips/kernel/text.c b/arch/mips/kernel/text.c
> new file mode 100644
> index 000000000000..55ca87a421c3
> --- /dev/null
> +++ b/arch/mips/kernel/text.c
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + *
> + * Copyright (C) 2001 Rusty Russell.
> + * Copyright (C) 2003, 2004 Ralf Baechle (ralf@linux-mips.org)
> + * Copyright (C) 2005 Thiemo Seufer
> + */
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +#ifdef MODULE_START
> +void *text_alloc(unsigned long size)
> +{
> + return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
> + GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> +}
> +#endif
> diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> index 0af88622c619..2b03f7128809 100644
> --- a/arch/mips/net/bpf_jit.c
> +++ b/arch/mips/net/bpf_jit.c
> @@ -1233,7 +1233,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> build_epilogue(&ctx);
>
> alloc_size = 4 * ctx.idx;
> - ctx.target = module_alloc(alloc_size);
> + ctx.target = text_alloc(alloc_size);
> if (ctx.target == NULL)
> goto out;
>
> @@ -1264,7 +1264,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> void bpf_jit_free(struct bpf_prog *fp)
> {
> if (fp->jited)
> - module_memfree(fp->bpf_func);
> + text_memfree(fp->bpf_func);
>
> bpf_prog_unlock_free(fp);
> }
> diff --git a/arch/nds32/kernel/Makefile b/arch/nds32/kernel/Makefile
> index 394df3f6442c..fc15778c59d1 100644
> --- a/arch/nds32/kernel/Makefile
> +++ b/arch/nds32/kernel/Makefile
> @@ -10,7 +10,7 @@ AFLAGS_head.o := -DTEXTADDR=$(TEXTADDR)
> obj-y := ex-entry.o ex-exit.o ex-scall.o irq.o \
> process.o ptrace.o setup.o signal.o \
> sys_nds32.o time.o traps.o cacheinfo.o \
> - dma.o syscall_table.o vdso.o
> + dma.o syscall_table.o vdso.o text.o
>
> obj-$(CONFIG_MODULES) += nds32_ksyms.o module.o
> obj-$(CONFIG_STACKTRACE) += stacktrace.o
> diff --git a/arch/nds32/kernel/module.c b/arch/nds32/kernel/module.c
> index 3897fd14a21d..3d23a12ed535 100644
> --- a/arch/nds32/kernel/module.c
> +++ b/arch/nds32/kernel/module.c
> @@ -7,13 +7,6 @@
> #include <linux/moduleloader.h>
> #include <linux/pgtable.h>
>
> -void *module_alloc(unsigned long size)
> -{
> - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> -}
> -
> void module_free(struct module *module, void *region)
> {
> vfree(region);
> diff --git a/arch/nds32/kernel/text.c b/arch/nds32/kernel/text.c
> new file mode 100644
> index 000000000000..6e86eff9aaf0
> --- /dev/null
> +++ b/arch/nds32/kernel/text.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2005-2017 Andes Technology Corporation
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +void *text_alloc(unsigned long size)
> +{
> + return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> + GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> +}
> diff --git a/arch/nios2/kernel/Makefile b/arch/nios2/kernel/Makefile
> index 0b645e1e3158..5476fc749f37 100644
> --- a/arch/nios2/kernel/Makefile
> +++ b/arch/nios2/kernel/Makefile
> @@ -18,6 +18,7 @@ obj-y += setup.o
> obj-y += signal.o
> obj-y += sys_nios2.o
> obj-y += syscall_table.o
> +obj-y += text.o
> obj-y += time.o
> obj-y += traps.o
>
> diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
> index 76e0a42d6e36..20a0faf64e38 100644
> --- a/arch/nios2/kernel/module.c
> +++ b/arch/nios2/kernel/module.c
> @@ -21,25 +21,6 @@
>
> #include <asm/cacheflush.h>
>
> -/*
> - * Modules should NOT be allocated with kmalloc for (obvious) reasons.
> - * But we do it for now to avoid relocation issues. CALL26/PCREL26
> cannot reach
> - * from 0x80000000 (vmalloc area) to 0xc00000000 (kernel) (kmalloc returns
> - * addresses in 0xc0000000)
> - */
> -void *module_alloc(unsigned long size)
> -{
> - if (size == 0)
> - return NULL;
> - return kmalloc(size, GFP_KERNEL);
> -}
> -
> -/* Free memory returned from module_alloc */
> -void module_memfree(void *module_region)
> -{
> - kfree(module_region);
> -}
> -
> int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
> unsigned int symindex, unsigned int relsec,
> struct module *mod)
> diff --git a/arch/nios2/kernel/text.c b/arch/nios2/kernel/text.c
> new file mode 100644
> index 000000000000..af424174442f
> --- /dev/null
> +++ b/arch/nios2/kernel/text.c
> @@ -0,0 +1,34 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Kernel module support for Nios II.
> + *
> + * Copyright (C) 2004 Microtronix Datacom Ltd.
> + * Written by Wentao Xu <xuwentao@microtronix.com>
> + * Copyright (C) 2001, 2003 Rusty Russell
> + *
> + * This file is subject to the terms and conditions of the GNU General
> + * Public License. See the file COPYING in the main directory of this
> + * archive for more details.
> + */
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +/*
> + * Modules should NOT be allocated with kmalloc for (obvious) reasons.
> + * But we do it for now to avoid relocation issues. CALL26/PCREL26
> cannot reach
> + * from 0x80000000 (vmalloc area) to 0xc00000000 (kernel) (kmalloc returns
> + * addresses in 0xc0000000)
> + */
> +void *text_alloc(unsigned long size)
> +{
> + if (size == 0)
> + return NULL;
> + return kmalloc(size, GFP_KERNEL);
> +}
> +
> +/* Free memory returned from module_alloc */
> +void text_memfree(void *module_region)
> +{
> + kfree(module_region);
> +}
> diff --git a/arch/parisc/kernel/Makefile b/arch/parisc/kernel/Makefile
> index 068d90950d93..f71f7ffdae2e 100644
> --- a/arch/parisc/kernel/Makefile
> +++ b/arch/parisc/kernel/Makefile
> @@ -10,7 +10,7 @@ obj-y := cache.o pacache.o setup.o pdt.o
> traps.o time.o irq.o \
> ptrace.o hardware.o inventory.o drivers.o alternative.o \
> signal.o hpmc.o real2.o parisc_ksyms.o unaligned.o \
> process.o processor.o pdc_cons.o pdc_chassis.o unwind.o \
> - patch.o
> + patch.o text.o
>
> ifdef CONFIG_FUNCTION_TRACER
> # Do not profile debug and lowlevel utilities
> diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
> index 7df140545b22..c81e63e2549b 100644
> --- a/arch/parisc/kernel/module.c
> +++ b/arch/parisc/kernel/module.c
> @@ -192,17 +192,6 @@ static inline int reassemble_22(int as22)
> ((as22 & 0x0003ff) << 3));
> }
>
> -void *module_alloc(unsigned long size)
> -{
> - /* using RWX means less protection for modules, but it's
> - * easier than trying to map the text, data, init_text and
> - * init_data correctly */
> - return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> - GFP_KERNEL,
> - PAGE_KERNEL_RWX, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> -}
> -
> #ifndef CONFIG_64BIT
> static inline unsigned long count_gots(const Elf_Rela *rela,
> unsigned long n)
> {
> diff --git a/arch/parisc/kernel/text.c b/arch/parisc/kernel/text.c
> new file mode 100644
> index 000000000000..9ff503084191
> --- /dev/null
> +++ b/arch/parisc/kernel/text.c
> @@ -0,0 +1,22 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Linux/PA-RISC Project
> + * Copyright (C) 2003 Randolph Chung <tausq at debian . org>
> + * Copyright (C) 2008 Helge Deller <deller@gmx.de>
> + */
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +void *text_alloc(unsigned long size)
> +{
> + /*
> + * Using RWX means less protection for modules, but it's
> + * easier than trying to map the text, data, init_text and
> + * init_data correctly.
> + */
> + return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> + GFP_KERNEL,
> + PAGE_KERNEL_RWX, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> +}
> diff --git a/arch/powerpc/net/bpf_jit_comp.c
> b/arch/powerpc/net/bpf_jit_comp.c
> index 0acc9d5fb19e..ba1cef7a812d 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -634,7 +634,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
>
> proglen = cgctx.idx * 4;
> alloclen = proglen + FUNCTION_DESCR_SIZE;
> - image = module_alloc(alloclen);
> + image = text_alloc(alloclen);
> if (!image)
> goto out;
>
> @@ -678,7 +678,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> void bpf_jit_free(struct bpf_prog *fp)
> {
> if (fp->jited)
> - module_memfree(fp->bpf_func);
> + text_memfree(fp->bpf_func);
>
> bpf_prog_unlock_free(fp);
> }
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index b355cf485671..d0b30f286ce6 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -29,6 +29,7 @@ obj-y += riscv_ksyms.o
> obj-y += stacktrace.o
> obj-y += cacheinfo.o
> obj-y += patch.o
> +obj-y += text.o
> obj-$(CONFIG_MMU) += vdso.o vdso/
>
> obj-$(CONFIG_RISCV_M_MODE) += clint.o traps_misaligned.o
> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> index 7191342c54da..f6aa66431c9e 100644
> --- a/arch/riscv/kernel/module.c
> +++ b/arch/riscv/kernel/module.c
> @@ -390,15 +390,3 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const
> char *strtab,
>
> return 0;
> }
> -
> -#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
> -#define VMALLOC_MODULE_START \
> - max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
> -void *module_alloc(unsigned long size)
> -{
> - return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
> - VMALLOC_END, GFP_KERNEL,
> - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> -}
> -#endif
> diff --git a/arch/riscv/kernel/text.c b/arch/riscv/kernel/text.c
> new file mode 100644
> index 000000000000..201608a25641
> --- /dev/null
> +++ b/arch/riscv/kernel/text.c
> @@ -0,0 +1,20 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + *
> + * Copyright (C) 2017 Zihao Yu
> + */
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
> +#define VMALLOC_MODULE_START \
> + max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
> +void *text_alloc(unsigned long size)
> +{
> + return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
> + VMALLOC_END, GFP_KERNEL,
> + PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> +}
> +#endif
> diff --git a/arch/s390/kernel/Makefile b/arch/s390/kernel/Makefile
> index a8f136943deb..9f00c320b938 100644
> --- a/arch/s390/kernel/Makefile
> +++ b/arch/s390/kernel/Makefile
> @@ -40,7 +40,7 @@ obj-y += sysinfo.o lgr.o os_info.o machine_kexec.o
> pgm_check.o
> obj-y += runtime_instr.o cache.o fpu.o dumpstack.o guarded_storage.o sthyi.o
> obj-y += entry.o reipl.o relocate_kernel.o kdebugfs.o alternative.o
> obj-y += nospec-branch.o ipl_vmparm.o machine_kexec_reloc.o unwind_bc.o
> -obj-y += smp.o
> +obj-y += smp.o text.o
>
> extra-y += head64.o vmlinux.lds
>
> diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
> index b388e87a08bf..a752b1442846 100644
> --- a/arch/s390/kernel/ftrace.c
> +++ b/arch/s390/kernel/ftrace.c
> @@ -134,7 +134,7 @@ static int __init ftrace_plt_init(void)
> {
> unsigned int *ip;
>
> - ftrace_plt = (unsigned long) module_alloc(PAGE_SIZE);
> + ftrace_plt = (unsigned long) text_alloc(PAGE_SIZE);
> if (!ftrace_plt)
> panic("cannot allocate ftrace plt\n");
> ip = (unsigned int *) ftrace_plt;
> diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
> index 4055f1c49814..087cb5951de6 100644
> --- a/arch/s390/kernel/module.c
> +++ b/arch/s390/kernel/module.c
> @@ -32,22 +32,6 @@
>
> #define PLT_ENTRY_SIZE 20
>
> -void *module_alloc(unsigned long size)
> -{
> - void *p;
> -
> - if (PAGE_ALIGN(size) > MODULES_LEN)
> - return NULL;
> - p = __vmalloc_node_range(size, MODULE_ALIGN, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> - if (p && (kasan_module_alloc(p, size) < 0)) {
> - vfree(p);
> - return NULL;
> - }
> - return p;
> -}
> -
> void module_arch_freeing_init(struct module *mod)
> {
> if (is_livepatch_module(mod) &&
> diff --git a/arch/s390/kernel/text.c b/arch/s390/kernel/text.c
> new file mode 100644
> index 000000000000..63aaa1ab727b
> --- /dev/null
> +++ b/arch/s390/kernel/text.c
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Kernel module help for s390.
> + */
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +void *text_alloc(unsigned long size)
> +{
> + void *p;
> +
> + if (PAGE_ALIGN(size) > MODULES_LEN)
> + return NULL;
> + p = __vmalloc_node_range(size, MODULE_ALIGN, MODULES_VADDR, MODULES_END,
> + GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> + if (p && (kasan_module_alloc(p, size) < 0)) {
> + vfree(p);
> + return NULL;
> + }
> + return p;
> +}
> diff --git a/arch/sparc/kernel/Makefile b/arch/sparc/kernel/Makefile
> index 97c0e19263d1..e025f9e1db4a 100644
> --- a/arch/sparc/kernel/Makefile
> +++ b/arch/sparc/kernel/Makefile
> @@ -52,6 +52,7 @@ obj-y += prom_common.o
> obj-y += prom_$(BITS).o
> obj-y += of_device_common.o
> obj-y += of_device_$(BITS).o
> +obj-y += text.o
> obj-$(CONFIG_SPARC64) += prom_irqtrans.o
>
> obj-$(CONFIG_SPARC32) += leon_kernel.o
> diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
> index df39580f398d..f2babc69f189 100644
> --- a/arch/sparc/kernel/module.c
> +++ b/arch/sparc/kernel/module.c
> @@ -21,36 +21,6 @@
>
> #include "entry.h"
>
> -#ifdef CONFIG_SPARC64
> -
> -#include <linux/jump_label.h>
> -
> -static void *module_map(unsigned long size)
> -{
> - if (PAGE_ALIGN(size) > MODULES_LEN)
> - return NULL;
> - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> -}
> -#else
> -static void *module_map(unsigned long size)
> -{
> - return vmalloc(size);
> -}
> -#endif /* CONFIG_SPARC64 */
> -
> -void *module_alloc(unsigned long size)
> -{
> - void *ret;
> -
> - ret = module_map(size);
> - if (ret)
> - memset(ret, 0, size);
> -
> - return ret;
> -}
> -
> /* Make generic code ignore STT_REGISTER dummy undefined symbols. */
> int module_frob_arch_sections(Elf_Ehdr *hdr,
> Elf_Shdr *sechdrs,
> diff --git a/arch/sparc/kernel/text.c b/arch/sparc/kernel/text.c
> new file mode 100644
> index 000000000000..d16663f2c6ba
> --- /dev/null
> +++ b/arch/sparc/kernel/text.c
> @@ -0,0 +1,39 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Kernel module help for sparc64.
> + *
> + * Copyright (C) 2001 Rusty Russell.
> + * Copyright (C) 2002 David S. Miller.
> + */
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +#ifdef CONFIG_SPARC64
> +
> +#include <linux/jump_label.h>
> +
> +static void *module_map(unsigned long size)
> +{
> + if (PAGE_ALIGN(size) > MODULES_LEN)
> + return NULL;
> + return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> + GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> +}
> +#else
> +static void *module_map(unsigned long size)
> +{
> + return vmalloc(size);
> +}
> +#endif /* CONFIG_SPARC64 */
> +
> +void *text_alloc(unsigned long size)
> +{
> + void *ret;
> +
> + ret = module_map(size);
> + if (ret)
> + memset(ret, 0, size);
> +
> + return ret;
> +}
> diff --git a/arch/sparc/net/bpf_jit_comp_32.c
> b/arch/sparc/net/bpf_jit_comp_32.c
> index c8eabb973b86..d9dd513b27b2 100644
> --- a/arch/sparc/net/bpf_jit_comp_32.c
> +++ b/arch/sparc/net/bpf_jit_comp_32.c
> @@ -713,7 +713,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf];
> if (unlikely(proglen + ilen > oldproglen)) {
> pr_err("bpb_jit_compile fatal error\n");
> kfree(addrs);
> - module_memfree(image);
> + text_memfree(image);
> return;
> }
> memcpy(image + proglen, temp, ilen);
> @@ -736,7 +736,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf];
> break;
> }
> if (proglen == oldproglen) {
> - image = module_alloc(proglen);
> + image = text_alloc(proglen);
> if (!image)
> goto out;
> }
> @@ -758,7 +758,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf];
> void bpf_jit_free(struct bpf_prog *fp)
> {
> if (fp->jited)
> - module_memfree(fp->bpf_func);
> + text_memfree(fp->bpf_func);
>
> bpf_prog_unlock_free(fp);
> }
> diff --git a/arch/unicore32/kernel/Makefile b/arch/unicore32/kernel/Makefile
> index 2f79aa56735b..96eb8cfc8b1e 100644
> --- a/arch/unicore32/kernel/Makefile
> +++ b/arch/unicore32/kernel/Makefile
> @@ -6,6 +6,7 @@
> # Object file lists.
> obj-y := dma.o elf.o entry.o process.o ptrace.o
> obj-y += setup.o signal.o sys.o stacktrace.o traps.o
> +obj-y += text.o
>
> obj-$(CONFIG_MODULES) += ksyms.o module.o
> obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
> diff --git a/arch/unicore32/kernel/module.c b/arch/unicore32/kernel/module.c
> index 67c89ef2d6ee..e1e703c02379 100644
> --- a/arch/unicore32/kernel/module.c
> +++ b/arch/unicore32/kernel/module.c
> @@ -18,13 +18,6 @@
>
> #include <asm/sections.h>
>
> -void *module_alloc(unsigned long size)
> -{
> - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> - GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> -}
> -
> int
> apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, unsigned
> int symindex,
> unsigned int relindex, struct module *module)
> diff --git a/arch/unicore32/kernel/text.c b/arch/unicore32/kernel/text.c
> new file mode 100644
> index 000000000000..b94aac824bb8
> --- /dev/null
> +++ b/arch/unicore32/kernel/text.c
> @@ -0,0 +1,18 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * linux/arch/unicore32/kernel/module.c
> + *
> + * Code specific to PKUnity SoC and UniCore ISA
> + *
> + * Copyright (C) 2001-2010 GUAN Xue-tao
> + */
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +void *text_alloc(unsigned long size)
> +{
> + return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> + GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> +}
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index e77261db2391..2878e4b753a0 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -68,6 +68,7 @@ obj-y += tsc.o tsc_msr.o io_delay.o rtc.o
> obj-y += pci-iommu_table.o
> obj-y += resource.o
> obj-y += irqflags.o
> +obj-y += text.o
>
> obj-y += process.o
> obj-y += fpu/
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 51504566b3a6..f76703ee96f2 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -265,11 +265,11 @@ int __init ftrace_dyn_arch_init(void)
> /* Module allocation simplifies allocating memory for code */
> static inline void *alloc_tramp(unsigned long size)
> {
> - return module_alloc(size);
> + return text_alloc(size);
> }
> static inline void tramp_free(void *tramp)
> {
> - module_memfree(tramp);
> + text_memfree(tramp);
> }
> #else
> /* Trampolines can only be created if modules are supported */
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index ada39ddbc922..e9ac7d3c658e 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -423,7 +423,7 @@ void *alloc_insn_page(void)
> {
> void *page;
>
> - page = module_alloc(PAGE_SIZE);
> + page = text_alloc(PAGE_SIZE);
> if (!page)
> return NULL;
>
> @@ -446,7 +446,7 @@ void *alloc_insn_page(void)
> /* Recover page to RW mode before releasing it */
> void free_insn_page(void *page)
> {
> - module_memfree(page);
> + text_memfree(page);
> }
>
> static int arch_copy_kprobe(struct kprobe *p)
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index 34b153cbd4ac..261df078f127 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -36,55 +36,6 @@ do { \
> } while (0)
> #endif
>
> -#ifdef CONFIG_RANDOMIZE_BASE
> -static unsigned long module_load_offset;
> -
> -/* Mutex protects the module_load_offset. */
> -static DEFINE_MUTEX(module_kaslr_mutex);
> -
> -static unsigned long int get_module_load_offset(void)
> -{
> - if (kaslr_enabled()) {
> - mutex_lock(&module_kaslr_mutex);
> - /*
> - * Calculate the module_load_offset the first time this
> - * code is called. Once calculated it stays the same until
> - * reboot.
> - */
> - if (module_load_offset == 0)
> - module_load_offset =
> - (get_random_int() % 1024 + 1) * PAGE_SIZE;
> - mutex_unlock(&module_kaslr_mutex);
> - }
> - return module_load_offset;
> -}
> -#else
> -static unsigned long int get_module_load_offset(void)
> -{
> - return 0;
> -}
> -#endif
> -
> -void *module_alloc(unsigned long size)
> -{
> - void *p;
> -
> - if (PAGE_ALIGN(size) > MODULES_LEN)
> - return NULL;
> -
> - p = __vmalloc_node_range(size, MODULE_ALIGN,
> - MODULES_VADDR + get_module_load_offset(),
> - MODULES_END, GFP_KERNEL,
> - PAGE_KERNEL, 0, NUMA_NO_NODE,
> - __builtin_return_address(0));
> - if (p && (kasan_module_alloc(p, size) < 0)) {
> - vfree(p);
> - return NULL;
> - }
> -
> - return p;
> -}
> -
> #ifdef CONFIG_X86_32
> int apply_relocate(Elf32_Shdr *sechdrs,
> const char *strtab,
> diff --git a/arch/x86/kernel/text.c b/arch/x86/kernel/text.c
> new file mode 100644
> index 000000000000..724ab2d93ac5
> --- /dev/null
> +++ b/arch/x86/kernel/text.c
> @@ -0,0 +1,60 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Kernel module help for x86.
> + * Copyright (C) 2001 Rusty Russell.
> + */
> +#include <linux/kasan.h>
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/random.h>
> +#include <linux/vmalloc.h>
> +#include <asm/setup.h>
> +
> +#ifdef CONFIG_RANDOMIZE_BASE
> +static unsigned long module_load_offset;
> +
> +/* Mutex protects the module_load_offset. */
> +static DEFINE_MUTEX(module_kaslr_mutex);
> +
> +static unsigned long get_module_load_offset(void)
> +{
> + if (kaslr_enabled()) {
> + mutex_lock(&module_kaslr_mutex);
> + /*
> + * Calculate the module_load_offset the first time this
> + * code is called. Once calculated it stays the same until
> + * reboot.
> + */
> + if (module_load_offset == 0)
> + module_load_offset =
> + (get_random_int() % 1024 + 1) * PAGE_SIZE;
> + mutex_unlock(&module_kaslr_mutex);
> + }
> + return module_load_offset;
> +}
> +#else
> +static unsigned long get_module_load_offset(void)
> +{
> + return 0;
> +}
> +#endif
> +
> +void *text_alloc(unsigned long size)
> +{
> + void *p;
> +
> + if (PAGE_ALIGN(size) > MODULES_LEN)
> + return NULL;
> +
> + p = __vmalloc_node_range(size, MODULE_ALIGN,
> + MODULES_VADDR + get_module_load_offset(),
> + MODULES_END, GFP_KERNEL,
> + PAGE_KERNEL, 0, NUMA_NO_NODE,
> + __builtin_return_address(0));
> + if (p && (kasan_module_alloc(p, size) < 0)) {
> + vfree(p);
> + return NULL;
> + }
> +
> + return p;
> +}
> diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
> index 4fa67a8b2265..4e8b9ba431ee 100644
> --- a/include/linux/moduleloader.h
> +++ b/include/linux/moduleloader.h
> @@ -24,10 +24,10 @@ unsigned int arch_mod_section_prepend(struct
> module *mod, unsigned int section);
>
> /* Allocator used for allocating struct module, core sections and init
> sections. Returns NULL on failure. */
> -void *module_alloc(unsigned long size);
> +void *text_alloc(unsigned long size);
>
> /* Free memory returned from module_alloc. */
> -void module_memfree(void *module_region);
> +void text_memfree(void *module_region);
>
> /* Determines if the section name is an init section (that is only
> used during
> * module loading).
> diff --git a/kernel/Makefile b/kernel/Makefile
> index f3218bc5ec69..9e88e81f68ef 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -10,7 +10,7 @@ obj-y = fork.o exec_domain.o panic.o \
> extable.o params.o \
> kthread.o sys_ni.o nsproxy.o \
> notifier.o ksysfs.o cred.o reboot.o \
> - async.o range.o smpboot.o ucount.o
> + async.o range.o smpboot.o ucount.o text.o
>
> obj-$(CONFIG_MODULES) += kmod.o
> obj-$(CONFIG_MULTIUSER) += groups.o
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 9df4cc9a2907..febd55019a8a 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -840,12 +840,12 @@ static void bpf_jit_uncharge_modmem(u32 pages)
>
> void *__weak bpf_jit_alloc_exec(unsigned long size)
> {
> - return module_alloc(size);
> + return text_alloc(size);
> }
>
> void __weak bpf_jit_free_exec(void *addr)
> {
> - module_memfree(addr);
> + text_memfree(addr);
> }
>
> struct bpf_binary_header *
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 4a904cc56d68..d1c354ec89de 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -111,12 +111,12 @@ enum kprobe_slot_state {
>
> void __weak *alloc_insn_page(void)
> {
> - return module_alloc(PAGE_SIZE);
> + return text_alloc(PAGE_SIZE);
> }
>
> void __weak free_insn_page(void *page)
> {
> - module_memfree(page);
> + text_memfree(page);
> }
>
> struct kprobe_insn_cache kprobe_insn_slots = {
> diff --git a/kernel/module.c b/kernel/module.c
> index bee1c25ca5c5..bdb3773f3668 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2141,16 +2141,6 @@ static void free_module_elf(struct module *mod)
> }
> #endif /* CONFIG_LIVEPATCH */
>
> -void __weak module_memfree(void *module_region)
> -{
> - /*
> - * This memory may be RO, and freeing RO memory in an interrupt is not
> - * supported by vmalloc.
> - */
> - WARN_ON(in_interrupt());
> - vfree(module_region);
> -}
> -
> void __weak module_arch_cleanup(struct module *mod)
> {
> }
> @@ -2200,7 +2190,7 @@ static void free_module(struct module *mod)
>
> /* This may be empty, but that's OK */
> module_arch_freeing_init(mod);
> - module_memfree(mod->init_layout.base);
> + text_memfree(mod->init_layout.base);
> kfree(mod->args);
> percpu_modfree(mod);
>
> @@ -2208,7 +2198,7 @@ static void free_module(struct module *mod)
> lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);
>
> /* Finally, free the core (containing the module structure) */
> - module_memfree(mod->core_layout.base);
> + text_memfree(mod->core_layout.base);
> }
>
> void *__symbol_get(const char *symbol)
> @@ -2781,13 +2771,6 @@ static void dynamic_debug_remove(struct
> module *mod, struct _ddebug *debug)
> ddebug_remove_module(mod->name);
> }
>
> -void * __weak module_alloc(unsigned long size)
> -{
> - return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> - GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
> - NUMA_NO_NODE, __builtin_return_address(0));
> -}
> -
> bool __weak module_init_section(const char *name)
> {
> return strstarts(name, ".init");
> @@ -3246,7 +3229,7 @@ static int move_module(struct module *mod,
> struct load_info *info)
> void *ptr;
>
> /* Do the allocs. */
> - ptr = module_alloc(mod->core_layout.size);
> + ptr = text_alloc(mod->core_layout.size);
> /*
> * The pointer to this block is stored in the module structure
> * which is inside the block. Just mark it as not being a
> @@ -3260,7 +3243,7 @@ static int move_module(struct module *mod,
> struct load_info *info)
> mod->core_layout.base = ptr;
>
> if (mod->init_layout.size) {
> - ptr = module_alloc(mod->init_layout.size);
> + ptr = text_alloc(mod->init_layout.size);
> /*
> * The pointer to this block is stored in the module structure
> * which is inside the block. This block doesn't need to be
> @@ -3269,7 +3252,7 @@ static int move_module(struct module *mod,
> struct load_info *info)
> */
> kmemleak_ignore(ptr);
> if (!ptr) {
> - module_memfree(mod->core_layout.base);
> + text_memfree(mod->core_layout.base);
> return -ENOMEM;
> }
> memset(ptr, 0, mod->init_layout.size);
> @@ -3452,8 +3435,8 @@ static void module_deallocate(struct module
> *mod, struct load_info *info)
> {
> percpu_modfree(mod);
> module_arch_freeing_init(mod);
> - module_memfree(mod->init_layout.base);
> - module_memfree(mod->core_layout.base);
> + text_memfree(mod->init_layout.base);
> + text_memfree(mod->core_layout.base);
> }
>
> int __weak module_finalize(const Elf_Ehdr *hdr,
> @@ -3527,7 +3510,7 @@ static void do_free_init(struct work_struct *w)
>
> llist_for_each_safe(pos, n, list) {
> initfree = container_of(pos, struct mod_initfree, node);
> - module_memfree(initfree->module_init);
> + text_memfree(initfree->module_init);
> kfree(initfree);
> }
> }
> @@ -3626,10 +3609,10 @@ static noinline int do_init_module(struct
> module *mod)
> * We want to free module_init, but be aware that kallsyms may be
> * walking this with preempt disabled. In all the failure paths, we
> * call synchronize_rcu(), but we don't want to slow down the success
> - * path. module_memfree() cannot be called in an interrupt, so do the
> + * path. text_memfree() cannot be called in an interrupt, so do the
> * work and call synchronize_rcu() in a work queue.
> *
> - * Note that module_alloc() on most architectures creates W+X page
> + * Note that text_alloc() on most architectures creates W+X page
> * mappings which won't be cleaned up until do_free_init() runs. Any
> * code such as mark_rodata_ro() which depends on those mappings to
> * be cleaned up needs to sync with the queued work - ie
> diff --git a/kernel/text.c b/kernel/text.c
> new file mode 100644
> index 000000000000..9a12c508ded5
> --- /dev/null
> +++ b/kernel/text.c
> @@ -0,0 +1,25 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2002 Richard Henderson
> + * Copyright (C) 2001 Rusty Russell, 2002, 2010 Rusty Russell IBM.
> + */
> +#include <linux/mm.h>
> +#include <linux/moduleloader.h>
> +#include <linux/vmalloc.h>
> +
> +void __weak text_memfree(void *module_region)
> +{
> + /*
> + * This memory may be RO, and freeing RO memory in an interrupt is not
> + * supported by vmalloc.
> + */
> + WARN_ON(in_interrupt());
> + vfree(module_region);
> +}
> +
> +void * __weak text_alloc(unsigned long size)
> +{
> + return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> + GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
> + NUMA_NO_NODE, __builtin_return_address(0));
> +}
> --
> 2.25.1
^ permalink raw reply
* Re: [PATCH v3 07/12] ppc64/kexec_file: add support to relocate purgatory
From: Thiago Jung Bauermann @ 2020-07-16 22:12 UTC (permalink / raw)
To: Hari Bathini
Cc: kernel test robot, Pingfan Liu, Petr Tesarik, Nayna Jain,
Kexec-ml, Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev,
Sourabh Jain, Andrew Morton, Dave Young, Vivek Goyal,
Eric Biederman
In-Reply-To: <6f64a18a-352e-fdec-c902-45aefc31cc0a@linux.ibm.com>
Hari Bathini <hbathini@linux.ibm.com> writes:
> On 16/07/20 5:50 am, Thiago Jung Bauermann wrote:
>>
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>
>>> So, add support to relocate purgatory in kexec_file_load system call
>>> by setting up TOC pointer and applying RELA relocations as needed.
>>
>> If we do want to use a C purgatory, Michael Ellerman had suggested
>> building it as a Position Independent Executable, which greatly reduces
>> the number and types of relocations that are needed. See patches 4 and 9
>> here:
>>
>> https://lore.kernel.org/linuxppc-dev/1478748449-3894-1-git-send-email-bauerman@linux.vnet.ibm.com/
>>
>> In the series above I hadn't converted x86 to PIE. If I had done that,
>> possibly Dave Young's opinion would have been different. :-)
>>
>> If that's still not desirable, he suggested in that discussion lifting
>> some code from x86 to generic code, which I implemented and would
>> simplify this patch as well:
>>
>> https://lore.kernel.org/linuxppc-dev/5009580.5GxAkTrMYA@morokweng/
>>
>
> Agreed. But I prefer to work on PIE and/or moving common relocation_add code
> for x86 & s390 to generic code later when I try to build on these purgatory
> changes. So, a separate series later to rework purgatory with the things you
> mentioned above sounds ok?
Sounds ok to me. Let's see what the maintainers think, then.
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v3 09/12] ppc64/kexec_file: setup backup region for kdump kernel
From: Thiago Jung Bauermann @ 2020-07-16 22:06 UTC (permalink / raw)
To: Hari Bathini
Cc: kernel test robot, Pingfan Liu, Petr Tesarik, Nayna Jain,
Kexec-ml, Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev,
Sourabh Jain, Andrew Morton, Dave Young, Vivek Goyal,
Eric Biederman
In-Reply-To: <bea19627-c6b7-5d59-e194-03038bb4d9f6@linux.ibm.com>
Hari Bathini <hbathini@linux.ibm.com> writes:
> On 16/07/20 7:08 am, Thiago Jung Bauermann wrote:
>>
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>
>>> @@ -968,7 +1040,7 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
>>>
>>> /*
>>> * Restrict memory usage for kdump kernel by setting up
>>> - * usable memory ranges.
>>> + * usable memory ranges and memory reserve map.
>>> */
>>> if (image->type == KEXEC_TYPE_CRASH) {
>>> ret = get_usable_memory_ranges(&umem);
>>> @@ -980,6 +1052,24 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
>>> pr_err("Error setting up usable-memory property for kdump kernel\n");
>>> goto out;
>>> }
>>> +
>>> + ret = fdt_add_mem_rsv(fdt, BACKUP_SRC_START + BACKUP_SRC_SIZE,
>>> + crashk_res.start - BACKUP_SRC_SIZE);
>>
>> I believe this answers my question from the other email about how the
>> crashkernel is prevented from stomping in the crashed kernel's memory,
>> right? I needed to think for a bit to understand what the above
>> reservation was protecting. I think it's worth adding a comment.
>
> Right. The reason to add it in the first place is, prom presses the panic button if
> it can't find low memory. Marking it reserved seems to keep it quiet though. so..
>
> Will add comment mentioning that..
Ah, makes sense. Thanks for the explanation.
>>> +void purgatory(void)
>>> +{
>>> + void *dest, *src;
>>> +
>>> + src = (void *)BACKUP_SRC_START;
>>> + if (backup_start) {
>>> + dest = (void *)backup_start;
>>> + __memcpy(dest, src, BACKUP_SRC_SIZE);
>>> + }
>>> +}
>>
>> In general I'm in favor of using C code over assembly, but having to
>> bring in that relocation support just for the above makes me wonder if
>> it's worth it in this case.
>
> I am planning to build on purgatory later with "I'm in purgatory" print support
> for pseries at least and also, sha256 digest check.
Ok. In that case, my preference would be to convert both the powerpc and
x86 purgatories to PIE since this greatly reduces the types of
relocations that are emitted, but better ask Dave Young what he thinks
before going down that route.
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v3 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel
From: Thiago Jung Bauermann @ 2020-07-16 22:03 UTC (permalink / raw)
To: Hari Bathini
Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <baa29ea9-7698-a7e8-e5a4-c9f842e1fcc8@linux.ibm.com>
Hari Bathini <hbathini@linux.ibm.com> writes:
> On 16/07/20 4:22 am, Thiago Jung Bauermann wrote:
>>
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>
>
> <snip>
>
>>> +/**
>>> + * get_node_path - Get the full path of the given node.
>>> + * @dn: Node.
>>> + * @path: Updated with the full path of the node.
>>> + *
>>> + * Returns nothing.
>>> + */
>>> +static void get_node_path(struct device_node *dn, char *path)
>>> +{
>>> + if (!dn)
>>> + return;
>>> +
>>> + get_node_path(dn->parent, path);
>>
>> Is it ok to do recursion in the kernel? In this case I believe it's not
>> problematic since the maximum call depth will be the maximum depth of a
>> device tree node which shouldn't be too much. Also, there are no local
>> variables in this function. But I thought it was worth mentioning.
>
> You are right. We are better off avoiding the recursion here. Will
> change it to an iterative version instead.
Ok.
>>> + * each representing a memory range.
>>> + */
>>> + ranges = (len >> 2) / (n_mem_addr_cells + n_mem_size_cells);
>>> +
>>> + for (i = 0; i < ranges; i++) {
>>> + base = of_read_number(prop, n_mem_addr_cells);
>>> + prop += n_mem_addr_cells;
>>> + end = base + of_read_number(prop, n_mem_size_cells) - 1;
>
> prop is not used after the above.
>
>> You need to `prop += n_mem_size_cells` here.
>
> But yeah, adding it would make it look complete in some sense..
Isn't it used in the next iteration of the loop?
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v3 05/12] powerpc/drmem: make lmb walk a bit more flexible
From: Thiago Jung Bauermann @ 2020-07-16 22:01 UTC (permalink / raw)
To: Hari Bathini
Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <30e8f02a-f009-70a5-01e9-dec9eff213b1@linux.ibm.com>
Hari Bathini <hbathini@linux.ibm.com> writes:
> On 15/07/20 9:20 am, Thiago Jung Bauermann wrote:
>>
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>
>>> @@ -534,7 +537,7 @@ static int __init early_init_dt_scan_memory_ppc(unsigned long node,
>>> #ifdef CONFIG_PPC_PSERIES
>>> if (depth == 1 &&
>>> strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) {
>>> - walk_drmem_lmbs_early(node, early_init_drmem_lmb);
>>> + walk_drmem_lmbs_early(node, NULL, early_init_drmem_lmb);
>>
>> walk_drmem_lmbs_early() can now fail. Should this failure be propagated
>> as a return value of early_init_dt_scan_memory_ppc()?
>
>>
>>> return 0;
>>> }
>>> #endif
>> <snip>
>>
>>> @@ -787,7 +790,7 @@ static int __init parse_numa_properties(void)
>>> */
>>> memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
>>> if (memory) {
>>> - walk_drmem_lmbs(memory, numa_setup_drmem_lmb);
>>> + walk_drmem_lmbs(memory, NULL, numa_setup_drmem_lmb);
>>
>> Similarly here. Now that this call can fail, should
>> parse_numa_properties() handle or propagate the failure?
>
> They would still not fail unless the callbacks early_init_drmem_lmb() & numa_setup_drmem_lmb()
> are updated to have failure scenarios. Also, these call sites always ignored failure scenarios
> even before walk_drmem_lmbs() was introduced. So, I prefer to keep them the way they are?
Ok, makes sense. In this case:
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v3 04/12] ppc64/kexec_file: avoid stomping memory used by special regions
From: Thiago Jung Bauermann @ 2020-07-16 21:59 UTC (permalink / raw)
To: Hari Bathini
Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <0582476e-415e-3f60-2bb2-6199d0340156@linux.ibm.com>
Hari Bathini <hbathini@linux.ibm.com> writes:
> On 15/07/20 8:09 am, Thiago Jung Bauermann wrote:
>>
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>
>
> <snip>
>
>>> +/**
>>> + * __locate_mem_hole_top_down - Looks top down for a large enough memory hole
>>> + * in the memory regions between buf_min & buf_max
>>> + * for the buffer. If found, sets kbuf->mem.
>>> + * @kbuf: Buffer contents and memory parameters.
>>> + * @buf_min: Minimum address for the buffer.
>>> + * @buf_max: Maximum address for the buffer.
>>> + *
>>> + * Returns 0 on success, negative errno on error.
>>> + */
>>> +static int __locate_mem_hole_top_down(struct kexec_buf *kbuf,
>>> + u64 buf_min, u64 buf_max)
>>> +{
>>> + int ret = -EADDRNOTAVAIL;
>>> + phys_addr_t start, end;
>>> + u64 i;
>>> +
>>> + for_each_mem_range_rev(i, &memblock.memory, NULL, NUMA_NO_NODE,
>>> + MEMBLOCK_NONE, &start, &end, NULL) {
>>> + if (start > buf_max)
>>> + continue;
>>> +
>>> + /* Memory hole not found */
>>> + if (end < buf_min)
>>> + break;
>>> +
>>> + /* Adjust memory region based on the given range */
>>> + if (start < buf_min)
>>> + start = buf_min;
>>> + if (end > buf_max)
>>> + end = buf_max;
>>> +
>>> + start = ALIGN(start, kbuf->buf_align);
>>> + if (start < end && (end - start + 1) >= kbuf->memsz) {
>>
>> This is why I dislike using start and end to express address ranges:
>>
>> While struct resource seems to use the [address, end] convention, my
>
> struct crash_mem also uses [address, end] convention.
> This off-by-one error did not cause any issues as the hole start and size we try to find
> are at least page aligned.
>
> Nonetheless, I think fixing 'end' early in the loop with "end -= 1" would ensure
> correctness while continuing to use the same convention for structs crash_mem & resource.
Sounds good.
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v3 10/12] ppc64/kexec_file: prepare elfcore header for crashing kernel
From: Thiago Jung Bauermann @ 2020-07-16 21:57 UTC (permalink / raw)
To: Hari Bathini
Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <929db6fe-b221-a514-8ea1-93227f8d47b0@linux.ibm.com>
Hari Bathini <hbathini@linux.ibm.com> writes:
> On 16/07/20 7:52 am, Thiago Jung Bauermann wrote:
>>
>> Hari Bathini <hbathini@linux.ibm.com> writes:
>>
>>> /**
>>> + * get_crash_memory_ranges - Get crash memory ranges. This list includes
>>> + * first/crashing kernel's memory regions that
>>> + * would be exported via an elfcore.
>>> + * @mem_ranges: Range list to add the memory ranges to.
>>> + *
>>> + * Returns 0 on success, negative errno on error.
>>> + */
>>> +static int get_crash_memory_ranges(struct crash_mem **mem_ranges)
>>> +{
>>> + struct memblock_region *reg;
>>> + struct crash_mem *tmem;
>>> + int ret;
>>> +
>>> + for_each_memblock(memory, reg) {
>>> + u64 base, size;
>>> +
>>> + base = (u64)reg->base;
>>> + size = (u64)reg->size;
>>> +
>>> + /* Skip backup memory region, which needs a separate entry */
>>> + if (base == BACKUP_SRC_START) {
>>> + if (size > BACKUP_SRC_SIZE) {
>>> + base = BACKUP_SRC_END + 1;
>>> + size -= BACKUP_SRC_SIZE;
>>> + } else
>>> + continue;
>>> + }
>>> +
>>> + ret = add_mem_range(mem_ranges, base, size);
>>> + if (ret)
>>> + goto out;
>>> +
>>> + /* Try merging adjacent ranges before reallocation attempt */
>>> + if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
>>> + sort_memory_ranges(*mem_ranges, true);
>>> + }
>>> +
>>> + /* Reallocate memory ranges if there is no space to split ranges */
>>> + tmem = *mem_ranges;
>>> + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
>>> + tmem = realloc_mem_ranges(mem_ranges);
>>> + if (!tmem)
>>> + goto out;
>>> + }
>>> +
>>> + /* Exclude crashkernel region */
>>> + ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
>>> + if (ret)
>>> + goto out;
>>> +
>>> + ret = add_rtas_mem_range(mem_ranges);
>>> + if (ret)
>>> + goto out;
>>> +
>>> + ret = add_opal_mem_range(mem_ranges);
>>> + if (ret)
>>> + goto out;
>>
>> Maybe I'm confused, but don't you add the RTAS and OPAL regions as
>> usable memory for the crashkernel? In that case they shouldn't show up
>> in the core file.
>
> kexec-tools does the same thing. I am not endorsing it but I was trying to stay
> in parity to avoid breaking any userspace tools/commands. But as you rightly
> pointed, this is NOT right. The right thing to do, to get the rtas/opal data at
> the time of crash, is to have a backup region for them just like we have for
> the first 64K memory. I was hoping to do that later.
>
> Will check how userspace tools respond to dropping these regions. If that makes
> the tools unhappy, will retain the regions with a FIXME. Sorry about the confusion.
No problem, thanks for the clarification.
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v3 07/12] ppc64/kexec_file: add support to relocate purgatory
From: Hari Bathini @ 2020-07-16 21:11 UTC (permalink / raw)
To: Thiago Jung Bauermann
Cc: kernel test robot, Pingfan Liu, Petr Tesarik, Nayna Jain,
Kexec-ml, Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev,
Sourabh Jain, Andrew Morton, Dave Young, Vivek Goyal,
Eric Biederman
In-Reply-To: <871rlc9upc.fsf@morokweng.localdomain>
On 16/07/20 5:50 am, Thiago Jung Bauermann wrote:
>
> Hari Bathini <hbathini@linux.ibm.com> writes:
>
>> Right now purgatory implementation is only minimal. But if purgatory
>> code is to be enhanced to copy memory to the backup region and verify
>
> Can't the memcpy be done in asm? We have arch/powerpc/lib/memcpy_64.S
> for example, perhaps it could be linked in with the purgatory?
I wanted to avoid touching common code to make it work for purgatory
for now.
>
>> sha256 digest, relocations may have to be applied to the purgatory.
>
> Do we want to do the sha256 verification? My original patch series for
> kexec_file_load() had a purgatory in C from kexec-tools which did the
> sha256 verification but Michael Ellerman thought it was unnecessary and
> decided to use the simpler purgatory in asm from kexec-lite.
kexec_file_load could as well be used without IMA or secureboot. With sha256 digest
calculated anyway, verifying it would make sense to accommodate that case as well.
>
>> So, add support to relocate purgatory in kexec_file_load system call
>> by setting up TOC pointer and applying RELA relocations as needed.
>
> If we do want to use a C purgatory, Michael Ellerman had suggested
> building it as a Position Independent Executable, which greatly reduces
> the number and types of relocations that are needed. See patches 4 and 9
> here:
>
> https://lore.kernel.org/linuxppc-dev/1478748449-3894-1-git-send-email-bauerman@linux.vnet.ibm.com/
>
> In the series above I hadn't converted x86 to PIE. If I had done that,
> possibly Dave Young's opinion would have been different. :-)
>
> If that's still not desirable, he suggested in that discussion lifting
> some code from x86 to generic code, which I implemented and would
> simplify this patch as well:
>
> https://lore.kernel.org/linuxppc-dev/5009580.5GxAkTrMYA@morokweng/
>
Agreed. But I prefer to work on PIE and/or moving common relocation_add code
for x86 & s390 to generic code later when I try to build on these purgatory
changes. So, a separate series later to rework purgatory with the things you
mentioned above sounds ok?
Thanks
Hari
^ permalink raw reply
* Re: [PATCH v3 09/12] ppc64/kexec_file: setup backup region for kdump kernel
From: Hari Bathini @ 2020-07-16 21:10 UTC (permalink / raw)
To: Thiago Jung Bauermann
Cc: kernel test robot, Pingfan Liu, Petr Tesarik, Nayna Jain,
Kexec-ml, Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev,
Sourabh Jain, Andrew Morton, Dave Young, Vivek Goyal,
Eric Biederman
In-Reply-To: <87y2nk8cjq.fsf@morokweng.localdomain>
On 16/07/20 7:08 am, Thiago Jung Bauermann wrote:
>
> Hari Bathini <hbathini@linux.ibm.com> writes:
>
>> @@ -968,7 +1040,7 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
>>
>> /*
>> * Restrict memory usage for kdump kernel by setting up
>> - * usable memory ranges.
>> + * usable memory ranges and memory reserve map.
>> */
>> if (image->type == KEXEC_TYPE_CRASH) {
>> ret = get_usable_memory_ranges(&umem);
>> @@ -980,6 +1052,24 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
>> pr_err("Error setting up usable-memory property for kdump kernel\n");
>> goto out;
>> }
>> +
>> + ret = fdt_add_mem_rsv(fdt, BACKUP_SRC_START + BACKUP_SRC_SIZE,
>> + crashk_res.start - BACKUP_SRC_SIZE);
>
> I believe this answers my question from the other email about how the
> crashkernel is prevented from stomping in the crashed kernel's memory,
> right? I needed to think for a bit to understand what the above
> reservation was protecting. I think it's worth adding a comment.
Right. The reason to add it in the first place is, prom presses the panic button if
it can't find low memory. Marking it reserved seems to keep it quiet though. so..
Will add comment mentioning that..
>> +void purgatory(void)
>> +{
>> + void *dest, *src;
>> +
>> + src = (void *)BACKUP_SRC_START;
>> + if (backup_start) {
>> + dest = (void *)backup_start;
>> + __memcpy(dest, src, BACKUP_SRC_SIZE);
>> + }
>> +}
>
> In general I'm in favor of using C code over assembly, but having to
> bring in that relocation support just for the above makes me wonder if
> it's worth it in this case.
I am planning to build on purgatory later with "I'm in purgatory" print support
for pseries at least and also, sha256 digest check.
Thanks
Hari
^ permalink raw reply
* Re: [PATCH v3 06/12] ppc64/kexec_file: restrict memory usage of kdump kernel
From: Hari Bathini @ 2020-07-16 21:10 UTC (permalink / raw)
To: Thiago Jung Bauermann
Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <87365s9ysj.fsf@morokweng.localdomain>
On 16/07/20 4:22 am, Thiago Jung Bauermann wrote:
>
> Hari Bathini <hbathini@linux.ibm.com> writes:
>
<snip>
>> +/**
>> + * get_node_path - Get the full path of the given node.
>> + * @dn: Node.
>> + * @path: Updated with the full path of the node.
>> + *
>> + * Returns nothing.
>> + */
>> +static void get_node_path(struct device_node *dn, char *path)
>> +{
>> + if (!dn)
>> + return;
>> +
>> + get_node_path(dn->parent, path);
>
> Is it ok to do recursion in the kernel? In this case I believe it's not
> problematic since the maximum call depth will be the maximum depth of a
> device tree node which shouldn't be too much. Also, there are no local
> variables in this function. But I thought it was worth mentioning.
You are right. We are better off avoiding the recursion here. Will
change it to an iterative version instead.
>> + * each representing a memory range.
>> + */
>> + ranges = (len >> 2) / (n_mem_addr_cells + n_mem_size_cells);
>> +
>> + for (i = 0; i < ranges; i++) {
>> + base = of_read_number(prop, n_mem_addr_cells);
>> + prop += n_mem_addr_cells;
>> + end = base + of_read_number(prop, n_mem_size_cells) - 1;
prop is not used after the above.
> You need to `prop += n_mem_size_cells` here.
But yeah, adding it would make it look complete in some sense..
Thanks
Hari
^ permalink raw reply
* Re: [PATCH v3 05/12] powerpc/drmem: make lmb walk a bit more flexible
From: Hari Bathini @ 2020-07-16 21:09 UTC (permalink / raw)
To: Thiago Jung Bauermann
Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <871rld8mic.fsf@morokweng.localdomain>
On 15/07/20 9:20 am, Thiago Jung Bauermann wrote:
>
> Hari Bathini <hbathini@linux.ibm.com> writes:
>
>> @@ -534,7 +537,7 @@ static int __init early_init_dt_scan_memory_ppc(unsigned long node,
>> #ifdef CONFIG_PPC_PSERIES
>> if (depth == 1 &&
>> strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) {
>> - walk_drmem_lmbs_early(node, early_init_drmem_lmb);
>> + walk_drmem_lmbs_early(node, NULL, early_init_drmem_lmb);
>
> walk_drmem_lmbs_early() can now fail. Should this failure be propagated
> as a return value of early_init_dt_scan_memory_ppc()?
>
>> return 0;
>> }
>> #endif
> <snip>
>
>> @@ -787,7 +790,7 @@ static int __init parse_numa_properties(void)
>> */
>> memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
>> if (memory) {
>> - walk_drmem_lmbs(memory, numa_setup_drmem_lmb);
>> + walk_drmem_lmbs(memory, NULL, numa_setup_drmem_lmb);
>
> Similarly here. Now that this call can fail, should
> parse_numa_properties() handle or propagate the failure?
They would still not fail unless the callbacks early_init_drmem_lmb() & numa_setup_drmem_lmb()
are updated to have failure scenarios. Also, these call sites always ignored failure scenarios
even before walk_drmem_lmbs() was introduced. So, I prefer to keep them the way they are?
Thanks
Hari
^ permalink raw reply
* Re: [PATCH v3 04/12] ppc64/kexec_file: avoid stomping memory used by special regions
From: Hari Bathini @ 2020-07-16 21:09 UTC (permalink / raw)
To: Thiago Jung Bauermann
Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <87365t8pse.fsf@morokweng.localdomain>
On 15/07/20 8:09 am, Thiago Jung Bauermann wrote:
>
> Hari Bathini <hbathini@linux.ibm.com> writes:
>
<snip>
>> +/**
>> + * __locate_mem_hole_top_down - Looks top down for a large enough memory hole
>> + * in the memory regions between buf_min & buf_max
>> + * for the buffer. If found, sets kbuf->mem.
>> + * @kbuf: Buffer contents and memory parameters.
>> + * @buf_min: Minimum address for the buffer.
>> + * @buf_max: Maximum address for the buffer.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +static int __locate_mem_hole_top_down(struct kexec_buf *kbuf,
>> + u64 buf_min, u64 buf_max)
>> +{
>> + int ret = -EADDRNOTAVAIL;
>> + phys_addr_t start, end;
>> + u64 i;
>> +
>> + for_each_mem_range_rev(i, &memblock.memory, NULL, NUMA_NO_NODE,
>> + MEMBLOCK_NONE, &start, &end, NULL) {
>> + if (start > buf_max)
>> + continue;
>> +
>> + /* Memory hole not found */
>> + if (end < buf_min)
>> + break;
>> +
>> + /* Adjust memory region based on the given range */
>> + if (start < buf_min)
>> + start = buf_min;
>> + if (end > buf_max)
>> + end = buf_max;
>> +
>> + start = ALIGN(start, kbuf->buf_align);
>> + if (start < end && (end - start + 1) >= kbuf->memsz) {
>
> This is why I dislike using start and end to express address ranges:
>
> While struct resource seems to use the [address, end] convention, my
struct crash_mem also uses [address, end] convention.
This off-by-one error did not cause any issues as the hole start and size we try to find
are at least page aligned.
Nonetheless, I think fixing 'end' early in the loop with "end -= 1" would ensure
correctness while continuing to use the same convention for structs crash_mem & resource.
Thanks
Hari
^ permalink raw reply
* Re: [PATCH v3 03/12] powerpc/kexec_file: add helper functions for getting memory ranges
From: Hari Bathini @ 2020-07-16 21:08 UTC (permalink / raw)
To: Thiago Jung Bauermann
Cc: Pingfan Liu, Petr Tesarik, Nayna Jain, Kexec-ml,
Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
Andrew Morton, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <874kq98xo4.fsf@morokweng.localdomain>
On 15/07/20 5:19 am, Thiago Jung Bauermann wrote:
>
<snip>
> <snip>
>
>> +/**
>> + * get_mem_rngs_size - Get the allocated size of mrngs based on
>> + * max_nr_ranges and chunk size.
>> + * @mrngs: Memory ranges.
>> + *
>> + * Returns the maximum no. of ranges.
>
> This isn't correct. It returns the maximum size of @mrngs.
True. Will update..
> <snip>
>
>> +/**
>> + * add_tce_mem_ranges - Adds tce-table range to the given memory ranges list.
>> + * @mem_ranges: Range list to add the memory range(s) to.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +int add_tce_mem_ranges(struct crash_mem **mem_ranges)
>> +{
>> + struct device_node *dn;
>> + int ret;
>> +
>> + for_each_node_by_type(dn, "pci") {
>> + u64 base;
>> + u32 size;
>> +
>> + ret = of_property_read_u64(dn, "linux,tce-base", &base);
>> + ret |= of_property_read_u32(dn, "linux,tce-size", &size);
>> + if (!ret)
>
> Shouldn't the condition be `ret` instead of `!ret`?
Oops! Will fix it.
>> +/**
>> + * sort_memory_ranges - Sorts the given memory ranges list.
>> + * @mem_ranges: Range list to sort.
>> + * @merge: If true, merge the list after sorting.
>> + *
>> + * Returns nothing.
>> + */
>> +void sort_memory_ranges(struct crash_mem *mrngs, bool merge)
>> +{
>> + struct crash_mem_range *rngs;
>> + struct crash_mem_range rng;
>> + int i, j, idx;
>> +
>> + if (!mrngs)
>> + return;
>> +
>> + /* Sort the ranges in-place */
>> + rngs = &mrngs->ranges[0];
>> + for (i = 0; i < mrngs->nr_ranges; i++) {
>> + idx = i;
>> + for (j = (i + 1); j < mrngs->nr_ranges; j++) {
>> + if (rngs[idx].start > rngs[j].start)
>> + idx = j;
>> + }
>> + if (idx != i) {
>> + rng = rngs[idx];
>> + rngs[idx] = rngs[i];
>> + rngs[i] = rng;
>> + }
>> + }
>
> Would it work using sort() from lib/sort.c here?
Yeah. I think we could reuse it with a simple compare callback. Will do that.
Thanks
Hari
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox