* [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions
@ 2026-06-13 8:41 haoran.jiang
2026-06-13 10:43 ` Huacai Chen
0 siblings, 1 reply; 2+ messages in thread
From: haoran.jiang @ 2026-06-13 8:41 UTC (permalink / raw)
To: loongarch
Cc: linux-kernel, chenhuacai, kernel, akpm, jbohac, kees, yangtiezhu,
Haoran Jiang
From: Haoran Jiang <jianghaoran@kylinos.cn>
Enable STRICT_MODULE_RWX to enforce strict memory permissions
on modules,making the code region non-writable, the data region
non-executable, and the read-only data region both non-writable
and non-executable.Temporarily modify code section read/write
permissions via set_memory() API.
Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
---
v2:
Change the method of modifying page table permissions from patch_map to set_memory() API.
v3:
Modify commit description.
v4:
Add text_mutex lock in the larch_insn_write call path and
CONFIG_STRICT_MODULE_RWX is enabled by default.
UB test on the 3C6000 server shows no significant performance impact.
Before patch:
========================================================================
BYTE UNIX Benchmarks (Version 5.1.6)
System: localhost.localdomain: GNU/Linux
OS: GNU/Linux -- 7.1.0-rc6 -- #1 SMP PREEMPT Wed Jun 10 21:07:41 CST 2026
Machine: loongarch64 (loongarch64)
Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
21:19:51 up 1 min, 2 users, load average: 0.71, 0.38, 0.14; runlevel 2026-06-10
------------------------------------------------------------------------
128 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 35205725.4 lps (10.0 s, 7 samples)
Double-Precision Whetstone 4244.9 MWIPS (10.0 s, 7 samples)
Execl Throughput 6717.7 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 1213873.8 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 350740.5 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 3275103.0 KBps (30.0 s, 2 samples)
Pipe Throughput 1981993.9 lps (10.0 s, 7 samples)
Pipe-based Context Switching 55287.7 lps (10.0 s, 7 samples)
Process Creation 9056.8 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 6736.5 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 2109.8 lpm (60.0 s, 2 samples)
System Call Overhead 1549110.9 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 35205725.4 3016.8
Double-Precision Whetstone 55.0 4244.9 771.8
Execl Throughput 43.0 6717.7 1562.3
File Copy 1024 bufsize 2000 maxblocks 3960.0 1213873.8 3065.3
File Copy 256 bufsize 500 maxblocks 1655.0 350740.5 2119.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 3275103.0 5646.7
Pipe Throughput 12440.0 1981993.9 1593.2
Pipe-based Context Switching 4000.0 55287.7 138.2
Process Creation 126.0 9056.8 718.8
Shell Scripts (1 concurrent) 42.4 6736.5 1588.8
Shell Scripts (8 concurrent) 6.0 2109.8 3516.4
System Call Overhead 15000.0 1549110.9 1032.7
========
System Benchmarks Index Score 1492.2
------------------------------------------------------------------------
128 CPUs in system; running 128 parallel copies of tests
Dhrystone 2 using register variables 2901925470.7 lps (10.0 s, 7 samples)
Double-Precision Whetstone 503614.9 MWIPS (10.1 s, 7 samples)
Execl Throughput 34080.1 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 309291.4 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 75115.3 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1018101.3 KBps (30.0 s, 2 samples)
Pipe Throughput 180702003.1 lps (10.0 s, 7 samples)
Pipe-based Context Switching 2426596.4 lps (10.0 s, 7 samples)
Process Creation 37282.5 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 50813.7 lpm (60.1 s, 2 samples)
Shell Scripts (8 concurrent) 5835.8 lpm (60.4 s, 2 samples)
System Call Overhead 9039181.1 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 2901925470.7 248665.4
Double-Precision Whetstone 55.0 503614.9 91566.3
Execl Throughput 43.0 34080.1 7925.6
File Copy 1024 bufsize 2000 maxblocks 3960.0 309291.4 781.0
File Copy 256 bufsize 500 maxblocks 1655.0 75115.3 453.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 1018101.3 1755.3
Pipe Throughput 12440.0 180702003.1 145258.8
Pipe-based Context Switching 4000.0 2426596.4 6066.5
Process Creation 126.0 37282.5 2958.9
Shell Scripts (1 concurrent) 42.4 50813.7 11984.4
Shell Scripts (8 concurrent) 6.0 5835.8 9726.4
System Call Overhead 15000.0 9039181.1 6026.1
========
System Benchmarks Index Score 8765.2
After patch:
128 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 35438193.7 lps (10.0 s, 7 samples)
Double-Precision Whetstone 4245.7 MWIPS (10.0 s, 7 samples)
Execl Throughput 5293.7 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 1233323.4 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 355264.5 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 3333631.6 KBps (30.0 s, 2 samples)
Pipe Throughput 1979613.2 lps (10.0 s, 7 samples)
Pipe-based Context Switching 55675.2 lps (10.0 s, 7 samples)
Process Creation 8528.1 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 6870.0 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 2115.5 lpm (60.0 s, 2 samples)
System Call Overhead 1546959.4 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 35438193.7 3036.7
Double-Precision Whetstone 55.0 4245.7 772.0
Execl Throughput 43.0 5293.7 1231.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 1233323.4 3114.5
File Copy 256 bufsize 500 maxblocks 1655.0 355264.5 2146.6
File Copy 4096 bufsize 8000 maxblocks 5800.0 3333631.6 5747.6
Pipe Throughput 12440.0 1979613.2 1591.3
Pipe-based Context Switching 4000.0 55675.2 139.2
Process Creation 126.0 8528.1 676.8
Shell Scripts (1 concurrent) 42.4 6870.0 1620.3
Shell Scripts (8 concurrent) 6.0 2115.5 3525.8
System Call Overhead 15000.0 1546959.4 1031.3
========
System Benchmarks Index Score 1465.3
------------------------------------------------------------------------
128 CPUs in system; running 128 parallel copies of tests
Dhrystone 2 using register variables 2903340286.5 lps (10.0 s, 7 samples)
Double-Precision Whetstone 504137.7 MWIPS (10.1 s, 7 samples)
Execl Throughput 34332.8 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 311391.2 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 72503.3 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1000861.7 KBps (30.0 s, 2 samples)
Pipe Throughput 179382076.6 lps (10.0 s, 7 samples)
Pipe-based Context Switching 2415716.6 lps (10.0 s, 7 samples)
Process Creation 36873.1 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 51464.1 lpm (60.1 s, 2 samples)
Shell Scripts (8 concurrent) 5976.3 lpm (60.4 s, 2 samples)
System Call Overhead 9182389.5 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 2903340286.5 248786.7
Double-Precision Whetstone 55.0 504137.7 91661.4
Execl Throughput 43.0 34332.8 7984.4
File Copy 1024 bufsize 2000 maxblocks 3960.0 311391.2 786.3
File Copy 256 bufsize 500 maxblocks 1655.0 72503.3 438.1
File Copy 4096 bufsize 8000 maxblocks 5800.0 1000861.7 1725.6
Pipe Throughput 12440.0 179382076.6 144197.8
Pipe-based Context Switching 4000.0 2415716.6 6039.3
Process Creation 126.0 36873.1 2926.4
Shell Scripts (1 concurrent) 42.4 51464.1 12137.8
Shell Scripts (8 concurrent) 6.0 5976.3 9960.6
System Call Overhead 15000.0 9182389.5 6121.6
========
System Benchmarks Index Score 8759.8
---
arch/loongarch/Kconfig | 1 +
arch/loongarch/kernel/ftrace_dyn.c | 7 ++++++-
arch/loongarch/kernel/inst.c | 25 +++++++++++++++++++++----
arch/loongarch/kernel/jump_label.c | 3 +++
4 files changed, 31 insertions(+), 5 deletions(-)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 606597da46b8..c751d714c287 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -27,6 +27,7 @@ config LOONGARCH
select ARCH_HAS_PTE_SPECIAL if 64BIT
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_SET_DIRECT_MAP
+ select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UBSAN
select ARCH_HAS_VDSO_ARCH_DATA
diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
index d5d81d74034c..598dc6434cc4 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -8,6 +8,7 @@
#include <linux/ftrace.h>
#include <linux/kprobes.h>
#include <linux/uaccess.h>
+#include <linux/memory.h>
#include <asm/inst.h>
#include <asm/module.h>
@@ -24,8 +25,12 @@ static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, bool validate)
return -EINVAL;
}
- if (larch_insn_patch_text((void *)pc, new))
+ mutex_lock(&text_mutex);
+ if (larch_insn_patch_text((void *)pc, new)) {
+ mutex_unlock(&text_mutex);
return -EPERM;
+ }
+ mutex_unlock(&text_mutex);
return 0;
}
diff --git a/arch/loongarch/kernel/inst.c b/arch/loongarch/kernel/inst.c
index 0b9228b7c13a..3de94d465c3c 100644
--- a/arch/loongarch/kernel/inst.c
+++ b/arch/loongarch/kernel/inst.c
@@ -6,12 +6,11 @@
#include <linux/uaccess.h>
#include <linux/set_memory.h>
#include <linux/stop_machine.h>
+#include <linux/memory.h>
#include <asm/cacheflush.h>
#include <asm/inst.h>
-static DEFINE_RAW_SPINLOCK(patch_lock);
-
void simu_pc(struct pt_regs *regs, union loongarch_instruction insn)
{
unsigned long pc = regs->csr_era;
@@ -207,14 +206,32 @@ int larch_insn_read(void *addr, u32 *insnp)
int larch_insn_write(void *addr, u32 insn)
{
int ret;
+ int err = 0;
+ size_t start;
unsigned long flags = 0;
if ((unsigned long)addr & 3)
return -EINVAL;
- raw_spin_lock_irqsave(&patch_lock, flags);
+ start = round_down((size_t)addr, PAGE_SIZE);
+
+ lockdep_assert_held(&text_mutex);
+
+ err = set_memory_rw(start, 1);
+ if (err) {
+ pr_info("%s: set_memory_rw() failed\n", __func__);
+ return err;
+ }
+
+ local_irq_save(flags);
ret = copy_to_kernel_nofault(addr, &insn, LOONGARCH_INSN_SIZE);
- raw_spin_unlock_irqrestore(&patch_lock, flags);
+ local_irq_restore(flags);
+
+ err = set_memory_rox(start, 1);
+ if (err) {
+ pr_info("%s: set_memory_rox() failed\n", __func__);
+ return err;
+ }
return ret;
}
diff --git a/arch/loongarch/kernel/jump_label.c b/arch/loongarch/kernel/jump_label.c
index 24a3f4d8540c..e6bb040fe4c5 100644
--- a/arch/loongarch/kernel/jump_label.c
+++ b/arch/loongarch/kernel/jump_label.c
@@ -6,6 +6,7 @@
*/
#include <linux/kernel.h>
#include <linux/jump_label.h>
+#include <linux/memory.h>
#include <asm/cacheflush.h>
#include <asm/inst.h>
@@ -19,7 +20,9 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_t
else
insn = larch_insn_gen_nop();
+ mutex_lock(&text_mutex);
larch_insn_write(addr, insn);
+ mutex_unlock(&text_mutex);
return true;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 2+ messages in thread* Re: [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions 2026-06-13 8:41 [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions haoran.jiang @ 2026-06-13 10:43 ` Huacai Chen 0 siblings, 0 replies; 2+ messages in thread From: Huacai Chen @ 2026-06-13 10:43 UTC (permalink / raw) To: haoran.jiang Cc: loongarch, linux-kernel, kernel, akpm, jbohac, kees, yangtiezhu, Haoran Jiang Hi, Haoran, On Sat, Jun 13, 2026 at 4:42 PM <haoran.jiang@linux.dev> wrote: > > From: Haoran Jiang <jianghaoran@kylinos.cn> > > Enable STRICT_MODULE_RWX to enforce strict memory permissions > on modules,making the code region non-writable, the data region > non-executable, and the read-only data region both non-writable > and non-executable.Temporarily modify code section read/write > permissions via set_memory() API. > > Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn> > --- > v2: > Change the method of modifying page table permissions from patch_map to set_memory() API. > > v3: > Modify commit description. > > v4: > Add text_mutex lock in the larch_insn_write call path and > CONFIG_STRICT_MODULE_RWX is enabled by default. > > UB test on the 3C6000 server shows no significant performance impact. > > Before patch: > > ======================================================================== > BYTE UNIX Benchmarks (Version 5.1.6) > > System: localhost.localdomain: GNU/Linux > OS: GNU/Linux -- 7.1.0-rc6 -- #1 SMP PREEMPT Wed Jun 10 21:07:41 CST 2026 > Machine: loongarch64 (loongarch64) > Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8") > 21:19:51 up 1 min, 2 users, load average: 0.71, 0.38, 0.14; runlevel 2026-06-10 > > ------------------------------------------------------------------------ > 128 CPUs in system; running 1 parallel copy of tests > > Dhrystone 2 using register variables 35205725.4 lps (10.0 s, 7 samples) > Double-Precision Whetstone 4244.9 MWIPS (10.0 s, 7 samples) > Execl Throughput 6717.7 lps (30.0 s, 2 samples) > File Copy 1024 bufsize 2000 maxblocks 1213873.8 KBps (30.0 s, 2 samples) > File Copy 256 bufsize 500 maxblocks 350740.5 KBps (30.0 s, 2 samples) > File Copy 4096 bufsize 8000 maxblocks 3275103.0 KBps (30.0 s, 2 samples) > Pipe Throughput 1981993.9 lps (10.0 s, 7 samples) > Pipe-based Context Switching 55287.7 lps (10.0 s, 7 samples) > Process Creation 9056.8 lps (30.0 s, 2 samples) > Shell Scripts (1 concurrent) 6736.5 lpm (60.0 s, 2 samples) > Shell Scripts (8 concurrent) 2109.8 lpm (60.0 s, 2 samples) > System Call Overhead 1549110.9 lps (10.0 s, 7 samples) > > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 35205725.4 3016.8 > Double-Precision Whetstone 55.0 4244.9 771.8 > Execl Throughput 43.0 6717.7 1562.3 > File Copy 1024 bufsize 2000 maxblocks 3960.0 1213873.8 3065.3 > File Copy 256 bufsize 500 maxblocks 1655.0 350740.5 2119.3 > File Copy 4096 bufsize 8000 maxblocks 5800.0 3275103.0 5646.7 > Pipe Throughput 12440.0 1981993.9 1593.2 > Pipe-based Context Switching 4000.0 55287.7 138.2 > Process Creation 126.0 9056.8 718.8 > Shell Scripts (1 concurrent) 42.4 6736.5 1588.8 > Shell Scripts (8 concurrent) 6.0 2109.8 3516.4 > System Call Overhead 15000.0 1549110.9 1032.7 > ======== > System Benchmarks Index Score 1492.2 > > ------------------------------------------------------------------------ > 128 CPUs in system; running 128 parallel copies of tests > > Dhrystone 2 using register variables 2901925470.7 lps (10.0 s, 7 samples) > Double-Precision Whetstone 503614.9 MWIPS (10.1 s, 7 samples) > Execl Throughput 34080.1 lps (29.9 s, 2 samples) > File Copy 1024 bufsize 2000 maxblocks 309291.4 KBps (30.0 s, 2 samples) > File Copy 256 bufsize 500 maxblocks 75115.3 KBps (30.0 s, 2 samples) > File Copy 4096 bufsize 8000 maxblocks 1018101.3 KBps (30.0 s, 2 samples) > Pipe Throughput 180702003.1 lps (10.0 s, 7 samples) > Pipe-based Context Switching 2426596.4 lps (10.0 s, 7 samples) > Process Creation 37282.5 lps (30.0 s, 2 samples) > Shell Scripts (1 concurrent) 50813.7 lpm (60.1 s, 2 samples) > Shell Scripts (8 concurrent) 5835.8 lpm (60.4 s, 2 samples) > System Call Overhead 9039181.1 lps (10.0 s, 7 samples) > > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 2901925470.7 248665.4 > Double-Precision Whetstone 55.0 503614.9 91566.3 > Execl Throughput 43.0 34080.1 7925.6 > File Copy 1024 bufsize 2000 maxblocks 3960.0 309291.4 781.0 > File Copy 256 bufsize 500 maxblocks 1655.0 75115.3 453.9 > File Copy 4096 bufsize 8000 maxblocks 5800.0 1018101.3 1755.3 > Pipe Throughput 12440.0 180702003.1 145258.8 > Pipe-based Context Switching 4000.0 2426596.4 6066.5 > Process Creation 126.0 37282.5 2958.9 > Shell Scripts (1 concurrent) 42.4 50813.7 11984.4 > Shell Scripts (8 concurrent) 6.0 5835.8 9726.4 > System Call Overhead 15000.0 9039181.1 6026.1 > ======== > System Benchmarks Index Score 8765.2 > > > After patch: > > 128 CPUs in system; running 1 parallel copy of tests > > Dhrystone 2 using register variables 35438193.7 lps (10.0 s, 7 samples) > Double-Precision Whetstone 4245.7 MWIPS (10.0 s, 7 samples) > Execl Throughput 5293.7 lps (30.0 s, 2 samples) > File Copy 1024 bufsize 2000 maxblocks 1233323.4 KBps (30.0 s, 2 samples) > File Copy 256 bufsize 500 maxblocks 355264.5 KBps (30.0 s, 2 samples) > File Copy 4096 bufsize 8000 maxblocks 3333631.6 KBps (30.0 s, 2 samples) > Pipe Throughput 1979613.2 lps (10.0 s, 7 samples) > Pipe-based Context Switching 55675.2 lps (10.0 s, 7 samples) > Process Creation 8528.1 lps (30.0 s, 2 samples) > Shell Scripts (1 concurrent) 6870.0 lpm (60.0 s, 2 samples) > Shell Scripts (8 concurrent) 2115.5 lpm (60.0 s, 2 samples) > System Call Overhead 1546959.4 lps (10.0 s, 7 samples) > > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 35438193.7 3036.7 > Double-Precision Whetstone 55.0 4245.7 772.0 > Execl Throughput 43.0 5293.7 1231.1 > File Copy 1024 bufsize 2000 maxblocks 3960.0 1233323.4 3114.5 > File Copy 256 bufsize 500 maxblocks 1655.0 355264.5 2146.6 > File Copy 4096 bufsize 8000 maxblocks 5800.0 3333631.6 5747.6 > Pipe Throughput 12440.0 1979613.2 1591.3 > Pipe-based Context Switching 4000.0 55675.2 139.2 > Process Creation 126.0 8528.1 676.8 > Shell Scripts (1 concurrent) 42.4 6870.0 1620.3 > Shell Scripts (8 concurrent) 6.0 2115.5 3525.8 > System Call Overhead 15000.0 1546959.4 1031.3 > ======== > System Benchmarks Index Score 1465.3 > > ------------------------------------------------------------------------ > 128 CPUs in system; running 128 parallel copies of tests > > Dhrystone 2 using register variables 2903340286.5 lps (10.0 s, 7 samples) > Double-Precision Whetstone 504137.7 MWIPS (10.1 s, 7 samples) > Execl Throughput 34332.8 lps (29.9 s, 2 samples) > File Copy 1024 bufsize 2000 maxblocks 311391.2 KBps (30.0 s, 2 samples) > File Copy 256 bufsize 500 maxblocks 72503.3 KBps (30.0 s, 2 samples) > File Copy 4096 bufsize 8000 maxblocks 1000861.7 KBps (30.0 s, 2 samples) > Pipe Throughput 179382076.6 lps (10.0 s, 7 samples) > Pipe-based Context Switching 2415716.6 lps (10.0 s, 7 samples) > Process Creation 36873.1 lps (30.0 s, 2 samples) > Shell Scripts (1 concurrent) 51464.1 lpm (60.1 s, 2 samples) > Shell Scripts (8 concurrent) 5976.3 lpm (60.4 s, 2 samples) > System Call Overhead 9182389.5 lps (10.0 s, 7 samples) > > System Benchmarks Index Values BASELINE RESULT INDEX > Dhrystone 2 using register variables 116700.0 2903340286.5 248786.7 > Double-Precision Whetstone 55.0 504137.7 91661.4 > Execl Throughput 43.0 34332.8 7984.4 > File Copy 1024 bufsize 2000 maxblocks 3960.0 311391.2 786.3 > File Copy 256 bufsize 500 maxblocks 1655.0 72503.3 438.1 > File Copy 4096 bufsize 8000 maxblocks 5800.0 1000861.7 1725.6 > Pipe Throughput 12440.0 179382076.6 144197.8 > Pipe-based Context Switching 4000.0 2415716.6 6039.3 > Process Creation 126.0 36873.1 2926.4 > Shell Scripts (1 concurrent) 42.4 51464.1 12137.8 > Shell Scripts (8 concurrent) 6.0 5976.3 9960.6 > System Call Overhead 15000.0 9182389.5 6121.6 > ======== > System Benchmarks Index Score 8759.8 I think your tests are incomplete. Performance test is just a very basic test, you should make sure all dynamic code modification mechanisms are correct. At least these should be verified one by one: jump label, kgdb, bpf, ftrace, kprobes, uprobes.... You have modified ftrace, but not the best method that Ihae suggested in V3. You completely ignore my suggestion about kprobes in V3. For KGDB, you should use text_mutex to protect copy_to_kernel_nofault(). For uprobes, I have no idea, maybe @Tiezhu can give some suggestions. Huacai > > --- > arch/loongarch/Kconfig | 1 + > arch/loongarch/kernel/ftrace_dyn.c | 7 ++++++- > arch/loongarch/kernel/inst.c | 25 +++++++++++++++++++++---- > arch/loongarch/kernel/jump_label.c | 3 +++ > 4 files changed, 31 insertions(+), 5 deletions(-) > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 606597da46b8..c751d714c287 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -27,6 +27,7 @@ config LOONGARCH > select ARCH_HAS_PTE_SPECIAL if 64BIT > select ARCH_HAS_SET_MEMORY > select ARCH_HAS_SET_DIRECT_MAP > + select ARCH_HAS_STRICT_MODULE_RWX > select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST > select ARCH_HAS_UBSAN > select ARCH_HAS_VDSO_ARCH_DATA > diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c > index d5d81d74034c..598dc6434cc4 100644 > --- a/arch/loongarch/kernel/ftrace_dyn.c > +++ b/arch/loongarch/kernel/ftrace_dyn.c > @@ -8,6 +8,7 @@ > #include <linux/ftrace.h> > #include <linux/kprobes.h> > #include <linux/uaccess.h> > +#include <linux/memory.h> > > #include <asm/inst.h> > #include <asm/module.h> > @@ -24,8 +25,12 @@ static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, bool validate) > return -EINVAL; > } > > - if (larch_insn_patch_text((void *)pc, new)) > + mutex_lock(&text_mutex); > + if (larch_insn_patch_text((void *)pc, new)) { > + mutex_unlock(&text_mutex); > return -EPERM; > + } > + mutex_unlock(&text_mutex); > > return 0; > } > diff --git a/arch/loongarch/kernel/inst.c b/arch/loongarch/kernel/inst.c > index 0b9228b7c13a..3de94d465c3c 100644 > --- a/arch/loongarch/kernel/inst.c > +++ b/arch/loongarch/kernel/inst.c > @@ -6,12 +6,11 @@ > #include <linux/uaccess.h> > #include <linux/set_memory.h> > #include <linux/stop_machine.h> > +#include <linux/memory.h> > > #include <asm/cacheflush.h> > #include <asm/inst.h> > > -static DEFINE_RAW_SPINLOCK(patch_lock); > - > void simu_pc(struct pt_regs *regs, union loongarch_instruction insn) > { > unsigned long pc = regs->csr_era; > @@ -207,14 +206,32 @@ int larch_insn_read(void *addr, u32 *insnp) > int larch_insn_write(void *addr, u32 insn) > { > int ret; > + int err = 0; > + size_t start; > unsigned long flags = 0; > > if ((unsigned long)addr & 3) > return -EINVAL; > > - raw_spin_lock_irqsave(&patch_lock, flags); > + start = round_down((size_t)addr, PAGE_SIZE); > + > + lockdep_assert_held(&text_mutex); > + > + err = set_memory_rw(start, 1); > + if (err) { > + pr_info("%s: set_memory_rw() failed\n", __func__); > + return err; > + } > + > + local_irq_save(flags); > ret = copy_to_kernel_nofault(addr, &insn, LOONGARCH_INSN_SIZE); > - raw_spin_unlock_irqrestore(&patch_lock, flags); > + local_irq_restore(flags); > + > + err = set_memory_rox(start, 1); > + if (err) { > + pr_info("%s: set_memory_rox() failed\n", __func__); > + return err; > + } > > return ret; > } > diff --git a/arch/loongarch/kernel/jump_label.c b/arch/loongarch/kernel/jump_label.c > index 24a3f4d8540c..e6bb040fe4c5 100644 > --- a/arch/loongarch/kernel/jump_label.c > +++ b/arch/loongarch/kernel/jump_label.c > @@ -6,6 +6,7 @@ > */ > #include <linux/kernel.h> > #include <linux/jump_label.h> > +#include <linux/memory.h> > #include <asm/cacheflush.h> > #include <asm/inst.h> > > @@ -19,7 +20,9 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_t > else > insn = larch_insn_gen_nop(); > > + mutex_lock(&text_mutex); > larch_insn_write(addr, insn); > + mutex_unlock(&text_mutex); > > return true; > } > -- > 2.43.0 > > ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-06-13 10:43 UTC | newest] Thread overview: 2+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-13 8:41 [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions haoran.jiang 2026-06-13 10:43 ` Huacai Chen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.