All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions
@ 2026-06-13  8:41 haoran.jiang
  2026-06-13 10:43 ` Huacai Chen
  0 siblings, 1 reply; 2+ messages in thread
From: haoran.jiang @ 2026-06-13  8:41 UTC (permalink / raw)
  To: loongarch
  Cc: linux-kernel, chenhuacai, kernel, akpm, jbohac, kees, yangtiezhu,
	Haoran Jiang

From: Haoran Jiang <jianghaoran@kylinos.cn>

Enable STRICT_MODULE_RWX to enforce strict memory permissions
on modules,making the code region non-writable, the data region
non-executable, and the read-only data region both non-writable
and non-executable.Temporarily modify code section read/write
permissions via set_memory() API.

Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
---
v2:
Change the method of modifying page table permissions from patch_map to set_memory() API.

v3:
Modify commit description.

v4:
Add text_mutex lock in the larch_insn_write call path and
CONFIG_STRICT_MODULE_RWX is enabled by default.

UB test on the 3C6000 server shows no significant performance impact.

Before patch:

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.6)

   System: localhost.localdomain: GNU/Linux
   OS: GNU/Linux -- 7.1.0-rc6 -- #1 SMP PREEMPT Wed Jun 10 21:07:41 CST 2026
   Machine: loongarch64 (loongarch64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   21:19:51 up 1 min,  2 users,  load average: 0.71, 0.38, 0.14; runlevel 2026-06-10

------------------------------------------------------------------------
128 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       35205725.4 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4244.9 MWIPS (10.0 s, 7 samples)
Execl Throughput                               6717.7 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1213873.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          350740.5 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       3275103.0 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1981993.9 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  55287.7 lps   (10.0 s, 7 samples)
Process Creation                               9056.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   6736.5 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   2109.8 lpm   (60.0 s, 2 samples)
System Call Overhead                        1549110.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   35205725.4   3016.8
Double-Precision Whetstone                       55.0       4244.9    771.8
Execl Throughput                                 43.0       6717.7   1562.3
File Copy 1024 bufsize 2000 maxblocks          3960.0    1213873.8   3065.3
File Copy 256 bufsize 500 maxblocks            1655.0     350740.5   2119.3
File Copy 4096 bufsize 8000 maxblocks          5800.0    3275103.0   5646.7
Pipe Throughput                               12440.0    1981993.9   1593.2
Pipe-based Context Switching                   4000.0      55287.7    138.2
Process Creation                                126.0       9056.8    718.8
Shell Scripts (1 concurrent)                     42.4       6736.5   1588.8
Shell Scripts (8 concurrent)                      6.0       2109.8   3516.4
System Call Overhead                          15000.0    1549110.9   1032.7
                                                                   ========
System Benchmarks Index Score                                        1492.2

------------------------------------------------------------------------
128 CPUs in system; running 128 parallel copies of tests

Dhrystone 2 using register variables     2901925470.7 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                   503614.9 MWIPS (10.1 s, 7 samples)
Execl Throughput                              34080.1 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        309291.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           75115.3 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1018101.3 KBps  (30.0 s, 2 samples)
Pipe Throughput                           180702003.1 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                2426596.4 lps   (10.0 s, 7 samples)
Process Creation                              37282.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  50813.7 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                   5835.8 lpm   (60.4 s, 2 samples)
System Call Overhead                        9039181.1 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0 2901925470.7 248665.4
Double-Precision Whetstone                       55.0     503614.9  91566.3
Execl Throughput                                 43.0      34080.1   7925.6
File Copy 1024 bufsize 2000 maxblocks          3960.0     309291.4    781.0
File Copy 256 bufsize 500 maxblocks            1655.0      75115.3    453.9
File Copy 4096 bufsize 8000 maxblocks          5800.0    1018101.3   1755.3
Pipe Throughput                               12440.0  180702003.1 145258.8
Pipe-based Context Switching                   4000.0    2426596.4   6066.5
Process Creation                                126.0      37282.5   2958.9
Shell Scripts (1 concurrent)                     42.4      50813.7  11984.4
Shell Scripts (8 concurrent)                      6.0       5835.8   9726.4
System Call Overhead                          15000.0    9039181.1   6026.1
                                                                   ========
System Benchmarks Index Score                                        8765.2


After patch:

128 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       35438193.7 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4245.7 MWIPS (10.0 s, 7 samples)
Execl Throughput                               5293.7 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1233323.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          355264.5 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       3333631.6 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1979613.2 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  55675.2 lps   (10.0 s, 7 samples)
Process Creation                               8528.1 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   6870.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   2115.5 lpm   (60.0 s, 2 samples)
System Call Overhead                        1546959.4 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   35438193.7   3036.7
Double-Precision Whetstone                       55.0       4245.7    772.0
Execl Throughput                                 43.0       5293.7   1231.1
File Copy 1024 bufsize 2000 maxblocks          3960.0    1233323.4   3114.5
File Copy 256 bufsize 500 maxblocks            1655.0     355264.5   2146.6
File Copy 4096 bufsize 8000 maxblocks          5800.0    3333631.6   5747.6
Pipe Throughput                               12440.0    1979613.2   1591.3
Pipe-based Context Switching                   4000.0      55675.2    139.2
Process Creation                                126.0       8528.1    676.8
Shell Scripts (1 concurrent)                     42.4       6870.0   1620.3
Shell Scripts (8 concurrent)                      6.0       2115.5   3525.8
System Call Overhead                          15000.0    1546959.4   1031.3
                                                                   ========
System Benchmarks Index Score                                        1465.3

------------------------------------------------------------------------
128 CPUs in system; running 128 parallel copies of tests

Dhrystone 2 using register variables     2903340286.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                   504137.7 MWIPS (10.1 s, 7 samples)
Execl Throughput                              34332.8 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        311391.2 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           72503.3 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1000861.7 KBps  (30.0 s, 2 samples)
Pipe Throughput                           179382076.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                2415716.6 lps   (10.0 s, 7 samples)
Process Creation                              36873.1 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  51464.1 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                   5976.3 lpm   (60.4 s, 2 samples)
System Call Overhead                        9182389.5 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0 2903340286.5 248786.7
Double-Precision Whetstone                       55.0     504137.7  91661.4
Execl Throughput                                 43.0      34332.8   7984.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     311391.2    786.3
File Copy 256 bufsize 500 maxblocks            1655.0      72503.3    438.1
File Copy 4096 bufsize 8000 maxblocks          5800.0    1000861.7   1725.6
Pipe Throughput                               12440.0  179382076.6 144197.8
Pipe-based Context Switching                   4000.0    2415716.6   6039.3
Process Creation                                126.0      36873.1   2926.4
Shell Scripts (1 concurrent)                     42.4      51464.1  12137.8
Shell Scripts (8 concurrent)                      6.0       5976.3   9960.6
System Call Overhead                          15000.0    9182389.5   6121.6
                                                                   ========
System Benchmarks Index Score                                        8759.8

---
 arch/loongarch/Kconfig             |  1 +
 arch/loongarch/kernel/ftrace_dyn.c |  7 ++++++-
 arch/loongarch/kernel/inst.c       | 25 +++++++++++++++++++++----
 arch/loongarch/kernel/jump_label.c |  3 +++
 4 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 606597da46b8..c751d714c287 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -27,6 +27,7 @@ config LOONGARCH
 	select ARCH_HAS_PTE_SPECIAL if 64BIT
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SET_DIRECT_MAP
+	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAS_UBSAN
 	select ARCH_HAS_VDSO_ARCH_DATA
diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
index d5d81d74034c..598dc6434cc4 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -8,6 +8,7 @@
 #include <linux/ftrace.h>
 #include <linux/kprobes.h>
 #include <linux/uaccess.h>
+#include <linux/memory.h>
 
 #include <asm/inst.h>
 #include <asm/module.h>
@@ -24,8 +25,12 @@ static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, bool validate)
 			return -EINVAL;
 	}
 
-	if (larch_insn_patch_text((void *)pc, new))
+	mutex_lock(&text_mutex);
+	if (larch_insn_patch_text((void *)pc, new)) {
+		mutex_unlock(&text_mutex);
 		return -EPERM;
+	}
+	mutex_unlock(&text_mutex);
 
 	return 0;
 }
diff --git a/arch/loongarch/kernel/inst.c b/arch/loongarch/kernel/inst.c
index 0b9228b7c13a..3de94d465c3c 100644
--- a/arch/loongarch/kernel/inst.c
+++ b/arch/loongarch/kernel/inst.c
@@ -6,12 +6,11 @@
 #include <linux/uaccess.h>
 #include <linux/set_memory.h>
 #include <linux/stop_machine.h>
+#include <linux/memory.h>
 
 #include <asm/cacheflush.h>
 #include <asm/inst.h>
 
-static DEFINE_RAW_SPINLOCK(patch_lock);
-
 void simu_pc(struct pt_regs *regs, union loongarch_instruction insn)
 {
 	unsigned long pc = regs->csr_era;
@@ -207,14 +206,32 @@ int larch_insn_read(void *addr, u32 *insnp)
 int larch_insn_write(void *addr, u32 insn)
 {
 	int ret;
+	int err = 0;
+	size_t start;
 	unsigned long flags = 0;
 
 	if ((unsigned long)addr & 3)
 		return -EINVAL;
 
-	raw_spin_lock_irqsave(&patch_lock, flags);
+	start = round_down((size_t)addr, PAGE_SIZE);
+
+	lockdep_assert_held(&text_mutex);
+
+	err = set_memory_rw(start, 1);
+	if (err) {
+		pr_info("%s: set_memory_rw() failed\n", __func__);
+		return err;
+	}
+
+	local_irq_save(flags);
 	ret = copy_to_kernel_nofault(addr, &insn, LOONGARCH_INSN_SIZE);
-	raw_spin_unlock_irqrestore(&patch_lock, flags);
+	local_irq_restore(flags);
+
+	err = set_memory_rox(start, 1);
+	if (err) {
+		pr_info("%s: set_memory_rox() failed\n", __func__);
+		return err;
+	}
 
 	return ret;
 }
diff --git a/arch/loongarch/kernel/jump_label.c b/arch/loongarch/kernel/jump_label.c
index 24a3f4d8540c..e6bb040fe4c5 100644
--- a/arch/loongarch/kernel/jump_label.c
+++ b/arch/loongarch/kernel/jump_label.c
@@ -6,6 +6,7 @@
  */
 #include <linux/kernel.h>
 #include <linux/jump_label.h>
+#include <linux/memory.h>
 #include <asm/cacheflush.h>
 #include <asm/inst.h>
 
@@ -19,7 +20,9 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_t
 	else
 		insn = larch_insn_gen_nop();
 
+	mutex_lock(&text_mutex);
 	larch_insn_write(addr, insn);
+	mutex_unlock(&text_mutex);
 
 	return true;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions
  2026-06-13  8:41 [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions haoran.jiang
@ 2026-06-13 10:43 ` Huacai Chen
  0 siblings, 0 replies; 2+ messages in thread
From: Huacai Chen @ 2026-06-13 10:43 UTC (permalink / raw)
  To: haoran.jiang
  Cc: loongarch, linux-kernel, kernel, akpm, jbohac, kees, yangtiezhu,
	Haoran Jiang

Hi, Haoran,

On Sat, Jun 13, 2026 at 4:42 PM <haoran.jiang@linux.dev> wrote:
>
> From: Haoran Jiang <jianghaoran@kylinos.cn>
>
> Enable STRICT_MODULE_RWX to enforce strict memory permissions
> on modules,making the code region non-writable, the data region
> non-executable, and the read-only data region both non-writable
> and non-executable.Temporarily modify code section read/write
> permissions via set_memory() API.
>
> Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
> ---
> v2:
> Change the method of modifying page table permissions from patch_map to set_memory() API.
>
> v3:
> Modify commit description.
>
> v4:
> Add text_mutex lock in the larch_insn_write call path and
> CONFIG_STRICT_MODULE_RWX is enabled by default.
>
> UB test on the 3C6000 server shows no significant performance impact.
>
> Before patch:
>
> ========================================================================
>    BYTE UNIX Benchmarks (Version 5.1.6)
>
>    System: localhost.localdomain: GNU/Linux
>    OS: GNU/Linux -- 7.1.0-rc6 -- #1 SMP PREEMPT Wed Jun 10 21:07:41 CST 2026
>    Machine: loongarch64 (loongarch64)
>    Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
>    21:19:51 up 1 min,  2 users,  load average: 0.71, 0.38, 0.14; runlevel 2026-06-10
>
> ------------------------------------------------------------------------
> 128 CPUs in system; running 1 parallel copy of tests
>
> Dhrystone 2 using register variables       35205725.4 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                     4244.9 MWIPS (10.0 s, 7 samples)
> Execl Throughput                               6717.7 lps   (30.0 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks       1213873.8 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks          350740.5 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       3275103.0 KBps  (30.0 s, 2 samples)
> Pipe Throughput                             1981993.9 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                  55287.7 lps   (10.0 s, 7 samples)
> Process Creation                               9056.8 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                   6736.5 lpm   (60.0 s, 2 samples)
> Shell Scripts (8 concurrent)                   2109.8 lpm   (60.0 s, 2 samples)
> System Call Overhead                        1549110.9 lps   (10.0 s, 7 samples)
>
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0   35205725.4   3016.8
> Double-Precision Whetstone                       55.0       4244.9    771.8
> Execl Throughput                                 43.0       6717.7   1562.3
> File Copy 1024 bufsize 2000 maxblocks          3960.0    1213873.8   3065.3
> File Copy 256 bufsize 500 maxblocks            1655.0     350740.5   2119.3
> File Copy 4096 bufsize 8000 maxblocks          5800.0    3275103.0   5646.7
> Pipe Throughput                               12440.0    1981993.9   1593.2
> Pipe-based Context Switching                   4000.0      55287.7    138.2
> Process Creation                                126.0       9056.8    718.8
> Shell Scripts (1 concurrent)                     42.4       6736.5   1588.8
> Shell Scripts (8 concurrent)                      6.0       2109.8   3516.4
> System Call Overhead                          15000.0    1549110.9   1032.7
>                                                                    ========
> System Benchmarks Index Score                                        1492.2
>
> ------------------------------------------------------------------------
> 128 CPUs in system; running 128 parallel copies of tests
>
> Dhrystone 2 using register variables     2901925470.7 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                   503614.9 MWIPS (10.1 s, 7 samples)
> Execl Throughput                              34080.1 lps   (29.9 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks        309291.4 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks           75115.3 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       1018101.3 KBps  (30.0 s, 2 samples)
> Pipe Throughput                           180702003.1 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                2426596.4 lps   (10.0 s, 7 samples)
> Process Creation                              37282.5 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                  50813.7 lpm   (60.1 s, 2 samples)
> Shell Scripts (8 concurrent)                   5835.8 lpm   (60.4 s, 2 samples)
> System Call Overhead                        9039181.1 lps   (10.0 s, 7 samples)
>
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0 2901925470.7 248665.4
> Double-Precision Whetstone                       55.0     503614.9  91566.3
> Execl Throughput                                 43.0      34080.1   7925.6
> File Copy 1024 bufsize 2000 maxblocks          3960.0     309291.4    781.0
> File Copy 256 bufsize 500 maxblocks            1655.0      75115.3    453.9
> File Copy 4096 bufsize 8000 maxblocks          5800.0    1018101.3   1755.3
> Pipe Throughput                               12440.0  180702003.1 145258.8
> Pipe-based Context Switching                   4000.0    2426596.4   6066.5
> Process Creation                                126.0      37282.5   2958.9
> Shell Scripts (1 concurrent)                     42.4      50813.7  11984.4
> Shell Scripts (8 concurrent)                      6.0       5835.8   9726.4
> System Call Overhead                          15000.0    9039181.1   6026.1
>                                                                    ========
> System Benchmarks Index Score                                        8765.2
>
>
> After patch:
>
> 128 CPUs in system; running 1 parallel copy of tests
>
> Dhrystone 2 using register variables       35438193.7 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                     4245.7 MWIPS (10.0 s, 7 samples)
> Execl Throughput                               5293.7 lps   (30.0 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks       1233323.4 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks          355264.5 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       3333631.6 KBps  (30.0 s, 2 samples)
> Pipe Throughput                             1979613.2 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                  55675.2 lps   (10.0 s, 7 samples)
> Process Creation                               8528.1 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                   6870.0 lpm   (60.0 s, 2 samples)
> Shell Scripts (8 concurrent)                   2115.5 lpm   (60.0 s, 2 samples)
> System Call Overhead                        1546959.4 lps   (10.0 s, 7 samples)
>
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0   35438193.7   3036.7
> Double-Precision Whetstone                       55.0       4245.7    772.0
> Execl Throughput                                 43.0       5293.7   1231.1
> File Copy 1024 bufsize 2000 maxblocks          3960.0    1233323.4   3114.5
> File Copy 256 bufsize 500 maxblocks            1655.0     355264.5   2146.6
> File Copy 4096 bufsize 8000 maxblocks          5800.0    3333631.6   5747.6
> Pipe Throughput                               12440.0    1979613.2   1591.3
> Pipe-based Context Switching                   4000.0      55675.2    139.2
> Process Creation                                126.0       8528.1    676.8
> Shell Scripts (1 concurrent)                     42.4       6870.0   1620.3
> Shell Scripts (8 concurrent)                      6.0       2115.5   3525.8
> System Call Overhead                          15000.0    1546959.4   1031.3
>                                                                    ========
> System Benchmarks Index Score                                        1465.3
>
> ------------------------------------------------------------------------
> 128 CPUs in system; running 128 parallel copies of tests
>
> Dhrystone 2 using register variables     2903340286.5 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                   504137.7 MWIPS (10.1 s, 7 samples)
> Execl Throughput                              34332.8 lps   (29.9 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks        311391.2 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks           72503.3 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       1000861.7 KBps  (30.0 s, 2 samples)
> Pipe Throughput                           179382076.6 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                2415716.6 lps   (10.0 s, 7 samples)
> Process Creation                              36873.1 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                  51464.1 lpm   (60.1 s, 2 samples)
> Shell Scripts (8 concurrent)                   5976.3 lpm   (60.4 s, 2 samples)
> System Call Overhead                        9182389.5 lps   (10.0 s, 7 samples)
>
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0 2903340286.5 248786.7
> Double-Precision Whetstone                       55.0     504137.7  91661.4
> Execl Throughput                                 43.0      34332.8   7984.4
> File Copy 1024 bufsize 2000 maxblocks          3960.0     311391.2    786.3
> File Copy 256 bufsize 500 maxblocks            1655.0      72503.3    438.1
> File Copy 4096 bufsize 8000 maxblocks          5800.0    1000861.7   1725.6
> Pipe Throughput                               12440.0  179382076.6 144197.8
> Pipe-based Context Switching                   4000.0    2415716.6   6039.3
> Process Creation                                126.0      36873.1   2926.4
> Shell Scripts (1 concurrent)                     42.4      51464.1  12137.8
> Shell Scripts (8 concurrent)                      6.0       5976.3   9960.6
> System Call Overhead                          15000.0    9182389.5   6121.6
>                                                                    ========
> System Benchmarks Index Score                                        8759.8
I think your tests are incomplete. Performance test is just a very
basic test, you should make sure all dynamic code modification
mechanisms are correct.

At least these should be verified one by one: jump label, kgdb, bpf,
ftrace, kprobes, uprobes....

You have modified ftrace, but not the best method that Ihae suggested in V3.

You completely ignore my suggestion about kprobes in V3.

For KGDB, you should use text_mutex to protect copy_to_kernel_nofault().

For uprobes, I have no idea, maybe @Tiezhu can give some suggestions.

Huacai

>
> ---
>  arch/loongarch/Kconfig             |  1 +
>  arch/loongarch/kernel/ftrace_dyn.c |  7 ++++++-
>  arch/loongarch/kernel/inst.c       | 25 +++++++++++++++++++++----
>  arch/loongarch/kernel/jump_label.c |  3 +++
>  4 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 606597da46b8..c751d714c287 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -27,6 +27,7 @@ config LOONGARCH
>         select ARCH_HAS_PTE_SPECIAL if 64BIT
>         select ARCH_HAS_SET_MEMORY
>         select ARCH_HAS_SET_DIRECT_MAP
> +       select ARCH_HAS_STRICT_MODULE_RWX
>         select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>         select ARCH_HAS_UBSAN
>         select ARCH_HAS_VDSO_ARCH_DATA
> diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
> index d5d81d74034c..598dc6434cc4 100644
> --- a/arch/loongarch/kernel/ftrace_dyn.c
> +++ b/arch/loongarch/kernel/ftrace_dyn.c
> @@ -8,6 +8,7 @@
>  #include <linux/ftrace.h>
>  #include <linux/kprobes.h>
>  #include <linux/uaccess.h>
> +#include <linux/memory.h>
>
>  #include <asm/inst.h>
>  #include <asm/module.h>
> @@ -24,8 +25,12 @@ static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, bool validate)
>                         return -EINVAL;
>         }
>
> -       if (larch_insn_patch_text((void *)pc, new))
> +       mutex_lock(&text_mutex);
> +       if (larch_insn_patch_text((void *)pc, new)) {
> +               mutex_unlock(&text_mutex);
>                 return -EPERM;
> +       }
> +       mutex_unlock(&text_mutex);
>
>         return 0;
>  }
> diff --git a/arch/loongarch/kernel/inst.c b/arch/loongarch/kernel/inst.c
> index 0b9228b7c13a..3de94d465c3c 100644
> --- a/arch/loongarch/kernel/inst.c
> +++ b/arch/loongarch/kernel/inst.c
> @@ -6,12 +6,11 @@
>  #include <linux/uaccess.h>
>  #include <linux/set_memory.h>
>  #include <linux/stop_machine.h>
> +#include <linux/memory.h>
>
>  #include <asm/cacheflush.h>
>  #include <asm/inst.h>
>
> -static DEFINE_RAW_SPINLOCK(patch_lock);
> -
>  void simu_pc(struct pt_regs *regs, union loongarch_instruction insn)
>  {
>         unsigned long pc = regs->csr_era;
> @@ -207,14 +206,32 @@ int larch_insn_read(void *addr, u32 *insnp)
>  int larch_insn_write(void *addr, u32 insn)
>  {
>         int ret;
> +       int err = 0;
> +       size_t start;
>         unsigned long flags = 0;
>
>         if ((unsigned long)addr & 3)
>                 return -EINVAL;
>
> -       raw_spin_lock_irqsave(&patch_lock, flags);
> +       start = round_down((size_t)addr, PAGE_SIZE);
> +
> +       lockdep_assert_held(&text_mutex);
> +
> +       err = set_memory_rw(start, 1);
> +       if (err) {
> +               pr_info("%s: set_memory_rw() failed\n", __func__);
> +               return err;
> +       }
> +
> +       local_irq_save(flags);
>         ret = copy_to_kernel_nofault(addr, &insn, LOONGARCH_INSN_SIZE);
> -       raw_spin_unlock_irqrestore(&patch_lock, flags);
> +       local_irq_restore(flags);
> +
> +       err = set_memory_rox(start, 1);
> +       if (err) {
> +               pr_info("%s: set_memory_rox() failed\n", __func__);
> +               return err;
> +       }
>
>         return ret;
>  }
> diff --git a/arch/loongarch/kernel/jump_label.c b/arch/loongarch/kernel/jump_label.c
> index 24a3f4d8540c..e6bb040fe4c5 100644
> --- a/arch/loongarch/kernel/jump_label.c
> +++ b/arch/loongarch/kernel/jump_label.c
> @@ -6,6 +6,7 @@
>   */
>  #include <linux/kernel.h>
>  #include <linux/jump_label.h>
> +#include <linux/memory.h>
>  #include <asm/cacheflush.h>
>  #include <asm/inst.h>
>
> @@ -19,7 +20,9 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_t
>         else
>                 insn = larch_insn_gen_nop();
>
> +       mutex_lock(&text_mutex);
>         larch_insn_write(addr, insn);
> +       mutex_unlock(&text_mutex);
>
>         return true;
>  }
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-13 10:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-13  8:41 [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions haoran.jiang
2026-06-13 10:43 ` Huacai Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.