All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions
@ 2026-06-13  8:41 haoran.jiang
  2026-06-13 10:43 ` Huacai Chen
  0 siblings, 1 reply; 4+ messages in thread
From: haoran.jiang @ 2026-06-13  8:41 UTC (permalink / raw)
  To: loongarch
  Cc: linux-kernel, chenhuacai, kernel, akpm, jbohac, kees, yangtiezhu,
	Haoran Jiang

From: Haoran Jiang <jianghaoran@kylinos.cn>

Enable STRICT_MODULE_RWX to enforce strict memory permissions
on modules,making the code region non-writable, the data region
non-executable, and the read-only data region both non-writable
and non-executable.Temporarily modify code section read/write
permissions via set_memory() API.

Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
---
v2:
Change the method of modifying page table permissions from patch_map to set_memory() API.

v3:
Modify commit description.

v4:
Add text_mutex lock in the larch_insn_write call path and
CONFIG_STRICT_MODULE_RWX is enabled by default.

UB test on the 3C6000 server shows no significant performance impact.

Before patch:

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.6)

   System: localhost.localdomain: GNU/Linux
   OS: GNU/Linux -- 7.1.0-rc6 -- #1 SMP PREEMPT Wed Jun 10 21:07:41 CST 2026
   Machine: loongarch64 (loongarch64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   21:19:51 up 1 min,  2 users,  load average: 0.71, 0.38, 0.14; runlevel 2026-06-10

------------------------------------------------------------------------
128 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       35205725.4 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4244.9 MWIPS (10.0 s, 7 samples)
Execl Throughput                               6717.7 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1213873.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          350740.5 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       3275103.0 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1981993.9 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  55287.7 lps   (10.0 s, 7 samples)
Process Creation                               9056.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   6736.5 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   2109.8 lpm   (60.0 s, 2 samples)
System Call Overhead                        1549110.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   35205725.4   3016.8
Double-Precision Whetstone                       55.0       4244.9    771.8
Execl Throughput                                 43.0       6717.7   1562.3
File Copy 1024 bufsize 2000 maxblocks          3960.0    1213873.8   3065.3
File Copy 256 bufsize 500 maxblocks            1655.0     350740.5   2119.3
File Copy 4096 bufsize 8000 maxblocks          5800.0    3275103.0   5646.7
Pipe Throughput                               12440.0    1981993.9   1593.2
Pipe-based Context Switching                   4000.0      55287.7    138.2
Process Creation                                126.0       9056.8    718.8
Shell Scripts (1 concurrent)                     42.4       6736.5   1588.8
Shell Scripts (8 concurrent)                      6.0       2109.8   3516.4
System Call Overhead                          15000.0    1549110.9   1032.7
                                                                   ========
System Benchmarks Index Score                                        1492.2

------------------------------------------------------------------------
128 CPUs in system; running 128 parallel copies of tests

Dhrystone 2 using register variables     2901925470.7 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                   503614.9 MWIPS (10.1 s, 7 samples)
Execl Throughput                              34080.1 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        309291.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           75115.3 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1018101.3 KBps  (30.0 s, 2 samples)
Pipe Throughput                           180702003.1 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                2426596.4 lps   (10.0 s, 7 samples)
Process Creation                              37282.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  50813.7 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                   5835.8 lpm   (60.4 s, 2 samples)
System Call Overhead                        9039181.1 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0 2901925470.7 248665.4
Double-Precision Whetstone                       55.0     503614.9  91566.3
Execl Throughput                                 43.0      34080.1   7925.6
File Copy 1024 bufsize 2000 maxblocks          3960.0     309291.4    781.0
File Copy 256 bufsize 500 maxblocks            1655.0      75115.3    453.9
File Copy 4096 bufsize 8000 maxblocks          5800.0    1018101.3   1755.3
Pipe Throughput                               12440.0  180702003.1 145258.8
Pipe-based Context Switching                   4000.0    2426596.4   6066.5
Process Creation                                126.0      37282.5   2958.9
Shell Scripts (1 concurrent)                     42.4      50813.7  11984.4
Shell Scripts (8 concurrent)                      6.0       5835.8   9726.4
System Call Overhead                          15000.0    9039181.1   6026.1
                                                                   ========
System Benchmarks Index Score                                        8765.2


After patch:

128 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       35438193.7 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4245.7 MWIPS (10.0 s, 7 samples)
Execl Throughput                               5293.7 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1233323.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          355264.5 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       3333631.6 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1979613.2 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  55675.2 lps   (10.0 s, 7 samples)
Process Creation                               8528.1 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   6870.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   2115.5 lpm   (60.0 s, 2 samples)
System Call Overhead                        1546959.4 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   35438193.7   3036.7
Double-Precision Whetstone                       55.0       4245.7    772.0
Execl Throughput                                 43.0       5293.7   1231.1
File Copy 1024 bufsize 2000 maxblocks          3960.0    1233323.4   3114.5
File Copy 256 bufsize 500 maxblocks            1655.0     355264.5   2146.6
File Copy 4096 bufsize 8000 maxblocks          5800.0    3333631.6   5747.6
Pipe Throughput                               12440.0    1979613.2   1591.3
Pipe-based Context Switching                   4000.0      55675.2    139.2
Process Creation                                126.0       8528.1    676.8
Shell Scripts (1 concurrent)                     42.4       6870.0   1620.3
Shell Scripts (8 concurrent)                      6.0       2115.5   3525.8
System Call Overhead                          15000.0    1546959.4   1031.3
                                                                   ========
System Benchmarks Index Score                                        1465.3

------------------------------------------------------------------------
128 CPUs in system; running 128 parallel copies of tests

Dhrystone 2 using register variables     2903340286.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                   504137.7 MWIPS (10.1 s, 7 samples)
Execl Throughput                              34332.8 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        311391.2 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           72503.3 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1000861.7 KBps  (30.0 s, 2 samples)
Pipe Throughput                           179382076.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                2415716.6 lps   (10.0 s, 7 samples)
Process Creation                              36873.1 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  51464.1 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                   5976.3 lpm   (60.4 s, 2 samples)
System Call Overhead                        9182389.5 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0 2903340286.5 248786.7
Double-Precision Whetstone                       55.0     504137.7  91661.4
Execl Throughput                                 43.0      34332.8   7984.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     311391.2    786.3
File Copy 256 bufsize 500 maxblocks            1655.0      72503.3    438.1
File Copy 4096 bufsize 8000 maxblocks          5800.0    1000861.7   1725.6
Pipe Throughput                               12440.0  179382076.6 144197.8
Pipe-based Context Switching                   4000.0    2415716.6   6039.3
Process Creation                                126.0      36873.1   2926.4
Shell Scripts (1 concurrent)                     42.4      51464.1  12137.8
Shell Scripts (8 concurrent)                      6.0       5976.3   9960.6
System Call Overhead                          15000.0    9182389.5   6121.6
                                                                   ========
System Benchmarks Index Score                                        8759.8

---
 arch/loongarch/Kconfig             |  1 +
 arch/loongarch/kernel/ftrace_dyn.c |  7 ++++++-
 arch/loongarch/kernel/inst.c       | 25 +++++++++++++++++++++----
 arch/loongarch/kernel/jump_label.c |  3 +++
 4 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 606597da46b8..c751d714c287 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -27,6 +27,7 @@ config LOONGARCH
 	select ARCH_HAS_PTE_SPECIAL if 64BIT
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SET_DIRECT_MAP
+	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAS_UBSAN
 	select ARCH_HAS_VDSO_ARCH_DATA
diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
index d5d81d74034c..598dc6434cc4 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -8,6 +8,7 @@
 #include <linux/ftrace.h>
 #include <linux/kprobes.h>
 #include <linux/uaccess.h>
+#include <linux/memory.h>
 
 #include <asm/inst.h>
 #include <asm/module.h>
@@ -24,8 +25,12 @@ static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, bool validate)
 			return -EINVAL;
 	}
 
-	if (larch_insn_patch_text((void *)pc, new))
+	mutex_lock(&text_mutex);
+	if (larch_insn_patch_text((void *)pc, new)) {
+		mutex_unlock(&text_mutex);
 		return -EPERM;
+	}
+	mutex_unlock(&text_mutex);
 
 	return 0;
 }
diff --git a/arch/loongarch/kernel/inst.c b/arch/loongarch/kernel/inst.c
index 0b9228b7c13a..3de94d465c3c 100644
--- a/arch/loongarch/kernel/inst.c
+++ b/arch/loongarch/kernel/inst.c
@@ -6,12 +6,11 @@
 #include <linux/uaccess.h>
 #include <linux/set_memory.h>
 #include <linux/stop_machine.h>
+#include <linux/memory.h>
 
 #include <asm/cacheflush.h>
 #include <asm/inst.h>
 
-static DEFINE_RAW_SPINLOCK(patch_lock);
-
 void simu_pc(struct pt_regs *regs, union loongarch_instruction insn)
 {
 	unsigned long pc = regs->csr_era;
@@ -207,14 +206,32 @@ int larch_insn_read(void *addr, u32 *insnp)
 int larch_insn_write(void *addr, u32 insn)
 {
 	int ret;
+	int err = 0;
+	size_t start;
 	unsigned long flags = 0;
 
 	if ((unsigned long)addr & 3)
 		return -EINVAL;
 
-	raw_spin_lock_irqsave(&patch_lock, flags);
+	start = round_down((size_t)addr, PAGE_SIZE);
+
+	lockdep_assert_held(&text_mutex);
+
+	err = set_memory_rw(start, 1);
+	if (err) {
+		pr_info("%s: set_memory_rw() failed\n", __func__);
+		return err;
+	}
+
+	local_irq_save(flags);
 	ret = copy_to_kernel_nofault(addr, &insn, LOONGARCH_INSN_SIZE);
-	raw_spin_unlock_irqrestore(&patch_lock, flags);
+	local_irq_restore(flags);
+
+	err = set_memory_rox(start, 1);
+	if (err) {
+		pr_info("%s: set_memory_rox() failed\n", __func__);
+		return err;
+	}
 
 	return ret;
 }
diff --git a/arch/loongarch/kernel/jump_label.c b/arch/loongarch/kernel/jump_label.c
index 24a3f4d8540c..e6bb040fe4c5 100644
--- a/arch/loongarch/kernel/jump_label.c
+++ b/arch/loongarch/kernel/jump_label.c
@@ -6,6 +6,7 @@
  */
 #include <linux/kernel.h>
 #include <linux/jump_label.h>
+#include <linux/memory.h>
 #include <asm/cacheflush.h>
 #include <asm/inst.h>
 
@@ -19,7 +20,9 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_t
 	else
 		insn = larch_insn_gen_nop();
 
+	mutex_lock(&text_mutex);
 	larch_insn_write(addr, insn);
+	mutex_unlock(&text_mutex);
 
 	return true;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions
  2026-06-13  8:41 [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions haoran.jiang
@ 2026-06-13 10:43 ` Huacai Chen
  2026-06-14 10:24   ` haoran.jiang
  0 siblings, 1 reply; 4+ messages in thread
From: Huacai Chen @ 2026-06-13 10:43 UTC (permalink / raw)
  To: haoran.jiang
  Cc: loongarch, linux-kernel, kernel, akpm, jbohac, kees, yangtiezhu,
	Haoran Jiang

Hi, Haoran,

On Sat, Jun 13, 2026 at 4:42 PM <haoran.jiang@linux.dev> wrote:
>
> From: Haoran Jiang <jianghaoran@kylinos.cn>
>
> Enable STRICT_MODULE_RWX to enforce strict memory permissions
> on modules,making the code region non-writable, the data region
> non-executable, and the read-only data region both non-writable
> and non-executable.Temporarily modify code section read/write
> permissions via set_memory() API.
>
> Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
> ---
> v2:
> Change the method of modifying page table permissions from patch_map to set_memory() API.
>
> v3:
> Modify commit description.
>
> v4:
> Add text_mutex lock in the larch_insn_write call path and
> CONFIG_STRICT_MODULE_RWX is enabled by default.
>
> UB test on the 3C6000 server shows no significant performance impact.
>
> Before patch:
>
> ========================================================================
>    BYTE UNIX Benchmarks (Version 5.1.6)
>
>    System: localhost.localdomain: GNU/Linux
>    OS: GNU/Linux -- 7.1.0-rc6 -- #1 SMP PREEMPT Wed Jun 10 21:07:41 CST 2026
>    Machine: loongarch64 (loongarch64)
>    Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
>    21:19:51 up 1 min,  2 users,  load average: 0.71, 0.38, 0.14; runlevel 2026-06-10
>
> ------------------------------------------------------------------------
> 128 CPUs in system; running 1 parallel copy of tests
>
> Dhrystone 2 using register variables       35205725.4 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                     4244.9 MWIPS (10.0 s, 7 samples)
> Execl Throughput                               6717.7 lps   (30.0 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks       1213873.8 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks          350740.5 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       3275103.0 KBps  (30.0 s, 2 samples)
> Pipe Throughput                             1981993.9 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                  55287.7 lps   (10.0 s, 7 samples)
> Process Creation                               9056.8 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                   6736.5 lpm   (60.0 s, 2 samples)
> Shell Scripts (8 concurrent)                   2109.8 lpm   (60.0 s, 2 samples)
> System Call Overhead                        1549110.9 lps   (10.0 s, 7 samples)
>
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0   35205725.4   3016.8
> Double-Precision Whetstone                       55.0       4244.9    771.8
> Execl Throughput                                 43.0       6717.7   1562.3
> File Copy 1024 bufsize 2000 maxblocks          3960.0    1213873.8   3065.3
> File Copy 256 bufsize 500 maxblocks            1655.0     350740.5   2119.3
> File Copy 4096 bufsize 8000 maxblocks          5800.0    3275103.0   5646.7
> Pipe Throughput                               12440.0    1981993.9   1593.2
> Pipe-based Context Switching                   4000.0      55287.7    138.2
> Process Creation                                126.0       9056.8    718.8
> Shell Scripts (1 concurrent)                     42.4       6736.5   1588.8
> Shell Scripts (8 concurrent)                      6.0       2109.8   3516.4
> System Call Overhead                          15000.0    1549110.9   1032.7
>                                                                    ========
> System Benchmarks Index Score                                        1492.2
>
> ------------------------------------------------------------------------
> 128 CPUs in system; running 128 parallel copies of tests
>
> Dhrystone 2 using register variables     2901925470.7 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                   503614.9 MWIPS (10.1 s, 7 samples)
> Execl Throughput                              34080.1 lps   (29.9 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks        309291.4 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks           75115.3 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       1018101.3 KBps  (30.0 s, 2 samples)
> Pipe Throughput                           180702003.1 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                2426596.4 lps   (10.0 s, 7 samples)
> Process Creation                              37282.5 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                  50813.7 lpm   (60.1 s, 2 samples)
> Shell Scripts (8 concurrent)                   5835.8 lpm   (60.4 s, 2 samples)
> System Call Overhead                        9039181.1 lps   (10.0 s, 7 samples)
>
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0 2901925470.7 248665.4
> Double-Precision Whetstone                       55.0     503614.9  91566.3
> Execl Throughput                                 43.0      34080.1   7925.6
> File Copy 1024 bufsize 2000 maxblocks          3960.0     309291.4    781.0
> File Copy 256 bufsize 500 maxblocks            1655.0      75115.3    453.9
> File Copy 4096 bufsize 8000 maxblocks          5800.0    1018101.3   1755.3
> Pipe Throughput                               12440.0  180702003.1 145258.8
> Pipe-based Context Switching                   4000.0    2426596.4   6066.5
> Process Creation                                126.0      37282.5   2958.9
> Shell Scripts (1 concurrent)                     42.4      50813.7  11984.4
> Shell Scripts (8 concurrent)                      6.0       5835.8   9726.4
> System Call Overhead                          15000.0    9039181.1   6026.1
>                                                                    ========
> System Benchmarks Index Score                                        8765.2
>
>
> After patch:
>
> 128 CPUs in system; running 1 parallel copy of tests
>
> Dhrystone 2 using register variables       35438193.7 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                     4245.7 MWIPS (10.0 s, 7 samples)
> Execl Throughput                               5293.7 lps   (30.0 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks       1233323.4 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks          355264.5 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       3333631.6 KBps  (30.0 s, 2 samples)
> Pipe Throughput                             1979613.2 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                  55675.2 lps   (10.0 s, 7 samples)
> Process Creation                               8528.1 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                   6870.0 lpm   (60.0 s, 2 samples)
> Shell Scripts (8 concurrent)                   2115.5 lpm   (60.0 s, 2 samples)
> System Call Overhead                        1546959.4 lps   (10.0 s, 7 samples)
>
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0   35438193.7   3036.7
> Double-Precision Whetstone                       55.0       4245.7    772.0
> Execl Throughput                                 43.0       5293.7   1231.1
> File Copy 1024 bufsize 2000 maxblocks          3960.0    1233323.4   3114.5
> File Copy 256 bufsize 500 maxblocks            1655.0     355264.5   2146.6
> File Copy 4096 bufsize 8000 maxblocks          5800.0    3333631.6   5747.6
> Pipe Throughput                               12440.0    1979613.2   1591.3
> Pipe-based Context Switching                   4000.0      55675.2    139.2
> Process Creation                                126.0       8528.1    676.8
> Shell Scripts (1 concurrent)                     42.4       6870.0   1620.3
> Shell Scripts (8 concurrent)                      6.0       2115.5   3525.8
> System Call Overhead                          15000.0    1546959.4   1031.3
>                                                                    ========
> System Benchmarks Index Score                                        1465.3
>
> ------------------------------------------------------------------------
> 128 CPUs in system; running 128 parallel copies of tests
>
> Dhrystone 2 using register variables     2903340286.5 lps   (10.0 s, 7 samples)
> Double-Precision Whetstone                   504137.7 MWIPS (10.1 s, 7 samples)
> Execl Throughput                              34332.8 lps   (29.9 s, 2 samples)
> File Copy 1024 bufsize 2000 maxblocks        311391.2 KBps  (30.0 s, 2 samples)
> File Copy 256 bufsize 500 maxblocks           72503.3 KBps  (30.0 s, 2 samples)
> File Copy 4096 bufsize 8000 maxblocks       1000861.7 KBps  (30.0 s, 2 samples)
> Pipe Throughput                           179382076.6 lps   (10.0 s, 7 samples)
> Pipe-based Context Switching                2415716.6 lps   (10.0 s, 7 samples)
> Process Creation                              36873.1 lps   (30.0 s, 2 samples)
> Shell Scripts (1 concurrent)                  51464.1 lpm   (60.1 s, 2 samples)
> Shell Scripts (8 concurrent)                   5976.3 lpm   (60.4 s, 2 samples)
> System Call Overhead                        9182389.5 lps   (10.0 s, 7 samples)
>
> System Benchmarks Index Values               BASELINE       RESULT    INDEX
> Dhrystone 2 using register variables         116700.0 2903340286.5 248786.7
> Double-Precision Whetstone                       55.0     504137.7  91661.4
> Execl Throughput                                 43.0      34332.8   7984.4
> File Copy 1024 bufsize 2000 maxblocks          3960.0     311391.2    786.3
> File Copy 256 bufsize 500 maxblocks            1655.0      72503.3    438.1
> File Copy 4096 bufsize 8000 maxblocks          5800.0    1000861.7   1725.6
> Pipe Throughput                               12440.0  179382076.6 144197.8
> Pipe-based Context Switching                   4000.0    2415716.6   6039.3
> Process Creation                                126.0      36873.1   2926.4
> Shell Scripts (1 concurrent)                     42.4      51464.1  12137.8
> Shell Scripts (8 concurrent)                      6.0       5976.3   9960.6
> System Call Overhead                          15000.0    9182389.5   6121.6
>                                                                    ========
> System Benchmarks Index Score                                        8759.8
I think your tests are incomplete. Performance test is just a very
basic test, you should make sure all dynamic code modification
mechanisms are correct.

At least these should be verified one by one: jump label, kgdb, bpf,
ftrace, kprobes, uprobes....

You have modified ftrace, but not the best method that Ihae suggested in V3.

You completely ignore my suggestion about kprobes in V3.

For KGDB, you should use text_mutex to protect copy_to_kernel_nofault().

For uprobes, I have no idea, maybe @Tiezhu can give some suggestions.

Huacai

>
> ---
>  arch/loongarch/Kconfig             |  1 +
>  arch/loongarch/kernel/ftrace_dyn.c |  7 ++++++-
>  arch/loongarch/kernel/inst.c       | 25 +++++++++++++++++++++----
>  arch/loongarch/kernel/jump_label.c |  3 +++
>  4 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 606597da46b8..c751d714c287 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -27,6 +27,7 @@ config LOONGARCH
>         select ARCH_HAS_PTE_SPECIAL if 64BIT
>         select ARCH_HAS_SET_MEMORY
>         select ARCH_HAS_SET_DIRECT_MAP
> +       select ARCH_HAS_STRICT_MODULE_RWX
>         select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>         select ARCH_HAS_UBSAN
>         select ARCH_HAS_VDSO_ARCH_DATA
> diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
> index d5d81d74034c..598dc6434cc4 100644
> --- a/arch/loongarch/kernel/ftrace_dyn.c
> +++ b/arch/loongarch/kernel/ftrace_dyn.c
> @@ -8,6 +8,7 @@
>  #include <linux/ftrace.h>
>  #include <linux/kprobes.h>
>  #include <linux/uaccess.h>
> +#include <linux/memory.h>
>
>  #include <asm/inst.h>
>  #include <asm/module.h>
> @@ -24,8 +25,12 @@ static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, bool validate)
>                         return -EINVAL;
>         }
>
> -       if (larch_insn_patch_text((void *)pc, new))
> +       mutex_lock(&text_mutex);
> +       if (larch_insn_patch_text((void *)pc, new)) {
> +               mutex_unlock(&text_mutex);
>                 return -EPERM;
> +       }
> +       mutex_unlock(&text_mutex);
>
>         return 0;
>  }
> diff --git a/arch/loongarch/kernel/inst.c b/arch/loongarch/kernel/inst.c
> index 0b9228b7c13a..3de94d465c3c 100644
> --- a/arch/loongarch/kernel/inst.c
> +++ b/arch/loongarch/kernel/inst.c
> @@ -6,12 +6,11 @@
>  #include <linux/uaccess.h>
>  #include <linux/set_memory.h>
>  #include <linux/stop_machine.h>
> +#include <linux/memory.h>
>
>  #include <asm/cacheflush.h>
>  #include <asm/inst.h>
>
> -static DEFINE_RAW_SPINLOCK(patch_lock);
> -
>  void simu_pc(struct pt_regs *regs, union loongarch_instruction insn)
>  {
>         unsigned long pc = regs->csr_era;
> @@ -207,14 +206,32 @@ int larch_insn_read(void *addr, u32 *insnp)
>  int larch_insn_write(void *addr, u32 insn)
>  {
>         int ret;
> +       int err = 0;
> +       size_t start;
>         unsigned long flags = 0;
>
>         if ((unsigned long)addr & 3)
>                 return -EINVAL;
>
> -       raw_spin_lock_irqsave(&patch_lock, flags);
> +       start = round_down((size_t)addr, PAGE_SIZE);
> +
> +       lockdep_assert_held(&text_mutex);
> +
> +       err = set_memory_rw(start, 1);
> +       if (err) {
> +               pr_info("%s: set_memory_rw() failed\n", __func__);
> +               return err;
> +       }
> +
> +       local_irq_save(flags);
>         ret = copy_to_kernel_nofault(addr, &insn, LOONGARCH_INSN_SIZE);
> -       raw_spin_unlock_irqrestore(&patch_lock, flags);
> +       local_irq_restore(flags);
> +
> +       err = set_memory_rox(start, 1);
> +       if (err) {
> +               pr_info("%s: set_memory_rox() failed\n", __func__);
> +               return err;
> +       }
>
>         return ret;
>  }
> diff --git a/arch/loongarch/kernel/jump_label.c b/arch/loongarch/kernel/jump_label.c
> index 24a3f4d8540c..e6bb040fe4c5 100644
> --- a/arch/loongarch/kernel/jump_label.c
> +++ b/arch/loongarch/kernel/jump_label.c
> @@ -6,6 +6,7 @@
>   */
>  #include <linux/kernel.h>
>  #include <linux/jump_label.h>
> +#include <linux/memory.h>
>  #include <asm/cacheflush.h>
>  #include <asm/inst.h>
>
> @@ -19,7 +20,9 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_t
>         else
>                 insn = larch_insn_gen_nop();
>
> +       mutex_lock(&text_mutex);
>         larch_insn_write(addr, insn);
> +       mutex_unlock(&text_mutex);
>
>         return true;
>  }
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions
  2026-06-13 10:43 ` Huacai Chen
@ 2026-06-14 10:24   ` haoran.jiang
  2026-06-14 12:10     ` Huacai Chen
  0 siblings, 1 reply; 4+ messages in thread
From: haoran.jiang @ 2026-06-14 10:24 UTC (permalink / raw)
  To: Huacai Chen
  Cc: loongarch, linux-kernel, kernel, akpm, jbohac, kees, yangtiezhu,
	Haoran Jiang

2026年6月13日 18:43, "Huacai Chen" <chenhuacai@kernel.org mailto:chenhuacai@kernel.org?to=%22Huacai%20Chen%22%20%3Cchenhuacai%40kernel.org%3E > 写到:


> 
> Hi, Haoran,
> 
> On Sat, Jun 13, 2026 at 4:42 PM <haoran.jiang@linux.dev> wrote:
> 
> > 
> > From: Haoran Jiang <jianghaoran@kylinos.cn>
> > 
> >  Enable STRICT_MODULE_RWX to enforce strict memory permissions
> >  on modules,making the code region non-writable, the data region
> >  non-executable, and the read-only data region both non-writable
> >  and non-executable.Temporarily modify code section read/write
> >  permissions via set_memory() API.
> > 
> >  Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
> >  ---
> >  v2:
> >  Change the method of modifying page table permissions from patch_map to set_memory() API.
> > 
> >  v3:
> >  Modify commit description.
> > 
> >  v4:
> >  Add text_mutex lock in the larch_insn_write call path and
> >  CONFIG_STRICT_MODULE_RWX is enabled by default.
> > 
> >  UB test on the 3C6000 server shows no significant performance impact.
> > 
> >  Before patch:
> > 
> >  ========================================================================
> >  BYTE UNIX Benchmarks (Version 5.1.6)
> > 
> >  System: localhost.localdomain: GNU/Linux
> >  OS: GNU/Linux -- 7.1.0-rc6 -- #1 SMP PREEMPT Wed Jun 10 21:07:41 CST 2026
> >  Machine: loongarch64 (loongarch64)
> >  Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
> >  21:19:51 up 1 min, 2 users, load average: 0.71, 0.38, 0.14; runlevel 2026-06-10
> > 
> >  ------------------------------------------------------------------------
> >  128 CPUs in system; running 1 parallel copy of tests
> > 
> >  Dhrystone 2 using register variables 35205725.4 lps (10.0 s, 7 samples)
> >  Double-Precision Whetstone 4244.9 MWIPS (10.0 s, 7 samples)
> >  Execl Throughput 6717.7 lps (30.0 s, 2 samples)
> >  File Copy 1024 bufsize 2000 maxblocks 1213873.8 KBps (30.0 s, 2 samples)
> >  File Copy 256 bufsize 500 maxblocks 350740.5 KBps (30.0 s, 2 samples)
> >  File Copy 4096 bufsize 8000 maxblocks 3275103.0 KBps (30.0 s, 2 samples)
> >  Pipe Throughput 1981993.9 lps (10.0 s, 7 samples)
> >  Pipe-based Context Switching 55287.7 lps (10.0 s, 7 samples)
> >  Process Creation 9056.8 lps (30.0 s, 2 samples)
> >  Shell Scripts (1 concurrent) 6736.5 lpm (60.0 s, 2 samples)
> >  Shell Scripts (8 concurrent) 2109.8 lpm (60.0 s, 2 samples)
> >  System Call Overhead 1549110.9 lps (10.0 s, 7 samples)
> > 
> >  System Benchmarks Index Values BASELINE RESULT INDEX
> >  Dhrystone 2 using register variables 116700.0 35205725.4 3016.8
> >  Double-Precision Whetstone 55.0 4244.9 771.8
> >  Execl Throughput 43.0 6717.7 1562.3
> >  File Copy 1024 bufsize 2000 maxblocks 3960.0 1213873.8 3065.3
> >  File Copy 256 bufsize 500 maxblocks 1655.0 350740.5 2119.3
> >  File Copy 4096 bufsize 8000 maxblocks 5800.0 3275103.0 5646.7
> >  Pipe Throughput 12440.0 1981993.9 1593.2
> >  Pipe-based Context Switching 4000.0 55287.7 138.2
> >  Process Creation 126.0 9056.8 718.8
> >  Shell Scripts (1 concurrent) 42.4 6736.5 1588.8
> >  Shell Scripts (8 concurrent) 6.0 2109.8 3516.4
> >  System Call Overhead 15000.0 1549110.9 1032.7
> >  ========
> >  System Benchmarks Index Score 1492.2
> > 
> >  ------------------------------------------------------------------------
> >  128 CPUs in system; running 128 parallel copies of tests
> > 
> >  Dhrystone 2 using register variables 2901925470.7 lps (10.0 s, 7 samples)
> >  Double-Precision Whetstone 503614.9 MWIPS (10.1 s, 7 samples)
> >  Execl Throughput 34080.1 lps (29.9 s, 2 samples)
> >  File Copy 1024 bufsize 2000 maxblocks 309291.4 KBps (30.0 s, 2 samples)
> >  File Copy 256 bufsize 500 maxblocks 75115.3 KBps (30.0 s, 2 samples)
> >  File Copy 4096 bufsize 8000 maxblocks 1018101.3 KBps (30.0 s, 2 samples)
> >  Pipe Throughput 180702003.1 lps (10.0 s, 7 samples)
> >  Pipe-based Context Switching 2426596.4 lps (10.0 s, 7 samples)
> >  Process Creation 37282.5 lps (30.0 s, 2 samples)
> >  Shell Scripts (1 concurrent) 50813.7 lpm (60.1 s, 2 samples)
> >  Shell Scripts (8 concurrent) 5835.8 lpm (60.4 s, 2 samples)
> >  System Call Overhead 9039181.1 lps (10.0 s, 7 samples)
> > 
> >  System Benchmarks Index Values BASELINE RESULT INDEX
> >  Dhrystone 2 using register variables 116700.0 2901925470.7 248665.4
> >  Double-Precision Whetstone 55.0 503614.9 91566.3
> >  Execl Throughput 43.0 34080.1 7925.6
> >  File Copy 1024 bufsize 2000 maxblocks 3960.0 309291.4 781.0
> >  File Copy 256 bufsize 500 maxblocks 1655.0 75115.3 453.9
> >  File Copy 4096 bufsize 8000 maxblocks 5800.0 1018101.3 1755.3
> >  Pipe Throughput 12440.0 180702003.1 145258.8
> >  Pipe-based Context Switching 4000.0 2426596.4 6066.5
> >  Process Creation 126.0 37282.5 2958.9
> >  Shell Scripts (1 concurrent) 42.4 50813.7 11984.4
> >  Shell Scripts (8 concurrent) 6.0 5835.8 9726.4
> >  System Call Overhead 15000.0 9039181.1 6026.1
> >  ========
> >  System Benchmarks Index Score 8765.2
> > 
> >  After patch:
> > 
> >  128 CPUs in system; running 1 parallel copy of tests
> > 
> >  Dhrystone 2 using register variables 35438193.7 lps (10.0 s, 7 samples)
> >  Double-Precision Whetstone 4245.7 MWIPS (10.0 s, 7 samples)
> >  Execl Throughput 5293.7 lps (30.0 s, 2 samples)
> >  File Copy 1024 bufsize 2000 maxblocks 1233323.4 KBps (30.0 s, 2 samples)
> >  File Copy 256 bufsize 500 maxblocks 355264.5 KBps (30.0 s, 2 samples)
> >  File Copy 4096 bufsize 8000 maxblocks 3333631.6 KBps (30.0 s, 2 samples)
> >  Pipe Throughput 1979613.2 lps (10.0 s, 7 samples)
> >  Pipe-based Context Switching 55675.2 lps (10.0 s, 7 samples)
> >  Process Creation 8528.1 lps (30.0 s, 2 samples)
> >  Shell Scripts (1 concurrent) 6870.0 lpm (60.0 s, 2 samples)
> >  Shell Scripts (8 concurrent) 2115.5 lpm (60.0 s, 2 samples)
> >  System Call Overhead 1546959.4 lps (10.0 s, 7 samples)
> > 
> >  System Benchmarks Index Values BASELINE RESULT INDEX
> >  Dhrystone 2 using register variables 116700.0 35438193.7 3036.7
> >  Double-Precision Whetstone 55.0 4245.7 772.0
> >  Execl Throughput 43.0 5293.7 1231.1
> >  File Copy 1024 bufsize 2000 maxblocks 3960.0 1233323.4 3114.5
> >  File Copy 256 bufsize 500 maxblocks 1655.0 355264.5 2146.6
> >  File Copy 4096 bufsize 8000 maxblocks 5800.0 3333631.6 5747.6
> >  Pipe Throughput 12440.0 1979613.2 1591.3
> >  Pipe-based Context Switching 4000.0 55675.2 139.2
> >  Process Creation 126.0 8528.1 676.8
> >  Shell Scripts (1 concurrent) 42.4 6870.0 1620.3
> >  Shell Scripts (8 concurrent) 6.0 2115.5 3525.8
> >  System Call Overhead 15000.0 1546959.4 1031.3
> >  ========
> >  System Benchmarks Index Score 1465.3
> > 
> >  ------------------------------------------------------------------------
> >  128 CPUs in system; running 128 parallel copies of tests
> > 
> >  Dhrystone 2 using register variables 2903340286.5 lps (10.0 s, 7 samples)
> >  Double-Precision Whetstone 504137.7 MWIPS (10.1 s, 7 samples)
> >  Execl Throughput 34332.8 lps (29.9 s, 2 samples)
> >  File Copy 1024 bufsize 2000 maxblocks 311391.2 KBps (30.0 s, 2 samples)
> >  File Copy 256 bufsize 500 maxblocks 72503.3 KBps (30.0 s, 2 samples)
> >  File Copy 4096 bufsize 8000 maxblocks 1000861.7 KBps (30.0 s, 2 samples)
> >  Pipe Throughput 179382076.6 lps (10.0 s, 7 samples)
> >  Pipe-based Context Switching 2415716.6 lps (10.0 s, 7 samples)
> >  Process Creation 36873.1 lps (30.0 s, 2 samples)
> >  Shell Scripts (1 concurrent) 51464.1 lpm (60.1 s, 2 samples)
> >  Shell Scripts (8 concurrent) 5976.3 lpm (60.4 s, 2 samples)
> >  System Call Overhead 9182389.5 lps (10.0 s, 7 samples)
> > 
> >  System Benchmarks Index Values BASELINE RESULT INDEX
> >  Dhrystone 2 using register variables 116700.0 2903340286.5 248786.7
> >  Double-Precision Whetstone 55.0 504137.7 91661.4
> >  Execl Throughput 43.0 34332.8 7984.4
> >  File Copy 1024 bufsize 2000 maxblocks 3960.0 311391.2 786.3
> >  File Copy 256 bufsize 500 maxblocks 1655.0 72503.3 438.1
> >  File Copy 4096 bufsize 8000 maxblocks 5800.0 1000861.7 1725.6
> >  Pipe Throughput 12440.0 179382076.6 144197.8
> >  Pipe-based Context Switching 4000.0 2415716.6 6039.3
> >  Process Creation 126.0 36873.1 2926.4
> >  Shell Scripts (1 concurrent) 42.4 51464.1 12137.8
> >  Shell Scripts (8 concurrent) 6.0 5976.3 9960.6
> >  System Call Overhead 15000.0 9182389.5 6121.6
> >  ========
> >  System Benchmarks Index Score 8759.8
> > 
> I think your tests are incomplete. Performance test is just a very
> basic test, you should make sure all dynamic code modification
> mechanisms are correct.
> 
> At least these should be verified one by one: jump label, kgdb, bpf,
> ftrace, kprobes, uprobes....
> 
> You have modified ftrace, but not the best method that Ihae suggested in V3.
> 
> You completely ignore my suggestion about kprobes in V3.
> 
> For KGDB, you should use text_mutex to protect copy_to_kernel_nofault().
> 
> For uprobes, I have no idea, maybe @Tiezhu can give some suggestions.
> 
> Huacai
> 
 For fentry:

 ftrace_arch_code_modify_prepare() and ftrace_arch_code_modify_post_process() are not in the call path of ftrace_init_nop(), 

 so the text_mutex lock was added in ftrace_modify_code(). However, I overlooked that ftrace_init_nop() also holds the text_mutex lock (e.g., on RISC-V). 
 I am fixing this.

 For kprobes:

 text_mutex is already held in the caller arm_kprobe(), which then calls arch_arm_kprobe->larch_insn_text_copy. So the kprobes path is properly protected.

 For kgdb:

 When KGDB enters do_single_step(), it has already stopped execution on all other CPUs. Only the current CPU is running KGDB logic. Do we need to worry about concurrency issues in this case? Also, Sleepable lock is also disallowed in exception context.

 For uprobes:

 I will check the call path again.

 For test:

 I have already done some tests using BPF tools that utilize kprobes and fentry. I will run more tests using the selftests suite.

Thanks !
> > 
> > ---
> >  arch/loongarch/Kconfig | 1 +
> >  arch/loongarch/kernel/ftrace_dyn.c | 7 ++++++-
> >  arch/loongarch/kernel/inst.c | 25 +++++++++++++++++++++----
> >  arch/loongarch/kernel/jump_label.c | 3 +++
> >  4 files changed, 31 insertions(+), 5 deletions(-)
> > 
> >  diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >  index 606597da46b8..c751d714c287 100644
> >  --- a/arch/loongarch/Kconfig
> >  +++ b/arch/loongarch/Kconfig
> >  @@ -27,6 +27,7 @@ config LOONGARCH
> >  select ARCH_HAS_PTE_SPECIAL if 64BIT
> >  select ARCH_HAS_SET_MEMORY
> >  select ARCH_HAS_SET_DIRECT_MAP
> >  + select ARCH_HAS_STRICT_MODULE_RWX
> >  select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> >  select ARCH_HAS_UBSAN
> >  select ARCH_HAS_VDSO_ARCH_DATA
> >  diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
> >  index d5d81d74034c..598dc6434cc4 100644
> >  --- a/arch/loongarch/kernel/ftrace_dyn.c
> >  +++ b/arch/loongarch/kernel/ftrace_dyn.c
> >  @@ -8,6 +8,7 @@
> >  #include <linux/ftrace.h>
> >  #include <linux/kprobes.h>
> >  #include <linux/uaccess.h>
> >  +#include <linux/memory.h>
> > 
> >  #include <asm/inst.h>
> >  #include <asm/module.h>
> >  @@ -24,8 +25,12 @@ static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, bool validate)
> >  return -EINVAL;
> >  }
> > 
> >  - if (larch_insn_patch_text((void *)pc, new))
> >  + mutex_lock(&text_mutex);
> >  + if (larch_insn_patch_text((void *)pc, new)) {
> >  + mutex_unlock(&text_mutex);
> >  return -EPERM;
> >  + }
> >  + mutex_unlock(&text_mutex);
> > 
> >  return 0;
> >  }
> >  diff --git a/arch/loongarch/kernel/inst.c b/arch/loongarch/kernel/inst.c
> >  index 0b9228b7c13a..3de94d465c3c 100644
> >  --- a/arch/loongarch/kernel/inst.c
> >  +++ b/arch/loongarch/kernel/inst.c
> >  @@ -6,12 +6,11 @@
> >  #include <linux/uaccess.h>
> >  #include <linux/set_memory.h>
> >  #include <linux/stop_machine.h>
> >  +#include <linux/memory.h>
> > 
> >  #include <asm/cacheflush.h>
> >  #include <asm/inst.h>
> > 
> >  -static DEFINE_RAW_SPINLOCK(patch_lock);
> >  -
> >  void simu_pc(struct pt_regs *regs, union loongarch_instruction insn)
> >  {
> >  unsigned long pc = regs->csr_era;
> >  @@ -207,14 +206,32 @@ int larch_insn_read(void *addr, u32 *insnp)
> >  int larch_insn_write(void *addr, u32 insn)
> >  {
> >  int ret;
> >  + int err = 0;
> >  + size_t start;
> >  unsigned long flags = 0;
> > 
> >  if ((unsigned long)addr & 3)
> >  return -EINVAL;
> > 
> >  - raw_spin_lock_irqsave(&patch_lock, flags);
> >  + start = round_down((size_t)addr, PAGE_SIZE);
> >  +
> >  + lockdep_assert_held(&text_mutex);
> >  +
> >  + err = set_memory_rw(start, 1);
> >  + if (err) {
> >  + pr_info("%s: set_memory_rw() failed\n", __func__);
> >  + return err;
> >  + }
> >  +
> >  + local_irq_save(flags);
> >  ret = copy_to_kernel_nofault(addr, &insn, LOONGARCH_INSN_SIZE);
> >  - raw_spin_unlock_irqrestore(&patch_lock, flags);
> >  + local_irq_restore(flags);
> >  +
> >  + err = set_memory_rox(start, 1);
> >  + if (err) {
> >  + pr_info("%s: set_memory_rox() failed\n", __func__);
> >  + return err;
> >  + }
> > 
> >  return ret;
> >  }
> >  diff --git a/arch/loongarch/kernel/jump_label.c b/arch/loongarch/kernel/jump_label.c
> >  index 24a3f4d8540c..e6bb040fe4c5 100644
> >  --- a/arch/loongarch/kernel/jump_label.c
> >  +++ b/arch/loongarch/kernel/jump_label.c
> >  @@ -6,6 +6,7 @@
> >  */
> >  #include <linux/kernel.h>
> >  #include <linux/jump_label.h>
> >  +#include <linux/memory.h>
> >  #include <asm/cacheflush.h>
> >  #include <asm/inst.h>
> > 
> >  @@ -19,7 +20,9 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_t
> >  else
> >  insn = larch_insn_gen_nop();
> > 
> >  + mutex_lock(&text_mutex);
> >  larch_insn_write(addr, insn);
> >  + mutex_unlock(&text_mutex);
> > 
> >  return true;
> >  }
> >  --
> >  2.43.0
> >
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions
  2026-06-14 10:24   ` haoran.jiang
@ 2026-06-14 12:10     ` Huacai Chen
  0 siblings, 0 replies; 4+ messages in thread
From: Huacai Chen @ 2026-06-14 12:10 UTC (permalink / raw)
  To: haoran.jiang
  Cc: loongarch, linux-kernel, kernel, akpm, jbohac, kees, yangtiezhu,
	Haoran Jiang

On Sun, Jun 14, 2026 at 6:24 PM <haoran.jiang@linux.dev> wrote:
>
> 2026年6月13日 18:43, "Huacai Chen" <chenhuacai@kernel.org mailto:chenhuacai@kernel.org?to=%22Huacai%20Chen%22%20%3Cchenhuacai%40kernel.org%3E > 写到:
>
>
> >
> > Hi, Haoran,
> >
> > On Sat, Jun 13, 2026 at 4:42 PM <haoran.jiang@linux.dev> wrote:
> >
> > >
> > > From: Haoran Jiang <jianghaoran@kylinos.cn>
> > >
> > >  Enable STRICT_MODULE_RWX to enforce strict memory permissions
> > >  on modules,making the code region non-writable, the data region
> > >  non-executable, and the read-only data region both non-writable
> > >  and non-executable.Temporarily modify code section read/write
> > >  permissions via set_memory() API.
> > >
> > >  Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
> > >  ---
> > >  v2:
> > >  Change the method of modifying page table permissions from patch_map to set_memory() API.
> > >
> > >  v3:
> > >  Modify commit description.
> > >
> > >  v4:
> > >  Add text_mutex lock in the larch_insn_write call path and
> > >  CONFIG_STRICT_MODULE_RWX is enabled by default.
> > >
> > >  UB test on the 3C6000 server shows no significant performance impact.
> > >
> > >  Before patch:
> > >
> > >  ========================================================================
> > >  BYTE UNIX Benchmarks (Version 5.1.6)
> > >
> > >  System: localhost.localdomain: GNU/Linux
> > >  OS: GNU/Linux -- 7.1.0-rc6 -- #1 SMP PREEMPT Wed Jun 10 21:07:41 CST 2026
> > >  Machine: loongarch64 (loongarch64)
> > >  Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
> > >  21:19:51 up 1 min, 2 users, load average: 0.71, 0.38, 0.14; runlevel 2026-06-10
> > >
> > >  ------------------------------------------------------------------------
> > >  128 CPUs in system; running 1 parallel copy of tests
> > >
> > >  Dhrystone 2 using register variables 35205725.4 lps (10.0 s, 7 samples)
> > >  Double-Precision Whetstone 4244.9 MWIPS (10.0 s, 7 samples)
> > >  Execl Throughput 6717.7 lps (30.0 s, 2 samples)
> > >  File Copy 1024 bufsize 2000 maxblocks 1213873.8 KBps (30.0 s, 2 samples)
> > >  File Copy 256 bufsize 500 maxblocks 350740.5 KBps (30.0 s, 2 samples)
> > >  File Copy 4096 bufsize 8000 maxblocks 3275103.0 KBps (30.0 s, 2 samples)
> > >  Pipe Throughput 1981993.9 lps (10.0 s, 7 samples)
> > >  Pipe-based Context Switching 55287.7 lps (10.0 s, 7 samples)
> > >  Process Creation 9056.8 lps (30.0 s, 2 samples)
> > >  Shell Scripts (1 concurrent) 6736.5 lpm (60.0 s, 2 samples)
> > >  Shell Scripts (8 concurrent) 2109.8 lpm (60.0 s, 2 samples)
> > >  System Call Overhead 1549110.9 lps (10.0 s, 7 samples)
> > >
> > >  System Benchmarks Index Values BASELINE RESULT INDEX
> > >  Dhrystone 2 using register variables 116700.0 35205725.4 3016.8
> > >  Double-Precision Whetstone 55.0 4244.9 771.8
> > >  Execl Throughput 43.0 6717.7 1562.3
> > >  File Copy 1024 bufsize 2000 maxblocks 3960.0 1213873.8 3065.3
> > >  File Copy 256 bufsize 500 maxblocks 1655.0 350740.5 2119.3
> > >  File Copy 4096 bufsize 8000 maxblocks 5800.0 3275103.0 5646.7
> > >  Pipe Throughput 12440.0 1981993.9 1593.2
> > >  Pipe-based Context Switching 4000.0 55287.7 138.2
> > >  Process Creation 126.0 9056.8 718.8
> > >  Shell Scripts (1 concurrent) 42.4 6736.5 1588.8
> > >  Shell Scripts (8 concurrent) 6.0 2109.8 3516.4
> > >  System Call Overhead 15000.0 1549110.9 1032.7
> > >  ========
> > >  System Benchmarks Index Score 1492.2
> > >
> > >  ------------------------------------------------------------------------
> > >  128 CPUs in system; running 128 parallel copies of tests
> > >
> > >  Dhrystone 2 using register variables 2901925470.7 lps (10.0 s, 7 samples)
> > >  Double-Precision Whetstone 503614.9 MWIPS (10.1 s, 7 samples)
> > >  Execl Throughput 34080.1 lps (29.9 s, 2 samples)
> > >  File Copy 1024 bufsize 2000 maxblocks 309291.4 KBps (30.0 s, 2 samples)
> > >  File Copy 256 bufsize 500 maxblocks 75115.3 KBps (30.0 s, 2 samples)
> > >  File Copy 4096 bufsize 8000 maxblocks 1018101.3 KBps (30.0 s, 2 samples)
> > >  Pipe Throughput 180702003.1 lps (10.0 s, 7 samples)
> > >  Pipe-based Context Switching 2426596.4 lps (10.0 s, 7 samples)
> > >  Process Creation 37282.5 lps (30.0 s, 2 samples)
> > >  Shell Scripts (1 concurrent) 50813.7 lpm (60.1 s, 2 samples)
> > >  Shell Scripts (8 concurrent) 5835.8 lpm (60.4 s, 2 samples)
> > >  System Call Overhead 9039181.1 lps (10.0 s, 7 samples)
> > >
> > >  System Benchmarks Index Values BASELINE RESULT INDEX
> > >  Dhrystone 2 using register variables 116700.0 2901925470.7 248665.4
> > >  Double-Precision Whetstone 55.0 503614.9 91566.3
> > >  Execl Throughput 43.0 34080.1 7925.6
> > >  File Copy 1024 bufsize 2000 maxblocks 3960.0 309291.4 781.0
> > >  File Copy 256 bufsize 500 maxblocks 1655.0 75115.3 453.9
> > >  File Copy 4096 bufsize 8000 maxblocks 5800.0 1018101.3 1755.3
> > >  Pipe Throughput 12440.0 180702003.1 145258.8
> > >  Pipe-based Context Switching 4000.0 2426596.4 6066.5
> > >  Process Creation 126.0 37282.5 2958.9
> > >  Shell Scripts (1 concurrent) 42.4 50813.7 11984.4
> > >  Shell Scripts (8 concurrent) 6.0 5835.8 9726.4
> > >  System Call Overhead 15000.0 9039181.1 6026.1
> > >  ========
> > >  System Benchmarks Index Score 8765.2
> > >
> > >  After patch:
> > >
> > >  128 CPUs in system; running 1 parallel copy of tests
> > >
> > >  Dhrystone 2 using register variables 35438193.7 lps (10.0 s, 7 samples)
> > >  Double-Precision Whetstone 4245.7 MWIPS (10.0 s, 7 samples)
> > >  Execl Throughput 5293.7 lps (30.0 s, 2 samples)
> > >  File Copy 1024 bufsize 2000 maxblocks 1233323.4 KBps (30.0 s, 2 samples)
> > >  File Copy 256 bufsize 500 maxblocks 355264.5 KBps (30.0 s, 2 samples)
> > >  File Copy 4096 bufsize 8000 maxblocks 3333631.6 KBps (30.0 s, 2 samples)
> > >  Pipe Throughput 1979613.2 lps (10.0 s, 7 samples)
> > >  Pipe-based Context Switching 55675.2 lps (10.0 s, 7 samples)
> > >  Process Creation 8528.1 lps (30.0 s, 2 samples)
> > >  Shell Scripts (1 concurrent) 6870.0 lpm (60.0 s, 2 samples)
> > >  Shell Scripts (8 concurrent) 2115.5 lpm (60.0 s, 2 samples)
> > >  System Call Overhead 1546959.4 lps (10.0 s, 7 samples)
> > >
> > >  System Benchmarks Index Values BASELINE RESULT INDEX
> > >  Dhrystone 2 using register variables 116700.0 35438193.7 3036.7
> > >  Double-Precision Whetstone 55.0 4245.7 772.0
> > >  Execl Throughput 43.0 5293.7 1231.1
> > >  File Copy 1024 bufsize 2000 maxblocks 3960.0 1233323.4 3114.5
> > >  File Copy 256 bufsize 500 maxblocks 1655.0 355264.5 2146.6
> > >  File Copy 4096 bufsize 8000 maxblocks 5800.0 3333631.6 5747.6
> > >  Pipe Throughput 12440.0 1979613.2 1591.3
> > >  Pipe-based Context Switching 4000.0 55675.2 139.2
> > >  Process Creation 126.0 8528.1 676.8
> > >  Shell Scripts (1 concurrent) 42.4 6870.0 1620.3
> > >  Shell Scripts (8 concurrent) 6.0 2115.5 3525.8
> > >  System Call Overhead 15000.0 1546959.4 1031.3
> > >  ========
> > >  System Benchmarks Index Score 1465.3
> > >
> > >  ------------------------------------------------------------------------
> > >  128 CPUs in system; running 128 parallel copies of tests
> > >
> > >  Dhrystone 2 using register variables 2903340286.5 lps (10.0 s, 7 samples)
> > >  Double-Precision Whetstone 504137.7 MWIPS (10.1 s, 7 samples)
> > >  Execl Throughput 34332.8 lps (29.9 s, 2 samples)
> > >  File Copy 1024 bufsize 2000 maxblocks 311391.2 KBps (30.0 s, 2 samples)
> > >  File Copy 256 bufsize 500 maxblocks 72503.3 KBps (30.0 s, 2 samples)
> > >  File Copy 4096 bufsize 8000 maxblocks 1000861.7 KBps (30.0 s, 2 samples)
> > >  Pipe Throughput 179382076.6 lps (10.0 s, 7 samples)
> > >  Pipe-based Context Switching 2415716.6 lps (10.0 s, 7 samples)
> > >  Process Creation 36873.1 lps (30.0 s, 2 samples)
> > >  Shell Scripts (1 concurrent) 51464.1 lpm (60.1 s, 2 samples)
> > >  Shell Scripts (8 concurrent) 5976.3 lpm (60.4 s, 2 samples)
> > >  System Call Overhead 9182389.5 lps (10.0 s, 7 samples)
> > >
> > >  System Benchmarks Index Values BASELINE RESULT INDEX
> > >  Dhrystone 2 using register variables 116700.0 2903340286.5 248786.7
> > >  Double-Precision Whetstone 55.0 504137.7 91661.4
> > >  Execl Throughput 43.0 34332.8 7984.4
> > >  File Copy 1024 bufsize 2000 maxblocks 3960.0 311391.2 786.3
> > >  File Copy 256 bufsize 500 maxblocks 1655.0 72503.3 438.1
> > >  File Copy 4096 bufsize 8000 maxblocks 5800.0 1000861.7 1725.6
> > >  Pipe Throughput 12440.0 179382076.6 144197.8
> > >  Pipe-based Context Switching 4000.0 2415716.6 6039.3
> > >  Process Creation 126.0 36873.1 2926.4
> > >  Shell Scripts (1 concurrent) 42.4 51464.1 12137.8
> > >  Shell Scripts (8 concurrent) 6.0 5976.3 9960.6
> > >  System Call Overhead 15000.0 9182389.5 6121.6
> > >  ========
> > >  System Benchmarks Index Score 8759.8
> > >
> > I think your tests are incomplete. Performance test is just a very
> > basic test, you should make sure all dynamic code modification
> > mechanisms are correct.
> >
> > At least these should be verified one by one: jump label, kgdb, bpf,
> > ftrace, kprobes, uprobes....
> >
> > You have modified ftrace, but not the best method that Ihae suggested in V3.
> >
> > You completely ignore my suggestion about kprobes in V3.
> >
> > For KGDB, you should use text_mutex to protect copy_to_kernel_nofault().
> >
> > For uprobes, I have no idea, maybe @Tiezhu can give some suggestions.
> >
> > Huacai
> >
>  For fentry:
>
>  ftrace_arch_code_modify_prepare() and ftrace_arch_code_modify_post_process() are not in the call path of ftrace_init_nop(),
>
>  so the text_mutex lock was added in ftrace_modify_code(). However, I overlooked that ftrace_init_nop() also holds the text_mutex lock (e.g., on RISC-V).
>  I am fixing this.
No, RISC-V is right, use ftrace_arch_code_modify_prepare() and
ftrace_arch_code_modify_post_process() to protect all common cases,
and use text_mutex to protect the only exception, ftrace_init_nop().

>
>  For kprobes:
>
>  text_mutex is already held in the caller arm_kprobe(), which then calls arch_arm_kprobe->larch_insn_text_copy. So the kprobes path is properly protected.
No, I doubt you didn't see my link, we are talking about
arch_prepare_ss_slot(). Though it seems the framework has hold
text_mutex, but you don't use the larch_insn_patch_text() API, who
handle the RWX?

>
>  For kgdb:
>
>  When KGDB enters do_single_step(), it has already stopped execution on all other CPUs. Only the current CPU is running KGDB logic. Do we need to worry about concurrency issues in this case? Also, Sleepable lock is also disallowed in exception context.
The same as above, if you use copy_to_kernel_nofault() directly, who
handle the RWX?


Huacai

>
>  For uprobes:
>
>  I will check the call path again.
>
>  For test:
>
>  I have already done some tests using BPF tools that utilize kprobes and fentry. I will run more tests using the selftests suite.
>
> Thanks !
> > >
> > > ---
> > >  arch/loongarch/Kconfig | 1 +
> > >  arch/loongarch/kernel/ftrace_dyn.c | 7 ++++++-
> > >  arch/loongarch/kernel/inst.c | 25 +++++++++++++++++++++----
> > >  arch/loongarch/kernel/jump_label.c | 3 +++
> > >  4 files changed, 31 insertions(+), 5 deletions(-)
> > >
> > >  diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> > >  index 606597da46b8..c751d714c287 100644
> > >  --- a/arch/loongarch/Kconfig
> > >  +++ b/arch/loongarch/Kconfig
> > >  @@ -27,6 +27,7 @@ config LOONGARCH
> > >  select ARCH_HAS_PTE_SPECIAL if 64BIT
> > >  select ARCH_HAS_SET_MEMORY
> > >  select ARCH_HAS_SET_DIRECT_MAP
> > >  + select ARCH_HAS_STRICT_MODULE_RWX
> > >  select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> > >  select ARCH_HAS_UBSAN
> > >  select ARCH_HAS_VDSO_ARCH_DATA
> > >  diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
> > >  index d5d81d74034c..598dc6434cc4 100644
> > >  --- a/arch/loongarch/kernel/ftrace_dyn.c
> > >  +++ b/arch/loongarch/kernel/ftrace_dyn.c
> > >  @@ -8,6 +8,7 @@
> > >  #include <linux/ftrace.h>
> > >  #include <linux/kprobes.h>
> > >  #include <linux/uaccess.h>
> > >  +#include <linux/memory.h>
> > >
> > >  #include <asm/inst.h>
> > >  #include <asm/module.h>
> > >  @@ -24,8 +25,12 @@ static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, bool validate)
> > >  return -EINVAL;
> > >  }
> > >
> > >  - if (larch_insn_patch_text((void *)pc, new))
> > >  + mutex_lock(&text_mutex);
> > >  + if (larch_insn_patch_text((void *)pc, new)) {
> > >  + mutex_unlock(&text_mutex);
> > >  return -EPERM;
> > >  + }
> > >  + mutex_unlock(&text_mutex);
> > >
> > >  return 0;
> > >  }
> > >  diff --git a/arch/loongarch/kernel/inst.c b/arch/loongarch/kernel/inst.c
> > >  index 0b9228b7c13a..3de94d465c3c 100644
> > >  --- a/arch/loongarch/kernel/inst.c
> > >  +++ b/arch/loongarch/kernel/inst.c
> > >  @@ -6,12 +6,11 @@
> > >  #include <linux/uaccess.h>
> > >  #include <linux/set_memory.h>
> > >  #include <linux/stop_machine.h>
> > >  +#include <linux/memory.h>
> > >
> > >  #include <asm/cacheflush.h>
> > >  #include <asm/inst.h>
> > >
> > >  -static DEFINE_RAW_SPINLOCK(patch_lock);
> > >  -
> > >  void simu_pc(struct pt_regs *regs, union loongarch_instruction insn)
> > >  {
> > >  unsigned long pc = regs->csr_era;
> > >  @@ -207,14 +206,32 @@ int larch_insn_read(void *addr, u32 *insnp)
> > >  int larch_insn_write(void *addr, u32 insn)
> > >  {
> > >  int ret;
> > >  + int err = 0;
> > >  + size_t start;
> > >  unsigned long flags = 0;
> > >
> > >  if ((unsigned long)addr & 3)
> > >  return -EINVAL;
> > >
> > >  - raw_spin_lock_irqsave(&patch_lock, flags);
> > >  + start = round_down((size_t)addr, PAGE_SIZE);
> > >  +
> > >  + lockdep_assert_held(&text_mutex);
> > >  +
> > >  + err = set_memory_rw(start, 1);
> > >  + if (err) {
> > >  + pr_info("%s: set_memory_rw() failed\n", __func__);
> > >  + return err;
> > >  + }
> > >  +
> > >  + local_irq_save(flags);
> > >  ret = copy_to_kernel_nofault(addr, &insn, LOONGARCH_INSN_SIZE);
> > >  - raw_spin_unlock_irqrestore(&patch_lock, flags);
> > >  + local_irq_restore(flags);
> > >  +
> > >  + err = set_memory_rox(start, 1);
> > >  + if (err) {
> > >  + pr_info("%s: set_memory_rox() failed\n", __func__);
> > >  + return err;
> > >  + }
> > >
> > >  return ret;
> > >  }
> > >  diff --git a/arch/loongarch/kernel/jump_label.c b/arch/loongarch/kernel/jump_label.c
> > >  index 24a3f4d8540c..e6bb040fe4c5 100644
> > >  --- a/arch/loongarch/kernel/jump_label.c
> > >  +++ b/arch/loongarch/kernel/jump_label.c
> > >  @@ -6,6 +6,7 @@
> > >  */
> > >  #include <linux/kernel.h>
> > >  #include <linux/jump_label.h>
> > >  +#include <linux/memory.h>
> > >  #include <asm/cacheflush.h>
> > >  #include <asm/inst.h>
> > >
> > >  @@ -19,7 +20,9 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_t
> > >  else
> > >  insn = larch_insn_gen_nop();
> > >
> > >  + mutex_lock(&text_mutex);
> > >  larch_insn_write(addr, insn);
> > >  + mutex_unlock(&text_mutex);
> > >
> > >  return true;
> > >  }
> > >  --
> > >  2.43.0
> > >
> >
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-14 12:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-13  8:41 [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions haoran.jiang
2026-06-13 10:43 ` Huacai Chen
2026-06-14 10:24   ` haoran.jiang
2026-06-14 12:10     ` Huacai Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.