All of lore.kernel.org
 help / color / mirror / Atom feed
From: haoran.jiang@linux.dev
To: "Huacai Chen" <chenhuacai@kernel.org>
Cc: loongarch@lists.linux.dev, linux-kernel@vger.kernel.org,
	kernel@xen0n.name, akpm@linux-foundation.org, jbohac@suse.cz,
	kees@kernel.org, yangtiezhu@loongson.cn,
	"Haoran Jiang" <jianghaoran@kylinos.cn>
Subject: Re: [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions
Date: Sun, 14 Jun 2026 10:24:37 +0000	[thread overview]
Message-ID: <771c9cbf977e0f724055b2880739afdfb3e7d8a4@linux.dev> (raw)
In-Reply-To: <CAAhV-H5tzMMopjvgLU0wzB8B6hO9QrbkxavD=0e6Sn4Oi4q3vQ@mail.gmail.com>

2026年6月13日 18:43, "Huacai Chen" <chenhuacai@kernel.org mailto:chenhuacai@kernel.org?to=%22Huacai%20Chen%22%20%3Cchenhuacai%40kernel.org%3E > 写到:


> 
> Hi, Haoran,
> 
> On Sat, Jun 13, 2026 at 4:42 PM <haoran.jiang@linux.dev> wrote:
> 
> > 
> > From: Haoran Jiang <jianghaoran@kylinos.cn>
> > 
> >  Enable STRICT_MODULE_RWX to enforce strict memory permissions
> >  on modules,making the code region non-writable, the data region
> >  non-executable, and the read-only data region both non-writable
> >  and non-executable.Temporarily modify code section read/write
> >  permissions via set_memory() API.
> > 
> >  Signed-off-by: Haoran Jiang <jianghaoran@kylinos.cn>
> >  ---
> >  v2:
> >  Change the method of modifying page table permissions from patch_map to set_memory() API.
> > 
> >  v3:
> >  Modify commit description.
> > 
> >  v4:
> >  Add text_mutex lock in the larch_insn_write call path and
> >  CONFIG_STRICT_MODULE_RWX is enabled by default.
> > 
> >  UB test on the 3C6000 server shows no significant performance impact.
> > 
> >  Before patch:
> > 
> >  ========================================================================
> >  BYTE UNIX Benchmarks (Version 5.1.6)
> > 
> >  System: localhost.localdomain: GNU/Linux
> >  OS: GNU/Linux -- 7.1.0-rc6 -- #1 SMP PREEMPT Wed Jun 10 21:07:41 CST 2026
> >  Machine: loongarch64 (loongarch64)
> >  Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
> >  21:19:51 up 1 min, 2 users, load average: 0.71, 0.38, 0.14; runlevel 2026-06-10
> > 
> >  ------------------------------------------------------------------------
> >  128 CPUs in system; running 1 parallel copy of tests
> > 
> >  Dhrystone 2 using register variables 35205725.4 lps (10.0 s, 7 samples)
> >  Double-Precision Whetstone 4244.9 MWIPS (10.0 s, 7 samples)
> >  Execl Throughput 6717.7 lps (30.0 s, 2 samples)
> >  File Copy 1024 bufsize 2000 maxblocks 1213873.8 KBps (30.0 s, 2 samples)
> >  File Copy 256 bufsize 500 maxblocks 350740.5 KBps (30.0 s, 2 samples)
> >  File Copy 4096 bufsize 8000 maxblocks 3275103.0 KBps (30.0 s, 2 samples)
> >  Pipe Throughput 1981993.9 lps (10.0 s, 7 samples)
> >  Pipe-based Context Switching 55287.7 lps (10.0 s, 7 samples)
> >  Process Creation 9056.8 lps (30.0 s, 2 samples)
> >  Shell Scripts (1 concurrent) 6736.5 lpm (60.0 s, 2 samples)
> >  Shell Scripts (8 concurrent) 2109.8 lpm (60.0 s, 2 samples)
> >  System Call Overhead 1549110.9 lps (10.0 s, 7 samples)
> > 
> >  System Benchmarks Index Values BASELINE RESULT INDEX
> >  Dhrystone 2 using register variables 116700.0 35205725.4 3016.8
> >  Double-Precision Whetstone 55.0 4244.9 771.8
> >  Execl Throughput 43.0 6717.7 1562.3
> >  File Copy 1024 bufsize 2000 maxblocks 3960.0 1213873.8 3065.3
> >  File Copy 256 bufsize 500 maxblocks 1655.0 350740.5 2119.3
> >  File Copy 4096 bufsize 8000 maxblocks 5800.0 3275103.0 5646.7
> >  Pipe Throughput 12440.0 1981993.9 1593.2
> >  Pipe-based Context Switching 4000.0 55287.7 138.2
> >  Process Creation 126.0 9056.8 718.8
> >  Shell Scripts (1 concurrent) 42.4 6736.5 1588.8
> >  Shell Scripts (8 concurrent) 6.0 2109.8 3516.4
> >  System Call Overhead 15000.0 1549110.9 1032.7
> >  ========
> >  System Benchmarks Index Score 1492.2
> > 
> >  ------------------------------------------------------------------------
> >  128 CPUs in system; running 128 parallel copies of tests
> > 
> >  Dhrystone 2 using register variables 2901925470.7 lps (10.0 s, 7 samples)
> >  Double-Precision Whetstone 503614.9 MWIPS (10.1 s, 7 samples)
> >  Execl Throughput 34080.1 lps (29.9 s, 2 samples)
> >  File Copy 1024 bufsize 2000 maxblocks 309291.4 KBps (30.0 s, 2 samples)
> >  File Copy 256 bufsize 500 maxblocks 75115.3 KBps (30.0 s, 2 samples)
> >  File Copy 4096 bufsize 8000 maxblocks 1018101.3 KBps (30.0 s, 2 samples)
> >  Pipe Throughput 180702003.1 lps (10.0 s, 7 samples)
> >  Pipe-based Context Switching 2426596.4 lps (10.0 s, 7 samples)
> >  Process Creation 37282.5 lps (30.0 s, 2 samples)
> >  Shell Scripts (1 concurrent) 50813.7 lpm (60.1 s, 2 samples)
> >  Shell Scripts (8 concurrent) 5835.8 lpm (60.4 s, 2 samples)
> >  System Call Overhead 9039181.1 lps (10.0 s, 7 samples)
> > 
> >  System Benchmarks Index Values BASELINE RESULT INDEX
> >  Dhrystone 2 using register variables 116700.0 2901925470.7 248665.4
> >  Double-Precision Whetstone 55.0 503614.9 91566.3
> >  Execl Throughput 43.0 34080.1 7925.6
> >  File Copy 1024 bufsize 2000 maxblocks 3960.0 309291.4 781.0
> >  File Copy 256 bufsize 500 maxblocks 1655.0 75115.3 453.9
> >  File Copy 4096 bufsize 8000 maxblocks 5800.0 1018101.3 1755.3
> >  Pipe Throughput 12440.0 180702003.1 145258.8
> >  Pipe-based Context Switching 4000.0 2426596.4 6066.5
> >  Process Creation 126.0 37282.5 2958.9
> >  Shell Scripts (1 concurrent) 42.4 50813.7 11984.4
> >  Shell Scripts (8 concurrent) 6.0 5835.8 9726.4
> >  System Call Overhead 15000.0 9039181.1 6026.1
> >  ========
> >  System Benchmarks Index Score 8765.2
> > 
> >  After patch:
> > 
> >  128 CPUs in system; running 1 parallel copy of tests
> > 
> >  Dhrystone 2 using register variables 35438193.7 lps (10.0 s, 7 samples)
> >  Double-Precision Whetstone 4245.7 MWIPS (10.0 s, 7 samples)
> >  Execl Throughput 5293.7 lps (30.0 s, 2 samples)
> >  File Copy 1024 bufsize 2000 maxblocks 1233323.4 KBps (30.0 s, 2 samples)
> >  File Copy 256 bufsize 500 maxblocks 355264.5 KBps (30.0 s, 2 samples)
> >  File Copy 4096 bufsize 8000 maxblocks 3333631.6 KBps (30.0 s, 2 samples)
> >  Pipe Throughput 1979613.2 lps (10.0 s, 7 samples)
> >  Pipe-based Context Switching 55675.2 lps (10.0 s, 7 samples)
> >  Process Creation 8528.1 lps (30.0 s, 2 samples)
> >  Shell Scripts (1 concurrent) 6870.0 lpm (60.0 s, 2 samples)
> >  Shell Scripts (8 concurrent) 2115.5 lpm (60.0 s, 2 samples)
> >  System Call Overhead 1546959.4 lps (10.0 s, 7 samples)
> > 
> >  System Benchmarks Index Values BASELINE RESULT INDEX
> >  Dhrystone 2 using register variables 116700.0 35438193.7 3036.7
> >  Double-Precision Whetstone 55.0 4245.7 772.0
> >  Execl Throughput 43.0 5293.7 1231.1
> >  File Copy 1024 bufsize 2000 maxblocks 3960.0 1233323.4 3114.5
> >  File Copy 256 bufsize 500 maxblocks 1655.0 355264.5 2146.6
> >  File Copy 4096 bufsize 8000 maxblocks 5800.0 3333631.6 5747.6
> >  Pipe Throughput 12440.0 1979613.2 1591.3
> >  Pipe-based Context Switching 4000.0 55675.2 139.2
> >  Process Creation 126.0 8528.1 676.8
> >  Shell Scripts (1 concurrent) 42.4 6870.0 1620.3
> >  Shell Scripts (8 concurrent) 6.0 2115.5 3525.8
> >  System Call Overhead 15000.0 1546959.4 1031.3
> >  ========
> >  System Benchmarks Index Score 1465.3
> > 
> >  ------------------------------------------------------------------------
> >  128 CPUs in system; running 128 parallel copies of tests
> > 
> >  Dhrystone 2 using register variables 2903340286.5 lps (10.0 s, 7 samples)
> >  Double-Precision Whetstone 504137.7 MWIPS (10.1 s, 7 samples)
> >  Execl Throughput 34332.8 lps (29.9 s, 2 samples)
> >  File Copy 1024 bufsize 2000 maxblocks 311391.2 KBps (30.0 s, 2 samples)
> >  File Copy 256 bufsize 500 maxblocks 72503.3 KBps (30.0 s, 2 samples)
> >  File Copy 4096 bufsize 8000 maxblocks 1000861.7 KBps (30.0 s, 2 samples)
> >  Pipe Throughput 179382076.6 lps (10.0 s, 7 samples)
> >  Pipe-based Context Switching 2415716.6 lps (10.0 s, 7 samples)
> >  Process Creation 36873.1 lps (30.0 s, 2 samples)
> >  Shell Scripts (1 concurrent) 51464.1 lpm (60.1 s, 2 samples)
> >  Shell Scripts (8 concurrent) 5976.3 lpm (60.4 s, 2 samples)
> >  System Call Overhead 9182389.5 lps (10.0 s, 7 samples)
> > 
> >  System Benchmarks Index Values BASELINE RESULT INDEX
> >  Dhrystone 2 using register variables 116700.0 2903340286.5 248786.7
> >  Double-Precision Whetstone 55.0 504137.7 91661.4
> >  Execl Throughput 43.0 34332.8 7984.4
> >  File Copy 1024 bufsize 2000 maxblocks 3960.0 311391.2 786.3
> >  File Copy 256 bufsize 500 maxblocks 1655.0 72503.3 438.1
> >  File Copy 4096 bufsize 8000 maxblocks 5800.0 1000861.7 1725.6
> >  Pipe Throughput 12440.0 179382076.6 144197.8
> >  Pipe-based Context Switching 4000.0 2415716.6 6039.3
> >  Process Creation 126.0 36873.1 2926.4
> >  Shell Scripts (1 concurrent) 42.4 51464.1 12137.8
> >  Shell Scripts (8 concurrent) 6.0 5976.3 9960.6
> >  System Call Overhead 15000.0 9182389.5 6121.6
> >  ========
> >  System Benchmarks Index Score 8759.8
> > 
> I think your tests are incomplete. Performance test is just a very
> basic test, you should make sure all dynamic code modification
> mechanisms are correct.
> 
> At least these should be verified one by one: jump label, kgdb, bpf,
> ftrace, kprobes, uprobes....
> 
> You have modified ftrace, but not the best method that Ihae suggested in V3.
> 
> You completely ignore my suggestion about kprobes in V3.
> 
> For KGDB, you should use text_mutex to protect copy_to_kernel_nofault().
> 
> For uprobes, I have no idea, maybe @Tiezhu can give some suggestions.
> 
> Huacai
> 
 For fentry:

 ftrace_arch_code_modify_prepare() and ftrace_arch_code_modify_post_process() are not in the call path of ftrace_init_nop(), 

 so the text_mutex lock was added in ftrace_modify_code(). However, I overlooked that ftrace_init_nop() also holds the text_mutex lock (e.g., on RISC-V). 
 I am fixing this.

 For kprobes:

 text_mutex is already held in the caller arm_kprobe(), which then calls arch_arm_kprobe->larch_insn_text_copy. So the kprobes path is properly protected.

 For kgdb:

 When KGDB enters do_single_step(), it has already stopped execution on all other CPUs. Only the current CPU is running KGDB logic. Do we need to worry about concurrency issues in this case? Also, Sleepable lock is also disallowed in exception context.

 For uprobes:

 I will check the call path again.

 For test:

 I have already done some tests using BPF tools that utilize kprobes and fentry. I will run more tests using the selftests suite.

Thanks !
> > 
> > ---
> >  arch/loongarch/Kconfig | 1 +
> >  arch/loongarch/kernel/ftrace_dyn.c | 7 ++++++-
> >  arch/loongarch/kernel/inst.c | 25 +++++++++++++++++++++----
> >  arch/loongarch/kernel/jump_label.c | 3 +++
> >  4 files changed, 31 insertions(+), 5 deletions(-)
> > 
> >  diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> >  index 606597da46b8..c751d714c287 100644
> >  --- a/arch/loongarch/Kconfig
> >  +++ b/arch/loongarch/Kconfig
> >  @@ -27,6 +27,7 @@ config LOONGARCH
> >  select ARCH_HAS_PTE_SPECIAL if 64BIT
> >  select ARCH_HAS_SET_MEMORY
> >  select ARCH_HAS_SET_DIRECT_MAP
> >  + select ARCH_HAS_STRICT_MODULE_RWX
> >  select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> >  select ARCH_HAS_UBSAN
> >  select ARCH_HAS_VDSO_ARCH_DATA
> >  diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
> >  index d5d81d74034c..598dc6434cc4 100644
> >  --- a/arch/loongarch/kernel/ftrace_dyn.c
> >  +++ b/arch/loongarch/kernel/ftrace_dyn.c
> >  @@ -8,6 +8,7 @@
> >  #include <linux/ftrace.h>
> >  #include <linux/kprobes.h>
> >  #include <linux/uaccess.h>
> >  +#include <linux/memory.h>
> > 
> >  #include <asm/inst.h>
> >  #include <asm/module.h>
> >  @@ -24,8 +25,12 @@ static int ftrace_modify_code(unsigned long pc, u32 old, u32 new, bool validate)
> >  return -EINVAL;
> >  }
> > 
> >  - if (larch_insn_patch_text((void *)pc, new))
> >  + mutex_lock(&text_mutex);
> >  + if (larch_insn_patch_text((void *)pc, new)) {
> >  + mutex_unlock(&text_mutex);
> >  return -EPERM;
> >  + }
> >  + mutex_unlock(&text_mutex);
> > 
> >  return 0;
> >  }
> >  diff --git a/arch/loongarch/kernel/inst.c b/arch/loongarch/kernel/inst.c
> >  index 0b9228b7c13a..3de94d465c3c 100644
> >  --- a/arch/loongarch/kernel/inst.c
> >  +++ b/arch/loongarch/kernel/inst.c
> >  @@ -6,12 +6,11 @@
> >  #include <linux/uaccess.h>
> >  #include <linux/set_memory.h>
> >  #include <linux/stop_machine.h>
> >  +#include <linux/memory.h>
> > 
> >  #include <asm/cacheflush.h>
> >  #include <asm/inst.h>
> > 
> >  -static DEFINE_RAW_SPINLOCK(patch_lock);
> >  -
> >  void simu_pc(struct pt_regs *regs, union loongarch_instruction insn)
> >  {
> >  unsigned long pc = regs->csr_era;
> >  @@ -207,14 +206,32 @@ int larch_insn_read(void *addr, u32 *insnp)
> >  int larch_insn_write(void *addr, u32 insn)
> >  {
> >  int ret;
> >  + int err = 0;
> >  + size_t start;
> >  unsigned long flags = 0;
> > 
> >  if ((unsigned long)addr & 3)
> >  return -EINVAL;
> > 
> >  - raw_spin_lock_irqsave(&patch_lock, flags);
> >  + start = round_down((size_t)addr, PAGE_SIZE);
> >  +
> >  + lockdep_assert_held(&text_mutex);
> >  +
> >  + err = set_memory_rw(start, 1);
> >  + if (err) {
> >  + pr_info("%s: set_memory_rw() failed\n", __func__);
> >  + return err;
> >  + }
> >  +
> >  + local_irq_save(flags);
> >  ret = copy_to_kernel_nofault(addr, &insn, LOONGARCH_INSN_SIZE);
> >  - raw_spin_unlock_irqrestore(&patch_lock, flags);
> >  + local_irq_restore(flags);
> >  +
> >  + err = set_memory_rox(start, 1);
> >  + if (err) {
> >  + pr_info("%s: set_memory_rox() failed\n", __func__);
> >  + return err;
> >  + }
> > 
> >  return ret;
> >  }
> >  diff --git a/arch/loongarch/kernel/jump_label.c b/arch/loongarch/kernel/jump_label.c
> >  index 24a3f4d8540c..e6bb040fe4c5 100644
> >  --- a/arch/loongarch/kernel/jump_label.c
> >  +++ b/arch/loongarch/kernel/jump_label.c
> >  @@ -6,6 +6,7 @@
> >  */
> >  #include <linux/kernel.h>
> >  #include <linux/jump_label.h>
> >  +#include <linux/memory.h>
> >  #include <asm/cacheflush.h>
> >  #include <asm/inst.h>
> > 
> >  @@ -19,7 +20,9 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_t
> >  else
> >  insn = larch_insn_gen_nop();
> > 
> >  + mutex_lock(&text_mutex);
> >  larch_insn_write(addr, insn);
> >  + mutex_unlock(&text_mutex);
> > 
> >  return true;
> >  }
> >  --
> >  2.43.0
> >
>

  reply	other threads:[~2026-06-14 10:24 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-13  8:41 [PATCH v4] LoongArch: Enable STRICT_MODULE_RWX for stricter modules memory permissions haoran.jiang
2026-06-13 10:43 ` Huacai Chen
2026-06-14 10:24   ` haoran.jiang [this message]
2026-06-14 12:10     ` Huacai Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=771c9cbf977e0f724055b2880739afdfb3e7d8a4@linux.dev \
    --to=haoran.jiang@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=chenhuacai@kernel.org \
    --cc=jbohac@suse.cz \
    --cc=jianghaoran@kylinos.cn \
    --cc=kees@kernel.org \
    --cc=kernel@xen0n.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=loongarch@lists.linux.dev \
    --cc=yangtiezhu@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.