* [PATCH 0/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock
@ 2024-09-25 14:31 ran xiaokai
2024-09-25 14:31 ` [PATCH 1/4] kcsan, debugfs: Remove redundant call of kallsyms_lookup_name() ran xiaokai
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: ran xiaokai @ 2024-09-25 14:31 UTC (permalink / raw)
To: elver, tglx, dvyukov; +Cc: kasan-dev, linux-kernel, Ran Xiaokai
From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
In a preempt-RT kernel, although most of the irq handler have been
converted to the threaded mode except those which have the
IRQF_NO_THREAD flag set. The hrtimer IRQ is such an example.
So kcsan report could be triggered from a HARD-irq context, this will
trigger the "sleeping function called from invalid context" bug.
[ C1] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ C1] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
[ C1] preempt_count: 10002, expected: 0
[ C1] RCU nest depth: 0, expected: 0
[ C1] no locks held by swapper/1/0.
[ C1] irq event stamp: 156674
[ C1] hardirqs last enabled at (156673): [<ffffffff81130bd9>] do_idle+0x1f9/0x240
[ C1] hardirqs last disabled at (156674): [<ffffffff82254f84>] sysvec_apic_timer_interrupt+0x14/0xc0
[ C1] softirqs last enabled at (0): [<ffffffff81099f47>] copy_process+0xfc7/0x4b60
[ C1] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ C1] Preemption disabled at:
[ C1] [<ffffffff814a3e2a>] paint_ptr+0x2a/0x90
[ C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.11.0+ #3
[ C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[ C1] Call Trace:
[ C1] <IRQ>
[ C1] dump_stack_lvl+0x7e/0xc0
[ C1] dump_stack+0x1d/0x30
[ C1] __might_resched+0x1a2/0x270
[ C1] rt_spin_lock+0x68/0x170
[ C1] ? kcsan_skip_report_debugfs+0x43/0xe0
[ C1] kcsan_skip_report_debugfs+0x43/0xe0
[ C1] ? hrtimer_next_event_without+0x110/0x110
[ C1] print_report+0xb5/0x590
[ C1] kcsan_report_known_origin+0x1b1/0x1d0
[ C1] kcsan_setup_watchpoint+0x348/0x650
[ C1] __tsan_unaligned_write1+0x16d/0x1d0
[ C1] hrtimer_interrupt+0x3d6/0x430
[ C1] __sysvec_apic_timer_interrupt+0xe8/0x3a0
[ C1] sysvec_apic_timer_interrupt+0x97/0xc0
[ C1] </IRQ>
To fix this, we can not simply convert the report_filterlist_lock
to a raw_spinlock_t. In the insert_report_filterlist() path:
raw_spin_lock_irqsave(&report_filterlist_lock, flags);
krealloc
__do_kmalloc_node
slab_alloc_node
__slab_alloc
local_lock_irqsave(&s->cpu_slab->lock, flags)
local_lock_t is now a spinlock_t which is sleepable in preempt-RT
kernel, so kmalloc() and similar functions can not be called with
a raw_spinlock_t lock held.
Instead, we can convert it to rcu lock to fix this.
Ran Xiaokai (4):
kcsan, debugfs: Remove redundant call of kallsyms_lookup_name()
kcsan, debugfs: refactor set_report_filterlist_whitelist() to return a
value
kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock
kcsan, debugfs: avoid updating white/blacklist with the same value
kernel/kcsan/debugfs.c | 192 ++++++++++++++++++++++++++++++++-----------------
1 file changed, 125 insertions(+), 67 deletions(-)
--
2.15.2
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH 1/4] kcsan, debugfs: Remove redundant call of kallsyms_lookup_name() 2024-09-25 14:31 [PATCH 0/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock ran xiaokai @ 2024-09-25 14:31 ` ran xiaokai 2024-10-01 13:57 ` Marco Elver 2024-09-25 14:31 ` [PATCH 2/4] kcsan, debugfs: refactor set_report_filterlist_whitelist() to return a value ran xiaokai ` (2 subsequent siblings) 3 siblings, 1 reply; 11+ messages in thread From: ran xiaokai @ 2024-09-25 14:31 UTC (permalink / raw) To: elver, tglx, dvyukov; +Cc: kasan-dev, linux-kernel, Ran Xiaokai From: Ran Xiaokai <ran.xiaokai@zte.com.cn> There is no need to repeatedly call kallsyms_lookup_name, we can reuse the return value of this function. Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> --- kernel/kcsan/debugfs.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c index 53b21ae30e00..ed483987869e 100644 --- a/kernel/kcsan/debugfs.c +++ b/kernel/kcsan/debugfs.c @@ -181,8 +181,7 @@ static ssize_t insert_report_filterlist(const char *func) } /* Note: deduplicating should be done in userspace. */ - report_filterlist.addrs[report_filterlist.used++] = - kallsyms_lookup_name(func); + report_filterlist.addrs[report_filterlist.used++] = addr; report_filterlist.sorted = false; out: -- 2.15.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/4] kcsan, debugfs: Remove redundant call of kallsyms_lookup_name() 2024-09-25 14:31 ` [PATCH 1/4] kcsan, debugfs: Remove redundant call of kallsyms_lookup_name() ran xiaokai @ 2024-10-01 13:57 ` Marco Elver 0 siblings, 0 replies; 11+ messages in thread From: Marco Elver @ 2024-10-01 13:57 UTC (permalink / raw) To: ran xiaokai; +Cc: tglx, dvyukov, kasan-dev, linux-kernel, Ran Xiaokai On Wed, 25 Sept 2024 at 16:32, ran xiaokai <ranxiaokai627@163.com> wrote: > > From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > There is no need to repeatedly call kallsyms_lookup_name, we can > reuse the return value of this function. > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Reviewed-by: Marco Elver <elver@google.com> > --- > kernel/kcsan/debugfs.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c > index 53b21ae30e00..ed483987869e 100644 > --- a/kernel/kcsan/debugfs.c > +++ b/kernel/kcsan/debugfs.c > @@ -181,8 +181,7 @@ static ssize_t insert_report_filterlist(const char *func) > } > > /* Note: deduplicating should be done in userspace. */ > - report_filterlist.addrs[report_filterlist.used++] = > - kallsyms_lookup_name(func); > + report_filterlist.addrs[report_filterlist.used++] = addr; > report_filterlist.sorted = false; > > out: > -- > 2.15.2 > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/4] kcsan, debugfs: refactor set_report_filterlist_whitelist() to return a value 2024-09-25 14:31 [PATCH 0/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock ran xiaokai 2024-09-25 14:31 ` [PATCH 1/4] kcsan, debugfs: Remove redundant call of kallsyms_lookup_name() ran xiaokai @ 2024-09-25 14:31 ` ran xiaokai 2024-09-26 5:58 ` kernel test robot 2024-10-01 13:56 ` Marco Elver 2024-09-25 14:31 ` [PATCH 3/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock ran xiaokai 2024-09-25 14:31 ` [PATCH 4/4] kcsan, debugfs: avoid updating white/blacklist with the same value ran xiaokai 3 siblings, 2 replies; 11+ messages in thread From: ran xiaokai @ 2024-09-25 14:31 UTC (permalink / raw) To: elver, tglx, dvyukov; +Cc: kasan-dev, linux-kernel, Ran Xiaokai From: Ran Xiaokai <ran.xiaokai@zte.com.cn> This is a preparation patch, when converted to rcu lock, set_report_filterlist_whitelist() may fail due to memory alloction, refactor it to return a value, so the error codes can be passed to the userspace. Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> --- kernel/kcsan/debugfs.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c index ed483987869e..30547507f497 100644 --- a/kernel/kcsan/debugfs.c +++ b/kernel/kcsan/debugfs.c @@ -131,13 +131,14 @@ bool kcsan_skip_report_debugfs(unsigned long func_addr) return ret; } -static void set_report_filterlist_whitelist(bool whitelist) +static ssize_t set_report_filterlist_whitelist(bool whitelist) { unsigned long flags; spin_lock_irqsave(&report_filterlist_lock, flags); report_filterlist.whitelist = whitelist; spin_unlock_irqrestore(&report_filterlist_lock, flags); + return 0; } /* Returns 0 on success, error-code otherwise. */ @@ -225,6 +226,7 @@ debugfs_write(struct file *file, const char __user *buf, size_t count, loff_t *o char kbuf[KSYM_NAME_LEN]; char *arg; const size_t read_len = min(count, sizeof(kbuf) - 1); + ssize_t ret; if (copy_from_user(kbuf, buf, read_len)) return -EFAULT; @@ -242,19 +244,19 @@ debugfs_write(struct file *file, const char __user *buf, size_t count, loff_t *o return -EINVAL; microbenchmark(iters); } else if (!strcmp(arg, "whitelist")) { - set_report_filterlist_whitelist(true); + ret = set_report_filterlist_whitelist(true); } else if (!strcmp(arg, "blacklist")) { - set_report_filterlist_whitelist(false); + ret = set_report_filterlist_whitelist(false); } else if (arg[0] == '!') { - ssize_t ret = insert_report_filterlist(&arg[1]); - - if (ret < 0) - return ret; + ret = insert_report_filterlist(&arg[1]); } else { return -EINVAL; } - return count; + if (ret < 0) + return ret; + else + return count; } static const struct file_operations debugfs_ops = -- 2.15.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/4] kcsan, debugfs: refactor set_report_filterlist_whitelist() to return a value 2024-09-25 14:31 ` [PATCH 2/4] kcsan, debugfs: refactor set_report_filterlist_whitelist() to return a value ran xiaokai @ 2024-09-26 5:58 ` kernel test robot 2024-10-01 13:56 ` Marco Elver 1 sibling, 0 replies; 11+ messages in thread From: kernel test robot @ 2024-09-26 5:58 UTC (permalink / raw) To: ran xiaokai, elver, tglx, dvyukov Cc: llvm, oe-kbuild-all, kasan-dev, linux-kernel, Ran Xiaokai Hi ran, kernel test robot noticed the following build warnings: [auto build test WARNING on linus/master] [also build test WARNING on next-20240925] [cannot apply to v6.11] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/ran-xiaokai/kcsan-debugfs-Remove-redundant-call-of-kallsyms_lookup_name/20240925-231034 base: linus/master patch link: https://lore.kernel.org/r/20240925143154.2322926-3-ranxiaokai627%40163.com patch subject: [PATCH 2/4] kcsan, debugfs: refactor set_report_filterlist_whitelist() to return a value config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20240926/202409261331.9NyGRPt2-lkp@intel.com/config) compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240926/202409261331.9NyGRPt2-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202409261331.9NyGRPt2-lkp@intel.com/ All warnings (new ones prefixed by >>): >> kernel/kcsan/debugfs.c:243:7: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] 243 | if (kstrtoul(&arg[strlen("microbench=")], 0, &iters)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/kcsan/debugfs.c:256:6: note: uninitialized use occurs here 256 | if (ret < 0) | ^~~ kernel/kcsan/debugfs.c:243:3: note: remove the 'if' if its condition is always true 243 | if (kstrtoul(&arg[strlen("microbench=")], 0, &iters)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 244 | return -EINVAL; >> kernel/kcsan/debugfs.c:238:13: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] 238 | } else if (!strcmp(arg, "off")) { | ^~~~~~~~~~~~~~~~~~~ kernel/kcsan/debugfs.c:256:6: note: uninitialized use occurs here 256 | if (ret < 0) | ^~~ kernel/kcsan/debugfs.c:238:9: note: remove the 'if' if its condition is always false 238 | } else if (!strcmp(arg, "off")) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~ 239 | WRITE_ONCE(kcsan_enabled, false); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 240 | } else if (str_has_prefix(arg, "microbench=")) { | ~~~~~~ kernel/kcsan/debugfs.c:236:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] 236 | if (!strcmp(arg, "on")) { | ^~~~~~~~~~~~~~~~~~ kernel/kcsan/debugfs.c:256:6: note: uninitialized use occurs here 256 | if (ret < 0) | ^~~ kernel/kcsan/debugfs.c:236:2: note: remove the 'if' if its condition is always false 236 | if (!strcmp(arg, "on")) { | ^~~~~~~~~~~~~~~~~~~~~~~~~ 237 | WRITE_ONCE(kcsan_enabled, true); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 238 | } else if (!strcmp(arg, "off")) { | ~~~~~~ kernel/kcsan/debugfs.c:229:13: note: initialize the variable 'ret' to silence this warning 229 | ssize_t ret; | ^ | = 0 3 warnings generated. vim +243 kernel/kcsan/debugfs.c dfd402a4c4baae Marco Elver 2019-11-14 222 5cbaefe9743bf1 Ingo Molnar 2019-11-20 223 static ssize_t 5cbaefe9743bf1 Ingo Molnar 2019-11-20 224 debugfs_write(struct file *file, const char __user *buf, size_t count, loff_t *off) dfd402a4c4baae Marco Elver 2019-11-14 225 { dfd402a4c4baae Marco Elver 2019-11-14 226 char kbuf[KSYM_NAME_LEN]; dfd402a4c4baae Marco Elver 2019-11-14 227 char *arg; 43d631bf06ec96 Thorsten Blum 2024-06-24 228 const size_t read_len = min(count, sizeof(kbuf) - 1); 52313281c8b7ca Ran Xiaokai 2024-09-25 229 ssize_t ret; dfd402a4c4baae Marco Elver 2019-11-14 230 dfd402a4c4baae Marco Elver 2019-11-14 231 if (copy_from_user(kbuf, buf, read_len)) dfd402a4c4baae Marco Elver 2019-11-14 232 return -EFAULT; dfd402a4c4baae Marco Elver 2019-11-14 233 kbuf[read_len] = '\0'; dfd402a4c4baae Marco Elver 2019-11-14 234 arg = strstrip(kbuf); dfd402a4c4baae Marco Elver 2019-11-14 235 dfd402a4c4baae Marco Elver 2019-11-14 236 if (!strcmp(arg, "on")) { dfd402a4c4baae Marco Elver 2019-11-14 237 WRITE_ONCE(kcsan_enabled, true); dfd402a4c4baae Marco Elver 2019-11-14 @238 } else if (!strcmp(arg, "off")) { dfd402a4c4baae Marco Elver 2019-11-14 239 WRITE_ONCE(kcsan_enabled, false); a4e74fa5f0d3e2 Marco Elver 2020-07-31 240 } else if (str_has_prefix(arg, "microbench=")) { dfd402a4c4baae Marco Elver 2019-11-14 241 unsigned long iters; dfd402a4c4baae Marco Elver 2019-11-14 242 a4e74fa5f0d3e2 Marco Elver 2020-07-31 @243 if (kstrtoul(&arg[strlen("microbench=")], 0, &iters)) dfd402a4c4baae Marco Elver 2019-11-14 244 return -EINVAL; dfd402a4c4baae Marco Elver 2019-11-14 245 microbenchmark(iters); dfd402a4c4baae Marco Elver 2019-11-14 246 } else if (!strcmp(arg, "whitelist")) { 52313281c8b7ca Ran Xiaokai 2024-09-25 247 ret = set_report_filterlist_whitelist(true); dfd402a4c4baae Marco Elver 2019-11-14 248 } else if (!strcmp(arg, "blacklist")) { 52313281c8b7ca Ran Xiaokai 2024-09-25 249 ret = set_report_filterlist_whitelist(false); dfd402a4c4baae Marco Elver 2019-11-14 250 } else if (arg[0] == '!') { 52313281c8b7ca Ran Xiaokai 2024-09-25 251 ret = insert_report_filterlist(&arg[1]); dfd402a4c4baae Marco Elver 2019-11-14 252 } else { dfd402a4c4baae Marco Elver 2019-11-14 253 return -EINVAL; dfd402a4c4baae Marco Elver 2019-11-14 254 } dfd402a4c4baae Marco Elver 2019-11-14 255 52313281c8b7ca Ran Xiaokai 2024-09-25 256 if (ret < 0) 52313281c8b7ca Ran Xiaokai 2024-09-25 257 return ret; 52313281c8b7ca Ran Xiaokai 2024-09-25 258 else dfd402a4c4baae Marco Elver 2019-11-14 259 return count; dfd402a4c4baae Marco Elver 2019-11-14 260 } dfd402a4c4baae Marco Elver 2019-11-14 261 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/4] kcsan, debugfs: refactor set_report_filterlist_whitelist() to return a value 2024-09-25 14:31 ` [PATCH 2/4] kcsan, debugfs: refactor set_report_filterlist_whitelist() to return a value ran xiaokai 2024-09-26 5:58 ` kernel test robot @ 2024-10-01 13:56 ` Marco Elver 1 sibling, 0 replies; 11+ messages in thread From: Marco Elver @ 2024-10-01 13:56 UTC (permalink / raw) To: ran xiaokai; +Cc: tglx, dvyukov, kasan-dev, linux-kernel, Ran Xiaokai On Wed, 25 Sept 2024 at 16:32, ran xiaokai <ranxiaokai627@163.com> wrote: > > From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > This is a preparation patch, when converted to rcu lock, > set_report_filterlist_whitelist() may fail due to memory alloction, > refactor it to return a value, so the error codes can be > passed to the userspace. > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> > --- > kernel/kcsan/debugfs.c | 18 ++++++++++-------- > 1 file changed, 10 insertions(+), 8 deletions(-) > > diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c > index ed483987869e..30547507f497 100644 > --- a/kernel/kcsan/debugfs.c > +++ b/kernel/kcsan/debugfs.c > @@ -131,13 +131,14 @@ bool kcsan_skip_report_debugfs(unsigned long func_addr) > return ret; > } > > -static void set_report_filterlist_whitelist(bool whitelist) > +static ssize_t set_report_filterlist_whitelist(bool whitelist) > { > unsigned long flags; > > spin_lock_irqsave(&report_filterlist_lock, flags); > report_filterlist.whitelist = whitelist; > spin_unlock_irqrestore(&report_filterlist_lock, flags); > + return 0; > } > > /* Returns 0 on success, error-code otherwise. */ > @@ -225,6 +226,7 @@ debugfs_write(struct file *file, const char __user *buf, size_t count, loff_t *o > char kbuf[KSYM_NAME_LEN]; > char *arg; > const size_t read_len = min(count, sizeof(kbuf) - 1); > + ssize_t ret; This may be uninitialized depending on the branch taken below. > if (copy_from_user(kbuf, buf, read_len)) > return -EFAULT; > @@ -242,19 +244,19 @@ debugfs_write(struct file *file, const char __user *buf, size_t count, loff_t *o > return -EINVAL; > microbenchmark(iters); > } else if (!strcmp(arg, "whitelist")) { > - set_report_filterlist_whitelist(true); > + ret = set_report_filterlist_whitelist(true); > } else if (!strcmp(arg, "blacklist")) { > - set_report_filterlist_whitelist(false); > + ret = set_report_filterlist_whitelist(false); > } else if (arg[0] == '!') { > - ssize_t ret = insert_report_filterlist(&arg[1]); > - > - if (ret < 0) > - return ret; > + ret = insert_report_filterlist(&arg[1]); > } else { > return -EINVAL; > } > > - return count; > + if (ret < 0) > + return ret; > + else > + return count; > } > > static const struct file_operations debugfs_ops = > -- > 2.15.2 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock 2024-09-25 14:31 [PATCH 0/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock ran xiaokai 2024-09-25 14:31 ` [PATCH 1/4] kcsan, debugfs: Remove redundant call of kallsyms_lookup_name() ran xiaokai 2024-09-25 14:31 ` [PATCH 2/4] kcsan, debugfs: refactor set_report_filterlist_whitelist() to return a value ran xiaokai @ 2024-09-25 14:31 ` ran xiaokai 2024-10-01 14:13 ` Marco Elver 2024-09-25 14:31 ` [PATCH 4/4] kcsan, debugfs: avoid updating white/blacklist with the same value ran xiaokai 3 siblings, 1 reply; 11+ messages in thread From: ran xiaokai @ 2024-09-25 14:31 UTC (permalink / raw) To: elver, tglx, dvyukov; +Cc: kasan-dev, linux-kernel, Ran Xiaokai From: Ran Xiaokai <ran.xiaokai@zte.com.cn> In a preempt-RT kernel, most of the irq handlers have been converted to the threaded mode except those which have the IRQF_NO_THREAD flag set. The hrtimer IRQ is such an example. So kcsan report could be triggered from a HARD-irq context, this will trigger the "sleeping function called from invalid context" bug. [ C1] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [ C1] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 [ C1] preempt_count: 10002, expected: 0 [ C1] RCU nest depth: 0, expected: 0 [ C1] no locks held by swapper/1/0. [ C1] irq event stamp: 156674 [ C1] hardirqs last enabled at (156673): [<ffffffff81130bd9>] do_idle+0x1f9/0x240 [ C1] hardirqs last disabled at (156674): [<ffffffff82254f84>] sysvec_apic_timer_interrupt+0x14/0xc0 [ C1] softirqs last enabled at (0): [<ffffffff81099f47>] copy_process+0xfc7/0x4b60 [ C1] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ C1] Preemption disabled at: [ C1] [<ffffffff814a3e2a>] paint_ptr+0x2a/0x90 [ C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.11.0+ #3 [ C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 [ C1] Call Trace: [ C1] <IRQ> [ C1] dump_stack_lvl+0x7e/0xc0 [ C1] dump_stack+0x1d/0x30 [ C1] __might_resched+0x1a2/0x270 [ C1] rt_spin_lock+0x68/0x170 [ C1] ? kcsan_skip_report_debugfs+0x43/0xe0 [ C1] kcsan_skip_report_debugfs+0x43/0xe0 [ C1] ? hrtimer_next_event_without+0x110/0x110 [ C1] print_report+0xb5/0x590 [ C1] kcsan_report_known_origin+0x1b1/0x1d0 [ C1] kcsan_setup_watchpoint+0x348/0x650 [ C1] __tsan_unaligned_write1+0x16d/0x1d0 [ C1] hrtimer_interrupt+0x3d6/0x430 [ C1] __sysvec_apic_timer_interrupt+0xe8/0x3a0 [ C1] sysvec_apic_timer_interrupt+0x97/0xc0 [ C1] </IRQ> To fix this, we can not simply convert the report_filterlist_lock to a raw_spinlock_t. In the insert_report_filterlist() path: raw_spin_lock_irqsave(&report_filterlist_lock, flags); krealloc __do_kmalloc_node slab_alloc_node __slab_alloc local_lock_irqsave(&s->cpu_slab->lock, flags) local_lock_t is now a spinlock_t which is sleepable in preempt-RT kernel, so kmalloc() and similar functions can not be called with a raw_spinlock_t lock held. Instead, we can convert it to rcu lock to fix this. Aso introduce a mutex to serialize user-space write operations. Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> --- kernel/kcsan/debugfs.c | 172 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 113 insertions(+), 59 deletions(-) diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c index 30547507f497..d5e624c37125 100644 --- a/kernel/kcsan/debugfs.c +++ b/kernel/kcsan/debugfs.c @@ -40,20 +40,17 @@ static_assert(ARRAY_SIZE(counter_names) == KCSAN_COUNTER_COUNT); * Addresses for filtering functions from reporting. This list can be used as a * whitelist or blacklist. */ -static struct { +struct report_filterlist { unsigned long *addrs; /* array of addresses */ size_t size; /* current size */ int used; /* number of elements used */ bool sorted; /* if elements are sorted */ bool whitelist; /* if list is a blacklist or whitelist */ -} report_filterlist = { - .addrs = NULL, - .size = 8, /* small initial size */ - .used = 0, - .sorted = false, - .whitelist = false, /* default is blacklist */ + struct rcu_head rcu; }; -static DEFINE_SPINLOCK(report_filterlist_lock); + +static DEFINE_MUTEX(rp_flist_mutex); +static struct report_filterlist __rcu *rp_flist; /* * The microbenchmark allows benchmarking KCSAN core runtime only. To run @@ -103,98 +100,141 @@ static int cmp_filterlist_addrs(const void *rhs, const void *lhs) bool kcsan_skip_report_debugfs(unsigned long func_addr) { unsigned long symbolsize, offset; - unsigned long flags; bool ret = false; + struct report_filterlist *list; if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset)) return false; func_addr -= offset; /* Get function start */ - spin_lock_irqsave(&report_filterlist_lock, flags); - if (report_filterlist.used == 0) + rcu_read_lock(); + list = rcu_dereference(rp_flist); + + if (!list) + goto out; + + if (list->used == 0) goto out; /* Sort array if it is unsorted, and then do a binary search. */ - if (!report_filterlist.sorted) { - sort(report_filterlist.addrs, report_filterlist.used, + if (!list->sorted) { + sort(list->addrs, list->used, sizeof(unsigned long), cmp_filterlist_addrs, NULL); - report_filterlist.sorted = true; + list->sorted = true; } - ret = !!bsearch(&func_addr, report_filterlist.addrs, - report_filterlist.used, sizeof(unsigned long), + ret = !!bsearch(&func_addr, list->addrs, + list->used, sizeof(unsigned long), cmp_filterlist_addrs); - if (report_filterlist.whitelist) + if (list->whitelist) ret = !ret; out: - spin_unlock_irqrestore(&report_filterlist_lock, flags); + rcu_read_unlock(); return ret; } static ssize_t set_report_filterlist_whitelist(bool whitelist) { - unsigned long flags; + struct report_filterlist *new_list, *old_list; + ssize_t ret = 0; - spin_lock_irqsave(&report_filterlist_lock, flags); - report_filterlist.whitelist = whitelist; - spin_unlock_irqrestore(&report_filterlist_lock, flags); - return 0; + mutex_lock(&rp_flist_mutex); + old_list = rcu_dereference_protected(rp_flist, + lockdep_is_held(&rp_flist_mutex)); + + new_list = kzalloc(sizeof(*new_list), GFP_KERNEL); + if (!new_list) { + ret = -ENOMEM; + goto out; + } + + memcpy(new_list, old_list, sizeof(struct report_filterlist)); + new_list->whitelist = whitelist; + + rcu_assign_pointer(rp_flist, new_list); + synchronize_rcu(); + kfree(old_list); + +out: + mutex_unlock(&rp_flist_mutex); + return ret; } /* Returns 0 on success, error-code otherwise. */ static ssize_t insert_report_filterlist(const char *func) { - unsigned long flags; unsigned long addr = kallsyms_lookup_name(func); ssize_t ret = 0; + struct report_filterlist *new_list, *old_list; + unsigned long *new_addrs; if (!addr) { pr_err("could not find function: '%s'\n", func); return -ENOENT; } - spin_lock_irqsave(&report_filterlist_lock, flags); + new_list = kzalloc(sizeof(*new_list), GFP_KERNEL); + if (!new_list) + return -ENOMEM; + + mutex_lock(&rp_flist_mutex); + old_list = rcu_dereference_protected(rp_flist, + lockdep_is_held(&rp_flist_mutex)); + memcpy(new_list, old_list, sizeof(struct report_filterlist)); - if (report_filterlist.addrs == NULL) { + if (new_list->addrs == NULL) { /* initial allocation */ - report_filterlist.addrs = - kmalloc_array(report_filterlist.size, - sizeof(unsigned long), GFP_ATOMIC); - if (report_filterlist.addrs == NULL) { - ret = -ENOMEM; - goto out; - } - } else if (report_filterlist.used == report_filterlist.size) { + new_list->addrs = + kmalloc_array(new_list->size, + sizeof(unsigned long), GFP_KERNEL); + if (new_list->addrs == NULL) + goto out_free; + } else if (new_list->used == new_list->size) { /* resize filterlist */ - size_t new_size = report_filterlist.size * 2; - unsigned long *new_addrs = - krealloc(report_filterlist.addrs, - new_size * sizeof(unsigned long), GFP_ATOMIC); - - if (new_addrs == NULL) { - /* leave filterlist itself untouched */ - ret = -ENOMEM; - goto out; - } - - report_filterlist.size = new_size; - report_filterlist.addrs = new_addrs; + size_t new_size = new_list->size * 2; + + new_addrs = kmalloc_array(new_size, + sizeof(unsigned long), GFP_KERNEL); + if (new_addrs == NULL) + goto out_free; + + memcpy(new_addrs, old_list->addrs, + old_list->size * sizeof(unsigned long)); + new_list->size = new_size; + new_list->addrs = new_addrs; + } else { + new_addrs = kmalloc_array(new_list->size, + sizeof(unsigned long), GFP_KERNEL); + if (new_addrs == NULL) + goto out_free; + + memcpy(new_addrs, old_list->addrs, + old_list->size * sizeof(unsigned long)); + new_list->addrs = new_addrs; } /* Note: deduplicating should be done in userspace. */ - report_filterlist.addrs[report_filterlist.used++] = addr; - report_filterlist.sorted = false; - + new_list->addrs[new_list->used++] = addr; + new_list->sorted = false; + + rcu_assign_pointer(rp_flist, new_list); + synchronize_rcu(); + kfree(old_list->addrs); + kfree(old_list); + goto out; + +out_free: + ret = -ENOMEM; + kfree(new_list); out: - spin_unlock_irqrestore(&report_filterlist_lock, flags); - + mutex_unlock(&rp_flist_mutex); return ret; } static int show_info(struct seq_file *file, void *v) { int i; - unsigned long flags; + struct report_filterlist *list; /* show stats */ seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled)); @@ -203,14 +243,15 @@ static int show_info(struct seq_file *file, void *v) atomic_long_read(&kcsan_counters[i])); } + rcu_read_lock(); + list = rcu_dereference(rp_flist); /* show filter functions, and filter type */ - spin_lock_irqsave(&report_filterlist_lock, flags); seq_printf(file, "\n%s functions: %s\n", - report_filterlist.whitelist ? "whitelisted" : "blacklisted", - report_filterlist.used == 0 ? "none" : ""); - for (i = 0; i < report_filterlist.used; ++i) - seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]); - spin_unlock_irqrestore(&report_filterlist_lock, flags); + list->whitelist ? "whitelisted" : "blacklisted", + list->used == 0 ? "none" : ""); + for (i = 0; i < list->used; ++i) + seq_printf(file, " %ps\n", (void *)list->addrs[i]); + rcu_read_unlock(); return 0; } @@ -269,6 +310,19 @@ static const struct file_operations debugfs_ops = static int __init kcsan_debugfs_init(void) { + struct report_filterlist *list; + + list = kzalloc(sizeof(*list), GFP_KERNEL); + if (!list) + return -ENOMEM; + + list->addrs = NULL; + list->size = 8; /* small initial size */ + list->used = 0; + list->sorted = false; + list->whitelist = false; /* default is blacklist */ + rcu_assign_pointer(rp_flist, list); + debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops); return 0; } -- 2.15.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock 2024-09-25 14:31 ` [PATCH 3/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock ran xiaokai @ 2024-10-01 14:13 ` Marco Elver 2024-10-08 6:26 ` Ran Xiaokai 0 siblings, 1 reply; 11+ messages in thread From: Marco Elver @ 2024-10-01 14:13 UTC (permalink / raw) To: ran xiaokai; +Cc: tglx, dvyukov, kasan-dev, linux-kernel, Ran Xiaokai On Wed, Sep 25, 2024 at 02:31PM +0000, ran xiaokai wrote: > From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > In a preempt-RT kernel, most of the irq handlers have been > converted to the threaded mode except those which have the > IRQF_NO_THREAD flag set. The hrtimer IRQ is such an example. > So kcsan report could be triggered from a HARD-irq context, this will > trigger the "sleeping function called from invalid context" bug. > > [ C1] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 > [ C1] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 > [ C1] preempt_count: 10002, expected: 0 > [ C1] RCU nest depth: 0, expected: 0 > [ C1] no locks held by swapper/1/0. > [ C1] irq event stamp: 156674 > [ C1] hardirqs last enabled at (156673): [<ffffffff81130bd9>] do_idle+0x1f9/0x240 > [ C1] hardirqs last disabled at (156674): [<ffffffff82254f84>] sysvec_apic_timer_interrupt+0x14/0xc0 > [ C1] softirqs last enabled at (0): [<ffffffff81099f47>] copy_process+0xfc7/0x4b60 > [ C1] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [ C1] Preemption disabled at: > [ C1] [<ffffffff814a3e2a>] paint_ptr+0x2a/0x90 > [ C1] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.11.0+ #3 > [ C1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 > [ C1] Call Trace: > [ C1] <IRQ> > [ C1] dump_stack_lvl+0x7e/0xc0 > [ C1] dump_stack+0x1d/0x30 > [ C1] __might_resched+0x1a2/0x270 > [ C1] rt_spin_lock+0x68/0x170 > [ C1] ? kcsan_skip_report_debugfs+0x43/0xe0 > [ C1] kcsan_skip_report_debugfs+0x43/0xe0 > [ C1] ? hrtimer_next_event_without+0x110/0x110 > [ C1] print_report+0xb5/0x590 > [ C1] kcsan_report_known_origin+0x1b1/0x1d0 > [ C1] kcsan_setup_watchpoint+0x348/0x650 > [ C1] __tsan_unaligned_write1+0x16d/0x1d0 > [ C1] hrtimer_interrupt+0x3d6/0x430 > [ C1] __sysvec_apic_timer_interrupt+0xe8/0x3a0 > [ C1] sysvec_apic_timer_interrupt+0x97/0xc0 > [ C1] </IRQ> > > To fix this, we can not simply convert the report_filterlist_lock > to a raw_spinlock_t. In the insert_report_filterlist() path: > > raw_spin_lock_irqsave(&report_filterlist_lock, flags); > krealloc > __do_kmalloc_node > slab_alloc_node > __slab_alloc > local_lock_irqsave(&s->cpu_slab->lock, flags) > > local_lock_t is now a spinlock_t which is sleepable in preempt-RT > kernel, so kmalloc() and similar functions can not be called with > a raw_spinlock_t lock held. > > Instead, we can convert it to rcu lock to fix this. > Aso introduce a mutex to serialize user-space write operations. > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> [...] > - spin_lock_irqsave(&report_filterlist_lock, flags); > - if (report_filterlist.used == 0) > + rcu_read_lock(); > + list = rcu_dereference(rp_flist); > + > + if (!list) > + goto out; > + > + if (list->used == 0) > goto out; > > /* Sort array if it is unsorted, and then do a binary search. */ > - if (!report_filterlist.sorted) { > - sort(report_filterlist.addrs, report_filterlist.used, > + if (!list->sorted) { > + sort(list->addrs, list->used, > sizeof(unsigned long), cmp_filterlist_addrs, NULL); > - report_filterlist.sorted = true; > + list->sorted = true; > } This used to be under the report_filterlist_lock, but now there's no protection against this happening concurrently. Sure, at the moment, this is not a problem, because this function is only called under the report_lock which serializes it. Is that intended? > - ret = !!bsearch(&func_addr, report_filterlist.addrs, > - report_filterlist.used, sizeof(unsigned long), > + ret = !!bsearch(&func_addr, list->addrs, > + list->used, sizeof(unsigned long), > cmp_filterlist_addrs); > - if (report_filterlist.whitelist) > + if (list->whitelist) > ret = !ret; [...] > + > + memcpy(new_list, old_list, sizeof(struct report_filterlist)); > + new_list->whitelist = whitelist; > + > + rcu_assign_pointer(rp_flist, new_list); > + synchronize_rcu(); > + kfree(old_list); Why not kfree_rcu()? > +out: > + mutex_unlock(&rp_flist_mutex); > + return ret; > } [...] > + } else { > + new_addrs = kmalloc_array(new_list->size, > + sizeof(unsigned long), GFP_KERNEL); > + if (new_addrs == NULL) > + goto out_free; > + > + memcpy(new_addrs, old_list->addrs, > + old_list->size * sizeof(unsigned long)); > + new_list->addrs = new_addrs; > } Wait, for every insertion it ends up copying the list now? That's very wasteful. In general, this solution seems overly complex, esp. the part where it ends up copying the whole list on _every_ insertion. If the whole point is to avoid kmalloc() under the lock, we can do something much simpler. Please test the patch below - it's much simpler, and in the common case I expect it to rarely throw away the preemptive allocation done outside the critical section because concurrent insertions by the user should be rarely done. Thanks, -- Marco ------ >8 ------ From: Marco Elver <elver@google.com> Date: Tue, 1 Oct 2024 16:00:45 +0200 Subject: [PATCH] kcsan: turn report_filterlist_lock into a raw_spinlock <tbd... please test> Reported-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Signed-off-by: Marco Elver <elver@google.com> --- kernel/kcsan/debugfs.c | 76 +++++++++++++++++++++--------------------- 1 file changed, 38 insertions(+), 38 deletions(-) diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c index 1d1d1b0e4248..5ffb6cc5298b 100644 --- a/kernel/kcsan/debugfs.c +++ b/kernel/kcsan/debugfs.c @@ -46,14 +46,8 @@ static struct { int used; /* number of elements used */ bool sorted; /* if elements are sorted */ bool whitelist; /* if list is a blacklist or whitelist */ -} report_filterlist = { - .addrs = NULL, - .size = 8, /* small initial size */ - .used = 0, - .sorted = false, - .whitelist = false, /* default is blacklist */ -}; -static DEFINE_SPINLOCK(report_filterlist_lock); +} report_filterlist; +static DEFINE_RAW_SPINLOCK(report_filterlist_lock); /* * The microbenchmark allows benchmarking KCSAN core runtime only. To run @@ -110,7 +104,7 @@ bool kcsan_skip_report_debugfs(unsigned long func_addr) return false; func_addr -= offset; /* Get function start */ - spin_lock_irqsave(&report_filterlist_lock, flags); + raw_spin_lock_irqsave(&report_filterlist_lock, flags); if (report_filterlist.used == 0) goto out; @@ -127,7 +121,7 @@ bool kcsan_skip_report_debugfs(unsigned long func_addr) ret = !ret; out: - spin_unlock_irqrestore(&report_filterlist_lock, flags); + raw_spin_unlock_irqrestore(&report_filterlist_lock, flags); return ret; } @@ -135,9 +129,9 @@ static void set_report_filterlist_whitelist(bool whitelist) { unsigned long flags; - spin_lock_irqsave(&report_filterlist_lock, flags); + raw_spin_lock_irqsave(&report_filterlist_lock, flags); report_filterlist.whitelist = whitelist; - spin_unlock_irqrestore(&report_filterlist_lock, flags); + raw_spin_unlock_irqrestore(&report_filterlist_lock, flags); } /* Returns 0 on success, error-code otherwise. */ @@ -145,6 +139,9 @@ static ssize_t insert_report_filterlist(const char *func) { unsigned long flags; unsigned long addr = kallsyms_lookup_name(func); + unsigned long *delay_free = NULL; + unsigned long *new_addrs = NULL; + size_t new_size = 0; ssize_t ret = 0; if (!addr) { @@ -152,32 +149,33 @@ static ssize_t insert_report_filterlist(const char *func) return -ENOENT; } - spin_lock_irqsave(&report_filterlist_lock, flags); +retry_alloc: + /* + * Check if we need an allocation, and re-validate under the lock. Since + * the report_filterlist_lock is a raw, cannot allocate under the lock. + */ + if (data_race(report_filterlist.used == report_filterlist.size)) { + new_size = (report_filterlist.size ?: 4) * 2; + delay_free = new_addrs = kmalloc_array(new_size, sizeof(unsigned long), GFP_KERNEL); + if (!new_addrs) + return -ENOMEM; + } - if (report_filterlist.addrs == NULL) { - /* initial allocation */ - report_filterlist.addrs = - kmalloc_array(report_filterlist.size, - sizeof(unsigned long), GFP_ATOMIC); - if (report_filterlist.addrs == NULL) { - ret = -ENOMEM; - goto out; - } - } else if (report_filterlist.used == report_filterlist.size) { - /* resize filterlist */ - size_t new_size = report_filterlist.size * 2; - unsigned long *new_addrs = - krealloc(report_filterlist.addrs, - new_size * sizeof(unsigned long), GFP_ATOMIC); - - if (new_addrs == NULL) { - /* leave filterlist itself untouched */ - ret = -ENOMEM; - goto out; + raw_spin_lock_irqsave(&report_filterlist_lock, flags); + if (report_filterlist.used == report_filterlist.size) { + /* Check we pre-allocated enough, and retry if not. */ + if (report_filterlist.used >= new_size) { + raw_spin_unlock_irqrestore(&report_filterlist_lock, flags); + kfree(new_addrs); /* kfree(NULL) is safe */ + delay_free = new_addrs = NULL; + goto retry_alloc; } + if (report_filterlist.used) + memcpy(new_addrs, report_filterlist.addrs, report_filterlist.used * sizeof(unsigned long)); + delay_free = report_filterlist.addrs; /* free the old list */ + report_filterlist.addrs = new_addrs; /* switch to the new list */ report_filterlist.size = new_size; - report_filterlist.addrs = new_addrs; } /* Note: deduplicating should be done in userspace. */ @@ -185,8 +183,10 @@ static ssize_t insert_report_filterlist(const char *func) kallsyms_lookup_name(func); report_filterlist.sorted = false; -out: - spin_unlock_irqrestore(&report_filterlist_lock, flags); + raw_spin_unlock_irqrestore(&report_filterlist_lock, flags); + + if (delay_free) + kfree(delay_free); return ret; } @@ -204,13 +204,13 @@ static int show_info(struct seq_file *file, void *v) } /* show filter functions, and filter type */ - spin_lock_irqsave(&report_filterlist_lock, flags); + raw_spin_lock_irqsave(&report_filterlist_lock, flags); seq_printf(file, "\n%s functions: %s\n", report_filterlist.whitelist ? "whitelisted" : "blacklisted", report_filterlist.used == 0 ? "none" : ""); for (i = 0; i < report_filterlist.used; ++i) seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]); - spin_unlock_irqrestore(&report_filterlist_lock, flags); + raw_spin_unlock_irqrestore(&report_filterlist_lock, flags); return 0; } -- 2.46.1.824.gd892dcdcdd-goog ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock 2024-10-01 14:13 ` Marco Elver @ 2024-10-08 6:26 ` Ran Xiaokai 0 siblings, 0 replies; 11+ messages in thread From: Ran Xiaokai @ 2024-10-08 6:26 UTC (permalink / raw) To: elver; +Cc: dvyukov, kasan-dev, linux-kernel, ran.xiaokai, ranxiaokai627, tglx >> - spin_lock_irqsave(&report_filterlist_lock, flags); >> - if (report_filterlist.used == 0) >> + rcu_read_lock(); >> + list = rcu_dereference(rp_flist); >> + >> + if (!list) >> + goto out; >> + >> + if (list->used == 0) >> goto out; >> >> /* Sort array if it is unsorted, and then do a binary search. */ >> - if (!report_filterlist.sorted) { >> - sort(report_filterlist.addrs, report_filterlist.used, >> + if (!list->sorted) { >> + sort(list->addrs, list->used, >> sizeof(unsigned long), cmp_filterlist_addrs, NULL); >> - report_filterlist.sorted = true; >> + list->sorted = true; >> } > >This used to be under the report_filterlist_lock, but now there's no >protection against this happening concurrently. > >Sure, at the moment, this is not a problem, because this function is >only called under the report_lock which serializes it. Is that intended? > >> - ret = !!bsearch(&func_addr, report_filterlist.addrs, >> - report_filterlist.used, sizeof(unsigned long), >> + ret = !!bsearch(&func_addr, list->addrs, >> + list->used, sizeof(unsigned long), >> cmp_filterlist_addrs); >> - if (report_filterlist.whitelist) >> + if (list->whitelist) >> ret = !ret; >[...] >> + >> + memcpy(new_list, old_list, sizeof(struct report_filterlist)); >> + new_list->whitelist = whitelist; >> + >> + rcu_assign_pointer(rp_flist, new_list); >> + synchronize_rcu(); >> + kfree(old_list); > >Why not kfree_rcu()? > >> +out: >> + mutex_unlock(&rp_flist_mutex); >> + return ret; >> } >[...] >> + } else { >> + new_addrs = kmalloc_array(new_list->size, >> + sizeof(unsigned long), GFP_KERNEL); >> + if (new_addrs == NULL) >> + goto out_free; >> + >> + memcpy(new_addrs, old_list->addrs, >> + old_list->size * sizeof(unsigned long)); >> + new_list->addrs = new_addrs; >> } > >Wait, for every insertion it ends up copying the list now? That's very >wasteful. > >In general, this solution seems overly complex, esp. the part where it >ends up copying the whole list on _every_ insertion. > >If the whole point is to avoid kmalloc() under the lock, we can do >something much simpler. > >Please test the patch below - it's much simpler, and in the common case >I expect it to rarely throw away the preemptive allocation done outside >the critical section because concurrent insertions by the user should be >rarely done. I have tested this, it works. Yes, this patch is much simpler. Another consideration for me to convert the spinlock to a RCU lock was that this would reduce the irq-latency when kcsan_skip_report_debugfs() called from hard-irq context, but as you said, insertions by the user should not be a frequent operation, this should not be a problem. >Thanks, >-- Marco > >------ >8 ------ > >From: Marco Elver <elver@google.com> >Date: Tue, 1 Oct 2024 16:00:45 +0200 >Subject: [PATCH] kcsan: turn report_filterlist_lock into a raw_spinlock > ><tbd... please test> > >Reported-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> >Signed-off-by: Marco Elver <elver@google.com> >--- > kernel/kcsan/debugfs.c | 76 +++++++++++++++++++++--------------------- > 1 file changed, 38 insertions(+), 38 deletions(-) > >diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c >index 1d1d1b0e4248..5ffb6cc5298b 100644 >--- a/kernel/kcsan/debugfs.c >+++ b/kernel/kcsan/debugfs.c >@@ -46,14 +46,8 @@ static struct { > int used; /* number of elements used */ > bool sorted; /* if elements are sorted */ > bool whitelist; /* if list is a blacklist or whitelist */ >-} report_filterlist = { >- .addrs = NULL, >- .size = 8, /* small initial size */ >- .used = 0, >- .sorted = false, >- .whitelist = false, /* default is blacklist */ >-}; >-static DEFINE_SPINLOCK(report_filterlist_lock); >+} report_filterlist; >+static DEFINE_RAW_SPINLOCK(report_filterlist_lock); > > /* > * The microbenchmark allows benchmarking KCSAN core runtime only. To run >@@ -110,7 +104,7 @@ bool kcsan_skip_report_debugfs(unsigned long func_addr) > return false; > func_addr -= offset; /* Get function start */ > >- spin_lock_irqsave(&report_filterlist_lock, flags); >+ raw_spin_lock_irqsave(&report_filterlist_lock, flags); > if (report_filterlist.used == 0) > goto out; > >@@ -127,7 +121,7 @@ bool kcsan_skip_report_debugfs(unsigned long func_addr) > ret = !ret; > > out: >- spin_unlock_irqrestore(&report_filterlist_lock, flags); >+ raw_spin_unlock_irqrestore(&report_filterlist_lock, flags); > return ret; > } > >@@ -135,9 +129,9 @@ static void set_report_filterlist_whitelist(bool whitelist) > { > unsigned long flags; > >- spin_lock_irqsave(&report_filterlist_lock, flags); >+ raw_spin_lock_irqsave(&report_filterlist_lock, flags); > report_filterlist.whitelist = whitelist; >- spin_unlock_irqrestore(&report_filterlist_lock, flags); >+ raw_spin_unlock_irqrestore(&report_filterlist_lock, flags); > } > > /* Returns 0 on success, error-code otherwise. */ >@@ -145,6 +139,9 @@ static ssize_t insert_report_filterlist(const char *func) > { > unsigned long flags; > unsigned long addr = kallsyms_lookup_name(func); >+ unsigned long *delay_free = NULL; >+ unsigned long *new_addrs = NULL; >+ size_t new_size = 0; > ssize_t ret = 0; > > if (!addr) { >@@ -152,32 +149,33 @@ static ssize_t insert_report_filterlist(const char *func) > return -ENOENT; > } > >- spin_lock_irqsave(&report_filterlist_lock, flags); >+retry_alloc: >+ /* >+ * Check if we need an allocation, and re-validate under the lock. Since >+ * the report_filterlist_lock is a raw, cannot allocate under the lock. >+ */ >+ if (data_race(report_filterlist.used == report_filterlist.size)) { >+ new_size = (report_filterlist.size ?: 4) * 2; >+ delay_free = new_addrs = kmalloc_array(new_size, sizeof(unsigned long), GFP_KERNEL); >+ if (!new_addrs) >+ return -ENOMEM; >+ } > >- if (report_filterlist.addrs == NULL) { >- /* initial allocation */ >- report_filterlist.addrs = >- kmalloc_array(report_filterlist.size, >- sizeof(unsigned long), GFP_ATOMIC); >- if (report_filterlist.addrs == NULL) { >- ret = -ENOMEM; >- goto out; >- } >- } else if (report_filterlist.used == report_filterlist.size) { >- /* resize filterlist */ >- size_t new_size = report_filterlist.size * 2; >- unsigned long *new_addrs = >- krealloc(report_filterlist.addrs, >- new_size * sizeof(unsigned long), GFP_ATOMIC); >- >- if (new_addrs == NULL) { >- /* leave filterlist itself untouched */ >- ret = -ENOMEM; >- goto out; >+ raw_spin_lock_irqsave(&report_filterlist_lock, flags); >+ if (report_filterlist.used == report_filterlist.size) { >+ /* Check we pre-allocated enough, and retry if not. */ >+ if (report_filterlist.used >= new_size) { >+ raw_spin_unlock_irqrestore(&report_filterlist_lock, flags); >+ kfree(new_addrs); /* kfree(NULL) is safe */ >+ delay_free = new_addrs = NULL; >+ goto retry_alloc; > } > >+ if (report_filterlist.used) >+ memcpy(new_addrs, report_filterlist.addrs, report_filterlist.used * sizeof(unsigned long)); >+ delay_free = report_filterlist.addrs; /* free the old list */ >+ report_filterlist.addrs = new_addrs; /* switch to the new list */ > report_filterlist.size = new_size; >- report_filterlist.addrs = new_addrs; > } > > /* Note: deduplicating should be done in userspace. */ >@@ -185,8 +183,10 @@ static ssize_t insert_report_filterlist(const char *func) > kallsyms_lookup_name(func); > report_filterlist.sorted = false; > >-out: >- spin_unlock_irqrestore(&report_filterlist_lock, flags); >+ raw_spin_unlock_irqrestore(&report_filterlist_lock, flags); >+ >+ if (delay_free) >+ kfree(delay_free); > > return ret; > } >@@ -204,13 +204,13 @@ static int show_info(struct seq_file *file, void *v) > } > > /* show filter functions, and filter type */ >- spin_lock_irqsave(&report_filterlist_lock, flags); >+ raw_spin_lock_irqsave(&report_filterlist_lock, flags); > seq_printf(file, "\n%s functions: %s\n", > report_filterlist.whitelist ? "whitelisted" : "blacklisted", > report_filterlist.used == 0 ? "none" : ""); > for (i = 0; i < report_filterlist.used; ++i) > seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]); >- spin_unlock_irqrestore(&report_filterlist_lock, flags); >+ raw_spin_unlock_irqrestore(&report_filterlist_lock, flags); > > return 0; > } >-- >2.46.1.824.gd892dcdcdd-goog > ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 4/4] kcsan, debugfs: avoid updating white/blacklist with the same value 2024-09-25 14:31 [PATCH 0/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock ran xiaokai ` (2 preceding siblings ...) 2024-09-25 14:31 ` [PATCH 3/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock ran xiaokai @ 2024-09-25 14:31 ` ran xiaokai 2024-10-01 13:56 ` Marco Elver 3 siblings, 1 reply; 11+ messages in thread From: ran xiaokai @ 2024-09-25 14:31 UTC (permalink / raw) To: elver, tglx, dvyukov; +Cc: kasan-dev, linux-kernel, Ran Xiaokai From: Ran Xiaokai <ran.xiaokai@zte.com.cn> When userspace passes a same white/blacklist value as it for now, the update is actually not necessary. Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> --- kernel/kcsan/debugfs.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c index d5e624c37125..6b05115d5b73 100644 --- a/kernel/kcsan/debugfs.c +++ b/kernel/kcsan/debugfs.c @@ -142,6 +142,9 @@ static ssize_t set_report_filterlist_whitelist(bool whitelist) old_list = rcu_dereference_protected(rp_flist, lockdep_is_held(&rp_flist_mutex)); + if (old_list->whitelist == whitelist) + goto out; + new_list = kzalloc(sizeof(*new_list), GFP_KERNEL); if (!new_list) { ret = -ENOMEM; -- 2.15.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] kcsan, debugfs: avoid updating white/blacklist with the same value 2024-09-25 14:31 ` [PATCH 4/4] kcsan, debugfs: avoid updating white/blacklist with the same value ran xiaokai @ 2024-10-01 13:56 ` Marco Elver 0 siblings, 0 replies; 11+ messages in thread From: Marco Elver @ 2024-10-01 13:56 UTC (permalink / raw) To: ran xiaokai; +Cc: tglx, dvyukov, kasan-dev, linux-kernel, Ran Xiaokai On Wed, 25 Sept 2024 at 16:32, ran xiaokai <ranxiaokai627@163.com> wrote: > > From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > When userspace passes a same white/blacklist value as it for now, > the update is actually not necessary. > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> > --- > kernel/kcsan/debugfs.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/kcsan/debugfs.c b/kernel/kcsan/debugfs.c > index d5e624c37125..6b05115d5b73 100644 > --- a/kernel/kcsan/debugfs.c > +++ b/kernel/kcsan/debugfs.c > @@ -142,6 +142,9 @@ static ssize_t set_report_filterlist_whitelist(bool whitelist) > old_list = rcu_dereference_protected(rp_flist, > lockdep_is_held(&rp_flist_mutex)); > > + if (old_list->whitelist == whitelist) > + goto out; Why is this in this patch? It seems like it could just be in the previous one. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-10-08 6:27 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-25 14:31 [PATCH 0/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock ran xiaokai 2024-09-25 14:31 ` [PATCH 1/4] kcsan, debugfs: Remove redundant call of kallsyms_lookup_name() ran xiaokai 2024-10-01 13:57 ` Marco Elver 2024-09-25 14:31 ` [PATCH 2/4] kcsan, debugfs: refactor set_report_filterlist_whitelist() to return a value ran xiaokai 2024-09-26 5:58 ` kernel test robot 2024-10-01 13:56 ` Marco Elver 2024-09-25 14:31 ` [PATCH 3/4] kcsan, debugfs: fix atomic sleep by converting spinlock_t to rcu lock ran xiaokai 2024-10-01 14:13 ` Marco Elver 2024-10-08 6:26 ` Ran Xiaokai 2024-09-25 14:31 ` [PATCH 4/4] kcsan, debugfs: avoid updating white/blacklist with the same value ran xiaokai 2024-10-01 13:56 ` Marco Elver
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox