From: Boqun Feng <boqun.feng@gmail.com>
To: kernel test robot <oliver.sang@intel.com>
Cc: "Suren Baghdasaryan" <surenb@google.com>,
oe-lkp@lists.linux.dev, lkp@intel.com,
linux-kernel@vger.kernel.org,
"Andrew Morton" <akpm@linux-foundation.org>,
"Kent Overstreet" <kent.overstreet@linux.dev>,
"Kees Cook" <keescook@chromium.org>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Alex Gaynor" <alex.gaynor@gmail.com>,
"Alice Ryhl" <aliceryhl@google.com>,
"Andreas Hindborg" <a.hindborg@samsung.com>,
"Benno Lossin" <benno.lossin@proton.me>,
"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
"Christoph Lameter" <cl@linux.com>,
"Dennis Zhou" <dennis@kernel.org>, "Gary Guo" <gary@garyguo.net>,
"Miguel Ojeda" <ojeda@kernel.org>,
"Pasha Tatashin" <pasha.tatashin@soleen.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Tejun Heo" <tj@kernel.org>, "Vlastimil Babka" <vbabka@suse.cz>,
"Wedson Almeida Filho" <wedsonaf@gmail.com>,
linux-mm@kvack.org, lkmm@lists.linux.dev
Subject: Re: [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc
Date: Mon, 22 Jul 2024 10:47:30 -0700 [thread overview]
Message-ID: <Zp6bMoDnUMxNrKos@boqun-archlinux> (raw)
In-Reply-To: <202407191651.f24e499d-oliver.sang@intel.com>
On Mon, Jul 22, 2024 at 03:30:01PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc" on:
>
> commit: 24e44cc22aa3112082f2ee23137d048c73ca96d5 ("mm: percpu: enable per-cpu allocation tagging")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/mater 68b59730459e5d1fe4e0bbeb04ceb9df0f002270]
> [test failed on linux-next/master 73399b58e5e5a1b28a04baf42e321cfcfc663c2f]
>
> in testcase: trinity
> version: trinity-i386-abe9de86-1_20230429
> with following parameters:
>
> runtime: 300s
> group: group-04
> nr_groups: 5
>
>
>
> compiler: gcc-13
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> as we understand, this commit is not the root-cause of KCSAN issue, just make
> the issue change from form (1) to (2). furthermore, we don't understand why
> the issue happens randomly. but we failed to bisect the issue (1). so we just
> make out this report FYI, not sure if it could supply some hints to any real
> issues.
>
> 60fa4a9e23231721 24e44cc22aa3112082f2ee23137
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> 281:999 -28% :998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_block_update_hint_alloc <--- (1)
> 296:999 -30% :998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_block_update_hint_free
> 25:999 -3% :998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc/pcpu_chunk_populated
> :999 29% 292:998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc <---(2)
> :999 27% 269:998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_free
> :999 4% 44:998 dmesg.BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_chunk_populated
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202407191651.f24e499d-oliver.sang@intel.com
>
>
>
> [ 117.357897][ T3648] ==================================================================
> [ 117.358568][ T3648] BUG: KCSAN: data-race in pcpu_alloc_noprof / pcpu_block_update_hint_alloc
> [ 117.359222][ T3648]
> [ 117.359424][ T3648] write to 0xffffffff8559e600 of 4 bytes by task 3653 on cpu 1:
> [ 117.360048][ T3648] pcpu_block_update_hint_alloc (mm/percpu.c:602 mm/percpu.c:598 mm/percpu.c:923)
> [ 117.360554][ T3648] pcpu_alloc_area (mm/percpu.c:1260)
> [ 117.360960][ T3648] pcpu_alloc_noprof (mm/percpu.c:1835)
> [ 117.361393][ T3648] bpf_map_alloc_percpu (kernel/bpf/syscall.c:466)
> [ 117.361825][ T3648] prealloc_init (kernel/bpf/hashtab.c:338)
> [ 117.362211][ T3648] htab_map_alloc (kernel/bpf/hashtab.c:573)
> [ 117.362618][ T3648] map_create (kernel/bpf/syscall.c:1320)
> [ 117.362987][ T3648] __sys_bpf (kernel/bpf/syscall.c:5642)
> [ 117.363348][ T3648] __ia32_sys_bpf (kernel/bpf/syscall.c:5765)
> [ 117.363738][ T3648] ia32_sys_call (kbuild/obj/consumer/x86_64-randconfig-013-20240713/./arch/x86/include/generated/asm/syscalls_32.h:358)
> [ 117.364151][ T3648] do_int80_emulation (arch/x86/entry/common.c:165 (discriminator 1) arch/x86/entry/common.c:253 (discriminator 1))
> [ 117.364578][ T3648] asm_int80_emulation (arch/x86/include/asm/idtentry.h:626)
> [ 117.365004][ T3648]
> [ 117.365203][ T3648] read to 0xffffffff8559e600 of 4 bytes by task 3648 on cpu 0:
> [ 117.365797][ T3648] pcpu_alloc_noprof (mm/percpu.c:1894)
> [ 117.366210][ T3648] bpf_map_alloc_percpu (kernel/bpf/syscall.c:466)
> [ 117.366627][ T3648] prealloc_init (kernel/bpf/hashtab.c:338)
> [ 117.367000][ T3648] htab_map_alloc (kernel/bpf/hashtab.c:573)
> [ 117.367400][ T3648] map_create (kernel/bpf/syscall.c:1320)
> [ 117.367766][ T3648] __sys_bpf (kernel/bpf/syscall.c:5642)
> [ 117.368120][ T3648] __ia32_sys_bpf (kernel/bpf/syscall.c:5765)
> [ 117.368503][ T3648] ia32_sys_call (kbuild/obj/consumer/x86_64-randconfig-013-20240713/./arch/x86/include/generated/asm/syscalls_32.h:358)
> [ 117.368907][ T3648] do_int80_emulation (arch/x86/entry/common.c:165 (discriminator 1) arch/x86/entry/common.c:253 (discriminator 1))
> [ 117.369341][ T3648] asm_int80_emulation (arch/x86/include/asm/idtentry.h:626)
> [ 117.369761][ T3648]
> [ 117.369968][ T3648] value changed: 0x00000003 -> 0x00000000
> [ 117.370454][ T3648]
> [ 117.370662][ T3648] Reported by Kernel Concurrency Sanitizer on:
> [ 117.371179][ T3648] CPU: 0 PID: 3648 Comm: trinity-c2 Not tainted 6.9.0-rc4-00089-g24e44cc22aa3 #1
> [ 117.371943][ T3648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 117.372811][ T3648] ==================================================================
>
This looks like a data race because we read pcpu_nr_empty_pop_pages out
of the lock for a best effort checking, @Tejun, maybe you could confirm
on this?
If so, a fix could be as the follow, i.e. telling KCSAN that the data
race is expected. Another fix is making all accesses to
pcpu_nr_empty_pop_pages atomics (via WRITE_ONCE() and READ_ONCE()).
Regards,
Boqun
(is there a way to tell the bot to issue a rerun with a diff? ;-))
------------------------>8
diff --git a/mm/percpu.c b/mm/percpu.c
index 20d91af8c033..0626ef12099b 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1891,7 +1891,12 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
mutex_unlock(&pcpu_alloc_mutex);
}
- if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
+ /*
+ * Checks pcpu_nr_empty_pop_pages out of the pcpu_lock, data races may
+ * occur but this is just a best-effort checking, everything is synced
+ * in pcpu_balance_work.
+ */
+ if (data_race(pcpu_nr_empty_pop_pages) < PCPU_EMPTY_POP_PAGES_LOW)
pcpu_schedule_balance_work();
/* clear the areas and return address relative to base address */
next prev parent reply other threads:[~2024-07-22 17:47 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-22 7:30 [linus:master] [mm] 24e44cc22a: BUG:KCSAN:data-race_in_pcpu_alloc_noprof/pcpu_block_update_hint_alloc kernel test robot
2024-07-22 17:47 ` Boqun Feng [this message]
2024-07-22 17:52 ` Tejun Heo
2024-07-22 18:03 ` Boqun Feng
2024-07-22 18:15 ` Tejun Heo
2024-07-22 18:27 ` Dennis Zhou
2024-07-22 20:53 ` Boqun Feng
2024-07-23 5:50 ` Dennis Zhou
2024-07-23 6:09 ` Oliver Sang
2024-07-23 6:13 ` Dennis Zhou
2024-07-24 7:10 ` Oliver Sang
2024-07-23 21:14 ` Boqun Feng
2024-07-27 3:15 ` Dennis Zhou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zp6bMoDnUMxNrKos@boqun-archlinux \
--to=boqun.feng@gmail.com \
--cc=a.hindborg@samsung.com \
--cc=akpm@linux-foundation.org \
--cc=alex.gaynor@gmail.com \
--cc=aliceryhl@google.com \
--cc=benno.lossin@proton.me \
--cc=bjorn3_gh@protonmail.com \
--cc=cl@linux.com \
--cc=dennis@kernel.org \
--cc=gary@garyguo.net \
--cc=keescook@chromium.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkmm@lists.linux.dev \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=ojeda@kernel.org \
--cc=oliver.sang@intel.com \
--cc=pasha.tatashin@soleen.com \
--cc=peterz@infradead.org \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=wedsonaf@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.