* [bigeasy-staging:futex_local_v6] [futex] 865221325b: WARNING:at_lib/rcuref.c:#rcuref_put_slowpath
@ 2024-12-31 8:41 kernel test robot
2025-01-16 23:31 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 5+ messages in thread
From: kernel test robot @ 2024-12-31 8:41 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: oe-lkp, lkp, oliver.sang
Hello,
kernel test robot noticed "WARNING:at_lib/rcuref.c:#rcuref_put_slowpath" on:
commit: 865221325b6a45c5d0777614ccad2e49aa82722b ("futex: Resize local futex hash table based on number of threads.")
https://git.kernel.org/cgit/linux/kernel/git/bigeasy/staging.git futex_local_v6
in testcase: phoronix-test-suite
version:
with following parameters:
test: build-erlang-1.2.0
cpufreq_governor: performance
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202412311453.9d7636a2-lkp@intel.com
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241231/202412311453.9d7636a2-lkp@intel.com
[ 184.835701][T52159] ------------[ cut here ]------------
[ 184.841511][T52159] rcuref - imbalanced put()
[ 184.841536][T52159] WARNING: CPU: 64 PID: 52159 at lib/rcuref.c:267 rcuref_put_slowpath+0x66/0x70
[ 184.855547][T52159] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio qrtr sg binfmt_misc fuse loop dm_mod ip_tables overlay btrfs blake2b_generic xor raid6_pq libcrc32c intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac skx_edac_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irdma ast sd_mod crct10dif_pclmul crc32_pclmul drm_client_lib crc32c_intel ice snd_pcm drm_shmem_helper ghash_clmulni_intel snd_timer ipmi_ssif gnss drm_kms_helper ahci rapl snd ib_uverbs libahci intel_cstate mei_me acpi_power_meter soundcore ioatdma i2c_i801 intel_uncore drm ib_core libata mei pcspkr ipmi_si acpi_ipmi lpc_ich joydev intel_pch_thermal i2c_smbus dca wmi ipmi_devintf ipmi_msghandler acpi_pad
[ 184.944417][T52159] CPU: 64 UID: 0 PID: 52159 Comm: 1_aux Tainted: G S 6.13.0-rc3-00011-g865221325b6a #1
[ 184.955911][T52159] Tainted: [S]=CPU_OUT_OF_SPEC
[ 184.961148][T52159] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 184.972898][T52159] RIP: 0010:rcuref_put_slowpath+0x66/0x70
[ 184.979101][T52159] Code: 31 c0 eb f0 80 3d 76 fb bd 01 00 74 0a c7 03 00 00 00 e0 31 c0 eb dd 48 c7 c7 02 4e b6 82 c6 05 5c fb bd 01 01 e8 da 49 96 ff <0f> 0b eb df 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90
[ 184.999849][T52159] RSP: 0018:ffffc90025ef7ca0 EFLAGS: 00010282
[ 185.006454][T52159] RAX: 0000000000000000 RBX: ffff88c0a1e54000 RCX: 0000000000000027
[ 185.014975][T52159] RDX: ffff88bf7fe20c08 RSI: 0000000000000001 RDI: ffff88bf7fe20c00
[ 185.023484][T52159] RBP: ffffc90025ef7ce8 R08: 0000000000000000 R09: 0000000000000003
[ 185.031952][T52159] R10: ffffc90025ef7b40 R11: ffffffff831e10c8 R12: 0000000000000000
[ 185.040433][T52159] R13: ffff88c10e034080 R14: 00000000ffffffff R15: 00000000ffffffff
[ 185.048952][T52159] FS: 00007f8fcb36a6c0(0000) GS:ffff88bf7fe00000(0000) knlGS:0000000000000000
[ 185.058426][T52159] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 185.065570][T52159] CR2: 00007f8fcd37f5d0 CR3: 00000041449c0003 CR4: 00000000007726f0
[ 185.074137][T52159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 185.082692][T52159] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 185.091244][T52159] PKRU: 55555554
[ 185.095270][T52159] Call Trace:
[ 185.099146][T52159] <TASK>
[ 185.102658][T52159] ? __warn+0x85/0x130
[ 185.107272][T52159] ? rcuref_put_slowpath+0x66/0x70
[ 185.112930][T52159] ? report_bug+0x150/0x170
[ 185.117954][T52159] ? prb_read_valid+0x1b/0x30
[ 185.122926][T52159] ? handle_bug+0x53/0xb0
[ 185.127813][T52159] ? exc_invalid_op+0x17/0x70
[ 185.133022][T52159] ? asm_exc_invalid_op+0x1a/0x20
[ 185.138520][T52159] ? rcuref_put_slowpath+0x66/0x70
[ 185.144224][T52159] futex_hash_put+0x2c/0x30
[ 185.149286][T52159] futex_wait_queue+0x4a/0xb0
[ 185.154497][T52159] __futex_wait+0x146/0x1b0
[ 185.159536][T52159] ? __pfx_futex_wake_mark+0x10/0x10
[ 185.165341][T52159] futex_wait+0x6c/0x130
[ 185.170135][T52159] ? __handle_mm_fault+0x2f6/0x6f0
[ 185.175497][T52159] do_futex+0x11d/0x1f0
[ 185.179885][T52159] __x64_sys_futex+0x77/0x1f0
[ 185.184795][T52159] do_syscall_64+0x7d/0x170
[ 185.189523][T52159] ? do_user_addr_fault+0x342/0x670
[ 185.194951][T52159] ? clear_bhb_loop+0x45/0xa0
[ 185.199852][T52159] ? clear_bhb_loop+0x45/0xa0
[ 185.204743][T52159] ? clear_bhb_loop+0x45/0xa0
[ 185.209631][T52159] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 185.215723][T52159] RIP: 0033:0x7f900e516719
[ 185.220324][T52159] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48
[ 185.240399][T52159] RSP: 002b:00007f8fcb369d28 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[ 185.248987][T52159] RAX: ffffffffffffffda RBX: 00007f8fcdc305d0 RCX: 00007f900e516719
[ 185.257134][T52159] RDX: 00000000ffffffff RSI: 0000000000000080 RDI: 00007f8fcdc305d0
[ 185.265258][T52159] RBP: 00007f8fcb369d90 R08: 0000000000000000 R09: 0000000000000000
[ 185.273369][T52159] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8fcdc305c0
[ 185.281471][T52159] R13: 0000000000000000 R14: 0000000000000002 R15: 00007f8fcb369e30
[ 185.289572][T52159] </TASK>
[ 185.292733][T52159] ---[ end trace 0000000000000000 ]---
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [bigeasy-staging:futex_local_v6] [futex] 865221325b: WARNING:at_lib/rcuref.c:#rcuref_put_slowpath 2024-12-31 8:41 [bigeasy-staging:futex_local_v6] [futex] 865221325b: WARNING:at_lib/rcuref.c:#rcuref_put_slowpath kernel test robot @ 2025-01-16 23:31 ` Sebastian Andrzej Siewior 2025-01-17 8:49 ` Thomas Gleixner 0 siblings, 1 reply; 5+ messages in thread From: Sebastian Andrzej Siewior @ 2025-01-16 23:31 UTC (permalink / raw) To: Thomas Gleixner, Peter Zijlstra; +Cc: oe-lkp, lkp, kernel test robot On 2024-12-31 16:41:41 [+0800], kernel test robot wrote: > kernel test robot noticed "WARNING:at_lib/rcuref.c:#rcuref_put_slowpath" on: > > commit: 865221325b6a45c5d0777614ccad2e49aa82722b ("futex: Resize local futex hash table based on number of threads.") > https://git.kernel.org/cgit/linux/kernel/git/bigeasy/staging.git futex_local_v6 … > [ 184.841511][T52159] rcuref - imbalanced put() > [ 184.841536][T52159] WARNING: CPU: 64 PID: 52159 at lib/rcuref.c:267 rcuref_put_slowpath+0x66/0x70 > [ 184.944417][T52159] CPU: 64 UID: 0 PID: 52159 Comm: 1_aux Tainted: G S 6.13.0-rc3-00011-g865221325b6a #1 … > [ 184.972898][T52159] RIP: 0010:rcuref_put_slowpath+0x66/0x70 … > [ 185.095270][T52159] Call Trace: > [ 185.099146][T52159] <TASK> > [ 185.144224][T52159] futex_hash_put+0x2c/0x30 > [ 185.149286][T52159] futex_wait_queue+0x4a/0xb0 > [ 185.154497][T52159] __futex_wait+0x146/0x1b0 > [ 185.165341][T52159] futex_wait+0x6c/0x130 > [ 185.175497][T52159] do_futex+0x11d/0x1f0 after hours of yelling and cursing: ref = 0 (via rcuref_init(ref, 1)) T1 T2 rcuref_put(ref) -> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff -> rcuref_put_slowpath(ref) rcuref_get(ref) -> atomic_add_negative_relaxed(1, &ref->refcnt) -> return true; # ref -> 0 rcuref_put(ref) -> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff -> rcuref_put_slowpath() -> cnt = atomic_read(&ref->refcnt); # cnt -> 0xffffffff / RCUREF_NOREF -> atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) # ref -> 0xe0000000 / RCUREF_DEAD -> return true -> cnt = atomic_read(&ref->refcnt); # cnt -> 0xe0000000 / RCUREF_DEAD -> if (cnt > RCUREF_RELEASED) # 0xe0000000 > 0xc0000000 -> WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()") Sebastian ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bigeasy-staging:futex_local_v6] [futex] 865221325b: WARNING:at_lib/rcuref.c:#rcuref_put_slowpath 2025-01-16 23:31 ` Sebastian Andrzej Siewior @ 2025-01-17 8:49 ` Thomas Gleixner 2025-01-17 10:03 ` Thomas Gleixner 0 siblings, 1 reply; 5+ messages in thread From: Thomas Gleixner @ 2025-01-17 8:49 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Peter Zijlstra; +Cc: oe-lkp, lkp, kernel test robot On Fri, Jan 17 2025 at 00:31, Sebastian Andrzej Siewior wrote: > On 2024-12-31 16:41:41 [+0800], kernel test robot wrote: >> [ 184.841511][T52159] rcuref - imbalanced put() >> [ 184.841536][T52159] WARNING: CPU: 64 PID: 52159 at lib/rcuref.c:267 rcuref_put_slowpath+0x66/0x70 > > after hours of yelling and cursing: > > ref = 0 (via rcuref_init(ref, 1)) > T1 T2 > rcuref_put(ref) > -> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff > -> rcuref_put_slowpath(ref) > rcuref_get(ref) > -> atomic_add_negative_relaxed(1, &ref->refcnt) > -> return true; # ref -> 0 > > rcuref_put(ref) > -> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff > -> rcuref_put_slowpath() > > -> cnt = atomic_read(&ref->refcnt); # cnt -> 0xffffffff / RCUREF_NOREF > -> atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) # ref -> 0xe0000000 / RCUREF_DEAD > -> return true > -> cnt = atomic_read(&ref->refcnt); # cnt -> 0xe0000000 / RCUREF_DEAD > -> if (cnt > RCUREF_RELEASED) # 0xe0000000 > 0xc0000000 > -> WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()") Duh. Indeed. That's a nasty one. Let me think about it. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bigeasy-staging:futex_local_v6] [futex] 865221325b: WARNING:at_lib/rcuref.c:#rcuref_put_slowpath 2025-01-17 8:49 ` Thomas Gleixner @ 2025-01-17 10:03 ` Thomas Gleixner 2025-01-20 17:08 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 5+ messages in thread From: Thomas Gleixner @ 2025-01-17 10:03 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Peter Zijlstra; +Cc: oe-lkp, lkp, kernel test robot On Fri, Jan 17 2025 at 09:49, Thomas Gleixner wrote: > On Fri, Jan 17 2025 at 00:31, Sebastian Andrzej Siewior wrote: >> >> -> cnt = atomic_read(&ref->refcnt); # cnt -> 0xffffffff / RCUREF_NOREF >> -> atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) # ref -> 0xe0000000 / RCUREF_DEAD >> -> return true >> -> cnt = atomic_read(&ref->refcnt); # cnt -> 0xe0000000 / RCUREF_DEAD >> -> if (cnt > RCUREF_RELEASED) # 0xe0000000 > 0xc0000000 >> -> WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()") > > Duh. Indeed. > > That's a nasty one. Let me think about it. The problem is the disconnect between the decrement and the read in the slow path. The untested below should plug that hole. It results in two extra instructions: 412: f0 83 43 40 ff lock addl $0xffffffff,0x40(%rbx) 417: 78 69 js 482 <dst_release+0x92> vs. 412: be ff ff ff ff mov $0xffffffff,%esi 417: f0 0f c1 73 40 lock xadd %esi,0x40(%rbx) 41c: 83 ee 01 sub $0x1,%esi 41f: 78 69 js 48a <dst_release+0x9a> That's not the end of the world and could be further optimized, but that's a minor detail. Thanks, tglx --- --- a/include/linux/rcuref.h +++ b/include/linux/rcuref.h @@ -71,27 +71,30 @@ static inline __must_check bool rcuref_g return rcuref_get_slowpath(ref); } -extern __must_check bool rcuref_put_slowpath(rcuref_t *ref); +extern __must_check bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt); /* * Internal helper. Do not invoke directly. */ static __always_inline __must_check bool __rcuref_put(rcuref_t *ref) { + int cnt; + RCU_LOCKDEP_WARN(!rcu_read_lock_held() && preemptible(), "suspicious rcuref_put_rcusafe() usage"); /* * Unconditionally decrease the reference count. The saturation and * dead zones provide enough tolerance for this. */ - if (likely(!atomic_add_negative_release(-1, &ref->refcnt))) + cnt = atomic_sub_return_release(1, &ref->refcnt); + if (likely(cnt >= 0)) return false; /* * Handle the last reference drop and cases inside the saturation * and dead zones. */ - return rcuref_put_slowpath(ref); + return rcuref_put_slowpath(ref, cnt); } /** --- a/lib/rcuref.c +++ b/lib/rcuref.c @@ -220,6 +220,7 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath); /** * rcuref_put_slowpath - Slowpath of __rcuref_put() * @ref: Pointer to the reference count + * @cnt: The resulting value of the fastpath decrement * * Invoked when the reference count is outside of the valid zone. * @@ -233,10 +234,8 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath); * with a concurrent get()/put() pair. Caller is not allowed to * deconstruct the protected object. */ -bool rcuref_put_slowpath(rcuref_t *ref) +bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt) { - unsigned int cnt = atomic_read(&ref->refcnt); - /* Did this drop the last reference? */ if (likely(cnt == RCUREF_NOREF)) { /* ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bigeasy-staging:futex_local_v6] [futex] 865221325b: WARNING:at_lib/rcuref.c:#rcuref_put_slowpath 2025-01-17 10:03 ` Thomas Gleixner @ 2025-01-20 17:08 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 5+ messages in thread From: Sebastian Andrzej Siewior @ 2025-01-20 17:08 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Peter Zijlstra, oe-lkp, lkp, kernel test robot On 2025-01-17 11:03:38 [+0100], Thomas Gleixner wrote: > > > > That's a nasty one. Let me think about it. > > The problem is the disconnect between the decrement and the read in the > slow path. The untested below should plug that hole. It results in two > extra instructions: This works, I'm going to reset it with the futex patches so the bot does not choke on it again. From: Thomas Gleixner <tglx@linutronix.de> Date: Sun, 19 Jan 2025 00:55:32 +0100 Subject: [PATCH] rcuref: Avoid false positive "imbalanced put" report. The kernel test robot reported an "imbalanced put" which turned out to be false positive. Consider the following race: ref = 0 (via rcuref_init(ref, 1)) T1 T2 rcuref_put(ref) -> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff -> rcuref_put_slowpath(ref) rcuref_get(ref) -> atomic_add_negative_relaxed(1, &ref->refcnt) -> return true; # ref -> 0 rcuref_put(ref) -> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff -> rcuref_put_slowpath() -> cnt = atomic_read(&ref->refcnt); # cnt -> 0xffffffff / RCUREF_NOREF -> atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) # ref -> 0xe0000000 / RCUREF_DEAD -> return true -> cnt = atomic_read(&ref->refcnt); # cnt -> 0xe0000000 / RCUREF_DEAD -> if (cnt > RCUREF_RELEASED) # 0xe0000000 > 0xc0000000 -> WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()") The problem is the additional read in the slow path (after it decremented to RCUREF_NOREF) which can happen after the counter has been marked RCUREF_DEAD. Avoid the false positive by reusing the returning value from the decrement. Now every "final" put uses RCUREF_NOREF in the slow path and attempts the final cmpxchg() to RCUREF_DEAD. Fixes: ee1ee6db07795 ("atomics: Provide rcuref - scalable reference counting") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202412311453.9d7636a2-lkp@intel.com Debugged-by:: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- include/linux/rcuref.h | 9 ++++++--- lib/rcuref.c | 5 ++--- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/rcuref.h b/include/linux/rcuref.h index 2c8bfd0f1b6b3..6322d8c1c6b42 100644 --- a/include/linux/rcuref.h +++ b/include/linux/rcuref.h @@ -71,27 +71,30 @@ static inline __must_check bool rcuref_get(rcuref_t *ref) return rcuref_get_slowpath(ref); } -extern __must_check bool rcuref_put_slowpath(rcuref_t *ref); +extern __must_check bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt); /* * Internal helper. Do not invoke directly. */ static __always_inline __must_check bool __rcuref_put(rcuref_t *ref) { + int cnt; + RCU_LOCKDEP_WARN(!rcu_read_lock_held() && preemptible(), "suspicious rcuref_put_rcusafe() usage"); /* * Unconditionally decrease the reference count. The saturation and * dead zones provide enough tolerance for this. */ - if (likely(!atomic_add_negative_release(-1, &ref->refcnt))) + cnt = atomic_sub_return_release(1, &ref->refcnt); + if (likely(cnt >= 0)) return false; /* * Handle the last reference drop and cases inside the saturation * and dead zones. */ - return rcuref_put_slowpath(ref); + return rcuref_put_slowpath(ref, cnt); } /** diff --git a/lib/rcuref.c b/lib/rcuref.c index 97f300eca927c..5bd726b71e393 100644 --- a/lib/rcuref.c +++ b/lib/rcuref.c @@ -220,6 +220,7 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath); /** * rcuref_put_slowpath - Slowpath of __rcuref_put() * @ref: Pointer to the reference count + * @cnt: The resulting value of the fastpath decrement * * Invoked when the reference count is outside of the valid zone. * @@ -233,10 +234,8 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath); * with a concurrent get()/put() pair. Caller is not allowed to * deconstruct the protected object. */ -bool rcuref_put_slowpath(rcuref_t *ref) +bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt) { - unsigned int cnt = atomic_read(&ref->refcnt); - /* Did this drop the last reference? */ if (likely(cnt == RCUREF_NOREF)) { /* -- 2.47.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-20 17:08 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-31 8:41 [bigeasy-staging:futex_local_v6] [futex] 865221325b: WARNING:at_lib/rcuref.c:#rcuref_put_slowpath kernel test robot 2025-01-16 23:31 ` Sebastian Andrzej Siewior 2025-01-17 8:49 ` Thomas Gleixner 2025-01-17 10:03 ` Thomas Gleixner 2025-01-20 17:08 ` Sebastian Andrzej Siewior
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.