From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 915CC396B98; Tue, 14 Apr 2026 10:47:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776163625; cv=none; b=No6N9K3XZ3zX8NNfzt4Vognzb82vl4sP/vVpMQogxO5Ks+vAatFLycnfFl4875He8Cw6pN9r9/oqh7MnAtkBfuKH1z5jF0auKQdSCBLAmODSSvLVBrpLRx/J0k/yuDvMzjVB2v4EeUilsm7KlFDoJenojMb9jW27YgBTGt1O7S0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776163625; c=relaxed/simple; bh=n9ct7ScsO07xA8DCNZjaYpaEG49x02N8cAQdoUQFVvo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=VPsHI5uhlnoq7D+ai1IhtOrsVV5aCjNAqV+WjKemp3NF0TNn2K5gJWcHJubnpnThP7RzyV8NW0tGPqH67izDC17wpqKntSmjPRTI/OO02yHq7I60w2Yk263LeuM2y5u2A/wkKwD9aD7e+DEBbVL98jJCWxLDgA2lmQL7fKwxV1g= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RhrLIiP7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RhrLIiP7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 74D5AC19425; Tue, 14 Apr 2026 10:47:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776163625; bh=n9ct7ScsO07xA8DCNZjaYpaEG49x02N8cAQdoUQFVvo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=RhrLIiP7sSz4GKbsKQPvWBGl3joH+Mk7pakrFmbjX0aKPTQWye1YYpgKTmOnJ38Uw gbIQuVrKcOP7tBWqwGnQGz4bOT4SdeBsuQVlb28Oxg3fGngxVGAMgjpEE65YDJ4S7Q +9sOhYbW/azERbzv8rPNyGuaBqnVmPH2GB4mJm08gFZmN1H8ExyyZJGuK8S9AfN0Wm BZJ0ULLFHEJgrxygSmxe6H3Nd+lHsaNVQfuPlvVq8ResmL6SZmC9ekVTw8ED6UPCuO qwuLOtLM/YEdcyBxLvWt2Okm65yyWAvjLfBcAjZ0yfP71HdBnbzMzNkHNDmP3pXPf3 OkIIkRtbpfB/Q== Date: Tue, 14 Apr 2026 13:47:01 +0300 From: Leon Romanovsky To: Jason Gunthorpe Cc: Jiri Pirko , syzbot , "linux-kernel@vger.kernel.org" , "linux-rdma@vger.kernel.org" , "syzkaller-bugs@googlegroups.com" Subject: Re: [syzbot] [rdma?] WARNING in ib_dealloc_device Message-ID: <20260414104701.GB361495@unreal> References: <69dc3310.a00a0220.475f0.0018.GAE@google.com> <20260413154353.GK21470@unreal> <20260413174228.GQ3694781@ziepe.ca> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260413174228.GQ3694781@ziepe.ca> On Mon, Apr 13, 2026 at 02:42:28PM -0300, Jason Gunthorpe wrote: > On Mon, Apr 13, 2026 at 04:12:09PM +0000, Jiri Pirko wrote: > > Will check it tmrw > > I fed it to Claude and after 40 mins it is stumped too.. It should not > be possible for this to happen. Interesting, I used Chris's prompts for this debug and got the following suggestions (CONFIG_PREEMPT_RT=y in this .config): ------------------------------------------------------------------------ REMAINING HYPOTHESES ------------------------------------------------------------------------ 1. PREEMPT_RT rwsem behavior (most likely for syzkaller SOFTLOCKUP trigger): Under PREEMPT_RT, down_write/down_read use rt_mutex internally. Priority inheritance and preemption semantics differ from non-RT. There may be a window in the rwsem downgrade path inside enable_device_and_get (which downgrades from WRITE to READ after setting DEVICE_REGISTERED) that allows a concurrent disable_device to observe an inconsistent state. Specifically: enable_device_and_get does: down_write(devices_rwsem) xa_set_mark(DEVICE_REGISTERED) downgrade_write(devices_rwsem) [WRITE -> READ] add_compat_devs() up_read(devices_rwsem) Under PREEMPT_RT, could disable_device acquire WRITE between the xa_set_mark and downgrade_write? If so, it would clear DEVICE_REGISTERED while add_compat_devs is about to run (but hasn't yet seen the mark cleared). 2. xa_for_each skipping entries during concurrent xa_erase restructuring: If rdma_dev_exit_net's remove_one_compat_dev erases an entry concurrently with remove_compat_devs iterating, xas_shrink (called inside xa_erase) could restructure the xarray tree. If xa_find_after then traverses a restructured tree and skips a subsequent entry, that entry remains in compat_devs. This is subtle: xa_erase takes the xarray spinlock (or rt_mutex), but xa_for_each calls xa_find_after under RCU. The RCU read side might see a partially-restructured tree that looks different from the spinlock-visible view. Under PREEMPT_RT, RCU critical sections can be longer. 3. rdma_compatdev_set (ib_devices_shared_netns sysctl) race: add_all_compat_devs() is guarded by DEVICE_REGISTERED + devices_rwsem, so the same analysis as T3a applies and the race is eliminated. However, if there is a remove_all_compat_devs() implementation, its interaction with the unregistration flow deserves verification. ------------------------------------------------------------------------ RECOMMENDED NEXT STEPS ------------------------------------------------------------------------ 1. Add WARN_ON with stack trace inside add_one_compat_dev (compat_devs_mutex held) that fires if DEVICE_REGISTERED is not set. If this never fires, the insertion is always properly gated. If it fires, the unmarked insertion path is identified. 2. Add ftrace or kprobe on remove_compat_devs and add_one_compat_dev to capture the exact sequence of events. The key question: does any add_one_compat_dev call happen AFTER remove_compat_devs for the same device? 3. Check whether the bug exists on non-PREEMPT_RT kernels. If only PREEMPT_RT is affected, hypothesis 1 (rwsem downgrade race) or hypothesis 2 (RCU/xarray interaction) is more likely. 4. Look at the kernel version of the syzkaller report. Check git log for any changes to drivers/infiniband/core/device.c around the report date that may have introduced or fixed the issue. 5. Investigate enable_device_and_get's downgrade_write() path -- specifically whether a concurrent disable_device can observe DEVICE_REGISTERED set between xa_set_mark and the downgrade, then fail to clear it before add_compat_devs runs. ------------------------------------------------------------------------