From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: "Håkon Bugge" <haakon.bugge@oracle.com>,
"Allison Henderson" <allison.henderson@oracle.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Sasha Levin" <sashal@kernel.org>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
rds-devel@oss.oracle.com
Subject: [PATCH AUTOSEL 6.19-5.10] net/rds: Clear reconnect pending bit
Date: Sat, 14 Feb 2026 16:22:42 -0500 [thread overview]
Message-ID: <20260214212452.782265-17-sashal@kernel.org> (raw)
In-Reply-To: <20260214212452.782265-1-sashal@kernel.org>
From: Håkon Bugge <haakon.bugge@oracle.com>
[ Upstream commit b89fc7c2523b2b0750d91840f4e52521270d70ed ]
When canceling the reconnect worker, care must be taken to reset the
reconnect-pending bit. If the reconnect worker has not yet been
scheduled before it is canceled, the reconnect-pending bit will stay
on forever.
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20260203055723.1085751-6-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
### 3. BUG MECHANISM — CLEAR AND CRITICAL
Now the full picture is clear:
**The flow:**
1. `rds_queue_reconnect()` at line 138 sets `RDS_RECONNECT_PENDING` and
queues the delayed work.
2. `rds_connect_worker()` (the worker function) at line 173 clears
`RDS_RECONNECT_PENDING` when it runs.
3. `rds_conn_path_connect_if_down()` at line 911 uses
`test_and_set_bit(RDS_RECONNECT_PENDING, ...)` — if the bit is
already set, it returns without queuing work, trusting that a
reconnect is already pending.
**The bug:**
In `rds_conn_shutdown()`, `cancel_delayed_work_sync()` cancels the
queued worker. If the worker hadn't started yet, it never runs
`clear_bit(RDS_RECONNECT_PENDING, ...)`. The bit stays set. Then when
`rds_conn_path_connect_if_down()` is later called (e.g., when trying to
send data), `test_and_set_bit` finds the bit already set and skips
queuing — **forever**. The connection can never reconnect.
This is a **permanent connection failure** bug — once triggered, the RDS
connection path is effectively dead until the system is rebooted or the
module is reloaded.
### 4. CLASSIFICATION
- **Bug type:** State corruption / logic bug leading to permanent loss
of network connectivity
- **Severity:** HIGH — RDS is used in production Oracle RAC clusters and
RDMA-based environments
- **Trigger:** Race between shutdown and reconnect scheduling —
realistic in production with network flaps
### 5. SCOPE AND RISK
- **Change size:** 1 functional line (plus 1 blank line) — extremely
small and surgical
- **Files changed:** 1 file (`net/rds/connection.c`)
- **Risk of regression:** Very low — `clear_bit` is idempotent. If the
worker already ran and cleared the bit, clearing it again is harmless.
If the worker didn't run, this is the correct fix.
- **The fix is placed correctly:** After `cancel_delayed_work_sync()`
guarantees the worker won't run, and before `rds_queue_reconnect()`
which will set the bit again if needed.
### 6. STABLE KERNEL CRITERIA
- **Obviously correct:** Yes — the logic is straightforward and well-
explained
- **Fixes a real bug:** Yes — permanent loss of RDS connectivity
- **Important issue:** Yes — affects network reliability for RDS users
(Oracle, RDMA environments)
- **Small and contained:** Yes — 1 line in 1 file
- **No new features:** Correct — purely a bug fix
- **No new APIs:** Correct
### 7. DEPENDENCY CHECK
This fix is self-contained. It only adds a `clear_bit()` call after an
existing `cancel_delayed_work_sync()` call. The `RDS_RECONNECT_PENDING`
flag and the surrounding code have been in the kernel for a long time.
This should apply cleanly to any stable tree that has the RDS subsystem.
### Conclusion
This is a textbook stable backport candidate: a tiny, obviously correct,
one-line fix for a serious bug (permanent connection failure) in
networking code used in production environments. The fix has zero risk
of regression due to the idempotent nature of `clear_bit`, and the bug
mechanism is clearly explained and verified through code analysis.
**YES**
net/rds/connection.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/rds/connection.c b/net/rds/connection.c
index ad8027e6f54ef..dbfea6fa11260 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -429,6 +429,8 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
* to the conn hash, so we never trigger a reconnect on this
* conn - the reconnect is always triggered by the active peer. */
cancel_delayed_work_sync(&cp->cp_conn_w);
+
+ clear_bit(RDS_RECONNECT_PENDING, &cp->cp_flags);
rcu_read_lock();
if (!hlist_unhashed(&conn->c_hash_node)) {
rcu_read_unlock();
--
2.51.0
next prev parent reply other threads:[~2026-02-14 21:25 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-14 21:22 [PATCH AUTOSEL 6.19-6.12] wifi: rtw89: ser: enable error IMR after recovering from L1 Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.12] wifi: ath11k: Fix failure to connect to a 6 GHz AP Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.10] myri10ge: avoid uninitialized variable use Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.1] wifi: rtw88: 8822b: Avoid WARNING in rtw8822b_config_trx_mode() Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.12] wifi: rtw89: 8922a: add digital compensation for 2GHz Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] wifi: iwlwifi: mld: Handle rate selection for NAN interface Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] wifi: rtw89: pci: validate sequence number of TX release report Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.1] net: mctp-i2c: fix duplicate reception of old data Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.15] net: hns3: extend HCLGE_FD_AD_QID to 11 bits Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.10] wifi: iwlegacy: add missing mutex protection in il4965_store_tx_power() Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.6] wifi: rtw88: rtw8821cu: Add ID for Mercusys MU6H Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.6] dm: replace -EEXIST with -EBUSY Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] driver core: faux: stop using static struct device Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.12] net: wwan: mhi: Add network support for Foxconn T99W760 Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] wifi: rtw89: Add support for MSI AX1800 Nano (GUAX18N) Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] wifi: rtw89: mcc: reset probe counter when receiving beacon Sasha Levin
2026-02-14 21:22 ` Sasha Levin [this message]
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.10] net: usb: r8152: fix transmit queue timeout Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] PCI/bwctrl: Disable BW controller on Intel P45 using a quirk Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] wifi: rtw89: setting TBTT AGG number when mac port initialization Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.10] netfilter: nf_conntrack: Add allow_clash to generic protocol handler Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] wifi: rtw89: disable EHT protocol by chip capabilities Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.12] ipv6: annotate data-races over sysctl.flowlabel_reflect Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.10] iommu/arm-smmu-v3: Improve CMDQ lock fairness and efficiency Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.1] gro: change the BUG_ON() in gro_pull_from_frag0() Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.10] wifi: ath10k: fix lock protection in ath10k_wmi_event_peer_sta_ps_state_chg() Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] wifi: iwlwifi: mld: Fix primary link selection logic Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.12] wifi: cfg80211: allow only one NAN interface, also in multi radio Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] PCI: dwc: Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link is not up Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.6] wifi: ath12k: fix preferred hardware mode calculation Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.1] wifi: rtw88: fix DTIM period handling when conf->dtim_period is zero Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.18] Bluetooth: hci_qca: Fix SSR (SubSystem Restart) fail when BT_EN is pulled up by hw Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.12] wifi: rtw89: mac: correct page number for CSI response Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.15] ipv6: exthdrs: annotate data-race over multiple sysctl Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] wifi: rtw88: Fix inadvertent sharing of struct ieee80211_supported_band data Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19] wifi: rtw89: 8852au: add support for TP TX30U Plus Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] PCI: Mark Nvidia GB10 to avoid bus reset Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] wifi: ath11k: add pm quirk for Thinkpad Z13/Z16 Gen1 Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] Bluetooth: btusb: Add USB ID 0489:e112 for Realtek 8851BE Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] Bluetooth: btusb: Add support for MediaTek7920 0489:e158 Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19] wifi: rtw89: Add default ID 28de:2432 for RTL8832CU Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] wifi: ath12k: fix mac phy capability parsing Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.1] wifi: rtw89: pci: restore LDO setting after device resume Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] ext4: use reserved metadata blocks when splitting extent on endio Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] jfs: Add missing set_freezable() for freezable kthread Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.1] Bluetooth: btusb: Add new VID/PID for RTL8852CE Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] octeontx2-af: Workaround SQM/PSE stalls by disabling sticky Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] vmw_vsock: bypass false-positive Wnonnull warning with gcc-16 Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] wifi: iwlegacy: add missing mutex protection in il3945_store_measurement() Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] PCI: dwc: ep: Cache MSI outbound iATU mapping Sasha Levin
2026-02-16 1:15 ` Koichiro Den
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] nfc: nxp-nci: remove interrupt trigger type Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19] wifi: rtw89: Add support for D-Link VR Air Bridge (DWA-F18) Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] PCI/AER: Clear stale errors on reporting agents upon probe Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] dm: remove fake timeout to avoid leak request Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] PCI: Add ACS quirk for Qualcomm Hamoa & Glymur Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19] PCI: cadence: Avoid signed 64-bit truncation and invalid sort Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] wifi: iwlwifi: mld: fix chandef start calculation Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] ext4: move ext4_percpu_param_init() before ext4_mb_init() Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] wifi: rtw89: wow: add reason codes for disassociation in WoWLAN mode Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ipv6: annotate data-races in ip6_multipath_hash_{policy,fields}() Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] ipv4: igmp: annotate data-races around idev->mr_maxdelay Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ext4: mark group add fast-commit ineligible Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] wifi: iwlwifi: fix 22000 series SMEM parsing Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] net/rds: No shortcut out of RDS_CONN_ERROR Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] ipv6: annotate data-races in net/ipv6/route.c Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] wifi: iwlwifi: mvm: check the validity of noa_len Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] wifi: rtw89: fix unable to receive probe responses under MLO connection Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] PCI: Enable ACS after configuring IOMMU for OF platforms Sasha Levin
2026-03-18 8:21 ` Thorsten Leemhuis
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.1] PCI: dw-rockchip: Disable BAR 0 and BAR 1 for Root Port Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] wifi: rtw89: regd: 6 GHz power type marks default when inactive Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19] wifi: cfg80211: treat deprecated INDOOR_SP_AP_OLD control value as LPI mode Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] wifi: rtw88: Use devm_kmemdup() in rtw_set_supported_band() Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] Bluetooth: hci_conn: Set link_policy on incoming ACL connections Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] wifi: libertas: fix WARNING in usb_tx_block Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] netfilter: xt_tcpmss: check remaining length before reading optlen Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] bnxt_en: Allow ntuple filters for drops Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] ext4: propagate flags to convert_initialized_extent() Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] net: usb: sr9700: remove code to drive nonexistent multicast filter Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] ptp: ptp_vmclock: add 'VMCLOCK' to ACPI device match Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] jfs: nlink overflow in jfs_rename Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] PCI: Mark ASM1164 SATA controller to avoid bus reset Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ext4: mark group extend fast-commit ineligible Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] openrisc: define arch-specific version of nop() Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] PCI: imx6: Add CLKREQ# override to enable REFCLK for i.MX95 PCIe Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] fsnotify: Shutdown fsnotify before destroying sb's dcache Sasha Levin
2026-02-15 8:11 ` Amir Goldstein
2026-02-17 10:00 ` Jan Kara
2026-02-26 14:09 ` Sasha Levin
2026-02-26 15:57 ` Jan Kara
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] ipv4: fib: Annotate access to struct fib_alias.fa_state Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] PCI: Fix pci_slot_lock () device locking Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] net: sfp: add quirk for Lantech 8330-265D Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] wifi: rtw89: pci: validate release report content before using for RTL8922DE Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] wifi: rtw89: 8922a: set random mac if efuse contains zeroes Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] Bluetooth: btusb: Add device ID for Realtek RTL8761BU Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] rtla: Fix NULL pointer dereference in actions_parse Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] wifi: rtw89: fix potential zero beacon interval in beacon tracking Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.6] iommu/amd: move wait_on_sem() out of spinlock Sasha Levin
2026-02-16 4:27 ` Ankit Soni
2026-02-14 21:24 ` [PATCH AUTOSEL 6.19-5.10] Bluetooth: hci_conn: use mod_delayed_work for active mode timeout Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260214212452.782265-17-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=allison.henderson@oracle.com \
--cc=haakon.bugge@oracle.com \
--cc=kuba@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=rds-devel@oss.oracle.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox