All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: "Yury Norov [NVIDIA]" <yury.norov@gmail.com>,
	Leon Romanovsky <leon@kernel.org>,
	Sasha Levin <sashal@kernel.org>,
	dennis.dalessandro@cornelisnetworks.com,
	linux-rdma@vger.kernel.org
Subject: [PATCH AUTOSEL 6.16-5.4] RDMA: hfi1: fix possible divide-by-zero in find_hw_thread_mask()
Date: Tue,  5 Aug 2025 09:09:37 -0400	[thread overview]
Message-ID: <20250805130945.471732-62-sashal@kernel.org> (raw)
In-Reply-To: <20250805130945.471732-1-sashal@kernel.org>

From: "Yury Norov [NVIDIA]" <yury.norov@gmail.com>

[ Upstream commit 59f7d2138591ef8f0e4e4ab5f1ab674e8181ad3a ]

The function divides number of online CPUs by num_core_siblings, and
later checks the divider by zero. This implies a possibility to get
and divide-by-zero runtime error. Fix it by moving the check prior to
division. This also helps to save one indentation level.

Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Link: https://patch.msgid.link/20250604193947.11834-3-yury.norov@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees. Here's my
extensive analysis:

## Bug Fix Analysis

The commit fixes a **genuine divide-by-zero bug** in the
`find_hw_thread_mask()` function. The code changes show:

1. **Original bug**: The division `num_cores_per_socket =
   node_affinity.num_online_cpus / affinity->num_core_siblings /
   node_affinity.num_online_nodes` occurs at lines 967-969 BEFORE
   checking if `num_core_siblings > 0` at line 972.

2. **The fix**: Moves the check `if (affinity->num_core_siblings == 0)
   return;` to line 973-974 (in the new code) BEFORE the division
   operation, preventing the divide-by-zero.

## When the Bug Can Trigger

The `num_core_siblings` value is initialized as:
```c
cpumask_weight(topology_sibling_cpumask(cpumask_first(&node_affinity.pro
c.mask)))
```

This can be 0 in several real-world scenarios:
- Single-core systems without SMT/hyperthreading
- Systems where SMT is disabled at runtime
- Virtualized environments with unusual CPU topology
- Certain ARM or other architectures where topology_sibling_cpumask()
  returns empty

## Stable Kernel Criteria Met

1. **Fixes a real bug**: ✓ - Prevents kernel divide-by-zero crash
2. **Small and contained**: ✓ - Only 20 lines changed in one function
3. **No side effects**: ✓ - Early return preserves existing behavior
   when num_core_siblings==0
4. **No architectural changes**: ✓ - Simple defensive programming fix
5. **Clear bug fix**: ✓ - Not a feature or optimization
6. **Low regression risk**: ✓ - Only adds safety check, doesn't change
   logic

## Impact Assessment

- **Severity**: Medium-High - Can cause kernel panic on affected systems
- **Affected systems**: HFI1 InfiniBand hardware on systems with
  specific CPU configurations
- **User impact**: System crash when loading HFI1 driver on vulnerable
  configurations

The commit message clearly states "fix possible divide-by-zero" and the
code change unambiguously moves a zero-check before a division operation
that uses that value as divisor. This is a textbook example of a bug fix
that should be backported to stable kernels to prevent crashes on
systems with certain CPU topologies.

 drivers/infiniband/hw/hfi1/affinity.c | 44 +++++++++++++++------------
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/affinity.c b/drivers/infiniband/hw/hfi1/affinity.c
index 7ead8746b79b..f2c530ab85a5 100644
--- a/drivers/infiniband/hw/hfi1/affinity.c
+++ b/drivers/infiniband/hw/hfi1/affinity.c
@@ -964,31 +964,35 @@ static void find_hw_thread_mask(uint hw_thread_no, cpumask_var_t hw_thread_mask,
 				struct hfi1_affinity_node_list *affinity)
 {
 	int possible, curr_cpu, i;
-	uint num_cores_per_socket = node_affinity.num_online_cpus /
+	uint num_cores_per_socket;
+
+	cpumask_copy(hw_thread_mask, &affinity->proc.mask);
+
+	if (affinity->num_core_siblings == 0)
+		return;
+
+	num_cores_per_socket = node_affinity.num_online_cpus /
 					affinity->num_core_siblings /
 						node_affinity.num_online_nodes;
 
-	cpumask_copy(hw_thread_mask, &affinity->proc.mask);
-	if (affinity->num_core_siblings > 0) {
-		/* Removing other siblings not needed for now */
-		possible = cpumask_weight(hw_thread_mask);
-		curr_cpu = cpumask_first(hw_thread_mask);
-		for (i = 0;
-		     i < num_cores_per_socket * node_affinity.num_online_nodes;
-		     i++)
-			curr_cpu = cpumask_next(curr_cpu, hw_thread_mask);
-
-		for (; i < possible; i++) {
-			cpumask_clear_cpu(curr_cpu, hw_thread_mask);
-			curr_cpu = cpumask_next(curr_cpu, hw_thread_mask);
-		}
+	/* Removing other siblings not needed for now */
+	possible = cpumask_weight(hw_thread_mask);
+	curr_cpu = cpumask_first(hw_thread_mask);
+	for (i = 0;
+	     i < num_cores_per_socket * node_affinity.num_online_nodes;
+	     i++)
+		curr_cpu = cpumask_next(curr_cpu, hw_thread_mask);
 
-		/* Identifying correct HW threads within physical cores */
-		cpumask_shift_left(hw_thread_mask, hw_thread_mask,
-				   num_cores_per_socket *
-				   node_affinity.num_online_nodes *
-				   hw_thread_no);
+	for (; i < possible; i++) {
+		cpumask_clear_cpu(curr_cpu, hw_thread_mask);
+		curr_cpu = cpumask_next(curr_cpu, hw_thread_mask);
 	}
+
+	/* Identifying correct HW threads within physical cores */
+	cpumask_shift_left(hw_thread_mask, hw_thread_mask,
+			   num_cores_per_socket *
+			   node_affinity.num_online_nodes *
+			   hw_thread_no);
 }
 
 int hfi1_get_proc_affinity(int node)
-- 
2.39.5


  parent reply	other threads:[~2025-08-05 13:12 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-05 13:08 [PATCH AUTOSEL 6.16-6.6] mfd: axp20x: Set explicit ID for AXP313 regulator Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.4] MIPS: vpe-mt: add missing prototypes for vpe_{alloc,start,stop,free} Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.10] leds: leds-lp50xx: Handle reg to get correct multi_index Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.4] scsi: bfa: Double-free fix Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.4] pinctrl: stm32: Manage irq affinity settings Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16] PCI: dw-rockchip: Delay link training after hot reset in EP mode Sasha Levin
2025-08-05 13:08   ` Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-6.6] phy: rockchip-pcie: Properly disable TEST_WRITE strobe signal Sasha Levin
2025-08-05 13:08   ` Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-6.6] soundwire: Move handle_nested_irq outside of sdw_dev_lock Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.4] media: uvcvideo: Fix bandwidth issue for Alcor camera Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.15] crypto: hisilicon/hpre - fix dma unmap sequence Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-6.6] soundwire: amd: serialize amd manager resume sequence during pm_prepare Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.15] watchdog: sbsa: Adjust keepalive timeout to avoid MediaTek WS0 race condition Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-6.6] clk: qcom: ipq5018: keep XO clock always on Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16] media: i2c: vd55g1: Fix RATE macros not being expressed in bps Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.4] media: usb: hdpvr: disable zero-length read messages Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-6.15] media: raspberrypi: cfe: Fix min_reqbufs_allocation Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-6.1] hwmon: (emc2305) Set initial PWM minimum value during probe based on thermal state Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-6.12] media: uvcvideo: Add quirk for HP Webcam HD 2300 Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-6.1] drm/amd/display: Only finalize atomic_obj if it was initialized Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.4] vhost: fail early when __vhost_add_used() fails Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-6.12] scsi: lpfc: Ensure HBA_SETUP flag is used only for SLI4 in dev_loss_tmo_callbk Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16] ext4: limit the maximum folio order Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16-5.4] fs/orangefs: use snprintf() instead of sprintf() Sasha Levin
2025-08-05 13:08 ` [PATCH AUTOSEL 6.16] crypto: caam - Support iMX8QXP and variants thereof Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.12] crypto: ccp - Add missing bootloader info reg for pspv6 Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] scsi: lpfc: Check for hdwq null ptr when cleaning up lpfc_vport structure Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] media: dvb-frontends: dib7090p: fix null-ptr-deref in dib7090p_rw_on_apb() Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.15] scsi: pm80xx: Free allocated tags after failure Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.15] HID: rate-limit hid_warn to prevent log flooding Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16] media: i2c: vd55g1: Setup sensor external clock before patching Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.15] watchdog: iTCO_wdt: Report error if timeout configuration fails Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.15] media: iris: Add handling for corrupt and drop frames Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.15] phy: rockchip-pcie: Enable all four lanes if required Sasha Levin
2025-08-05 13:09   ` Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] watchdog: dw_wdt: Fix default timeout Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] MIPS: Don't crash in stack_top() for tasks without ABI or vDSO Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.6] crypto: jitter - fix intermediary handling Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.1] vsock/virtio: Resize receive buffers so that each SKB fits in a 4K page Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.1] MIPS: lantiq: falcon: sysctrl: fix request memory check logic Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] media: tc358743: Check I2C succeeded during probe Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.1] scsi: mpi3mr: Correctly handle ATA device errors Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.12] clk: renesas: rzg2l: Postpone updating priv->clks[] Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] scsi: mpt3sas: Correctly handle ATA device errors Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.15] smb: client: fix session setup against servers that require SPN Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.1] fbdev: fix potential buffer overflow in do_register_framebuffer() Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.15] sphinx: kernel_abi: fix performance regression with O=<dir> Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] media: tc358743: Return an appropriate colorspace from tc358743_set_fmt Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.6] drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supported Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] media: tc358743: Increase FIFO trigger level to 374 Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] jfs: truncate good inode pages when hard link is 0 Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.15] media: v4l2-common: Reduce warnings about missing V4L2_CID_LINK_FREQ control Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.1] dmaengine: stm32-dma: configure next sg only if there are more than 2 sgs Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.12] RDMA/bnxt_re: Fix size of uverbs_copy_to() in BNXT_RE_METHOD_GET_TOGGLE_MEM Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] cifs: Fix calling CIFSFindFirst() for root path without msearch Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.10] RDMA/core: reduce stack using in nldev_stat_get_doit() Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] scsi: libiscsi: Initialize iscsi_conn->dd_data only if memory is allocated Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] media: dvb-frontends: w7090p: fix null-ptr-deref in w7090p_tuner_write_serpar and w7090p_tuner_read_serpar Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.12] soundwire: amd: cancel pending slave status handling workqueue during remove sequence Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.6] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU Sasha Levin
2025-08-05 13:20   ` Marc Zyngier
2025-08-05 13:59     ` Sasha Levin
2025-08-05 18:09       ` Marc Zyngier
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.6] clk: tegra: periph: Fix error handling and resolve unsigned compare warning Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.12] drm/amd/display: Disable dsc_power_gate for dcn314 by default Sasha Levin
2025-08-05 13:09 ` Sasha Levin [this message]
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.15] crypto: octeontx2 - add timeout for load_fvc completion poll Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.6] power: supply: qcom_battmgr: Add lithium-polymer entry Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.12] media: ipu-bridge: Add _HID for OV5670 Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.12] media: hi556: Fix reset GPIO timings Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.12] clk: thead: Mark essential bus clocks as CLK_IGNORE_UNUSED Sasha Levin
2025-08-05 13:09   ` Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.15] media: uvcvideo: Set V4L2_CTRL_FLAG_DISABLED during queryctrl errors Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] jfs: Regular file corruption check Sasha Levin
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-5.4] jfs: upper bound check of tree index in dbAllocAG Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250805130945.471732-62-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=dennis.dalessandro@cornelisnetworks.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.