From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56CC836826F for ; Tue, 16 Dec 2025 15:40:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765899603; cv=none; b=Hytf1cfSyGdLvKCcIYqiun3mo6sNDPVDQRNL209uS/ku05g+5bvDBB3dt00V3IzNx3dTE3tFGRSR05riEntpq9DIGy5zbmPcN2sYQFAmF7/bK7Os/rjCz5Bq6p7U+OFRlDerENTQU4pzZa594DZuafyxQkZLiygd6XsuxU8quQw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765899603; c=relaxed/simple; bh=hyJMn0BTnZ01EywS2A4gH01LpIXWFizyDZSfzagnlPE=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Td+Vo3avySAD4aQpUUc+TwMmaX9ewJnY0HGUtDeXBCIu7aRZrOLUPCsrt9CQLabhrZlTfBK2r3vkeICdtIBat57OXAF9VFTnvUPPOxv/Sh050e6h6p9aV40ZuJpWe1+JIneo6Tyg7L3OEOraOJ/J1phKyDdnq8enZca6ir/7DRM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=pLBu1TB9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="pLBu1TB9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5E350C4CEF1; Tue, 16 Dec 2025 15:40:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1765899602; bh=hyJMn0BTnZ01EywS2A4gH01LpIXWFizyDZSfzagnlPE=; h=From:To:Cc:Subject:Date:Reply-To:From; b=pLBu1TB9YpSgmSX/5zJv0JIinulMc0Ybd1RFui/WAnP1aURtlJoGtd/qrsBuNtz6z Vv142ZyZX5QqLzeySOo6MQfsgGr6ixG5/gt7mta3TpZfHJoGd1JwhG034/YOxJlAnu kZrXSOpXVeeroPXEcNDdT5mVCfzNDP8il/x/7bcs= From: Greg Kroah-Hartman To: linux-cve-announce@vger.kernel.org Cc: Greg Kroah-Hartman Subject: CVE-2025-68310: s390/pci: Avoid deadlock between PCI error recovery and mlx5 crdump Date: Tue, 16 Dec 2025 16:39:53 +0100 Message-ID: <2025121653-CVE-2025-68310-e0fc@gregkh> X-Mailer: git-send-email 2.52.0 Reply-To: , Precedence: bulk X-Mailing-List: linux-cve-announce@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6171; i=gregkh@linuxfoundation.org; h=from:subject:message-id; bh=m77R3zi/9/Q/DlaNSQaCpiqFFMr8PFHd7+FVqKNW2qY=; b=owGbwMvMwCRo6H6F97bub03G02pJDJmOtZ67FnSL2l1sVHk7O65mplXEH9ndrof23y4MfGR99 m3gGSbrjlgWBkEmBlkxRZYv23iO7q84pOhlaHsaZg4rE8gQBi5OAZhIwzSGBassd7ssXn9M7dKD wBdLXB7//XVFbjLD/NTYWabteiFJvV92zuNjYfc7LrHlKQA= X-Developer-Key: i=gregkh@linuxfoundation.org; a=openpgp; fpr=F4B60CC5BF78C2214A313DCB3147D40DDB2DFB29 Content-Transfer-Encoding: 8bit From: Greg Kroah-Hartman Description =========== In the Linux kernel, the following vulnerability has been resolved: s390/pci: Avoid deadlock between PCI error recovery and mlx5 crdump Do not block PCI config accesses through pci_cfg_access_lock() when executing the s390 variant of PCI error recovery: Acquire just device_lock() instead of pci_dev_lock() as powerpc's EEH and generig PCI AER processing do. During error recovery testing a pair of tasks was reported to be hung: mlx5_core 0000:00:00.1: mlx5_health_try_recover:338:(pid 5553): health recovery flow aborted, PCI reads still not working INFO: task kmcheck:72 blocked for more than 122 seconds. Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kmcheck state:D stack:0 pid:72 tgid:72 ppid:2 flags:0x00000000 Call Trace: [<000000065256f030>] __schedule+0x2a0/0x590 [<000000065256f356>] schedule+0x36/0xe0 [<000000065256f572>] schedule_preempt_disabled+0x22/0x30 [<0000000652570a94>] __mutex_lock.constprop.0+0x484/0x8a8 [<000003ff800673a4>] mlx5_unload_one+0x34/0x58 [mlx5_core] [<000003ff8006745c>] mlx5_pci_err_detected+0x94/0x140 [mlx5_core] [<0000000652556c5a>] zpci_event_attempt_error_recovery+0xf2/0x398 [<0000000651b9184a>] __zpci_event_error+0x23a/0x2c0 INFO: task kworker/u1664:6:1514 blocked for more than 122 seconds. Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u1664:6 state:D stack:0 pid:1514 tgid:1514 ppid:2 flags:0x00000000 Workqueue: mlx5_health0000:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] Call Trace: [<000000065256f030>] __schedule+0x2a0/0x590 [<000000065256f356>] schedule+0x36/0xe0 [<0000000652172e28>] pci_wait_cfg+0x80/0xe8 [<0000000652172f94>] pci_cfg_access_lock+0x74/0x88 [<000003ff800916b6>] mlx5_vsc_gw_lock+0x36/0x178 [mlx5_core] [<000003ff80098824>] mlx5_crdump_collect+0x34/0x1c8 [mlx5_core] [<000003ff80074b62>] mlx5_fw_fatal_reporter_dump+0x6a/0xe8 [mlx5_core] [<0000000652512242>] devlink_health_do_dump.part.0+0x82/0x168 [<0000000652513212>] devlink_health_report+0x19a/0x230 [<000003ff80075a12>] mlx5_fw_fatal_reporter_err_work+0xba/0x1b0 [mlx5_core] No kernel log of the exact same error with an upstream kernel is available - but the very same deadlock situation can be constructed there, too: - task: kmcheck mlx5_unload_one() tries to acquire devlink lock while the PCI error recovery code has set pdev->block_cfg_access by way of pci_cfg_access_lock() - task: kworker mlx5_crdump_collect() tries to set block_cfg_access through pci_cfg_access_lock() while devlink_health_report() had acquired the devlink lock. A similar deadlock situation can be reproduced by requesting a crdump with > devlink health dump show pci/ reporter fw_fatal while PCI error recovery is executed on the same physical function by mlx5_core's pci_error_handlers. On s390 this can be injected with > zpcictl --reset-fw Tests with this patch failed to reproduce that second deadlock situation, the devlink command is rejected with "kernel answers: Permission denied" - and we get a kernel log message of: mlx5_core 1ed0:00:00.1: mlx5_crdump_collect:50:(pid 254382): crdump: failed to lock vsc gw err -5 because the config read of VSC_SEMAPHORE is rejected by the underlying hardware. Two prior attempts to address this issue have been discussed and ultimately rejected [see link], with the primary argument that s390's implementation of PCI error recovery is imposing restrictions that neither powerpc's EEH nor PCI AER handling need. Tests show that PCI error recovery on s390 is running to completion even without blocking access to PCI config space. The Linux kernel CVE team has assigned CVE-2025-68310 to this issue. Affected and fixed versions =========================== Issue introduced in 5.16 with commit 4cdf2f4e24ff0d345fc36ef6d6aec059333a261e and fixed in 6.1.159 with commit d0df2503bc3c2be385ca2fd96585daad1870c7c5 Issue introduced in 5.16 with commit 4cdf2f4e24ff0d345fc36ef6d6aec059333a261e and fixed in 6.6.117 with commit b63c061be622b17b495cbf78a6d5f2d4c3147f8e Issue introduced in 5.16 with commit 4cdf2f4e24ff0d345fc36ef6d6aec059333a261e and fixed in 6.12.58 with commit 3591d56ea9bfd3e7fbbe70f749bdeed689d415f9 Issue introduced in 5.16 with commit 4cdf2f4e24ff0d345fc36ef6d6aec059333a261e and fixed in 6.17.8 with commit 54f938d9f5693af8ed586a08db4af5d9da1f0f2d Issue introduced in 5.16 with commit 4cdf2f4e24ff0d345fc36ef6d6aec059333a261e and fixed in 6.18 with commit 0fd20f65df6aa430454a0deed8f43efa91c54835 Please see https://www.kernel.org for a full list of currently supported kernel versions by the kernel community. Unaffected versions might change over time as fixes are backported to older supported kernel versions. The official CVE entry at https://cve.org/CVERecord/?id=CVE-2025-68310 will be updated if fixes are backported, please check that for the most up to date information about this issue. Affected files ============== The file(s) affected by this issue are: arch/s390/pci/pci_event.c Mitigation ========== The Linux kernel CVE team recommends that you update to the latest stable kernel version for this, and many other bugfixes. Individual changes are never tested alone, but rather are part of a larger kernel release. Cherry-picking individual commits is not recommended or supported by the Linux kernel community at all. If however, updating to the latest release is impossible, the individual changes to resolve this issue can be found at these commits: https://git.kernel.org/stable/c/d0df2503bc3c2be385ca2fd96585daad1870c7c5 https://git.kernel.org/stable/c/b63c061be622b17b495cbf78a6d5f2d4c3147f8e https://git.kernel.org/stable/c/3591d56ea9bfd3e7fbbe70f749bdeed689d415f9 https://git.kernel.org/stable/c/54f938d9f5693af8ed586a08db4af5d9da1f0f2d https://git.kernel.org/stable/c/0fd20f65df6aa430454a0deed8f43efa91c54835