patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	Aliaksei Makarau <Aliaksei.Makarau@ibm.com>,
	Mahanta Jambigi <mjambigi@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	Alexandra Winter <wintera@linux.ibm.com>,
	Simon Horman <horms@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.15 38/92] s390/ism: fix concurrency management in ism_cmd()
Date: Wed, 30 Jul 2025 11:35:46 +0200	[thread overview]
Message-ID: <20250730093232.224648260@linuxfoundation.org> (raw)
In-Reply-To: <20250730093230.629234025@linuxfoundation.org>

6.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Halil Pasic <pasic@linux.ibm.com>

[ Upstream commit 897e8601b9cff1d054cdd53047f568b0e1995726 ]

The s390x ISM device data sheet clearly states that only one
request-response sequence is allowable per ISM function at any point in
time.  Unfortunately as of today the s390/ism driver in Linux does not
honor that requirement. This patch aims to rectify that.

This problem was discovered based on Aliaksei's bug report which states
that for certain workloads the ISM functions end up entering error state
(with PEC 2 as seen from the logs) after a while and as a consequence
connections handled by the respective function break, and for future
connection requests the ISM device is not considered -- given it is in a
dysfunctional state. During further debugging PEC 3A was observed as
well.

A kernel message like
[ 1211.244319] zpci: 061a:00:00.0: Event 0x2 reports an error for PCI function 0x61a
is a reliable indicator of the stated function entering error state
with PEC 2. Let me also point out that a kernel message like
[ 1211.244325] zpci: 061a:00:00.0: The ism driver bound to the device does not support error recovery
is a reliable indicator that the ISM function won't be auto-recovered
because the ISM driver currently lacks support for it.

On a technical level, without this synchronization, commands (inputs to
the FW) may be partially or fully overwritten (corrupted) by another CPU
trying to issue commands on the same function. There is hard evidence that
this can lead to DMB token values being used as DMB IOVAs, leading to
PEC 2 PCI events indicating invalid DMA. But this is only one of the
failure modes imaginable. In theory even completely losing one command
and executing another one twice and then trying to interpret the outputs
as if the command we intended to execute was actually executed and not
the other one is also possible.  Frankly, I don't feel confident about
providing an exhaustive list of possible consequences.

Fixes: 684b89bc39ce ("s390/ism: add device driver for internal shared memory")
Reported-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Tested-by: Mahanta Jambigi <mjambigi@linux.ibm.com>
Tested-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722161817.1298473-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/s390/net/ism_drv.c | 3 +++
 include/linux/ism.h        | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index 60ed70a39d2cc..b8464d9433e96 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -130,6 +130,7 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
 	struct ism_req_hdr *req = cmd;
 	struct ism_resp_hdr *resp = cmd;
 
+	spin_lock(&ism->cmd_lock);
 	__ism_write_cmd(ism, req + 1, sizeof(*req), req->len - sizeof(*req));
 	__ism_write_cmd(ism, req, 0, sizeof(*req));
 
@@ -143,6 +144,7 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
 	}
 	__ism_read_cmd(ism, resp + 1, sizeof(*resp), resp->len - sizeof(*resp));
 out:
+	spin_unlock(&ism->cmd_lock);
 	return resp->ret;
 }
 
@@ -606,6 +608,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		return -ENOMEM;
 
 	spin_lock_init(&ism->lock);
+	spin_lock_init(&ism->cmd_lock);
 	dev_set_drvdata(&pdev->dev, ism);
 	ism->pdev = pdev;
 	ism->dev.parent = &pdev->dev;
diff --git a/include/linux/ism.h b/include/linux/ism.h
index 5428edd909823..8358b4cd7ba6a 100644
--- a/include/linux/ism.h
+++ b/include/linux/ism.h
@@ -28,6 +28,7 @@ struct ism_dmb {
 
 struct ism_dev {
 	spinlock_t lock; /* protects the ism device */
+	spinlock_t cmd_lock; /* serializes cmds */
 	struct list_head list;
 	struct pci_dev *pdev;
 
-- 
2.39.5




  parent reply	other threads:[~2025-07-30  9:53 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-30  9:35 [PATCH 6.15 00/92] 6.15.9-rc1 review Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 01/92] x86/traps: Initialize DR7 by writing its architectural reset value Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 02/92] virtio_net: Enforce minimum TX ring size for reliability Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 03/92] virtio_ring: Fix error reporting in virtqueue_resize Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 04/92] drm/amd/display: Dont allow OLED to go down to fully off Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 05/92] regulator: core: fix NULL dereference on unbind due to stale coupling data Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 06/92] platform/x86: asus-nb-wmi: add DMI quirk for ASUS Zenbook Duo UX8406CA Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 07/92] RDMA/core: Rate limit GID cache warning messages Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 08/92] iio: fix potential out-of-bound write Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 09/92] interconnect: qcom: sc7280: Add missing num_links to xm_pcie3_1 node Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 10/92] interconnect: icc-clk: destroy nodes in case of memory allocation failures Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 11/92] iio: adc: ad7949: use spi_is_bpw_supported() Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 12/92] regmap: fix potential memory leak of regmap_bus Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 13/92] platform/mellanox: mlxbf-pmc: Remove newline char from event name input Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 14/92] platform/mellanox: mlxbf-pmc: Validate event/enable input Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 15/92] platform/mellanox: mlxbf-pmc: Use kstrtobool() to check 0/1 input Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 16/92] tools/hv: fcopy: Fix incorrect file path conversion Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 17/92] x86/hyperv: Fix usage of cpu_online_mask to get valid cpu Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 18/92] platform/x86: Fix initialization order for firmware_attributes_class Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 19/92] staging: vchiq_arm: Make vchiq_shutdown never fail Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 20/92] xfrm: state: initialize state_ptrs earlier in xfrm_state_find Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 21/92] xfrm: state: use a consistent pcpu_id " Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 22/92] xfrm: always initialize offload path Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 23/92] xfrm: Set transport header to fix UDP GRO handling Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 24/92] xfrm: ipcomp: adjust transport header after decompressing Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 25/92] xfrm: interface: fix use-after-free after changing collect_md xfrm interface Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 26/92] ASoC: mediatek: mt8365-dai-i2s: pass correct size to mt8365_dai_set_priv Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 27/92] net: ti: icssg-prueth: Fix buffer allocation for ICSSG Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 28/92] net/mlx5: Fix memory leak in cmd_exec() Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 29/92] net/mlx5: E-Switch, Fix peer miss rules to use peer eswitch Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 30/92] i40e: report VF tx_dropped with tx_errors instead of tx_discards Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 31/92] i40e: When removing VF MAC filters, only check PF-set MAC Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 32/92] net: appletalk: Fix use-after-free in AARP proxy probe Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 33/92] net/sched: sch_qfq: Avoid triggering might_sleep in atomic context in qfq_delete_class Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 34/92] can: netlink: can_changelink(): fix NULL pointer deref of struct can_priv::do_set_mode Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 35/92] drm/bridge: ti-sn65dsi86: Remove extra semicolon in ti_sn_bridge_probe() Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 36/92] ALSA: hda/realtek: Fix mute LED mask on HP OMEN 16 laptop Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 37/92] selftests: drv-net: wait for iperf client to stop sending Greg Kroah-Hartman
2025-07-30  9:35 ` Greg Kroah-Hartman [this message]
2025-07-30  9:35 ` [PATCH 6.15 39/92] net: hns3: fix concurrent setting vlan filter issue Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 40/92] net: hns3: disable interrupt when ptp init failed Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 41/92] net: hns3: fixed vf get max channels bug Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 42/92] net: hns3: default enable tx bounce buffer when smmu enabled Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 43/92] platform/x86: alienware-wmi-wmax: Fix `dmi_system_id` array Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 44/92] platform/x86: ideapad-laptop: Fix FnLock not remembered among boots Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 45/92] platform/x86: ideapad-laptop: Fix kbd backlight " Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 46/92] drm/i915/dp: Fix 2.7 Gbps DP_LINK_BW value on g4x Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 47/92] Revert "drm/prime: Use dma_buf from GEM object instance" Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 48/92] Revert "drm/gem-framebuffer: " Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 49/92] Revert "drm/gem-dma: " Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 50/92] drm/amdgpu: Reset the clear flag in buddy during resume Greg Kroah-Hartman
2025-07-30  9:35 ` [PATCH 6.15 51/92] drm/sched: Remove optimization that causes hang when killing dependent jobs Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 52/92] mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show() Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 53/92] ARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGS Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 54/92] ARM: 9450/1: Fix allowing linker DCE with binutils < 2.36 Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 55/92] timekeeping: Zero initialize system_counterval when querying time from phc drivers Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 56/92] i2c: qup: jump out of the loop in case of timeout Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 57/92] i2c: tegra: Fix reset error handling with ACPI Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 58/92] i2c: virtio: Avoid hang by using interruptible completion wait Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 59/92] bus: fsl-mc: Fix potential double device reference in fsl_mc_get_endpoint() Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 60/92] sprintf.h requires stdarg.h Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 61/92] ALSA: hda/realtek - Add mute LED support for HP Pavilion 15-eg0xxx Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 62/92] ALSA: hda/realtek - Add mute LED support for HP Victus 15-fa0xxx Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 63/92] arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack() Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 64/92] ASoC: mediatek: common: fix device and OF node leak Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 65/92] dpaa2-eth: Fix device reference count leak in MAC endpoint handling Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 66/92] dpaa2-switch: " Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 67/92] e1000e: disregard NVM checksum on tgp when valid checksum bit is not set Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 68/92] e1000e: ignore uninitialized checksum word on tgp Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 69/92] gve: Fix stuck TX queue for DQ queue format Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 70/92] ice: Fix a null pointer dereference in ice_copy_and_init_pkg() Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 71/92] kasan: use vmalloc_dump_obj() for vmalloc error reports Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 72/92] nilfs2: reject invalid file types when reading inodes Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 73/92] PCI/pwrctrl: Create pwrctrl devices only when CONFIG_PCI_PWRCTRL is enabled Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 74/92] resource: fix false warning in __request_region() Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 75/92] selftests: mptcp: connect: also cover alt modes Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 76/92] selftests: mptcp: connect: also cover checksum Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 77/92] selftests/mm: fix split_huge_page_test for folio_split() tests Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 78/92] mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 79/92] mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 80/92] selftests/bpf: Add tests with stack ptr register in conditional jmp Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 81/92] drm/xe: Make WA BB part of LRC BO Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 82/92] drm/amdgpu: Add the new sdma function pointers for amdgpu_sdma.h Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 83/92] drm/amdgpu: Implement SDMA soft reset directly for v5.x Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 84/92] drm/amdgpu: Fix SDMA engine reset with logical instance ID Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 85/92] drm/shmem-helper: Remove obsoleted is_iomem test Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 86/92] Revert "drm/gem-shmem: Use dma_buf from GEM object instance" Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 87/92] usb: typec: tcpm: allow to use sink in accessory mode Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 88/92] usb: typec: tcpm: allow switching to mode accessory to mux properly Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 89/92] usb: typec: tcpm: apply vbus before data bringup in tcpm_src_attach Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 90/92] spi: cadence-quadspi: fix cleanup of rx_chan on failure paths Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 91/92] ALSA: hda/tegra: Add Tegra264 support Greg Kroah-Hartman
2025-07-30  9:36 ` [PATCH 6.15 92/92] ALSA: hda: Add missing NVIDIA HDA codec IDs Greg Kroah-Hartman
2025-07-30 12:35 ` [PATCH 6.15 00/92] 6.15.9-rc1 review Ronald Warsow
2025-07-30 13:09 ` Christian Heusel
2025-07-30 14:09 ` Jon Hunter
2025-07-30 15:29 ` Mark Brown
2025-07-30 16:47 ` Achill Gilgenast
2025-07-30 17:21 ` Brett A C Sheffield
2025-07-30 20:08 ` Peter Schneider
2025-07-30 20:51 ` Shuah Khan
2025-07-31  2:30 ` Justin Forbes
2025-07-31  2:39 ` Takeshi Ogasawara
2025-07-31  8:36 ` Ron Economos
2025-07-31 10:17 ` Naresh Kamboju
2025-07-31 19:00 ` Miguel Ojeda
2025-08-01  1:31 ` Hardik Garg
2025-08-18 22:51   ` [PATCH 6.16 000/570] 6.16.2-rc1 review Hardik Garg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250730093232.224648260@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=Aliaksei.Makarau@ibm.com \
    --cc=horms@kernel.org \
    --cc=mjambigi@linux.ibm.com \
    --cc=pabeni@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=patches@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=wintera@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).