* [PATCH 1/1] s390/ism: fix concurrency management in ism_cmd()
@ 2025-07-20 21:11 Halil Pasic
2025-07-21 7:30 ` Alexander Gordeev
0 siblings, 1 reply; 5+ messages in thread
From: Halil Pasic @ 2025-07-20 21:11 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexandra Winter, Thorsten Winkler, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
Sven Schnelle, Sebastian Ott, Ursula Braun, netdev, linux-s390,
linux-kernel
Cc: Halil Pasic, Aliaksei Makarau, Mahanta Jambigi
The s390x ISM device data sheet clearly states that only one
request-response sequence is allowable per ISM function at any point in
time. Unfortunately as of today the s390/ism driver in Linux does not
honor that requirement. This patch aims to rectify that.
This problem was discovered based on Aliaksei's bug report which states
that for certain workloads the ISM functions end up entering error state
(with PEC 2 as seen from the logs) after a while and as a consequence
connections handled by the respective function break, and for future
connection requests the ISM device is not considered -- given it is in a
dysfunctional state. During further debugging PEC 31 was observed as
well.
The kernel message
zpci: XXXX:00:00.0: Event 0x2 reports an error for PCI function XXXX
is a reliable indicator of the stated function entering error state
with PEC 2. Let me also point out that the kernel message
zpci: XXXX:00:00.0: The ism driver bound to the device does not support error recovery
is a reliable indicator that the ISM function won't be auto-recovered
because the ISM driver currently lacks support for it.
On a technical level, without this synchronization, commands (inputs to
the FW) may be partially or fully overwritten (corrupted) by another CPU
trying to issue commands on the same function. There is hard evidence that
this can lead to DMB token values being used as DMB IOVAs, leading to
PEC 2 PCI events indicating invalid DMA. But this is only one of the
failure modes imaginable. In theory even completely losing one command
and executing another one twice and then trying to interpret the outputs
as if the command we intended to execute was actually executed and not
the other one is also possible. Frankly I don't feel confident about
providing an exhaustive list of possible consequences.
Fixes: 684b89bc39ce ("s390/ism: add device driver for internal shared memory")
Reported-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Tested-by: Mahanta Jambigi <mjambigi@linux.ibm.com>
Tested-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
---
drivers/s390/net/ism_drv.c | 4 ++++
include/linux/ism.h | 1 +
2 files changed, 5 insertions(+)
diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index b7f15f303ea2..c3b79e22044c 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -129,7 +129,9 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
{
struct ism_req_hdr *req = cmd;
struct ism_resp_hdr *resp = cmd;
+ unsigned long flags;
+ spin_lock_irqsave(&ism->cmd_lock, flags);
__ism_write_cmd(ism, req + 1, sizeof(*req), req->len - sizeof(*req));
__ism_write_cmd(ism, req, 0, sizeof(*req));
@@ -143,6 +145,7 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
}
__ism_read_cmd(ism, resp + 1, sizeof(*resp), resp->len - sizeof(*resp));
out:
+ spin_unlock_irqrestore(&ism->cmd_lock, flags);
return resp->ret;
}
@@ -606,6 +609,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return -ENOMEM;
spin_lock_init(&ism->lock);
+ spin_lock_init(&ism->cmd_lock);
dev_set_drvdata(&pdev->dev, ism);
ism->pdev = pdev;
ism->dev.parent = &pdev->dev;
diff --git a/include/linux/ism.h b/include/linux/ism.h
index 5428edd90982..8358b4cd7ba6 100644
--- a/include/linux/ism.h
+++ b/include/linux/ism.h
@@ -28,6 +28,7 @@ struct ism_dmb {
struct ism_dev {
spinlock_t lock; /* protects the ism device */
+ spinlock_t cmd_lock; /* serializes cmds */
struct list_head list;
struct pci_dev *pdev;
base-commit: 07fa9cad54609df3eea00cd5b167df6088ce01a6
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] s390/ism: fix concurrency management in ism_cmd()
2025-07-20 21:11 [PATCH 1/1] s390/ism: fix concurrency management in ism_cmd() Halil Pasic
@ 2025-07-21 7:30 ` Alexander Gordeev
2025-07-21 8:17 ` Alexandra Winter
0 siblings, 1 reply; 5+ messages in thread
From: Alexander Gordeev @ 2025-07-21 7:30 UTC (permalink / raw)
To: Halil Pasic
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexandra Winter, Thorsten Winkler, Heiko Carstens,
Vasily Gorbik, Christian Borntraeger, Sven Schnelle,
Sebastian Ott, Ursula Braun, netdev, linux-s390, linux-kernel,
Aliaksei Makarau, Mahanta Jambigi
On Sun, Jul 20, 2025 at 11:11:09PM +0200, Halil Pasic wrote:
Hi Halil,
...
> @@ -129,7 +129,9 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
> {
> struct ism_req_hdr *req = cmd;
> struct ism_resp_hdr *resp = cmd;
> + unsigned long flags;
>
> + spin_lock_irqsave(&ism->cmd_lock, flags);
I only found smcd_handle_irq() scheduling a tasklet, but no commands issued.
Do we really need disable interrupts?
> __ism_write_cmd(ism, req + 1, sizeof(*req), req->len - sizeof(*req));
> __ism_write_cmd(ism, req, 0, sizeof(*req));
>
> @@ -143,6 +145,7 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
> }
> __ism_read_cmd(ism, resp + 1, sizeof(*resp), resp->len - sizeof(*resp));
> out:
> + spin_unlock_irqrestore(&ism->cmd_lock, flags);
> return resp->ret;
> }
>
...
Thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] s390/ism: fix concurrency management in ism_cmd()
2025-07-21 7:30 ` Alexander Gordeev
@ 2025-07-21 8:17 ` Alexandra Winter
2025-07-21 9:56 ` Simon Horman
2025-07-21 10:35 ` Halil Pasic
0 siblings, 2 replies; 5+ messages in thread
From: Alexandra Winter @ 2025-07-21 8:17 UTC (permalink / raw)
To: Alexander Gordeev, Halil Pasic
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Thorsten Winkler, Heiko Carstens, Vasily Gorbik,
Christian Borntraeger, Sven Schnelle, Sebastian Ott, Ursula Braun,
netdev, linux-s390, linux-kernel, Aliaksei Makarau,
Mahanta Jambigi
On 21.07.25 09:30, Alexander Gordeev wrote:
> On Sun, Jul 20, 2025 at 11:11:09PM +0200, Halil Pasic wrote:
>
> Hi Halil,
>
> ...
>> @@ -129,7 +129,9 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
>> {
>> struct ism_req_hdr *req = cmd;
>> struct ism_resp_hdr *resp = cmd;
>> + unsigned long flags;
>>
>> + spin_lock_irqsave(&ism->cmd_lock, flags);
>
> I only found smcd_handle_irq() scheduling a tasklet, but no commands issued.
> Do we really need disable interrupts?
You are right in current code, the interrupt and event handlers of ism and smcd
never issue a control command that calls ism_cmd().
OTOH, future ism clients could do that.
The control commands are not part of the data path, but of connection establish.
So I don't really expect a performance impact.
I have it on my ToDo list, to change this to threaded interrupts in the future.
So no strong opinion on my side.
Simple spin_lock is fine with me.
>
>> __ism_write_cmd(ism, req + 1, sizeof(*req), req->len - sizeof(*req));
>> __ism_write_cmd(ism, req, 0, sizeof(*req));
>>
>> @@ -143,6 +145,7 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
>> }
>> __ism_read_cmd(ism, resp + 1, sizeof(*resp), resp->len - sizeof(*resp));
>> out:
>> + spin_unlock_irqrestore(&ism->cmd_lock, flags);
>> return resp->ret;
>> }
>>
> ...
>
> Thanks!
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] s390/ism: fix concurrency management in ism_cmd()
2025-07-21 8:17 ` Alexandra Winter
@ 2025-07-21 9:56 ` Simon Horman
2025-07-21 10:35 ` Halil Pasic
1 sibling, 0 replies; 5+ messages in thread
From: Simon Horman @ 2025-07-21 9:56 UTC (permalink / raw)
To: Alexandra Winter
Cc: Alexander Gordeev, Halil Pasic, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Thorsten Winkler,
Heiko Carstens, Vasily Gorbik, Christian Borntraeger,
Sven Schnelle, Sebastian Ott, Ursula Braun, netdev, linux-s390,
linux-kernel, Aliaksei Makarau, Mahanta Jambigi
On Mon, Jul 21, 2025 at 10:17:30AM +0200, Alexandra Winter wrote:
>
>
> On 21.07.25 09:30, Alexander Gordeev wrote:
> > On Sun, Jul 20, 2025 at 11:11:09PM +0200, Halil Pasic wrote:
> >
> > Hi Halil,
> >
> > ...
> >> @@ -129,7 +129,9 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
> >> {
> >> struct ism_req_hdr *req = cmd;
> >> struct ism_resp_hdr *resp = cmd;
> >> + unsigned long flags;
> >>
> >> + spin_lock_irqsave(&ism->cmd_lock, flags);
> >
> > I only found smcd_handle_irq() scheduling a tasklet, but no commands issued.
> > Do we really need disable interrupts?
>
> You are right in current code, the interrupt and event handlers of ism and smcd
> never issue a control command that calls ism_cmd().
> OTOH, future ism clients could do that.
> The control commands are not part of the data path, but of connection establish.
> So I don't really expect a performance impact.
> I have it on my ToDo list, to change this to threaded interrupts in the future.
> So no strong opinion on my side.
> Simple spin_lock is fine with me.
I would suggest using spin_lock() if it is sufficient.
I think it is generally assumed that the minimal locking primitive is used
given the context code is executed in. And we can it can always be updated
if the contexts in which this code executes subsequently changes.
IOW, I'm suggesting avoiding confusion if someone looks over this code.
...
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] s390/ism: fix concurrency management in ism_cmd()
2025-07-21 8:17 ` Alexandra Winter
2025-07-21 9:56 ` Simon Horman
@ 2025-07-21 10:35 ` Halil Pasic
1 sibling, 0 replies; 5+ messages in thread
From: Halil Pasic @ 2025-07-21 10:35 UTC (permalink / raw)
To: Alexandra Winter
Cc: Alexander Gordeev, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Thorsten Winkler, Heiko Carstens,
Vasily Gorbik, Christian Borntraeger, Sven Schnelle,
Sebastian Ott, Ursula Braun, netdev, linux-s390, linux-kernel,
Aliaksei Makarau, Mahanta Jambigi, Halil Pasic
On Mon, 21 Jul 2025 10:17:30 +0200
Alexandra Winter <wintera@linux.ibm.com> wrote:
> >> + spin_lock_irqsave(&ism->cmd_lock, flags);
> >
> > I only found smcd_handle_irq() scheduling a tasklet, but no commands issued.
> > Do we really need disable interrupts?
>
> You are right in current code, the interrupt and event handlers of ism and smcd
> never issue a control command that calls ism_cmd().
> OTOH, future ism clients could do that.
> The control commands are not part of the data path, but of connection establish.
> So I don't really expect a performance impact.
> I have it on my ToDo list, to change this to threaded interrupts in the future.
> So no strong opinion on my side.
> Simple spin_lock is fine with me.
I agree!
My train of thought was, lets go with the safe option and look if the
maintainers want something different. I didn't feel confident about
trying to understand the details including the contract between the
clients and the driver.
I will change to simple spin_lock() at the end of the day if nobody
objects since the sentiment seems to be going into this direction and
spin a v2 no later than on Wed.
Thanks for having a look!
Regards,
Halil
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-21 10:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-20 21:11 [PATCH 1/1] s390/ism: fix concurrency management in ism_cmd() Halil Pasic
2025-07-21 7:30 ` Alexander Gordeev
2025-07-21 8:17 ` Alexandra Winter
2025-07-21 9:56 ` Simon Horman
2025-07-21 10:35 ` Halil Pasic
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).