Re: [PATCH v3] s390/pci: fix CPU address in MSI for directed IRQ

public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed

From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Halil Pasic <pasic@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] s390/pci: fix CPU address in MSI for directed IRQ
Date: Mon, 30 Nov 2020 10:50:02 +0100	[thread overview]
Message-ID: <bc52205b-9ae8-98c6-0eb0-9fff551c7868@linux.ibm.com> (raw)
In-Reply-To: <20201130095549.27da927f.pasic@linux.ibm.com>

On 11/30/20 9:55 AM, Halil Pasic wrote:
> On Mon, 30 Nov 2020 09:30:33 +0100
> Niklas Schnelle <schnelle@linux.ibm.com> wrote:
> 
>> I'm not really familiar, with it but I think this is closely related
>> to what I asked Bernd Nerz. I fear that if CPUs go away we might already
>> be in trouble at the firmware/hardware/platform level because the CPU Address is
>> "programmed into the device" so to speak. Thus a directed interrupt from
>> a device may race with anything reordering/removing CPUs even if
>> CPU addresses of dead CPUs are not reused and the mapping is stable.
> 
> From your answer, I read that CPU hot-unplug is supported for LPAR. 

I'm not sure about hot hot-unplug and firmware
telling us about removed CPUs but at the very least there is:

echo 0 > /sys/devices/system/cpu/cpu6/online

>>
>> Furthermore our floating fallback path will try to send a SIGP
>> to the target CPU which clearly doesn't work when that is permanently
>> gone. Either way I think these issues are out of scope for this fix
>> so I will go ahead and merge this.
> 
> I agree, it makes on sense to delay this fix.
> 
> But if CPU hot-unplug is supported, I believe we should react when
> a CPU is unplugged, that is a target of directed interrupts. My guess
> is, that in this scenario transient hiccups are unavoidable, and thus
> should be accepted, but we should make sure that we recover.

I agree, I just tested the above command on a firmware test system and
deactivated 4 of 8 CPUs.
This is in /proc/interrupts after that:

...
  3:       9392          0          0          0   PCI-MSI  mlx5_async@pci:0001:00:00.0
  4:     282741          0          0          0   PCI-MSI  mlx5_comp0@pci:0001:00:00.0
  5:          0          2          0          0   PCI-MSI  mlx5_comp1@pci:0001:00:00.0
  6:          0          0        104          0   PCI-MSI  mlx5_comp2@pci:0001:00:00.0
  7:          0          0          0          2   PCI-MSI  mlx5_comp3@pci:0001:00:00.0
  8:          0          0          0          0   PCI-MSI  mlx5_comp4@pci:0001:00:00.0
  9:          0          0          0          0   PCI-MSI  mlx5_comp5@pci:0001:00:00.0
 10:          0          0          0          0   PCI-MSI  mlx5_comp6@pci:0001:00:00.0
 11:          0          0          0          0   PCI-MSI  mlx5_comp7@pci:0001:00:00.0
...

So it looks like we are left with registered interrupts
for CPUs which are offline. However I'm not sure how to
trigger a problem with that. I think the drivers would
usually only do a directed interrupt to a CPU that
is currently running the process that triggered the
I/O (I tested this assumption with "taskset -c 2 ping ...").
Now with the CPU offline there cannot be such a
process. So I think for the most part the queue would
just remain unused. Still, if we do get a directed
interrupt for it's my understanding that currently
we will lose that.

I think this could be fixed with
something I tried in a prototype code a while back that
is in zpci_handle_fallback_irq() I handled the IRQ locally.
Back then it looked like Directed IRQs would make it to z15 GA 1.5 and
this was done to help Bernd to debug a Millicode issue (Jup 905371).
I also had a version of that code meant as a possible performance
improvement that would check if the target CPU is
available and only then send the SIGP and otherwise handle it
locally.

> 
> Regards,
> Halil
>

next prev parent reply	other threads:[~2020-11-30  9:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-26 17:00 [PATCH v3] s390/pci: fix CPU address in MSI for directed IRQ Alexander Gordeev
2020-11-27  8:56 ` Halil Pasic
2020-11-27 10:08   ` Niklas Schnelle
2020-11-27 15:39     ` Halil Pasic
2020-11-30  8:30       ` Niklas Schnelle
2020-11-30  8:55         ` Halil Pasic
2020-11-30  9:50           ` Niklas Schnelle [this message]
2020-12-09 15:08 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc52205b-9ae8-98c6-0eb0-9fff551c7868@linux.ibm.com \
    --to=schnelle@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pasic@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox