* [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job
@ 2023-12-13 0:36 Igor Mammedov
2023-12-13 0:36 ` [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already Igor Mammedov
` (3 more replies)
0 siblings, 4 replies; 23+ messages in thread
From: Igor Mammedov @ 2023-12-13 0:36 UTC (permalink / raw)
To: linux-kernel
Cc: Dongli Zhang, linux-acpi, linux-pci, imammedo, mst, rafael, lenb,
bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Fiona Ebner, Thomas Lamprecht
Hacks to mask a race between HBA scan job and bridge re-configuration(s)
during hotplug.
I don't like it a bit but it something that could be done quickly
and solves problems that were reported.
Other options to discuss/possibly more invasive:
1: make sure pci_assign_unassigned_bridge_resources() doesn't reconfigure
bridge if it's not necessary.
2. make SCSI_SCAN_ASYNC job wait till hotplug is finished for all slots on
the bridge or somehow restart the job if it fails
3. any other ideas?
1st reported: https://lore.kernel.org/r/9eb669c0-d8f2-431d-a700-6da13053ae54@proxmox.com
CC: Dongli Zhang <dongli.zhang@oracle.com>
CC: linux-acpi@vger.kernel.org
CC: linux-pci@vger.kernel.org
CC: imammedo@redhat.com
CC: mst@redhat.com
CC: rafael@kernel.org
CC: lenb@kernel.org
CC: bhelgaas@google.com
CC: mika.westerberg@linux.intel.com
CC: boris.ostrovsky@oracle.com
CC: joe.jin@oracle.com
CC: stable@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: Fiona Ebner <f.ebner@proxmox.com>
CC: Thomas Lamprecht <t.lamprecht@proxmox.com>
Igor Mammedov (2):
PCI: acpiphp: enable slot only if it hasn't been enabled already
PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a
time
drivers/pci/hotplug/acpiphp_glue.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--
2.39.3
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already
2023-12-13 0:36 [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Igor Mammedov
@ 2023-12-13 0:36 ` Igor Mammedov
2023-12-13 0:37 ` kernel test robot
` (2 more replies)
2023-12-13 0:36 ` [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time Igor Mammedov
` (2 subsequent siblings)
3 siblings, 3 replies; 23+ messages in thread
From: Igor Mammedov @ 2023-12-13 0:36 UTC (permalink / raw)
To: linux-kernel
Cc: Dongli Zhang, linux-acpi, linux-pci, imammedo, mst, rafael, lenb,
bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Fiona Ebner, Thomas Lamprecht
When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
adding device to bus and enabling it will kick in async host scan
scsi_scan_host+0x21/0x1f0
virtscsi_probe+0x2dd/0x350
..
driver_probe_device+0x19/0x80
...
driver_probe_device+0x19/0x80
pci_bus_add_device+0x53/0x80
pci_bus_add_devices+0x2b/0x70
...
which will schedule a job for async scan. That however breaks
if there are more than one SCSI host behind bridge, since
acpiphp_check_bridge() will walk over all slots and try to
enable each of them regardless of whether they were already
enabled.
As result the bridge might be reconfigured several times
and trigger following sequence:
[cpu 0] acpiphp_check_bridge()
[cpu 0] enable_slot(a)
[cpu 0] configure bridge
[cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1)
[cpu 0] enable_slot(b)
...
[cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
...
[cpu 0] configure bridge <- temporaly disables bridge
and cause do_scsi_scan_host() failure.
The same race affects SHPC (but it manages to avoid hitting the race due to
1sec delay when enabling slot).
To cover case of single device hotplug (at a time) do not attempt to
enable slot that have already been enabled.
Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
Reported-by: iona Ebner <f.ebner@proxmox.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
drivers/pci/hotplug/acpiphp_glue.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 601129772b2d..6b11609927d6 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
trim_stale_devices(dev);
/* configure all functions */
- enable_slot(slot, true);
+ if (slot->flags != SLOT_ENABLED) {
+ enable_slot(slot, true);
+ }
} else {
disable_slot(slot);
}
--
2.39.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 0:36 [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Igor Mammedov
2023-12-13 0:36 ` [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already Igor Mammedov
@ 2023-12-13 0:36 ` Igor Mammedov
2023-12-13 7:26 ` Greg KH
` (3 more replies)
2023-12-13 8:12 ` [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Dongli Zhang
2023-12-13 18:11 ` Bjorn Helgaas
3 siblings, 4 replies; 23+ messages in thread
From: Igor Mammedov @ 2023-12-13 0:36 UTC (permalink / raw)
To: linux-kernel
Cc: Dongli Zhang, linux-acpi, linux-pci, imammedo, mst, rafael, lenb,
bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Fiona Ebner, Thomas Lamprecht
previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
bridge reconfiguration in case of single HBA hotplug.
However in virt environment it's possible to pause machine hotplug several
HBAs and let machine run. That can hit the same race when 2nd hotplugged
HBA will start re-configuring bridge.
Do the same thing as SHPC and throttle down hotplug of 2nd and up
devices within single hotplug event.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 6b11609927d6..30bca2086b24 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -37,6 +37,7 @@
#include <linux/mutex.h>
#include <linux/slab.h>
#include <linux/acpi.h>
+#include <linux/delay.h>
#include "../pci.h"
#include "acpiphp.h"
@@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
{
struct acpiphp_slot *slot;
+ int nr_hp_slots = 0;
/* Bail out if the bridge is going away. */
if (bridge->is_going_away)
@@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
/* configure all functions */
if (slot->flags != SLOT_ENABLED) {
+ if (nr_hp_slots)
+ msleep(1000);
+
+ ++nr_hp_slots;
enable_slot(slot, true);
}
} else {
--
2.39.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already
2023-12-13 0:36 ` [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already Igor Mammedov
@ 2023-12-13 0:37 ` kernel test robot
2023-12-13 9:47 ` Fiona Ebner
2023-12-13 13:01 ` Rafael J. Wysocki
2 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2023-12-13 0:37 UTC (permalink / raw)
To: Igor Mammedov; +Cc: stable, oe-kbuild-all
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-1
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree.
Subject: [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already
Link: https://lore.kernel.org/stable/20231213003614.1648343-2-imammedo%40redhat.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 0:36 ` [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time Igor Mammedov
@ 2023-12-13 7:26 ` Greg KH
2023-12-13 8:13 ` Dongli Zhang
` (2 subsequent siblings)
3 siblings, 0 replies; 23+ messages in thread
From: Greg KH @ 2023-12-13 7:26 UTC (permalink / raw)
To: Igor Mammedov
Cc: linux-kernel, Dongli Zhang, linux-acpi, linux-pci, mst, rafael,
lenb, bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Fiona Ebner, Thomas Lamprecht
On Wed, Dec 13, 2023 at 01:36:14AM +0100, Igor Mammedov wrote:
> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
> bridge reconfiguration in case of single HBA hotplug.
> However in virt environment it's possible to pause machine hotplug several
> HBAs and let machine run. That can hit the same race when 2nd hotplugged
> HBA will start re-configuring bridge.
> Do the same thing as SHPC and throttle down hotplug of 2nd and up
> devices within single hotplug event.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index 6b11609927d6..30bca2086b24 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -37,6 +37,7 @@
> #include <linux/mutex.h>
> #include <linux/slab.h>
> #include <linux/acpi.h>
> +#include <linux/delay.h>
>
> #include "../pci.h"
> #include "acpiphp.h"
> @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
> static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> {
> struct acpiphp_slot *slot;
> + int nr_hp_slots = 0;
>
> /* Bail out if the bridge is going away. */
> if (bridge->is_going_away)
> @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>
> /* configure all functions */
> if (slot->flags != SLOT_ENABLED) {
> + if (nr_hp_slots)
> + msleep(1000);
> +
> + ++nr_hp_slots;
> enable_slot(slot, true);
> }
> } else {
> --
> 2.39.3
>
>
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
</formletter>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job
2023-12-13 0:36 [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Igor Mammedov
2023-12-13 0:36 ` [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already Igor Mammedov
2023-12-13 0:36 ` [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time Igor Mammedov
@ 2023-12-13 8:12 ` Dongli Zhang
2023-12-13 18:11 ` Bjorn Helgaas
3 siblings, 0 replies; 23+ messages in thread
From: Dongli Zhang @ 2023-12-13 8:12 UTC (permalink / raw)
To: Igor Mammedov
Cc: linux-acpi, linux-pci, mst, rafael, lenb, bhelgaas,
mika.westerberg, boris.ostrovsky, joe.jin, stable, Fiona Ebner,
Thomas Lamprecht, linux-kernel
Hi Igor,
I am not able to reproduce the issue any longer with the two patches on
top of the mainline linux.
Thank you very much!
Dongli Zhang
On 12/12/23 16:36, Igor Mammedov wrote:
> Hacks to mask a race between HBA scan job and bridge re-configuration(s)
> during hotplug.
>
> I don't like it a bit but it something that could be done quickly
> and solves problems that were reported.
>
> Other options to discuss/possibly more invasive:
> 1: make sure pci_assign_unassigned_bridge_resources() doesn't reconfigure
> bridge if it's not necessary.
> 2. make SCSI_SCAN_ASYNC job wait till hotplug is finished for all slots on
> the bridge or somehow restart the job if it fails
> 3. any other ideas?
>
>
> 1st reported: https://urldefense.com/v3/__https://lore.kernel.org/r/9eb669c0-d8f2-431d-a700-6da13053ae54@proxmox.com__;!!ACWV5N9M2RV99hQ!ORo96Nh22kv1Yj0pazd3c692djoLbWscgouJoyVG1c1CNQnYz-H7nPM7RIp8N-0qQjScZ7BgORR_Lm4oMGMl$
>
> CC: Dongli Zhang <dongli.zhang@oracle.com>
> CC: linux-acpi@vger.kernel.org
> CC: linux-pci@vger.kernel.org
> CC: imammedo@redhat.com
> CC: mst@redhat.com
> CC: rafael@kernel.org
> CC: lenb@kernel.org
> CC: bhelgaas@google.com
> CC: mika.westerberg@linux.intel.com
> CC: boris.ostrovsky@oracle.com
> CC: joe.jin@oracle.com
> CC: stable@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: Fiona Ebner <f.ebner@proxmox.com>
> CC: Thomas Lamprecht <t.lamprecht@proxmox.com>
>
> Igor Mammedov (2):
> PCI: acpiphp: enable slot only if it hasn't been enabled already
> PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a
> time
>
> drivers/pci/hotplug/acpiphp_glue.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 0:36 ` [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time Igor Mammedov
2023-12-13 7:26 ` Greg KH
@ 2023-12-13 8:13 ` Dongli Zhang
2023-12-13 10:05 ` Igor Mammedov
2023-12-13 9:47 ` Fiona Ebner
2023-12-13 13:07 ` Rafael J. Wysocki
3 siblings, 1 reply; 23+ messages in thread
From: Dongli Zhang @ 2023-12-13 8:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: linux-acpi, linux-pci, mst, rafael, lenb, bhelgaas,
mika.westerberg, boris.ostrovsky, joe.jin, stable, Fiona Ebner,
Thomas Lamprecht, linux-kernel
Hi Igor,
On 12/12/23 16:36, Igor Mammedov wrote:
> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
> bridge reconfiguration in case of single HBA hotplug.
> However in virt environment it's possible to pause machine hotplug several
> HBAs and let machine run. That can hit the same race when 2nd hotplugged
Would you mind helping explain what does "pause machine hotplug several HBAs and
let machine run" indicate?
Thank you very much!
Dongli Zhang
> HBA will start re-configuring bridge.
> Do the same thing as SHPC and throttle down hotplug of 2nd and up
> devices within single hotplug event.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index 6b11609927d6..30bca2086b24 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -37,6 +37,7 @@
> #include <linux/mutex.h>
> #include <linux/slab.h>
> #include <linux/acpi.h>
> +#include <linux/delay.h>
>
> #include "../pci.h"
> #include "acpiphp.h"
> @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
> static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> {
> struct acpiphp_slot *slot;
> + int nr_hp_slots = 0;
>
> /* Bail out if the bridge is going away. */
> if (bridge->is_going_away)
> @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>
> /* configure all functions */
> if (slot->flags != SLOT_ENABLED) {
> + if (nr_hp_slots)
> + msleep(1000);
> +
> + ++nr_hp_slots;
> enable_slot(slot, true);
> }
> } else {
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already
2023-12-13 0:36 ` [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already Igor Mammedov
2023-12-13 0:37 ` kernel test robot
@ 2023-12-13 9:47 ` Fiona Ebner
2023-12-13 10:07 ` Igor Mammedov
2023-12-13 13:01 ` Rafael J. Wysocki
2 siblings, 1 reply; 23+ messages in thread
From: Fiona Ebner @ 2023-12-13 9:47 UTC (permalink / raw)
To: Igor Mammedov, linux-kernel
Cc: Dongli Zhang, linux-acpi, linux-pci, mst, rafael, lenb, bhelgaas,
mika.westerberg, boris.ostrovsky, joe.jin, stable,
Thomas Lamprecht
Am 13.12.23 um 01:36 schrieb Igor Mammedov:
> When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> adding device to bus and enabling it will kick in async host scan
>
> scsi_scan_host+0x21/0x1f0
> virtscsi_probe+0x2dd/0x350
> ..
> driver_probe_device+0x19/0x80
> ...
> driver_probe_device+0x19/0x80
> pci_bus_add_device+0x53/0x80
> pci_bus_add_devices+0x2b/0x70
> ...
>
> which will schedule a job for async scan. That however breaks
> if there are more than one SCSI host behind bridge, since
> acpiphp_check_bridge() will walk over all slots and try to
> enable each of them regardless of whether they were already
> enabled.
> As result the bridge might be reconfigured several times
> and trigger following sequence:
>
> [cpu 0] acpiphp_check_bridge()
> [cpu 0] enable_slot(a)
> [cpu 0] configure bridge
> [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1)
> [cpu 0] enable_slot(b)
> ...
> [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
> ...
> [cpu 0] configure bridge <- temporaly disables bridge
>
> and cause do_scsi_scan_host() failure.
> The same race affects SHPC (but it manages to avoid hitting the race due to
> 1sec delay when enabling slot).
> To cover case of single device hotplug (at a time) do not attempt to
> enable slot that have already been enabled.
>
> Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reported-by: iona Ebner <f.ebner@proxmox.com>
Missing an F here ;)
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Thank you! Works for me:
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 0:36 ` [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time Igor Mammedov
2023-12-13 7:26 ` Greg KH
2023-12-13 8:13 ` Dongli Zhang
@ 2023-12-13 9:47 ` Fiona Ebner
2023-12-13 13:07 ` Rafael J. Wysocki
3 siblings, 0 replies; 23+ messages in thread
From: Fiona Ebner @ 2023-12-13 9:47 UTC (permalink / raw)
To: Igor Mammedov, linux-kernel
Cc: Dongli Zhang, linux-acpi, linux-pci, mst, rafael, lenb, bhelgaas,
mika.westerberg, boris.ostrovsky, joe.jin, stable,
Thomas Lamprecht
Am 13.12.23 um 01:36 schrieb Igor Mammedov:
> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
> bridge reconfiguration in case of single HBA hotplug.
> However in virt environment it's possible to pause machine hotplug several
> HBAs and let machine run. That can hit the same race when 2nd hotplugged
> HBA will start re-configuring bridge.
> Do the same thing as SHPC and throttle down hotplug of 2nd and up
> devices within single hotplug event.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
With only the first patch applied, I could reproduce the issue described
here, i.e. pausing the vCPUs while doing multiple hotplugs and this
patch makes that scenario work too:
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 8:13 ` Dongli Zhang
@ 2023-12-13 10:05 ` Igor Mammedov
2023-12-13 17:25 ` Dongli Zhang
0 siblings, 1 reply; 23+ messages in thread
From: Igor Mammedov @ 2023-12-13 10:05 UTC (permalink / raw)
To: Dongli Zhang
Cc: linux-acpi, linux-pci, mst, rafael, lenb, bhelgaas,
mika.westerberg, boris.ostrovsky, joe.jin, stable, Fiona Ebner,
Thomas Lamprecht, linux-kernel
On Wed, 13 Dec 2023 00:13:37 -0800
Dongli Zhang <dongli.zhang@oracle.com> wrote:
> Hi Igor,
>
>
> On 12/12/23 16:36, Igor Mammedov wrote:
> > previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
> > introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
> > bridge reconfiguration in case of single HBA hotplug.
> > However in virt environment it's possible to pause machine hotplug several
> > HBAs and let machine run. That can hit the same race when 2nd hotplugged
>
> Would you mind helping explain what does "pause machine hotplug several HBAs and
> let machine run" indicate?
qemu example would be:
{qemu) stop
(qemu) device_add device_add vhost-scsi-pci,wwpn=naa.5001405324af0985,id=vhost01,bus=bridge1,addr=8
(qemu) device_add vhost-scsi-pci,wwpn=naa.5001405324af0986,id=vhost02,bus=bridge1,addr=0
(qemu) cont
this way when machine continues to run acpiphp code will see 2 HBAs at once
and try to process one right after another. So [1/2] patch is not enough
to cover above case, and hence the same hack SHPC employs by adding delay.
However 2 separate hotplug events as in your reproducer should be covered
by the 1st patch.
> Thank you very much!
>
> Dongli Zhang
>
> > HBA will start re-configuring bridge.
> > Do the same thing as SHPC and throttle down hotplug of 2nd and up
> > devices within single hotplug event.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> > index 6b11609927d6..30bca2086b24 100644
> > --- a/drivers/pci/hotplug/acpiphp_glue.c
> > +++ b/drivers/pci/hotplug/acpiphp_glue.c
> > @@ -37,6 +37,7 @@
> > #include <linux/mutex.h>
> > #include <linux/slab.h>
> > #include <linux/acpi.h>
> > +#include <linux/delay.h>
> >
> > #include "../pci.h"
> > #include "acpiphp.h"
> > @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
> > static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> > {
> > struct acpiphp_slot *slot;
> > + int nr_hp_slots = 0;
> >
> > /* Bail out if the bridge is going away. */
> > if (bridge->is_going_away)
> > @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> >
> > /* configure all functions */
> > if (slot->flags != SLOT_ENABLED) {
> > + if (nr_hp_slots)
> > + msleep(1000);
> > +
> > + ++nr_hp_slots;
> > enable_slot(slot, true);
> > }
> > } else {
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already
2023-12-13 9:47 ` Fiona Ebner
@ 2023-12-13 10:07 ` Igor Mammedov
0 siblings, 0 replies; 23+ messages in thread
From: Igor Mammedov @ 2023-12-13 10:07 UTC (permalink / raw)
To: Fiona Ebner
Cc: linux-kernel, Dongli Zhang, linux-acpi, linux-pci, mst, rafael,
lenb, bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Thomas Lamprecht
On Wed, 13 Dec 2023 10:47:27 +0100
Fiona Ebner <f.ebner@proxmox.com> wrote:
> Am 13.12.23 um 01:36 schrieb Igor Mammedov:
> > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> > adding device to bus and enabling it will kick in async host scan
> >
> > scsi_scan_host+0x21/0x1f0
> > virtscsi_probe+0x2dd/0x350
> > ..
> > driver_probe_device+0x19/0x80
> > ...
> > driver_probe_device+0x19/0x80
> > pci_bus_add_device+0x53/0x80
> > pci_bus_add_devices+0x2b/0x70
> > ...
> >
> > which will schedule a job for async scan. That however breaks
> > if there are more than one SCSI host behind bridge, since
> > acpiphp_check_bridge() will walk over all slots and try to
> > enable each of them regardless of whether they were already
> > enabled.
> > As result the bridge might be reconfigured several times
> > and trigger following sequence:
> >
> > [cpu 0] acpiphp_check_bridge()
> > [cpu 0] enable_slot(a)
> > [cpu 0] configure bridge
> > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1)
> > [cpu 0] enable_slot(b)
> > ...
> > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
> > ...
> > [cpu 0] configure bridge <- temporaly disables bridge
> >
> > and cause do_scsi_scan_host() failure.
> > The same race affects SHPC (but it manages to avoid hitting the race due to
> > 1sec delay when enabling slot).
> > To cover case of single device hotplug (at a time) do not attempt to
> > enable slot that have already been enabled.
> >
> > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> > Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> > Reported-by: iona Ebner <f.ebner@proxmox.com>
>
> Missing an F here ;)
Sorry for copypaste mistake, I'll fix it up on the next submission.
>
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>
> Thank you! Works for me:
>
> Tested-by: Fiona Ebner <f.ebner@proxmox.com>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already
2023-12-13 0:36 ` [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already Igor Mammedov
2023-12-13 0:37 ` kernel test robot
2023-12-13 9:47 ` Fiona Ebner
@ 2023-12-13 13:01 ` Rafael J. Wysocki
2023-12-13 16:06 ` Igor Mammedov
2 siblings, 1 reply; 23+ messages in thread
From: Rafael J. Wysocki @ 2023-12-13 13:01 UTC (permalink / raw)
To: Igor Mammedov
Cc: linux-kernel, Dongli Zhang, linux-acpi, linux-pci, mst, rafael,
lenb, bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Fiona Ebner, Thomas Lamprecht
On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
>
> When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> adding device to bus and enabling it will kick in async host scan
>
> scsi_scan_host+0x21/0x1f0
> virtscsi_probe+0x2dd/0x350
> ..
> driver_probe_device+0x19/0x80
> ...
> driver_probe_device+0x19/0x80
> pci_bus_add_device+0x53/0x80
> pci_bus_add_devices+0x2b/0x70
> ...
>
> which will schedule a job for async scan. That however breaks
> if there are more than one SCSI host behind bridge, since
> acpiphp_check_bridge() will walk over all slots and try to
> enable each of them regardless of whether they were already
> enabled.
> As result the bridge might be reconfigured several times
> and trigger following sequence:
>
> [cpu 0] acpiphp_check_bridge()
> [cpu 0] enable_slot(a)
> [cpu 0] configure bridge
> [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1)
> [cpu 0] enable_slot(b)
> ...
> [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
> ...
> [cpu 0] configure bridge <- temporaly disables bridge
>
> and cause do_scsi_scan_host() failure.
> The same race affects SHPC (but it manages to avoid hitting the race due to
> 1sec delay when enabling slot).
> To cover case of single device hotplug (at a time) do not attempt to
> enable slot that have already been enabled.
>
> Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reported-by: iona Ebner <f.ebner@proxmox.com>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> drivers/pci/hotplug/acpiphp_glue.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index 601129772b2d..6b11609927d6 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> trim_stale_devices(dev);
>
> /* configure all functions */
> - enable_slot(slot, true);
> + if (slot->flags != SLOT_ENABLED) {
> + enable_slot(slot, true);
> + }
Shouldn't this be following the acpiphp_enable_slot() pattern, that is
if (!(slot->flags & SLOT_ENABLED))
enable_slot(slot, true);
Also the braces are redundant.
> } else {
> disable_slot(slot);
> }
> --
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 0:36 ` [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time Igor Mammedov
` (2 preceding siblings ...)
2023-12-13 9:47 ` Fiona Ebner
@ 2023-12-13 13:07 ` Rafael J. Wysocki
2023-12-13 16:49 ` Igor Mammedov
3 siblings, 1 reply; 23+ messages in thread
From: Rafael J. Wysocki @ 2023-12-13 13:07 UTC (permalink / raw)
To: Igor Mammedov
Cc: linux-kernel, Dongli Zhang, linux-acpi, linux-pci, mst, rafael,
lenb, bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Fiona Ebner, Thomas Lamprecht
On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
>
> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
> bridge reconfiguration in case of single HBA hotplug.
> However in virt environment it's possible to pause machine hotplug several
> HBAs and let machine run. That can hit the same race when 2nd hotplugged
> HBA will start re-configuring bridge.
> Do the same thing as SHPC and throttle down hotplug of 2nd and up
> devices within single hotplug event.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index 6b11609927d6..30bca2086b24 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -37,6 +37,7 @@
> #include <linux/mutex.h>
> #include <linux/slab.h>
> #include <linux/acpi.h>
> +#include <linux/delay.h>
>
> #include "../pci.h"
> #include "acpiphp.h"
> @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
> static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> {
> struct acpiphp_slot *slot;
> + int nr_hp_slots = 0;
>
> /* Bail out if the bridge is going away. */
> if (bridge->is_going_away)
> @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>
> /* configure all functions */
> if (slot->flags != SLOT_ENABLED) {
> + if (nr_hp_slots)
> + msleep(1000);
Why is 1000 considered the most suitable number here? Any chance to
define a symbol for it?
And won't this affect the cases when the race in question is not a concern?
Also, adding arbitrary timeouts is not the most robust way of
addressing race conditions IMV. Wouldn't it be better to add some
proper synchronization between the pieces of code that can race with
each other?
> +
> + ++nr_hp_slots;
> enable_slot(slot, true);
> }
> } else {
> --
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already
2023-12-13 13:01 ` Rafael J. Wysocki
@ 2023-12-13 16:06 ` Igor Mammedov
0 siblings, 0 replies; 23+ messages in thread
From: Igor Mammedov @ 2023-12-13 16:06 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: linux-kernel, Dongli Zhang, linux-acpi, linux-pci, mst, lenb,
bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Fiona Ebner, Thomas Lamprecht
On Wed, Dec 13, 2023 at 2:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
> >
> > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> > adding device to bus and enabling it will kick in async host scan
> >
> > scsi_scan_host+0x21/0x1f0
> > virtscsi_probe+0x2dd/0x350
> > ..
> > driver_probe_device+0x19/0x80
> > ...
> > driver_probe_device+0x19/0x80
> > pci_bus_add_device+0x53/0x80
> > pci_bus_add_devices+0x2b/0x70
> > ...
> >
> > which will schedule a job for async scan. That however breaks
> > if there are more than one SCSI host behind bridge, since
> > acpiphp_check_bridge() will walk over all slots and try to
> > enable each of them regardless of whether they were already
> > enabled.
> > As result the bridge might be reconfigured several times
> > and trigger following sequence:
> >
> > [cpu 0] acpiphp_check_bridge()
> > [cpu 0] enable_slot(a)
> > [cpu 0] configure bridge
> > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1)
> > [cpu 0] enable_slot(b)
> > ...
> > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
> > ...
> > [cpu 0] configure bridge <- temporaly disables bridge
> >
> > and cause do_scsi_scan_host() failure.
> > The same race affects SHPC (but it manages to avoid hitting the race due to
> > 1sec delay when enabling slot).
> > To cover case of single device hotplug (at a time) do not attempt to
> > enable slot that have already been enabled.
> >
> > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> > Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> > Reported-by: iona Ebner <f.ebner@proxmox.com>
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > drivers/pci/hotplug/acpiphp_glue.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> > index 601129772b2d..6b11609927d6 100644
> > --- a/drivers/pci/hotplug/acpiphp_glue.c
> > +++ b/drivers/pci/hotplug/acpiphp_glue.c
> > @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> > trim_stale_devices(dev);
> >
> > /* configure all functions */
> > - enable_slot(slot, true);
> > + if (slot->flags != SLOT_ENABLED) {
> > + enable_slot(slot, true);
> > + }
>
> Shouldn't this be following the acpiphp_enable_slot() pattern, that is
>
> if (!(slot->flags & SLOT_ENABLED))
> enable_slot(slot, true);
>
> Also the braces are redundant.
I'll fix up on respin if Bjorn is fine with the approach in general.
Patches need respin anyways to fix botched up white spacing.
>
> > } else {
> > disable_slot(slot);
> > }
> > --
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 13:07 ` Rafael J. Wysocki
@ 2023-12-13 16:49 ` Igor Mammedov
2023-12-13 16:54 ` Michael S. Tsirkin
0 siblings, 1 reply; 23+ messages in thread
From: Igor Mammedov @ 2023-12-13 16:49 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: linux-kernel, Dongli Zhang, linux-acpi, linux-pci, mst, lenb,
bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Fiona Ebner, Thomas Lamprecht
On Wed, Dec 13, 2023 at 2:08 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
> >
> > previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
> > introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
> > bridge reconfiguration in case of single HBA hotplug.
> > However in virt environment it's possible to pause machine hotplug several
> > HBAs and let machine run. That can hit the same race when 2nd hotplugged
> > HBA will start re-configuring bridge.
> > Do the same thing as SHPC and throttle down hotplug of 2nd and up
> > devices within single hotplug event.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> > index 6b11609927d6..30bca2086b24 100644
> > --- a/drivers/pci/hotplug/acpiphp_glue.c
> > +++ b/drivers/pci/hotplug/acpiphp_glue.c
> > @@ -37,6 +37,7 @@
> > #include <linux/mutex.h>
> > #include <linux/slab.h>
> > #include <linux/acpi.h>
> > +#include <linux/delay.h>
> >
> > #include "../pci.h"
> > #include "acpiphp.h"
> > @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
> > static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> > {
> > struct acpiphp_slot *slot;
> > + int nr_hp_slots = 0;
> >
> > /* Bail out if the bridge is going away. */
> > if (bridge->is_going_away)
> > @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> >
> > /* configure all functions */
> > if (slot->flags != SLOT_ENABLED) {
> > + if (nr_hp_slots)
> > + msleep(1000);
>
> Why is 1000 considered the most suitable number here? Any chance to
> define a symbol for it?
Timeout was borrowed from SHPC hotplug workflow where it apparently
makes race harder to reproduce.
(though it's not excuse to add more timeouts elsewhere)
> And won't this affect the cases when the race in question is not a concern?
In practice it's not likely, since even in virt scenario hypervisor won't
stop VM to hotplug device (which beats whole purpose of hotplug).
But in case of a very slow VM (overcommit case) it's possible for
several HBA's to be hotplugged by the time acpiphp gets a chance
to handle the 1st hotplug event. SHPC is more or less 'safe' with its
1sec delay.
> Also, adding arbitrary timeouts is not the most robust way of
> addressing race conditions IMV. Wouldn't it be better to add some
> proper synchronization between the pieces of code that can race with
> each other?
I don't like it either, it's a stop gap measure to hide regression on
short notice,
which I can fixup without much risk in short time left, before folks
leave on holidays.
It's fine to drop the patch as chances of this happening are small.
[1/2] should cover reported cases.
Since it's RFC, I basically ask for opinions on a proper way to fix
SCSI_ASYNC_SCAN
running wild while the hotplug is in progress (and maybe SCSI is not
the only user that
schedules async job from device probe). So adding synchronisation and testing
would take time (not something I'd do this late in the cycle).
So far I'm thinking about adding rw mutex to bridge with the PCI
hotplug subsystem
being a writer while scsi scan jobs would be readers and wait till hotplug code
says it's safe to proceed.
I plan to work in this direction and give it some testing, unless
someone has a better idea.
>
> > +
> > + ++nr_hp_slots;
> > enable_slot(slot, true);
> > }
> > } else {
> > --
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 16:49 ` Igor Mammedov
@ 2023-12-13 16:54 ` Michael S. Tsirkin
2023-12-13 17:09 ` Dongli Zhang
2023-12-13 18:50 ` Igor Mammedov
0 siblings, 2 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2023-12-13 16:54 UTC (permalink / raw)
To: Igor Mammedov
Cc: Rafael J. Wysocki, linux-kernel, Dongli Zhang, linux-acpi,
linux-pci, lenb, bhelgaas, mika.westerberg, boris.ostrovsky,
joe.jin, stable, Fiona Ebner, Thomas Lamprecht
On Wed, Dec 13, 2023 at 05:49:39PM +0100, Igor Mammedov wrote:
> On Wed, Dec 13, 2023 at 2:08 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
> > >
> > > previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
> > > introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
> > > bridge reconfiguration in case of single HBA hotplug.
> > > However in virt environment it's possible to pause machine hotplug several
> > > HBAs and let machine run. That can hit the same race when 2nd hotplugged
> > > HBA will start re-configuring bridge.
> > > Do the same thing as SHPC and throttle down hotplug of 2nd and up
> > > devices within single hotplug event.
> > >
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > > drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> > > index 6b11609927d6..30bca2086b24 100644
> > > --- a/drivers/pci/hotplug/acpiphp_glue.c
> > > +++ b/drivers/pci/hotplug/acpiphp_glue.c
> > > @@ -37,6 +37,7 @@
> > > #include <linux/mutex.h>
> > > #include <linux/slab.h>
> > > #include <linux/acpi.h>
> > > +#include <linux/delay.h>
> > >
> > > #include "../pci.h"
> > > #include "acpiphp.h"
> > > @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
> > > static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> > > {
> > > struct acpiphp_slot *slot;
> > > + int nr_hp_slots = 0;
> > >
> > > /* Bail out if the bridge is going away. */
> > > if (bridge->is_going_away)
> > > @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> > >
> > > /* configure all functions */
> > > if (slot->flags != SLOT_ENABLED) {
> > > + if (nr_hp_slots)
> > > + msleep(1000);
> >
> > Why is 1000 considered the most suitable number here? Any chance to
> > define a symbol for it?
>
> Timeout was borrowed from SHPC hotplug workflow where it apparently
> makes race harder to reproduce.
> (though it's not excuse to add more timeouts elsewhere)
>
> > And won't this affect the cases when the race in question is not a concern?
>
> In practice it's not likely, since even in virt scenario hypervisor won't
> stop VM to hotplug device (which beats whole purpose of hotplug).
>
> But in case of a very slow VM (overcommit case) it's possible for
> several HBA's to be hotplugged by the time acpiphp gets a chance
> to handle the 1st hotplug event. SHPC is more or less 'safe' with its
> 1sec delay.
>
> > Also, adding arbitrary timeouts is not the most robust way of
> > addressing race conditions IMV. Wouldn't it be better to add some
> > proper synchronization between the pieces of code that can race with
> > each other?
>
> I don't like it either, it's a stop gap measure to hide regression on
> short notice,
> which I can fixup without much risk in short time left, before folks
> leave on holidays.
> It's fine to drop the patch as chances of this happening are small.
> [1/2] should cover reported cases.
>
> Since it's RFC, I basically ask for opinions on a proper way to fix
> SCSI_ASYNC_SCAN
> running wild while the hotplug is in progress (and maybe SCSI is not
> the only user that
> schedules async job from device probe).
Of course not. And things don't have to be scheduled from probe right?
Can be triggered by an interrupt or userspace activity.
> So adding synchronisation and testing
> would take time (not something I'd do this late in the cycle).
>
> So far I'm thinking about adding rw mutex to bridge with the PCI
> hotplug subsystem
> being a writer while scsi scan jobs would be readers and wait till hotplug code
> says it's safe to proceed.
> I plan to work in this direction and give it some testing, unless
> someone has a better idea.
> >
> > > +
> > > + ++nr_hp_slots;
> > > enable_slot(slot, true);
> > > }
> > > } else {
> > > --
> >
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 16:54 ` Michael S. Tsirkin
@ 2023-12-13 17:09 ` Dongli Zhang
2024-01-03 9:54 ` Igor Mammedov
2023-12-13 18:50 ` Igor Mammedov
1 sibling, 1 reply; 23+ messages in thread
From: Dongli Zhang @ 2023-12-13 17:09 UTC (permalink / raw)
To: Michael S. Tsirkin, Igor Mammedov
Cc: Rafael J. Wysocki, linux-kernel, linux-acpi, linux-pci, lenb,
bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Fiona Ebner, Thomas Lamprecht
Hi Igor,
On 12/13/23 08:54, Michael S. Tsirkin wrote:
> On Wed, Dec 13, 2023 at 05:49:39PM +0100, Igor Mammedov wrote:
>> On Wed, Dec 13, 2023 at 2:08 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>>
>>> On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
>>>>
>>>> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
>>>> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
>>>> bridge reconfiguration in case of single HBA hotplug.
>>>> However in virt environment it's possible to pause machine hotplug several
>>>> HBAs and let machine run. That can hit the same race when 2nd hotplugged
>>>> HBA will start re-configuring bridge.
>>>> Do the same thing as SHPC and throttle down hotplug of 2nd and up
>>>> devices within single hotplug event.
>>>>
>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>>> ---
>>>> drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
>>>> 1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
>>>> index 6b11609927d6..30bca2086b24 100644
>>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>>>> @@ -37,6 +37,7 @@
>>>> #include <linux/mutex.h>
>>>> #include <linux/slab.h>
>>>> #include <linux/acpi.h>
>>>> +#include <linux/delay.h>
>>>>
>>>> #include "../pci.h"
>>>> #include "acpiphp.h"
>>>> @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
>>>> static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>>>> {
>>>> struct acpiphp_slot *slot;
>>>> + int nr_hp_slots = 0;
>>>>
>>>> /* Bail out if the bridge is going away. */
>>>> if (bridge->is_going_away)
>>>> @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>>>>
>>>> /* configure all functions */
>>>> if (slot->flags != SLOT_ENABLED) {
>>>> + if (nr_hp_slots)
>>>> + msleep(1000);
>>>
>>> Why is 1000 considered the most suitable number here? Any chance to
>>> define a symbol for it?
>>
>> Timeout was borrowed from SHPC hotplug workflow where it apparently
>> makes race harder to reproduce.
>> (though it's not excuse to add more timeouts elsewhere)
>>
>>> And won't this affect the cases when the race in question is not a concern?
>>
>> In practice it's not likely, since even in virt scenario hypervisor won't
>> stop VM to hotplug device (which beats whole purpose of hotplug).
>>
>> But in case of a very slow VM (overcommit case) it's possible for
>> several HBA's to be hotplugged by the time acpiphp gets a chance
>> to handle the 1st hotplug event. SHPC is more or less 'safe' with its
>> 1sec delay.
>>
>>> Also, adding arbitrary timeouts is not the most robust way of
>>> addressing race conditions IMV. Wouldn't it be better to add some
>>> proper synchronization between the pieces of code that can race with
>>> each other?
>>
>> I don't like it either, it's a stop gap measure to hide regression on
>> short notice,
>> which I can fixup without much risk in short time left, before folks
>> leave on holidays.
>> It's fine to drop the patch as chances of this happening are small.
>> [1/2] should cover reported cases.
>>
>> Since it's RFC, I basically ask for opinions on a proper way to fix
>> SCSI_ASYNC_SCAN
>> running wild while the hotplug is in progress (and maybe SCSI is not
>> the only user that
>> schedules async job from device probe).
>
> Of course not. And things don't have to be scheduled from probe right?
> Can be triggered by an interrupt or userspace activity.
I agree with Michael. TBH, I am curious if the two patches can
workaround/resolve the issue.
Would you mind helping explain if to run enable_slot() for a new PCI device can
impact the other PCI devices existing on the bridge?
E.g.,:
1. Attach several virtio-scsi or virtio-net on the same bridge.
2. Trigger workload for those PCI devices. They may do mmio write to kick the
doorbell (to trigger KVM/QEMU ioeventfd) very frequently.
3. Now hot-add an extra PCI device. Since the slot is never enabled, it enables
the slot via enable_slot().
Can I assume the last enable_slot() will temporarily re-configure the bridge
window so that all other PCI devices' mmio will lose effect at that time point?
Since drivers always kick the doorbell conditionally, they may hang forever.
As I have reported, we used to have the similar issue.
PCI: Probe bridge window attributes once at enumeration-time
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=51c48b310183ab6ba5419edfc6a8de889cc04521
Therefore, can I assume the issue is not because to re-enable an already-enabled
slot, but to touch the bridge window for more than once?
Thank you very much!
Dongli Zhang
>
>> So adding synchronisation and testing
>> would take time (not something I'd do this late in the cycle).
>>
>> So far I'm thinking about adding rw mutex to bridge with the PCI
>> hotplug subsystem
>> being a writer while scsi scan jobs would be readers and wait till hotplug code
>> says it's safe to proceed.
>> I plan to work in this direction and give it some testing, unless
>> someone has a better idea.
>
>>>
>>>> +
>>>> + ++nr_hp_slots;
>>>> enable_slot(slot, true);
>>>> }
>>>> } else {
>>>> --
>>>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 10:05 ` Igor Mammedov
@ 2023-12-13 17:25 ` Dongli Zhang
0 siblings, 0 replies; 23+ messages in thread
From: Dongli Zhang @ 2023-12-13 17:25 UTC (permalink / raw)
To: Igor Mammedov
Cc: linux-acpi, linux-pci, mst, rafael, lenb, bhelgaas,
mika.westerberg, boris.ostrovsky, joe.jin, stable, Fiona Ebner,
Thomas Lamprecht, linux-kernel
Hi Igor,
On 12/13/23 02:05, Igor Mammedov wrote:
> On Wed, 13 Dec 2023 00:13:37 -0800
> Dongli Zhang <dongli.zhang@oracle.com> wrote:
>
>> Hi Igor,
>>
>>
>> On 12/12/23 16:36, Igor Mammedov wrote:
>>> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
>>> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
>>> bridge reconfiguration in case of single HBA hotplug.
>>> However in virt environment it's possible to pause machine hotplug several
>>> HBAs and let machine run. That can hit the same race when 2nd hotplugged
>>
>> Would you mind helping explain what does "pause machine hotplug several HBAs and
>> let machine run" indicate?
>
> qemu example would be:
> {qemu) stop
> (qemu) device_add device_add vhost-scsi-pci,wwpn=naa.5001405324af0985,id=vhost01,bus=bridge1,addr=8
> (qemu) device_add vhost-scsi-pci,wwpn=naa.5001405324af0986,id=vhost02,bus=bridge1,addr=0
> (qemu) cont
>
> this way when machine continues to run acpiphp code will see 2 HBAs at once
> and try to process one right after another. So [1/2] patch is not enough
> to cover above case, and hence the same hack SHPC employs by adding delay.
> However 2 separate hotplug events as in your reproducer should be covered
> by the 1st patch.
Thank you very much for the explanation.
That indicates the two PCI devices will be detected and enabled in the same
event. Neither of the two PCI devices used to be enabled.
As mentioned in another email, I do not think this is the way to even workaround
the issue, because there are other ways to do mmio at the same time point.
Dongli Zhang
>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>>> HBA will start re-configuring bridge.
>>> Do the same thing as SHPC and throttle down hotplug of 2nd and up
>>> devices within single hotplug event.
>>>
>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>> ---
>>> drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
>>> index 6b11609927d6..30bca2086b24 100644
>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>>> @@ -37,6 +37,7 @@
>>> #include <linux/mutex.h>
>>> #include <linux/slab.h>
>>> #include <linux/acpi.h>
>>> +#include <linux/delay.h>
>>>
>>> #include "../pci.h"
>>> #include "acpiphp.h"
>>> @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
>>> static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>>> {
>>> struct acpiphp_slot *slot;
>>> + int nr_hp_slots = 0;
>>>
>>> /* Bail out if the bridge is going away. */
>>> if (bridge->is_going_away)
>>> @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>>>
>>> /* configure all functions */
>>> if (slot->flags != SLOT_ENABLED) {
>>> + if (nr_hp_slots)
>>> + msleep(1000);
>>> +
>>> + ++nr_hp_slots;
>>> enable_slot(slot, true);
>>> }
>>> } else {
>>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job
2023-12-13 0:36 [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Igor Mammedov
` (2 preceding siblings ...)
2023-12-13 8:12 ` [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Dongli Zhang
@ 2023-12-13 18:11 ` Bjorn Helgaas
2023-12-13 18:12 ` Rafael J. Wysocki
3 siblings, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2023-12-13 18:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: linux-kernel, Dongli Zhang, linux-acpi, linux-pci, mst, rafael,
lenb, bhelgaas, mika.westerberg, boris.ostrovsky, joe.jin, stable,
Fiona Ebner, Thomas Lamprecht
On Wed, Dec 13, 2023 at 01:36:12AM +0100, Igor Mammedov wrote:
> Hacks to mask a race between HBA scan job and bridge re-configuration(s)
> during hotplug.
>
> I don't like it a bit but it something that could be done quickly
> and solves problems that were reported.
I agree, I don't like it either. Adding a 1s delay doesn't address
the real problem, and putting in a band-aid like this means the real
problem would likely never be addressed.
At this point the best option I see is to revert these:
cc22522fd55e2 ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus")
40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary")
I hate the fact that reverting them would mean the root bus hotplug
and ACPI bus check notifications would become issues again.
But keeping these commits even though they add a new different problem
that breaks things for somebody else seems worse to me.
Bjorn
> Other options to discuss/possibly more invasive:
> 1: make sure pci_assign_unassigned_bridge_resources() doesn't reconfigure
> bridge if it's not necessary.
> 2. make SCSI_SCAN_ASYNC job wait till hotplug is finished for all slots on
> the bridge or somehow restart the job if it fails
> 3. any other ideas?
>
>
> 1st reported: https://lore.kernel.org/r/9eb669c0-d8f2-431d-a700-6da13053ae54@proxmox.com
>
> CC: Dongli Zhang <dongli.zhang@oracle.com>
> CC: linux-acpi@vger.kernel.org
> CC: linux-pci@vger.kernel.org
> CC: imammedo@redhat.com
> CC: mst@redhat.com
> CC: rafael@kernel.org
> CC: lenb@kernel.org
> CC: bhelgaas@google.com
> CC: mika.westerberg@linux.intel.com
> CC: boris.ostrovsky@oracle.com
> CC: joe.jin@oracle.com
> CC: stable@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: Fiona Ebner <f.ebner@proxmox.com>
> CC: Thomas Lamprecht <t.lamprecht@proxmox.com>
>
> Igor Mammedov (2):
> PCI: acpiphp: enable slot only if it hasn't been enabled already
> PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a
> time
>
> drivers/pci/hotplug/acpiphp_glue.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job
2023-12-13 18:11 ` Bjorn Helgaas
@ 2023-12-13 18:12 ` Rafael J. Wysocki
0 siblings, 0 replies; 23+ messages in thread
From: Rafael J. Wysocki @ 2023-12-13 18:12 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Igor Mammedov, linux-kernel, Dongli Zhang, linux-acpi, linux-pci,
mst, rafael, lenb, bhelgaas, mika.westerberg, boris.ostrovsky,
joe.jin, stable, Fiona Ebner, Thomas Lamprecht
On Wed, Dec 13, 2023 at 7:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Dec 13, 2023 at 01:36:12AM +0100, Igor Mammedov wrote:
> > Hacks to mask a race between HBA scan job and bridge re-configuration(s)
> > during hotplug.
> >
> > I don't like it a bit but it something that could be done quickly
> > and solves problems that were reported.
>
> I agree, I don't like it either. Adding a 1s delay doesn't address
> the real problem, and putting in a band-aid like this means the real
> problem would likely never be addressed.
>
> At this point the best option I see is to revert these:
>
> cc22522fd55e2 ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus")
> 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary")
>
> I hate the fact that reverting them would mean the root bus hotplug
> and ACPI bus check notifications would become issues again.
>
> But keeping these commits even though they add a new different problem
> that breaks things for somebody else seems worse to me.
Agreed.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 16:54 ` Michael S. Tsirkin
2023-12-13 17:09 ` Dongli Zhang
@ 2023-12-13 18:50 ` Igor Mammedov
1 sibling, 0 replies; 23+ messages in thread
From: Igor Mammedov @ 2023-12-13 18:50 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Rafael J. Wysocki, linux-kernel, Dongli Zhang, linux-acpi,
linux-pci, lenb, bhelgaas, mika.westerberg, boris.ostrovsky,
joe.jin, stable, Fiona Ebner, Thomas Lamprecht
On Wed, Dec 13, 2023 at 5:54 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Dec 13, 2023 at 05:49:39PM +0100, Igor Mammedov wrote:
> > On Wed, Dec 13, 2023 at 2:08 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
> > > >
> > > > previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
> > > > introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
> > > > bridge reconfiguration in case of single HBA hotplug.
> > > > However in virt environment it's possible to pause machine hotplug several
> > > > HBAs and let machine run. That can hit the same race when 2nd hotplugged
> > > > HBA will start re-configuring bridge.
> > > > Do the same thing as SHPC and throttle down hotplug of 2nd and up
> > > > devices within single hotplug event.
> > > >
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > ---
> > > > drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> > > > index 6b11609927d6..30bca2086b24 100644
> > > > --- a/drivers/pci/hotplug/acpiphp_glue.c
> > > > +++ b/drivers/pci/hotplug/acpiphp_glue.c
> > > > @@ -37,6 +37,7 @@
> > > > #include <linux/mutex.h>
> > > > #include <linux/slab.h>
> > > > #include <linux/acpi.h>
> > > > +#include <linux/delay.h>
> > > >
> > > > #include "../pci.h"
> > > > #include "acpiphp.h"
> > > > @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
> > > > static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> > > > {
> > > > struct acpiphp_slot *slot;
> > > > + int nr_hp_slots = 0;
> > > >
> > > > /* Bail out if the bridge is going away. */
> > > > if (bridge->is_going_away)
> > > > @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> > > >
> > > > /* configure all functions */
> > > > if (slot->flags != SLOT_ENABLED) {
> > > > + if (nr_hp_slots)
> > > > + msleep(1000);
> > >
> > > Why is 1000 considered the most suitable number here? Any chance to
> > > define a symbol for it?
> >
> > Timeout was borrowed from SHPC hotplug workflow where it apparently
> > makes race harder to reproduce.
> > (though it's not excuse to add more timeouts elsewhere)
> >
> > > And won't this affect the cases when the race in question is not a concern?
> >
> > In practice it's not likely, since even in virt scenario hypervisor won't
> > stop VM to hotplug device (which beats whole purpose of hotplug).
> >
> > But in case of a very slow VM (overcommit case) it's possible for
> > several HBA's to be hotplugged by the time acpiphp gets a chance
> > to handle the 1st hotplug event. SHPC is more or less 'safe' with its
> > 1sec delay.
> >
> > > Also, adding arbitrary timeouts is not the most robust way of
> > > addressing race conditions IMV. Wouldn't it be better to add some
> > > proper synchronization between the pieces of code that can race with
> > > each other?
> >
> > I don't like it either, it's a stop gap measure to hide regression on
> > short notice,
> > which I can fixup without much risk in short time left, before folks
> > leave on holidays.
> > It's fine to drop the patch as chances of this happening are small.
> > [1/2] should cover reported cases.
> >
> > Since it's RFC, I basically ask for opinions on a proper way to fix
> > SCSI_ASYNC_SCAN
> > running wild while the hotplug is in progress (and maybe SCSI is not
> > the only user that
> > schedules async job from device probe).
>
> Of course not. And things don't have to be scheduled from probe right?
> Can be triggered by an interrupt or userspace activity.
Maybe, but it would probably depend on driver/device.
For HBA case, we probably can't depend on iqr or a userspace activity.
Current expectations are that after hotplug HBA will show up along with
drives attached to it. I suppose udev can kick off scan on HBA after device
appears but it is still postponing the same race just elsewhere.
Not to mention making the whole system more complicated/fragile.
Also async scan during hotplug begs a question, it does speed up
boot process with several HBA. But how much sense it makes to do so
at hotplug time where resources are plugged on demand
(synchronous scan might even be better).
> > So adding synchronisation and testing
> > would take time (not something I'd do this late in the cycle).
> >
> > So far I'm thinking about adding rw mutex to bridge with the PCI
> > hotplug subsystem
> > being a writer while scsi scan jobs would be readers and wait till hotplug code
> > says it's safe to proceed.
> > I plan to work in this direction and give it some testing, unless
> > someone has a better idea.
>
> > >
> > > > +
> > > > + ++nr_hp_slots;
> > > > enable_slot(slot, true);
> > > > }
> > > > } else {
> > > > --
> > >
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2023-12-13 17:09 ` Dongli Zhang
@ 2024-01-03 9:54 ` Igor Mammedov
2024-01-03 16:19 ` Dongli Zhang
0 siblings, 1 reply; 23+ messages in thread
From: Igor Mammedov @ 2024-01-03 9:54 UTC (permalink / raw)
To: Dongli Zhang
Cc: Michael S. Tsirkin, Rafael J. Wysocki, linux-kernel, linux-acpi,
linux-pci, lenb, bhelgaas, mika.westerberg, boris.ostrovsky,
joe.jin, stable, Fiona Ebner, Thomas Lamprecht
On Wed, 13 Dec 2023 09:09:18 -0800
Dongli Zhang <dongli.zhang@oracle.com> wrote:
> Hi Igor,
>
> On 12/13/23 08:54, Michael S. Tsirkin wrote:
> > On Wed, Dec 13, 2023 at 05:49:39PM +0100, Igor Mammedov wrote:
> >> On Wed, Dec 13, 2023 at 2:08 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >>>
> >>> On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
> >>>>
> >>>> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
> >>>> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
> >>>> bridge reconfiguration in case of single HBA hotplug.
> >>>> However in virt environment it's possible to pause machine hotplug several
> >>>> HBAs and let machine run. That can hit the same race when 2nd hotplugged
> >>>> HBA will start re-configuring bridge.
> >>>> Do the same thing as SHPC and throttle down hotplug of 2nd and up
> >>>> devices within single hotplug event.
> >>>>
> >>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >>>> ---
> >>>> drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
> >>>> 1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> >>>> index 6b11609927d6..30bca2086b24 100644
> >>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
> >>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> >>>> @@ -37,6 +37,7 @@
> >>>> #include <linux/mutex.h>
> >>>> #include <linux/slab.h>
> >>>> #include <linux/acpi.h>
> >>>> +#include <linux/delay.h>
> >>>>
> >>>> #include "../pci.h"
> >>>> #include "acpiphp.h"
> >>>> @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
> >>>> static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> >>>> {
> >>>> struct acpiphp_slot *slot;
> >>>> + int nr_hp_slots = 0;
> >>>>
> >>>> /* Bail out if the bridge is going away. */
> >>>> if (bridge->is_going_away)
> >>>> @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> >>>>
> >>>> /* configure all functions */
> >>>> if (slot->flags != SLOT_ENABLED) {
> >>>> + if (nr_hp_slots)
> >>>> + msleep(1000);
> >>>
> >>> Why is 1000 considered the most suitable number here? Any chance to
> >>> define a symbol for it?
> >>
> >> Timeout was borrowed from SHPC hotplug workflow where it apparently
> >> makes race harder to reproduce.
> >> (though it's not excuse to add more timeouts elsewhere)
> >>
> >>> And won't this affect the cases when the race in question is not a concern?
> >>
> >> In practice it's not likely, since even in virt scenario hypervisor won't
> >> stop VM to hotplug device (which beats whole purpose of hotplug).
> >>
> >> But in case of a very slow VM (overcommit case) it's possible for
> >> several HBA's to be hotplugged by the time acpiphp gets a chance
> >> to handle the 1st hotplug event. SHPC is more or less 'safe' with its
> >> 1sec delay.
> >>
> >>> Also, adding arbitrary timeouts is not the most robust way of
> >>> addressing race conditions IMV. Wouldn't it be better to add some
> >>> proper synchronization between the pieces of code that can race with
> >>> each other?
> >>
> >> I don't like it either, it's a stop gap measure to hide regression on
> >> short notice,
> >> which I can fixup without much risk in short time left, before folks
> >> leave on holidays.
> >> It's fine to drop the patch as chances of this happening are small.
> >> [1/2] should cover reported cases.
> >>
> >> Since it's RFC, I basically ask for opinions on a proper way to fix
> >> SCSI_ASYNC_SCAN
> >> running wild while the hotplug is in progress (and maybe SCSI is not
> >> the only user that
> >> schedules async job from device probe).
> >
> > Of course not. And things don't have to be scheduled from probe right?
> > Can be triggered by an interrupt or userspace activity.
>
> I agree with Michael. TBH, I am curious if the two patches can
> workaround/resolve the issue.
>
> Would you mind helping explain if to run enable_slot() for a new PCI device can
> impact the other PCI devices existing on the bridge?
>
> E.g.,:
>
> 1. Attach several virtio-scsi or virtio-net on the same bridge.
>
> 2. Trigger workload for those PCI devices. They may do mmio write to kick the
> doorbell (to trigger KVM/QEMU ioeventfd) very frequently.
>
> 3. Now hot-add an extra PCI device. Since the slot is never enabled, it enables
> the slot via enable_slot().
>
> Can I assume the last enable_slot() will temporarily re-configure the bridge
> window so that all other PCI devices' mmio will lose effect at that time point?
That's likely what would happen.
The same issue should apply to native PCIe and SHPC hotplug, as they also use
pci_assign_unassigned_bridge_resources().
Perhaps drivers have to be taught that PCI tree is being reconfigured or some
another approach can be used to deal with it.
Do you have any ideas?
I'm comparing with Windows guest, which manages to reconfigure PCI hierarchy
on the fly. (though I haven't tested that under heavy load with several
devices on a bridge).
> Since drivers always kick the doorbell conditionally, they may hang forever.
>
> As I have reported, we used to have the similar issue.
>
> PCI: Probe bridge window attributes once at enumeration-time
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=51c48b310183ab6ba5419edfc6a8de889cc04521
>
>
> Therefore, can I assume the issue is not because to re-enable an already-enabled
> slot, but to touch the bridge window for more than once?
>
> Thank you very much!
>
> Dongli Zhang
>
> >
> >> So adding synchronisation and testing
> >> would take time (not something I'd do this late in the cycle).
> >>
> >> So far I'm thinking about adding rw mutex to bridge with the PCI
> >> hotplug subsystem
> >> being a writer while scsi scan jobs would be readers and wait till hotplug code
> >> says it's safe to proceed.
> >> I plan to work in this direction and give it some testing, unless
> >> someone has a better idea.
> >
> >>>
> >>>> +
> >>>> + ++nr_hp_slots;
> >>>> enable_slot(slot, true);
> >>>> }
> >>>> } else {
> >>>> --
> >>>
> >
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
2024-01-03 9:54 ` Igor Mammedov
@ 2024-01-03 16:19 ` Dongli Zhang
0 siblings, 0 replies; 23+ messages in thread
From: Dongli Zhang @ 2024-01-03 16:19 UTC (permalink / raw)
To: Igor Mammedov
Cc: Michael S. Tsirkin, Rafael J. Wysocki, linux-kernel, linux-acpi,
linux-pci, lenb, bhelgaas, mika.westerberg, boris.ostrovsky,
joe.jin, stable, Fiona Ebner, Thomas Lamprecht
Hi Igor,
On 1/3/24 01:54, Igor Mammedov wrote:
> On Wed, 13 Dec 2023 09:09:18 -0800
> Dongli Zhang <dongli.zhang@oracle.com> wrote:
>
>> Hi Igor,
>>
>> On 12/13/23 08:54, Michael S. Tsirkin wrote:
>>> On Wed, Dec 13, 2023 at 05:49:39PM +0100, Igor Mammedov wrote:
>>>> On Wed, Dec 13, 2023 at 2:08 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>>>>
>>>>> On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
>>>>>>
>>>>>> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
>>>>>> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
>>>>>> bridge reconfiguration in case of single HBA hotplug.
>>>>>> However in virt environment it's possible to pause machine hotplug several
>>>>>> HBAs and let machine run. That can hit the same race when 2nd hotplugged
>>>>>> HBA will start re-configuring bridge.
>>>>>> Do the same thing as SHPC and throttle down hotplug of 2nd and up
>>>>>> devices within single hotplug event.
>>>>>>
>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>>>>> ---
>>>>>> drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
>>>>>> 1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
>>>>>> index 6b11609927d6..30bca2086b24 100644
>>>>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>>>>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>>>>>> @@ -37,6 +37,7 @@
>>>>>> #include <linux/mutex.h>
>>>>>> #include <linux/slab.h>
>>>>>> #include <linux/acpi.h>
>>>>>> +#include <linux/delay.h>
>>>>>>
>>>>>> #include "../pci.h"
>>>>>> #include "acpiphp.h"
>>>>>> @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
>>>>>> static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>>>>>> {
>>>>>> struct acpiphp_slot *slot;
>>>>>> + int nr_hp_slots = 0;
>>>>>>
>>>>>> /* Bail out if the bridge is going away. */
>>>>>> if (bridge->is_going_away)
>>>>>> @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>>>>>>
>>>>>> /* configure all functions */
>>>>>> if (slot->flags != SLOT_ENABLED) {
>>>>>> + if (nr_hp_slots)
>>>>>> + msleep(1000);
>>>>>
>>>>> Why is 1000 considered the most suitable number here? Any chance to
>>>>> define a symbol for it?
>>>>
>>>> Timeout was borrowed from SHPC hotplug workflow where it apparently
>>>> makes race harder to reproduce.
>>>> (though it's not excuse to add more timeouts elsewhere)
>>>>
>>>>> And won't this affect the cases when the race in question is not a concern?
>>>>
>>>> In practice it's not likely, since even in virt scenario hypervisor won't
>>>> stop VM to hotplug device (which beats whole purpose of hotplug).
>>>>
>>>> But in case of a very slow VM (overcommit case) it's possible for
>>>> several HBA's to be hotplugged by the time acpiphp gets a chance
>>>> to handle the 1st hotplug event. SHPC is more or less 'safe' with its
>>>> 1sec delay.
>>>>
>>>>> Also, adding arbitrary timeouts is not the most robust way of
>>>>> addressing race conditions IMV. Wouldn't it be better to add some
>>>>> proper synchronization between the pieces of code that can race with
>>>>> each other?
>>>>
>>>> I don't like it either, it's a stop gap measure to hide regression on
>>>> short notice,
>>>> which I can fixup without much risk in short time left, before folks
>>>> leave on holidays.
>>>> It's fine to drop the patch as chances of this happening are small.
>>>> [1/2] should cover reported cases.
>>>>
>>>> Since it's RFC, I basically ask for opinions on a proper way to fix
>>>> SCSI_ASYNC_SCAN
>>>> running wild while the hotplug is in progress (and maybe SCSI is not
>>>> the only user that
>>>> schedules async job from device probe).
>>>
>>> Of course not. And things don't have to be scheduled from probe right?
>>> Can be triggered by an interrupt or userspace activity.
>>
>> I agree with Michael. TBH, I am curious if the two patches can
>> workaround/resolve the issue.
>>
>> Would you mind helping explain if to run enable_slot() for a new PCI device can
>> impact the other PCI devices existing on the bridge?
>>
>> E.g.,:
>>
>> 1. Attach several virtio-scsi or virtio-net on the same bridge.
>>
>> 2. Trigger workload for those PCI devices. They may do mmio write to kick the
>> doorbell (to trigger KVM/QEMU ioeventfd) very frequently.
>>
>> 3. Now hot-add an extra PCI device. Since the slot is never enabled, it enables
>> the slot via enable_slot().
>>
>> Can I assume the last enable_slot() will temporarily re-configure the bridge
>> window so that all other PCI devices' mmio will lose effect at that time point?
>
> That's likely what would happen.
> The same issue should apply to native PCIe and SHPC hotplug, as they also use
> pci_assign_unassigned_bridge_resources().
>
> Perhaps drivers have to be taught that PCI tree is being reconfigured or some
> another approach can be used to deal with it.
> Do you have any ideas?
This is not limited to the kernel space. The kernel space may remap/expose the
mmio region to the userspace. The userspace program may directly interacts with
the PCI device as well (DPDK?).
How about stop machine mechanism if we need to touch the PCI bridge window, to
guarantee no CPU is actively accessing the mmio?
Thank you very much!
Dongli Zhang
>
> I'm comparing with Windows guest, which manages to reconfigure PCI hierarchy
> on the fly. (though I haven't tested that under heavy load with several
> devices on a bridge).
>
>> Since drivers always kick the doorbell conditionally, they may hang forever.
>>
>> As I have reported, we used to have the similar issue.
>>
>> PCI: Probe bridge window attributes once at enumeration-time
>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=51c48b310183ab6ba5419edfc6a8de889cc04521__;!!ACWV5N9M2RV99hQ!KCs-3SYiAH9t7yzAmSJlDD-YJ7bo0z3Syg9VYF8JJ3JTkYeoSINQD4Tx_7NpDxWeL04FF5lLWlQHrZMsDkoY$
>>
>>
>> Therefore, can I assume the issue is not because to re-enable an already-enabled
>> slot, but to touch the bridge window for more than once?
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>>>
>>>> So adding synchronisation and testing
>>>> would take time (not something I'd do this late in the cycle).
>>>>
>>>> So far I'm thinking about adding rw mutex to bridge with the PCI
>>>> hotplug subsystem
>>>> being a writer while scsi scan jobs would be readers and wait till hotplug code
>>>> says it's safe to proceed.
>>>> I plan to work in this direction and give it some testing, unless
>>>> someone has a better idea.
>>>
>>>>>
>>>>>> +
>>>>>> + ++nr_hp_slots;
>>>>>> enable_slot(slot, true);
>>>>>> }
>>>>>> } else {
>>>>>> --
>>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2024-01-03 16:20 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-13 0:36 [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Igor Mammedov
2023-12-13 0:36 ` [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already Igor Mammedov
2023-12-13 0:37 ` kernel test robot
2023-12-13 9:47 ` Fiona Ebner
2023-12-13 10:07 ` Igor Mammedov
2023-12-13 13:01 ` Rafael J. Wysocki
2023-12-13 16:06 ` Igor Mammedov
2023-12-13 0:36 ` [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time Igor Mammedov
2023-12-13 7:26 ` Greg KH
2023-12-13 8:13 ` Dongli Zhang
2023-12-13 10:05 ` Igor Mammedov
2023-12-13 17:25 ` Dongli Zhang
2023-12-13 9:47 ` Fiona Ebner
2023-12-13 13:07 ` Rafael J. Wysocki
2023-12-13 16:49 ` Igor Mammedov
2023-12-13 16:54 ` Michael S. Tsirkin
2023-12-13 17:09 ` Dongli Zhang
2024-01-03 9:54 ` Igor Mammedov
2024-01-03 16:19 ` Dongli Zhang
2023-12-13 18:50 ` Igor Mammedov
2023-12-13 8:12 ` [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Dongli Zhang
2023-12-13 18:11 ` Bjorn Helgaas
2023-12-13 18:12 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox