From: Igor Mammedov <imammedo@redhat.com>
To: Dongli Zhang <dongli.zhang@oracle.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
linux-pci@vger.kernel.org, lenb@kernel.org, bhelgaas@google.com,
mika.westerberg@linux.intel.com, boris.ostrovsky@oracle.com,
joe.jin@oracle.com, stable@vger.kernel.org,
Fiona Ebner <f.ebner@proxmox.com>,
Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
Date: Wed, 3 Jan 2024 10:54:58 +0100 [thread overview]
Message-ID: <20240103105458.1f548f33@imammedo.users.ipa.redhat.com> (raw)
In-Reply-To: <a8db0ed6-05f4-7c2d-c63e-5f2976d25a45@oracle.com>
On Wed, 13 Dec 2023 09:09:18 -0800
Dongli Zhang <dongli.zhang@oracle.com> wrote:
> Hi Igor,
>
> On 12/13/23 08:54, Michael S. Tsirkin wrote:
> > On Wed, Dec 13, 2023 at 05:49:39PM +0100, Igor Mammedov wrote:
> >> On Wed, Dec 13, 2023 at 2:08 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >>>
> >>> On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
> >>>>
> >>>> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
> >>>> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
> >>>> bridge reconfiguration in case of single HBA hotplug.
> >>>> However in virt environment it's possible to pause machine hotplug several
> >>>> HBAs and let machine run. That can hit the same race when 2nd hotplugged
> >>>> HBA will start re-configuring bridge.
> >>>> Do the same thing as SHPC and throttle down hotplug of 2nd and up
> >>>> devices within single hotplug event.
> >>>>
> >>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >>>> ---
> >>>> drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
> >>>> 1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> >>>> index 6b11609927d6..30bca2086b24 100644
> >>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
> >>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> >>>> @@ -37,6 +37,7 @@
> >>>> #include <linux/mutex.h>
> >>>> #include <linux/slab.h>
> >>>> #include <linux/acpi.h>
> >>>> +#include <linux/delay.h>
> >>>>
> >>>> #include "../pci.h"
> >>>> #include "acpiphp.h"
> >>>> @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
> >>>> static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> >>>> {
> >>>> struct acpiphp_slot *slot;
> >>>> + int nr_hp_slots = 0;
> >>>>
> >>>> /* Bail out if the bridge is going away. */
> >>>> if (bridge->is_going_away)
> >>>> @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> >>>>
> >>>> /* configure all functions */
> >>>> if (slot->flags != SLOT_ENABLED) {
> >>>> + if (nr_hp_slots)
> >>>> + msleep(1000);
> >>>
> >>> Why is 1000 considered the most suitable number here? Any chance to
> >>> define a symbol for it?
> >>
> >> Timeout was borrowed from SHPC hotplug workflow where it apparently
> >> makes race harder to reproduce.
> >> (though it's not excuse to add more timeouts elsewhere)
> >>
> >>> And won't this affect the cases when the race in question is not a concern?
> >>
> >> In practice it's not likely, since even in virt scenario hypervisor won't
> >> stop VM to hotplug device (which beats whole purpose of hotplug).
> >>
> >> But in case of a very slow VM (overcommit case) it's possible for
> >> several HBA's to be hotplugged by the time acpiphp gets a chance
> >> to handle the 1st hotplug event. SHPC is more or less 'safe' with its
> >> 1sec delay.
> >>
> >>> Also, adding arbitrary timeouts is not the most robust way of
> >>> addressing race conditions IMV. Wouldn't it be better to add some
> >>> proper synchronization between the pieces of code that can race with
> >>> each other?
> >>
> >> I don't like it either, it's a stop gap measure to hide regression on
> >> short notice,
> >> which I can fixup without much risk in short time left, before folks
> >> leave on holidays.
> >> It's fine to drop the patch as chances of this happening are small.
> >> [1/2] should cover reported cases.
> >>
> >> Since it's RFC, I basically ask for opinions on a proper way to fix
> >> SCSI_ASYNC_SCAN
> >> running wild while the hotplug is in progress (and maybe SCSI is not
> >> the only user that
> >> schedules async job from device probe).
> >
> > Of course not. And things don't have to be scheduled from probe right?
> > Can be triggered by an interrupt or userspace activity.
>
> I agree with Michael. TBH, I am curious if the two patches can
> workaround/resolve the issue.
>
> Would you mind helping explain if to run enable_slot() for a new PCI device can
> impact the other PCI devices existing on the bridge?
>
> E.g.,:
>
> 1. Attach several virtio-scsi or virtio-net on the same bridge.
>
> 2. Trigger workload for those PCI devices. They may do mmio write to kick the
> doorbell (to trigger KVM/QEMU ioeventfd) very frequently.
>
> 3. Now hot-add an extra PCI device. Since the slot is never enabled, it enables
> the slot via enable_slot().
>
> Can I assume the last enable_slot() will temporarily re-configure the bridge
> window so that all other PCI devices' mmio will lose effect at that time point?
That's likely what would happen.
The same issue should apply to native PCIe and SHPC hotplug, as they also use
pci_assign_unassigned_bridge_resources().
Perhaps drivers have to be taught that PCI tree is being reconfigured or some
another approach can be used to deal with it.
Do you have any ideas?
I'm comparing with Windows guest, which manages to reconfigure PCI hierarchy
on the fly. (though I haven't tested that under heavy load with several
devices on a bridge).
> Since drivers always kick the doorbell conditionally, they may hang forever.
>
> As I have reported, we used to have the similar issue.
>
> PCI: Probe bridge window attributes once at enumeration-time
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=51c48b310183ab6ba5419edfc6a8de889cc04521
>
>
> Therefore, can I assume the issue is not because to re-enable an already-enabled
> slot, but to touch the bridge window for more than once?
>
> Thank you very much!
>
> Dongli Zhang
>
> >
> >> So adding synchronisation and testing
> >> would take time (not something I'd do this late in the cycle).
> >>
> >> So far I'm thinking about adding rw mutex to bridge with the PCI
> >> hotplug subsystem
> >> being a writer while scsi scan jobs would be readers and wait till hotplug code
> >> says it's safe to proceed.
> >> I plan to work in this direction and give it some testing, unless
> >> someone has a better idea.
> >
> >>>
> >>>> +
> >>>> + ++nr_hp_slots;
> >>>> enable_slot(slot, true);
> >>>> }
> >>>> } else {
> >>>> --
> >>>
> >
>
next prev parent reply other threads:[~2024-01-03 9:55 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-13 0:36 [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Igor Mammedov
2023-12-13 0:36 ` [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already Igor Mammedov
2023-12-13 0:37 ` kernel test robot
2023-12-13 9:47 ` Fiona Ebner
2023-12-13 10:07 ` Igor Mammedov
2023-12-13 13:01 ` Rafael J. Wysocki
2023-12-13 16:06 ` Igor Mammedov
2023-12-13 0:36 ` [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time Igor Mammedov
2023-12-13 7:26 ` Greg KH
2023-12-13 8:13 ` Dongli Zhang
2023-12-13 10:05 ` Igor Mammedov
2023-12-13 17:25 ` Dongli Zhang
2023-12-13 9:47 ` Fiona Ebner
2023-12-13 13:07 ` Rafael J. Wysocki
2023-12-13 16:49 ` Igor Mammedov
2023-12-13 16:54 ` Michael S. Tsirkin
2023-12-13 17:09 ` Dongli Zhang
2024-01-03 9:54 ` Igor Mammedov [this message]
2024-01-03 16:19 ` Dongli Zhang
2023-12-13 18:50 ` Igor Mammedov
2023-12-13 8:12 ` [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Dongli Zhang
2023-12-13 18:11 ` Bjorn Helgaas
2023-12-13 18:12 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240103105458.1f548f33@imammedo.users.ipa.redhat.com \
--to=imammedo@redhat.com \
--cc=bhelgaas@google.com \
--cc=boris.ostrovsky@oracle.com \
--cc=dongli.zhang@oracle.com \
--cc=f.ebner@proxmox.com \
--cc=joe.jin@oracle.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mika.westerberg@linux.intel.com \
--cc=mst@redhat.com \
--cc=rafael@kernel.org \
--cc=stable@vger.kernel.org \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox