Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Dongli Zhang <dongli.zhang@oracle.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org, lenb@kernel.org, bhelgaas@google.com,
	mika.westerberg@linux.intel.com, boris.ostrovsky@oracle.com,
	joe.jin@oracle.com, stable@vger.kernel.org,
	Fiona Ebner <f.ebner@proxmox.com>,
	Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time
Date: Wed, 13 Dec 2023 09:09:18 -0800	[thread overview]
Message-ID: <a8db0ed6-05f4-7c2d-c63e-5f2976d25a45@oracle.com> (raw)
In-Reply-To: <20231213115248-mutt-send-email-mst@kernel.org>

Hi Igor,

On 12/13/23 08:54, Michael S. Tsirkin wrote:
> On Wed, Dec 13, 2023 at 05:49:39PM +0100, Igor Mammedov wrote:
>> On Wed, Dec 13, 2023 at 2:08 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>>
>>> On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
>>>>
>>>> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already"
>>>> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and
>>>> bridge reconfiguration in case of single HBA hotplug.
>>>> However in virt environment it's possible to pause machine hotplug several
>>>> HBAs and let machine run. That can hit the same race when 2nd hotplugged
>>>> HBA will start re-configuring bridge.
>>>> Do the same thing as SHPC and throttle down hotplug of 2nd and up
>>>> devices within single hotplug event.
>>>>
>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>>> ---
>>>>  drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
>>>> index 6b11609927d6..30bca2086b24 100644
>>>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>>>> @@ -37,6 +37,7 @@
>>>>  #include <linux/mutex.h>
>>>>  #include <linux/slab.h>
>>>>  #include <linux/acpi.h>
>>>> +#include <linux/delay.h>
>>>>
>>>>  #include "../pci.h"
>>>>  #include "acpiphp.h"
>>>> @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev)
>>>>  static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>>>>  {
>>>>         struct acpiphp_slot *slot;
>>>> +        int nr_hp_slots = 0;
>>>>
>>>>         /* Bail out if the bridge is going away. */
>>>>         if (bridge->is_going_away)
>>>> @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
>>>>
>>>>                         /* configure all functions */
>>>>                         if (slot->flags != SLOT_ENABLED) {
>>>> +                               if (nr_hp_slots)
>>>> +                                       msleep(1000);
>>>
>>> Why is 1000 considered the most suitable number here?  Any chance to
>>> define a symbol for it?
>>
>> Timeout was borrowed from SHPC hotplug workflow where it apparently
>> makes race harder to reproduce.
>> (though it's not excuse to add more timeouts elsewhere)
>>
>>> And won't this affect the cases when the race in question is not a concern?
>>
>> In practice it's not likely, since even in virt scenario hypervisor won't
>> stop VM to hotplug device (which beats whole purpose of hotplug).
>>
>> But in case of a very slow VM (overcommit case) it's possible for
>> several HBA's to be hotplugged by the time acpiphp gets a chance
>> to handle the 1st hotplug event. SHPC is more or less 'safe' with its
>> 1sec delay.
>>
>>> Also, adding arbitrary timeouts is not the most robust way of
>>> addressing race conditions IMV.  Wouldn't it be better to add some
>>> proper synchronization between the pieces of code that can race with
>>> each other?
>>
>> I don't like it either, it's a stop gap measure to hide regression on
>> short notice,
>> which I can fixup without much risk in short time left, before folks
>> leave on holidays.
>> It's fine to drop the patch as chances of this happening are small.
>> [1/2] should cover reported cases.
>>
>> Since it's RFC, I basically ask for opinions on a proper way to fix
>> SCSI_ASYNC_SCAN
>> running wild while the hotplug is in progress (and maybe SCSI is not
>> the only user that
>> schedules async job from device probe).
> 
> Of course not. And things don't have to be scheduled from probe right?
> Can be triggered by an interrupt or userspace activity.

I agree with Michael. TBH, I am curious if the two patches can
workaround/resolve the issue.

Would you mind helping explain if to run enable_slot() for a new PCI device can
impact the other PCI devices existing on the bridge?

E.g.,:

1. Attach several virtio-scsi or virtio-net on the same bridge.

2. Trigger workload for those PCI devices. They may do mmio write to kick the
doorbell (to trigger KVM/QEMU ioeventfd) very frequently.

3. Now hot-add an extra PCI device. Since the slot is never enabled, it enables
the slot via enable_slot().

Can I assume the last enable_slot() will temporarily re-configure the bridge
window so that all other PCI devices' mmio will lose effect at that time point?

Since drivers always kick the doorbell conditionally, they may hang forever.

As I have reported, we used to have the similar issue.

PCI: Probe bridge window attributes once at enumeration-time
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=51c48b310183ab6ba5419edfc6a8de889cc04521


Therefore, can I assume the issue is not because to re-enable an already-enabled
slot, but to touch the bridge window for more than once?

Thank you very much!

Dongli Zhang

> 
>> So adding synchronisation and testing
>> would take time (not something I'd do this late in the cycle).
>>
>> So far I'm thinking about adding rw mutex to bridge with the PCI
>> hotplug subsystem
>> being a writer while scsi scan jobs would be readers and wait till hotplug code
>> says it's safe to proceed.
>> I plan to work in this direction and give it some testing, unless
>> someone has a better idea.
> 
>>>
>>>> +
>>>> +                                ++nr_hp_slots;
>>>>                                 enable_slot(slot, true);
>>>>                         }
>>>>                 } else {
>>>> --
>>>
> 

  reply	other threads:[~2023-12-13 17:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-13  0:36 [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Igor Mammedov
2023-12-13  0:36 ` [RFC 1/2] PCI: acpiphp: enable slot only if it hasn't been enabled already Igor Mammedov
2023-12-13  9:47   ` Fiona Ebner
2023-12-13 10:07     ` Igor Mammedov
2023-12-13 13:01   ` Rafael J. Wysocki
2023-12-13 16:06     ` Igor Mammedov
2023-12-13  0:36 ` [RFC 2/2] PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a time Igor Mammedov
2023-12-13  7:26   ` Greg KH
2023-12-13  8:13   ` Dongli Zhang
2023-12-13 10:05     ` Igor Mammedov
2023-12-13 17:25       ` Dongli Zhang
2023-12-13  9:47   ` Fiona Ebner
2023-12-13 13:07   ` Rafael J. Wysocki
2023-12-13 16:49     ` Igor Mammedov
2023-12-13 16:54       ` Michael S. Tsirkin
2023-12-13 17:09         ` Dongli Zhang [this message]
2024-01-03  9:54           ` Igor Mammedov
2024-01-03 16:19             ` Dongli Zhang
2023-12-13 18:50         ` Igor Mammedov
2023-12-13  8:12 ` [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job Dongli Zhang
2023-12-13 18:11 ` Bjorn Helgaas
2023-12-13 18:12   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8db0ed6-05f4-7c2d-c63e-5f2976d25a45@oracle.com \
    --to=dongli.zhang@oracle.com \
    --cc=bhelgaas@google.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=f.ebner@proxmox.com \
    --cc=imammedo@redhat.com \
    --cc=joe.jin@oracle.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mst@redhat.com \
    --cc=rafael@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox