From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"helgaas@kernel.org" <helgaas@kernel.org>,
"anatoli.antonovitch@amd.com" <anatoli.antonovitch@amd.com>,
"Kumar1, Rahul" <Rahul.Kumar1@amd.com>,
"Alexander.Deucher@amd.com" <Alexander.Deucher@amd.com>
Subject: Re: Question about deadlock between AER and pceihp interrupts during resume from S3 with unplugged device
Date: Fri, 10 Jun 2022 17:25:57 -0400 [thread overview]
Message-ID: <952f49bc-81f9-68d3-89a7-b89ea173f6df@amd.com> (raw)
In-Reply-To: <f3645499-f9ce-4625-60c7-a4a75384870f@amd.com>
On 2022-02-10 09:39, Andrey Grodzovsky wrote:
> Thanks a lot for quick response, we will give this a try.
>
> Andrey
>
> On 2022-02-10 01:23, Lukas Wunner wrote:
>> On Wed, Feb 09, 2022 at 02:54:06PM -0500, Andrey Grodzovsky wrote:
>>> Hi, on kernel based on 5.4.2 we are observing a deadlock between
>>> reset_lock semaphore and device_lock (dev->mutex). The scenario
>>> we do is putting the system to sleep, disconnecting the eGPU
>>> from the PCIe bus (through a special SBIOS setting) or by simply
>>> removing power to external PCIe cage and waking the
>>> system up.
>>>
>>> I attached the log. Please advise if you have any idea how
>>> to work around it ? Since the kernel is old, does anyone
>>> have an idea if this issue is known and already solved in later
>>> kernels ?
>>> We cannot try with latest since our kernel is custom for that platform.
>>
>> It is a known issue. Here's a fix I submitted during the v5.9 cycle:
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-pci%2F908047f7699d9de9ec2efd6b79aa752d73dab4b6.1595329748.git.lukas%40wunner.de%2F&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cba698967471548d739c108d9ec5dcf6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637800710411446272%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=hrRVL77%2FNRvojfG2WDamDLO5dsqn3Cv6XxNbP0eGum0%3D&reserved=0
>>
>>
>> The fix hasn't been applied yet. I think I need to rework the patch,
>> just haven't found the time.
Hey Lucas - just checking again if you had a chance to push this change
through ? It's essential to us in one of our costumer projects so we
wonder if have any estimate when will it be up-streamed and if we can
help with this. We would also need backporting this back to 5.11 and 5.4
kernels after it's upstreamed.
Another point I want to mention is that this patch has a negative
side effect on plug back times - it causes a regression point for the
delay to light-up display at resume time related to back-ported AER
Anatoli is working on resolving this and so maybe he can add his
comment here and maybe you can help him with proper resolution for this.
Andrey
>>
>> Since the trigger in your case are AER-handled errors during a
>> system sleep transition, you may also want to consider the
>> following 2-patch series by Kai-Heng Feng which is currently
>> under discussion:
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-pci%2F20220127025418.1989642-1-kai.heng.feng%40canonical.com%2F&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cba698967471548d739c108d9ec5dcf6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637800710411446272%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=tnLUa6J%2FLqFrlm4CfZ9l26io0bOQ7ip30d26ax05st4%3D&reserved=0
>>
>>
>> That series disables AER during a system sleep transition and
>> should thus prevent the flood of AER-handled errors you're seeing.
>> Once AER is disabled, the reset-induced deadlocks should go away as well.
>>
>> Thanks,
>>
>> Lukas
next prev parent reply other threads:[~2022-06-10 21:26 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <0fc31d9a-f414-a412-3765-5519cbb9b7ff@amd.com>
2022-02-09 21:28 ` Question about deadlock between AER and pceihp interrupts during resume from S3 with unplugged device Andrey Grodzovsky
2022-02-10 6:23 ` Lukas Wunner
2022-02-10 14:39 ` Andrey Grodzovsky
2022-06-10 21:25 ` Andrey Grodzovsky [this message]
2022-06-14 18:07 ` Andrey Grodzovsky
2022-06-14 18:22 ` Sathyanarayanan Kuppuswamy
2022-06-14 20:35 ` Andrey Grodzovsky
2022-06-15 15:14 ` Sathyanarayanan Kuppuswamy
2022-06-15 15:49 ` Andrey Grodzovsky
2022-02-10 20:47 ` Andrey Grodzovsky
2022-02-10 21:37 ` Lukas Wunner
2022-02-10 23:12 ` Andrey Grodzovsky
2022-02-11 14:42 ` Kumar1, Rahul
2022-02-15 7:02 ` Lukas Wunner
2022-02-15 8:18 ` Kumar1, Rahul
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=952f49bc-81f9-68d3-89a7-b89ea173f6df@amd.com \
--to=andrey.grodzovsky@amd.com \
--cc=Alexander.Deucher@amd.com \
--cc=Rahul.Kumar1@amd.com \
--cc=anatoli.antonovitch@amd.com \
--cc=helgaas@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).