* Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8 [not found] <wL3vtEh_zTQSCqS6d5YCJReErDDy_dw-dW5L9TSpp9VFDVHfkSN8lNo8i1ZVUD9NU-eIvF2M84nhfdt2O7spGu2Nv5-oz9FLohYO7SuJzWQ=@micha.zone> @ 2024-05-05 4:59 ` Linux regression tracking (Thorsten Leemhuis) 2024-05-05 12:37 ` Mario Limonciello 0 siblings, 1 reply; 6+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2024-05-05 4:59 UTC (permalink / raw) To: Micha Albert Cc: regressions@lists.linux.dev, stable@vger.kernel.org, linux-kernel@vger.kernel.org, Mario Limonciello [CCing Mario, who asked for the two suspected commits to be backported] On 05.05.24 03:12, Micha Albert wrote: > > I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board. > In 6.8.7, this works as expected, and my Plymouth screen (including the > LUKS password prompt) shows on my 2 monitors connected to the GPU as > well as my main laptop screen. Upon entering the password, I'm put into > userspace as expected. However, upon upgrading to 6.8.8, I will be > greeted with the regular password prompt, but after entering my password > and waiting for it to be accepted, my eGPU will reset and not function. > I can tell that it resets since I can hear the click of my ATX power > supply turning off and on again, and the status LED of the eGPU board > goes from green to blue and back to green, all in less than a second. > > I talked to a friend, and we found out that the kernel parameter > thunderbolt.host_reset=false fixes the issue. He also thinks that > commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look > suspicious. I've attached the output of dmesg when the error was > occurring, since I'm still able to use my laptop normally when this > happens, just not with my eGPU and its connected displays. Thx for the report. Could you please test if 6.9-rc6 (or a later snapshot; or -rc7, which should be out in about ~18 hours) is affected as well? That would be really important to know. It would also be great if you could try reverting the two patches you mentioned and see if they are really what's causing this. There iirc are two more; maybe you might need to revert some or all of them in the order they were applied. Ciao, Thorsten P.s.: To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced v6.8.7..v6.8.8 #regzbot title thunderbolt: eGPU disconnected during boot ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8 2024-05-05 4:59 ` [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8 Linux regression tracking (Thorsten Leemhuis) @ 2024-05-05 12:37 ` Mario Limonciello 2024-05-05 14:17 ` Mario Limonciello 0 siblings, 1 reply; 6+ messages in thread From: Mario Limonciello @ 2024-05-05 12:37 UTC (permalink / raw) To: Linux regressions mailing list, Micha Albert Cc: stable@vger.kernel.org, linux-kernel@vger.kernel.org, Mario Limonciello On 5/4/24 23:59, Linux regression tracking (Thorsten Leemhuis) wrote: > [CCing Mario, who asked for the two suspected commits to be backported] > > On 05.05.24 03:12, Micha Albert wrote: >> >> I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board. >> In 6.8.7, this works as expected, and my Plymouth screen (including the >> LUKS password prompt) shows on my 2 monitors connected to the GPU as >> well as my main laptop screen. Upon entering the password, I'm put into >> userspace as expected. However, upon upgrading to 6.8.8, I will be >> greeted with the regular password prompt, but after entering my password >> and waiting for it to be accepted, my eGPU will reset and not function. >> I can tell that it resets since I can hear the click of my ATX power >> supply turning off and on again, and the status LED of the eGPU board >> goes from green to blue and back to green, all in less than a second. >> >> I talked to a friend, and we found out that the kernel parameter >> thunderbolt.host_reset=false fixes the issue. He also thinks that >> commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look >> suspicious. I've attached the output of dmesg when the error was >> occurring, since I'm still able to use my laptop normally when this >> happens, just not with my eGPU and its connected displays. > > Thx for the report. Could you please test if 6.9-rc6 (or a later > snapshot; or -rc7, which should be out in about ~18 hours) is affected > as well? That would be really important to know. > > It would also be great if you could try reverting the two patches you > mentioned and see if they are really what's causing this. There iirc are > two more; maybe you might need to revert some or all of them in the > order they were applied. There are two other things that I think would be good to understand this issue. 1) Is it related to trusted devices handling? You can try to apply it both to 6.8.y or to 6.9-rc. https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=iommu/fixes&id=0f91d0795741c12cee200667648669a91b568735 2) Is it because you have amdgpu in your initramfs but not thunderbolt? If so; there's very likely an ordering issue. [ 2.325788] [drm] GPU posting now... [ 30.360701] ACPI: bus type thunderbolt registered Can you remove amdgpu from your initramfs and wait for it to startup after you pivot rootfs? Does this still happen? > > Ciao, Thorsten > > P.s.: To be sure the issue doesn't fall through the cracks unnoticed, > I'm adding it to regzbot, the Linux kernel regression tracking bot: > > #regzbot ^introduced v6.8.7..v6.8.8 > #regzbot title thunderbolt: eGPU disconnected during boot > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8 2024-05-05 12:37 ` Mario Limonciello @ 2024-05-05 14:17 ` Mario Limonciello 0 siblings, 0 replies; 6+ messages in thread From: Mario Limonciello @ 2024-05-05 14:17 UTC (permalink / raw) To: Mario Limonciello, Linux regressions mailing list, Micha Albert Cc: stable@vger.kernel.org, linux-kernel@vger.kernel.org On 5/5/2024 07:37, Mario Limonciello wrote: > > > On 5/4/24 23:59, Linux regression tracking (Thorsten Leemhuis) wrote: >> [CCing Mario, who asked for the two suspected commits to be backported] >> >> On 05.05.24 03:12, Micha Albert wrote: >>> >>> I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board. >>> In 6.8.7, this works as expected, and my Plymouth screen (including the >>> LUKS password prompt) shows on my 2 monitors connected to the GPU as >>> well as my main laptop screen. Upon entering the password, I'm put into >>> userspace as expected. However, upon upgrading to 6.8.8, I will be >>> greeted with the regular password prompt, but after entering my password >>> and waiting for it to be accepted, my eGPU will reset and not function. >>> I can tell that it resets since I can hear the click of my ATX power >>> supply turning off and on again, and the status LED of the eGPU board >>> goes from green to blue and back to green, all in less than a second. >>> >>> I talked to a friend, and we found out that the kernel parameter >>> thunderbolt.host_reset=false fixes the issue. He also thinks that >>> commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look >>> suspicious. I've attached the output of dmesg when the error was >>> occurring, since I'm still able to use my laptop normally when this >>> happens, just not with my eGPU and its connected displays. >> >> Thx for the report. Could you please test if 6.9-rc6 (or a later >> snapshot; or -rc7, which should be out in about ~18 hours) is affected >> as well? That would be really important to know. >> >> It would also be great if you could try reverting the two patches you >> mentioned and see if they are really what's causing this. There iirc are >> two more; maybe you might need to revert some or all of them in the >> order they were applied. > > There are two other things that I think would be good to understand this > issue. > > 1) Is it related to trusted devices handling? > > You can try to apply it both to 6.8.y or to 6.9-rc. > > https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=iommu/fixes&id=0f91d0795741c12cee200667648669a91b568735 > > 2) Is it because you have amdgpu in your initramfs but not thunderbolt? > > If so; there's very likely an ordering issue. > > [ 2.325788] [drm] GPU posting now... > [ 30.360701] ACPI: bus type thunderbolt registered > > Can you remove amdgpu from your initramfs and wait for it to startup > after you pivot rootfs? Does this still happen? > One more thought. When you say it's "not function", is it authorized in thunderbolt sysfs? See https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/thunderbolt.rst Is it showing up in lspci anymore? >> >> Ciao, Thorsten >> >> P.s.: To be sure the issue doesn't fall through the cracks unnoticed, >> I'm adding it to regzbot, the Linux kernel regression tracking bot: >> >> #regzbot ^introduced v6.8.7..v6.8.8 >> #regzbot title thunderbolt: eGPU disconnected during boot >> ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CAHe5sWavQcUTg2zTYaryRsMywSBgBgETG=R1jRexg4qDqwCfdw@mail.gmail.com>]
* Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8 [not found] <CAHe5sWavQcUTg2zTYaryRsMywSBgBgETG=R1jRexg4qDqwCfdw@mail.gmail.com> @ 2024-05-06 12:53 ` Linux regression tracking (Thorsten Leemhuis) 2024-05-20 9:19 ` Gia 0 siblings, 1 reply; 6+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2024-05-06 12:53 UTC (permalink / raw) To: Gia Cc: linux-kernel, regressions, stable@vger.kernel.org, kernel, Mario Limonciello [CCing Mario, who asked for the two suspected commits to be backported] On 06.05.24 14:24, Gia wrote: > Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit > TS3 Plus Thunderbolt 3 dock. > > After the update I see this message on boot "xHCI host controller not > responding, assume dead" and the dock is not working anymore. Kernel > 6.8.7 works great. Thx for the report. Could you make the kernel log (journalctl -k/dmesg) accessible somewhere? And have you looked into the other stuff that Mario suggested in the other thread? See the following mail and the reply to it for details: https://lore.kernel.org/all/1eb96465-0a81-4187-b8e7-607d85617d5f@gmail.com/T/#u Ciao, Thorsten P.S.: To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced v6.8.7..v6.8.8 #regzbot title thunderbolt: TB3 dock problems, xHCI host controller not responding, assume dead ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8 2024-05-06 12:53 ` Linux regression tracking (Thorsten Leemhuis) @ 2024-05-20 9:19 ` Gia 2024-05-20 13:43 ` Mario Limonciello 0 siblings, 1 reply; 6+ messages in thread From: Gia @ 2024-05-20 9:19 UTC (permalink / raw) To: Linux regressions mailing list Cc: linux-kernel, stable@vger.kernel.org, kernel, Mario Limonciello Hi Thorsten, I'll try to provide a kernel log ASAP, it's not that easy because when I run into this issue my keyboard isn't working. The kernel parameter that Mario suggested, thunderbolt.host_reset=false, fixes the issue! I can add that without the suggested kernel parameter the issue persists with the latest Archlinux kernel 6.9.1. I also found another report of the issue on Archlinux forum: https://bbs.archlinux.org/viewtopic.php?id=295824 On Mon, May 6, 2024 at 2:53 PM Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info> wrote: > > [CCing Mario, who asked for the two suspected commits to be backported] > > On 06.05.24 14:24, Gia wrote: > > Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit > > TS3 Plus Thunderbolt 3 dock. > > > > After the update I see this message on boot "xHCI host controller not > > responding, assume dead" and the dock is not working anymore. Kernel > > 6.8.7 works great. > > Thx for the report. Could you make the kernel log (journalctl -k/dmesg) > accessible somewhere? > > And have you looked into the other stuff that Mario suggested in the > other thread? See the following mail and the reply to it for details: > > https://lore.kernel.org/all/1eb96465-0a81-4187-b8e7-607d85617d5f@gmail.com/T/#u > > Ciao, Thorsten > > P.S.: To be sure the issue doesn't fall through the cracks unnoticed, > I'm adding it to regzbot, the Linux kernel regression tracking bot: > > #regzbot ^introduced v6.8.7..v6.8.8 > #regzbot title thunderbolt: TB3 dock problems, xHCI host controller not > responding, assume dead ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8 2024-05-20 9:19 ` Gia @ 2024-05-20 13:43 ` Mario Limonciello 0 siblings, 0 replies; 6+ messages in thread From: Mario Limonciello @ 2024-05-20 13:43 UTC (permalink / raw) To: Gia, Linux regressions mailing list Cc: linux-kernel, stable@vger.kernel.org, kernel Can we please get some kernel logs for these two cases on the command line? thunderbolt.dyndbg=+p thunderbolt.dyndbg=+p thunderbolt.host_reset=false Also what is the value for: $ cat /sys/bus/thunderbolt/devices/domain0/iommu_dma_protection That won't change in the two cases, but it will be really helpful to understand this issue. On 5/20/2024 04:19, Gia wrote: > Hi Thorsten, > > I'll try to provide a kernel log ASAP, it's not that easy because when > I run into this issue my keyboard isn't working. The kernel parameter > that Mario suggested, thunderbolt.host_reset=false, fixes the issue! > > I can add that without the suggested kernel parameter the issue > persists with the latest Archlinux kernel 6.9.1. > > I also found another report of the issue on Archlinux forum: > https://bbs.archlinux.org/viewtopic.php?id=295824 > > > On Mon, May 6, 2024 at 2:53 PM Linux regression tracking (Thorsten > Leemhuis) <regressions@leemhuis.info> wrote: >> >> [CCing Mario, who asked for the two suspected commits to be backported] >> >> On 06.05.24 14:24, Gia wrote: >>> Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit >>> TS3 Plus Thunderbolt 3 dock. >>> >>> After the update I see this message on boot "xHCI host controller not >>> responding, assume dead" and the dock is not working anymore. Kernel >>> 6.8.7 works great. >> >> Thx for the report. Could you make the kernel log (journalctl -k/dmesg) >> accessible somewhere? >> >> And have you looked into the other stuff that Mario suggested in the >> other thread? See the following mail and the reply to it for details: >> >> https://lore.kernel.org/all/1eb96465-0a81-4187-b8e7-607d85617d5f@gmail.com/T/#u >> >> Ciao, Thorsten >> >> P.S.: To be sure the issue doesn't fall through the cracks unnoticed, >> I'm adding it to regzbot, the Linux kernel regression tracking bot: >> >> #regzbot ^introduced v6.8.7..v6.8.8 >> #regzbot title thunderbolt: TB3 dock problems, xHCI host controller not >> responding, assume dead ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-05-20 13:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <wL3vtEh_zTQSCqS6d5YCJReErDDy_dw-dW5L9TSpp9VFDVHfkSN8lNo8i1ZVUD9NU-eIvF2M84nhfdt2O7spGu2Nv5-oz9FLohYO7SuJzWQ=@micha.zone>
2024-05-05 4:59 ` [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8 Linux regression tracking (Thorsten Leemhuis)
2024-05-05 12:37 ` Mario Limonciello
2024-05-05 14:17 ` Mario Limonciello
[not found] <CAHe5sWavQcUTg2zTYaryRsMywSBgBgETG=R1jRexg4qDqwCfdw@mail.gmail.com>
2024-05-06 12:53 ` Linux regression tracking (Thorsten Leemhuis)
2024-05-20 9:19 ` Gia
2024-05-20 13:43 ` Mario Limonciello
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox