* [PATCH v2 0/2] Fix Navi3x boot and hotplug problems
@ 2023-10-05 17:52 Mario Limonciello
2023-10-05 17:52 ` [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope Mario Limonciello
2023-10-05 17:52 ` [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0" Mario Limonciello
0 siblings, 2 replies; 8+ messages in thread
From: Mario Limonciello @ 2023-10-05 17:52 UTC (permalink / raw)
To: Heikki Krogerus, Greg Kroah-Hartman, Wolfram Sang,
Sebastian Reichel
Cc: Alex Deucher, linux-usb, linux-kernel, amd-gfx, Mario Limonciello
On some OEM systems multiple navi3x dGPUS are triggering RAS errors
and BACO errors.
These errors come from elements of the OEM system that weren't part of
original test environment. This series addresses those problems.
NOTE: Although this series touches two subsystems, I would prefer to
take this all through DRM because there is a workaround in
amd-staging-drm-next that I would like to be reverted at the same
time as picking up the fix.
v1->v2:
* Drop _PR3 patch from series, it was cherry picked and is on it's way
to 6.6-rcX already.
* Rather than changing global policy, fix the problematic power supply
driver.
v1: https://lore.kernel.org/linux-pm/20230926225955.386553-1-mario.limonciello@amd.com/
Mario Limonciello (2):
usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power
supply scope
Revert "drm/amd/pm: workaround for the wrong ac power detection on smu
13.0.0"
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 ++-
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 +
drivers/usb/typec/ucsi/psy.c | 10 ++++++++++
3 files changed, 13 insertions(+), 1 deletion(-)
--
2.34.1
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope 2023-10-05 17:52 [PATCH v2 0/2] Fix Navi3x boot and hotplug problems Mario Limonciello @ 2023-10-05 17:52 ` Mario Limonciello 2023-10-05 19:13 ` Greg Kroah-Hartman 2023-10-05 20:35 ` Sebastian Reichel 2023-10-05 17:52 ` [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0" Mario Limonciello 1 sibling, 2 replies; 8+ messages in thread From: Mario Limonciello @ 2023-10-05 17:52 UTC (permalink / raw) To: Heikki Krogerus, Greg Kroah-Hartman, Wolfram Sang, Sebastian Reichel Cc: Alex Deucher, linux-usb, linux-kernel, amd-gfx, Mario Limonciello, Kai-Heng Feng, Alex Deucher, Richard Gong On some OEM systems, adding a W7900 dGPU triggers RAS errors and hangs at a black screen on startup. This issue occurs only if `ucsi_acpi` has loaded before `amdgpu` has loaded. The reason for this failure is that `amdgpu` uses power_supply_is_system_supplied() to determine if running on AC or DC power at startup. If this value is reported incorrectly the dGPU will also be programmed incorrectly and trigger errors. power_supply_is_system_supplied() reports the wrong value because UCSI power supplies provided as part of the system don't properly report the scope as "DEVICE" scope (not powering the system). In order to fix this issue check the capabilities reported from the UCSI power supply to ensure that it supports charging a battery and that it can be powered by AC. Mark the scope accordingly. Fixes: a7fbfd44c020 ("usb: typec: ucsi: Mark dGPUs as DEVICE scope") Link: https://www.intel.com/content/www/us/en/products/docs/io/universal-serial-bus/usb-type-c-ucsi-spec.html p28 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> --- Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: Alex Deucher <Alexander.Deucher@amd.com>> Cc: Richard Gong <Richard.Gong@amd.com> --- drivers/usb/typec/ucsi/psy.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/usb/typec/ucsi/psy.c b/drivers/usb/typec/ucsi/psy.c index 384b42267f1f..b35c6e07911e 100644 --- a/drivers/usb/typec/ucsi/psy.c +++ b/drivers/usb/typec/ucsi/psy.c @@ -37,6 +37,15 @@ static int ucsi_psy_get_scope(struct ucsi_connector *con, struct device *dev = con->ucsi->dev; device_property_read_u8(dev, "scope", &scope); + if (scope == POWER_SUPPLY_SCOPE_UNKNOWN) { + u32 mask = UCSI_CAP_ATTR_POWER_AC_SUPPLY | + UCSI_CAP_ATTR_BATTERY_CHARGING; + + if (con->ucsi->cap.attributes & mask) + scope = POWER_SUPPLY_SCOPE_SYSTEM; + else + scope = POWER_SUPPLY_SCOPE_DEVICE; + } val->intval = scope; return 0; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope 2023-10-05 17:52 ` [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope Mario Limonciello @ 2023-10-05 19:13 ` Greg Kroah-Hartman 2023-10-05 20:35 ` Sebastian Reichel 1 sibling, 0 replies; 8+ messages in thread From: Greg Kroah-Hartman @ 2023-10-05 19:13 UTC (permalink / raw) To: Mario Limonciello Cc: Heikki Krogerus, Wolfram Sang, Sebastian Reichel, Alex Deucher, linux-usb, linux-kernel, amd-gfx, Kai-Heng Feng, Richard Gong On Thu, Oct 05, 2023 at 12:52:29PM -0500, Mario Limonciello wrote: > On some OEM systems, adding a W7900 dGPU triggers RAS errors and hangs > at a black screen on startup. This issue occurs only if `ucsi_acpi` has > loaded before `amdgpu` has loaded. The reason for this failure is that > `amdgpu` uses power_supply_is_system_supplied() to determine if running > on AC or DC power at startup. If this value is reported incorrectly the > dGPU will also be programmed incorrectly and trigger errors. > > power_supply_is_system_supplied() reports the wrong value because UCSI > power supplies provided as part of the system don't properly report the > scope as "DEVICE" scope (not powering the system). > > In order to fix this issue check the capabilities reported from the UCSI > power supply to ensure that it supports charging a battery and that it can > be powered by AC. Mark the scope accordingly. > > Fixes: a7fbfd44c020 ("usb: typec: ucsi: Mark dGPUs as DEVICE scope") > Link: https://www.intel.com/content/www/us/en/products/docs/io/universal-serial-bus/usb-type-c-ucsi-spec.html p28 > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> > Cc: Alex Deucher <Alexander.Deucher@amd.com>> > Cc: Richard Gong <Richard.Gong@amd.com> > --- > drivers/usb/typec/ucsi/psy.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/usb/typec/ucsi/psy.c b/drivers/usb/typec/ucsi/psy.c > index 384b42267f1f..b35c6e07911e 100644 > --- a/drivers/usb/typec/ucsi/psy.c > +++ b/drivers/usb/typec/ucsi/psy.c > @@ -37,6 +37,15 @@ static int ucsi_psy_get_scope(struct ucsi_connector *con, > struct device *dev = con->ucsi->dev; > > device_property_read_u8(dev, "scope", &scope); > + if (scope == POWER_SUPPLY_SCOPE_UNKNOWN) { > + u32 mask = UCSI_CAP_ATTR_POWER_AC_SUPPLY | > + UCSI_CAP_ATTR_BATTERY_CHARGING; > + > + if (con->ucsi->cap.attributes & mask) > + scope = POWER_SUPPLY_SCOPE_SYSTEM; > + else > + scope = POWER_SUPPLY_SCOPE_DEVICE; > + } > val->intval = scope; > return 0; > } > -- > 2.34.1 > > Hi, This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him a patch that has triggered this response. He used to manually respond to these common problems, but in order to save his sanity (he kept writing the same thing over and over, yet to different people), I was created. Hopefully you will not take offence and will fix the problem in your patch and resubmit it so that it can be accepted into the Linux kernel tree. You are receiving this message because of the following common error(s) as indicated below: - You have marked a patch with a "Fixes:" tag for a commit that is in an older released kernel, yet you do not have a cc: stable line in the signed-off-by area at all, which means that the patch will not be applied to any older kernel releases. To properly fix this, please follow the documented rules in the Documentation/process/stable-kernel-rules.rst file for how to resolve this. If you wish to discuss this problem further, or you have questions about how to resolve this issue, please feel free to respond to this email and Greg will reply once he has dug out from the pending patches received from other developers. thanks, greg k-h's patch email bot ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope 2023-10-05 17:52 ` [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope Mario Limonciello 2023-10-05 19:13 ` Greg Kroah-Hartman @ 2023-10-05 20:35 ` Sebastian Reichel 1 sibling, 0 replies; 8+ messages in thread From: Sebastian Reichel @ 2023-10-05 20:35 UTC (permalink / raw) To: Mario Limonciello Cc: Heikki Krogerus, Greg Kroah-Hartman, Wolfram Sang, Alex Deucher, linux-usb, linux-kernel, amd-gfx, Kai-Heng Feng, Richard Gong [-- Attachment #1: Type: text/plain, Size: 2225 bytes --] Hi, On Thu, Oct 05, 2023 at 12:52:29PM -0500, Mario Limonciello wrote: > On some OEM systems, adding a W7900 dGPU triggers RAS errors and hangs > at a black screen on startup. This issue occurs only if `ucsi_acpi` has > loaded before `amdgpu` has loaded. The reason for this failure is that > `amdgpu` uses power_supply_is_system_supplied() to determine if running > on AC or DC power at startup. If this value is reported incorrectly the > dGPU will also be programmed incorrectly and trigger errors. > > power_supply_is_system_supplied() reports the wrong value because UCSI > power supplies provided as part of the system don't properly report the > scope as "DEVICE" scope (not powering the system). > > In order to fix this issue check the capabilities reported from the UCSI > power supply to ensure that it supports charging a battery and that it can > be powered by AC. Mark the scope accordingly. > > Fixes: a7fbfd44c020 ("usb: typec: ucsi: Mark dGPUs as DEVICE scope") > Link: https://www.intel.com/content/www/us/en/products/docs/io/universal-serial-bus/usb-type-c-ucsi-spec.html p28 > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> > Cc: Alex Deucher <Alexander.Deucher@amd.com>> > Cc: Richard Gong <Richard.Gong@amd.com> > --- > drivers/usb/typec/ucsi/psy.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/usb/typec/ucsi/psy.c b/drivers/usb/typec/ucsi/psy.c > index 384b42267f1f..b35c6e07911e 100644 > --- a/drivers/usb/typec/ucsi/psy.c > +++ b/drivers/usb/typec/ucsi/psy.c > @@ -37,6 +37,15 @@ static int ucsi_psy_get_scope(struct ucsi_connector *con, > struct device *dev = con->ucsi->dev; > > device_property_read_u8(dev, "scope", &scope); > + if (scope == POWER_SUPPLY_SCOPE_UNKNOWN) { > + u32 mask = UCSI_CAP_ATTR_POWER_AC_SUPPLY | > + UCSI_CAP_ATTR_BATTERY_CHARGING; > + > + if (con->ucsi->cap.attributes & mask) > + scope = POWER_SUPPLY_SCOPE_SYSTEM; > + else > + scope = POWER_SUPPLY_SCOPE_DEVICE; > + } > val->intval = scope; > return 0; > } Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> -- Sebastian [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0" 2023-10-05 17:52 [PATCH v2 0/2] Fix Navi3x boot and hotplug problems Mario Limonciello 2023-10-05 17:52 ` [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope Mario Limonciello @ 2023-10-05 17:52 ` Mario Limonciello 2023-10-05 19:12 ` Greg Kroah-Hartman 1 sibling, 1 reply; 8+ messages in thread From: Mario Limonciello @ 2023-10-05 17:52 UTC (permalink / raw) To: Heikki Krogerus, Greg Kroah-Hartman, Wolfram Sang, Sebastian Reichel Cc: Alex Deucher, linux-usb, linux-kernel, amd-gfx, Mario Limonciello This reverts commit 0e5e1a84f0b8c814d502a135824244127fed8f23. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 ++- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c index 08cb9f8ce64e..9b62b45ebb7f 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -1026,7 +1026,8 @@ static int smu_v13_0_process_pending_interrupt(struct smu_context *smu) { int ret = 0; - if (smu_cmn_feature_is_enabled(smu, SMU_FEATURE_ACDC_BIT)) + if (smu->dc_controlled_by_gpio && + smu_cmn_feature_is_enabled(smu, SMU_FEATURE_ACDC_BIT)) ret = smu_v13_0_allow_ih_interrupt(smu); return ret; diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c index 07df5be063e2..0fb6be11a0cc 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c @@ -2662,6 +2662,7 @@ static const struct pptable_funcs smu_v13_0_0_ppt_funcs = { .enable_mgpu_fan_boost = smu_v13_0_0_enable_mgpu_fan_boost, .get_power_limit = smu_v13_0_0_get_power_limit, .set_power_limit = smu_v13_0_set_power_limit, + .set_power_source = smu_v13_0_set_power_source, .get_power_profile_mode = smu_v13_0_0_get_power_profile_mode, .set_power_profile_mode = smu_v13_0_0_set_power_profile_mode, .run_btc = smu_v13_0_run_btc, -- 2.34.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0" 2023-10-05 17:52 ` [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0" Mario Limonciello @ 2023-10-05 19:12 ` Greg Kroah-Hartman 2023-10-05 19:15 ` Mario Limonciello 2023-10-05 19:17 ` Alex Deucher 0 siblings, 2 replies; 8+ messages in thread From: Greg Kroah-Hartman @ 2023-10-05 19:12 UTC (permalink / raw) To: Mario Limonciello Cc: Heikki Krogerus, Wolfram Sang, Sebastian Reichel, Alex Deucher, linux-usb, linux-kernel, amd-gfx On Thu, Oct 05, 2023 at 12:52:30PM -0500, Mario Limonciello wrote: > This reverts commit 0e5e1a84f0b8c814d502a135824244127fed8f23. > > Reviewed-by: Alex Deucher <alexander.deucher@amd.com> > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> No explaination as to why this needs to be reverted? And does this need to be backported anywhere? thanks, greg k-h ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0" 2023-10-05 19:12 ` Greg Kroah-Hartman @ 2023-10-05 19:15 ` Mario Limonciello 2023-10-05 19:17 ` Alex Deucher 1 sibling, 0 replies; 8+ messages in thread From: Mario Limonciello @ 2023-10-05 19:15 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Heikki Krogerus, Wolfram Sang, Sebastian Reichel, Alex Deucher, linux-usb, linux-kernel, amd-gfx On 10/5/2023 14:12, Greg Kroah-Hartman wrote: > On Thu, Oct 05, 2023 at 12:52:30PM -0500, Mario Limonciello wrote: >> This reverts commit 0e5e1a84f0b8c814d502a135824244127fed8f23. >> >> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> >> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > > No explaination as to why this needs to be reverted? And does this need > to be backported anywhere? > > thanks, > > greg k-h No need to be backported anywhere. The commit is only in amd-staging-drm-next right now. I think it's up to whether Alex includes the workaround commit in the final 6.7 pull request. If he does, then yeah this could use a larger write up to explain why it went in and out. I was sort of thinking we could land both commits amd-staging-drm-next and then when Alex did the pull request the workaround commit just wouldn't be part of the 6.7 PR since it's a no-op with the revert. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0" 2023-10-05 19:12 ` Greg Kroah-Hartman 2023-10-05 19:15 ` Mario Limonciello @ 2023-10-05 19:17 ` Alex Deucher 1 sibling, 0 replies; 8+ messages in thread From: Alex Deucher @ 2023-10-05 19:17 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Mario Limonciello, Heikki Krogerus, linux-usb, Sebastian Reichel, amd-gfx, linux-kernel, Wolfram Sang, Alex Deucher On Thu, Oct 5, 2023 at 3:13 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Thu, Oct 05, 2023 at 12:52:30PM -0500, Mario Limonciello wrote: > > This reverts commit 0e5e1a84f0b8c814d502a135824244127fed8f23. > > > > Reviewed-by: Alex Deucher <alexander.deucher@amd.com> > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > > No explaination as to why this needs to be reverted? And does this need > to be backported anywhere? This patch ultimately never went upstream, but there was some confusion about whether it did or not. It can be ignored. Alex ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-10-05 20:35 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-05 17:52 [PATCH v2 0/2] Fix Navi3x boot and hotplug problems Mario Limonciello 2023-10-05 17:52 ` [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope Mario Limonciello 2023-10-05 19:13 ` Greg Kroah-Hartman 2023-10-05 20:35 ` Sebastian Reichel 2023-10-05 17:52 ` [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0" Mario Limonciello 2023-10-05 19:12 ` Greg Kroah-Hartman 2023-10-05 19:15 ` Mario Limonciello 2023-10-05 19:17 ` Alex Deucher
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox