* [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
@ 2023-04-18 22:15 Guilherme G. Piccoli
2023-04-19 13:16 ` Deucher, Alexander
2023-04-22 15:19 ` Greg KH
0 siblings, 2 replies; 13+ messages in thread
From: Guilherme G. Piccoli @ 2023-04-18 22:15 UTC (permalink / raw)
To: stable
Cc: gregkh, sashal, amd-gfx, alexander.deucher, James.Zhu, leo.liu,
kernel, kernel-dev
commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
The VCN firmware loading path enables the indirect SRAM mode if it's
advertised as supported. We might have some cases of FW issues that
prevents this mode to working properly though, ending-up in a failed
probe. An example below, observed in the Steam Deck:
[...]
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000)
amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110
amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
[...]
Disabling the VCN block circumvents this, but it's a very invasive
workaround that turns off the entire feature. So, let's add a quirk
on VCN loading that checks for known problematic BIOSes on Vangogh,
so we can proactively disable the indirect SRAM mode and allow the
HW proper probe and VCN IP block to work fine.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Cc: stable@vger.kernel.org
Cc: James Zhu <James.Zhu@amd.com>
Cc: Leo Liu <leo.liu@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
Hi folks, this was build/boot tested on Deck. I've also adjusted the
context, function was reworked on 6.2.
But what a surprise was for me not see this fix already in 6.1.y, since
I've CCed stable, and the reason for that is really peculiar:
$ git log -1 --pretty="%an <%ae>: %s" 82132ecc5432
Leo Liu <leo.liu@amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
$ git describe --contains 82132ecc5432
v6.2-rc1~124^2~1^2~13
$ git log -1 --pretty="%an <%ae>: %s" 9a8cc8cabc1e
Leo Liu <leo.liu@amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
$ git describe --contains 9a8cc8cabc1e
v6.1-rc8~16^2^2
This is quite strange for me, we have 2 commit hashes pointing to the *same*
commit, and each one is present..in a different release !!?!
Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable
misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
Alex, do you have an idea why sometimes commits from the AMD tree appear
duplicate in mainline? Specially in different releases, this could cause
some confusion I guess.
Thanks in advance,
Guilherme
drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index ce64ca1c6e66..5c1193dd7d88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -26,6 +26,7 @@
#include <linux/firmware.h>
#include <linux/module.h>
+#include <linux/dmi.h>
#include <linux/pci.h>
#include <linux/debugfs.h>
#include <drm/drm_drv.h>
@@ -84,6 +85,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
{
unsigned long bo_size;
const char *fw_name;
+ const char *bios_ver;
const struct common_firmware_header *hdr;
unsigned char fw_check;
unsigned int fw_shared_size, log_offset;
@@ -159,6 +161,21 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
(adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
adev->vcn.indirect_sram = true;
+ /*
+ * Some Steam Deck's BIOS versions are incompatible with the
+ * indirect SRAM mode, leading to amdgpu being unable to get
+ * properly probed (and even potentially crashing the kernel).
+ * Hence, check for these versions here - notice this is
+ * restricted to Vangogh (Deck's APU).
+ */
+ bios_ver = dmi_get_system_info(DMI_BIOS_VERSION);
+
+ if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) ||
+ !strncmp("F7A0114", bios_ver, 7))) {
+ adev->vcn.indirect_sram = false;
+ dev_info(adev->dev,
+ "Steam Deck quirk: indirect SRAM disabled on BIOS %s\n", bios_ver);
+ }
break;
case IP_VERSION(3, 0, 16):
fw_name = FIRMWARE_DIMGREY_CAVEFISH;
--
2.40.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* RE: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-18 22:15 [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes Guilherme G. Piccoli
@ 2023-04-19 13:16 ` Deucher, Alexander
2023-04-19 14:14 ` Guilherme G. Piccoli
2023-04-22 15:19 ` Greg KH
1 sibling, 1 reply; 13+ messages in thread
From: Deucher, Alexander @ 2023-04-19 13:16 UTC (permalink / raw)
To: Guilherme G. Piccoli, stable@vger.kernel.org
Cc: gregkh@linuxfoundation.org, sashal@kernel.org,
amd-gfx@lists.freedesktop.org, Zhu, James, Liu, Leo,
kernel@gpiccoli.net, kernel-dev@igalia.com
[Public]
> -----Original Message-----
> From: Guilherme G. Piccoli <gpiccoli@igalia.com>
> Sent: Tuesday, April 18, 2023 6:15 PM
> To: stable@vger.kernel.org
> Cc: gregkh@linuxfoundation.org; sashal@kernel.org; amd-
> gfx@lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Zhu, James <James.Zhu@amd.com>; Liu,
> Leo <Leo.Liu@amd.com>; kernel@gpiccoli.net; kernel-dev@igalia.com
> Subject: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh
> broken BIOSes
>
> commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
>
> The VCN firmware loading path enables the indirect SRAM mode if it's
> advertised as supported. We might have some cases of FW issues that
> prevents this mode to working properly though, ending-up in a failed probe.
> An example below, observed in the Steam Deck:
>
> [...]
> [drm] failed to load ucode VCN0_RAM(0x3A) [drm] psp gfx command
> LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000) amdgpu
> 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
> vcn_dec_0 test failed (-110) [drm:amdgpu_device_init.cold [amdgpu]]
> *ERROR* hw_init of IP block <vcn_v3_0> failed -110 amdgpu 0000:04:00.0:
> amdgpu: amdgpu_device_ip_init failed amdgpu 0000:04:00.0: amdgpu: Fatal
> error during GPU init [...]
>
> Disabling the VCN block circumvents this, but it's a very invasive workaround
> that turns off the entire feature. So, let's add a quirk on VCN loading that
> checks for known problematic BIOSes on Vangogh, so we can proactively
> disable the indirect SRAM mode and allow the HW proper probe and VCN IP
> block to work fine.
>
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
> Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram
> mode")
> Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram
> mode")
> Cc: stable@vger.kernel.org
> Cc: James Zhu <James.Zhu@amd.com>
> Cc: Leo Liu <leo.liu@amd.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>
> Hi folks, this was build/boot tested on Deck. I've also adjusted the context,
> function was reworked on 6.2.
>
> But what a surprise was for me not see this fix already in 6.1.y, since I've
> CCed stable, and the reason for that is really peculiar:
>
>
> $ git log -1 --pretty="%an <%ae>: %s" 82132ecc5432 Leo Liu
> <leo.liu@amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
>
> $ git describe --contains 82132ecc5432
> v6.2-rc1~124^2~1^2~13
>
>
> $ git log -1 --pretty="%an <%ae>: %s" 9a8cc8cabc1e Leo Liu
> <leo.liu@amd.com>: drm/amdgpu: enable Vangogh VCN indirect sram mode
>
> $ git describe --contains 9a8cc8cabc1e
> v6.1-rc8~16^2^2
>
>
> This is quite strange for me, we have 2 commit hashes pointing to the
> *same* commit, and each one is present..in a different release !!?!
> Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable
> misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
>
> Alex, do you have an idea why sometimes commits from the AMD tree
> appear duplicate in mainline? Specially in different releases, this could cause
> some confusion I guess.
This is pretty common in the drm. The problem is there is not a good way to get bug fixes into both -next and -fixes without constant back merges. So the patches end up in -next and if they are important for stable, they go to -fixes too. We don't want -next to be broken, but we can't wait until the next kernel cycle to get the bug fixes into the current kernel and stable. I don't know of a good way to make this smoother.
Alex
>
> Thanks in advance,
>
>
> Guilherme
>
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index ce64ca1c6e66..5c1193dd7d88 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -26,6 +26,7 @@
>
> #include <linux/firmware.h>
> #include <linux/module.h>
> +#include <linux/dmi.h>
> #include <linux/pci.h>
> #include <linux/debugfs.h>
> #include <drm/drm_drv.h>
> @@ -84,6 +85,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
> {
> unsigned long bo_size;
> const char *fw_name;
> + const char *bios_ver;
> const struct common_firmware_header *hdr;
> unsigned char fw_check;
> unsigned int fw_shared_size, log_offset; @@ -159,6 +161,21 @@ int
> amdgpu_vcn_sw_init(struct amdgpu_device *adev)
> if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> adev->vcn.indirect_sram = true;
> + /*
> + * Some Steam Deck's BIOS versions are incompatible with
> the
> + * indirect SRAM mode, leading to amdgpu being unable to
> get
> + * properly probed (and even potentially crashing the
> kernel).
> + * Hence, check for these versions here - notice this is
> + * restricted to Vangogh (Deck's APU).
> + */
> + bios_ver = dmi_get_system_info(DMI_BIOS_VERSION);
> +
> + if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) ||
> + !strncmp("F7A0114", bios_ver, 7))) {
> + adev->vcn.indirect_sram = false;
> + dev_info(adev->dev,
> + "Steam Deck quirk: indirect SRAM disabled on
> BIOS %s\n", bios_ver);
> + }
> break;
> case IP_VERSION(3, 0, 16):
> fw_name = FIRMWARE_DIMGREY_CAVEFISH;
> --
> 2.40.0
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-19 13:16 ` Deucher, Alexander
@ 2023-04-19 14:14 ` Guilherme G. Piccoli
2023-04-19 20:04 ` Deucher, Alexander
0 siblings, 1 reply; 13+ messages in thread
From: Guilherme G. Piccoli @ 2023-04-19 14:14 UTC (permalink / raw)
To: Deucher, Alexander
Cc: stable@vger.kernel.org, gregkh@linuxfoundation.org,
sashal@kernel.org, amd-gfx@lists.freedesktop.org, Zhu, James,
Liu, Leo, kernel@gpiccoli.net, kernel-dev@igalia.com
On 19/04/2023 10:16, Deucher, Alexander wrote:
> [...]
>> This is quite strange for me, we have 2 commit hashes pointing to the
>> *same* commit, and each one is present..in a different release !!?!
>> Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable
>> misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
>>
>> Alex, do you have an idea why sometimes commits from the AMD tree
>> appear duplicate in mainline? Specially in different releases, this could cause
>> some confusion I guess.
>
> This is pretty common in the drm. The problem is there is not a good way to get bug fixes into both -next and -fixes without constant back merges. So the patches end up in -next and if they are important for stable, they go to -fixes too. We don't want -next to be broken, but we can't wait until the next kernel cycle to get the bug fixes into the current kernel and stable. I don't know of a good way to make this smoother.
>
> Alex
Thanks Alex, seems quite complicated indeed.
So, let me check if I understood properly: there are 2 trees (-fixes and
-next), and the problem is that their merge onto mainline happens apart
and there are kind of duplicate commits, that were first merged on
-fixes, then "re-merged" on -next, right?
Would be possible to clean the -next tree right before the merge, using
some script to find commits there that are going to be merged but are
already present? Then you'd have a -next-to-merge tree that is clean and
doesn't present dups, avoiding this.
Disclaimer: I'm not a maintainer, maybe there are smart ways of doing
that or perhaps my suggestion is silly and unfeasible heh
But seems likely that other maintainers face this problem as well and
came up with some solution - avoiding the dups would be a good thing,
IMO, if possible.
Cheers,
Guilherme
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-19 14:14 ` Guilherme G. Piccoli
@ 2023-04-19 20:04 ` Deucher, Alexander
2023-04-20 12:36 ` Guilherme G. Piccoli
0 siblings, 1 reply; 13+ messages in thread
From: Deucher, Alexander @ 2023-04-19 20:04 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: stable@vger.kernel.org, gregkh@linuxfoundation.org,
sashal@kernel.org, amd-gfx@lists.freedesktop.org, Zhu, James,
Liu, Leo, kernel@gpiccoli.net, kernel-dev@igalia.com
[Public]
> -----Original Message-----
> From: Guilherme G. Piccoli <gpiccoli@igalia.com>
> Sent: Wednesday, April 19, 2023 10:15 AM
> To: Deucher, Alexander <Alexander.Deucher@amd.com>
> Cc: stable@vger.kernel.org; gregkh@linuxfoundation.org;
> sashal@kernel.org; amd-gfx@lists.freedesktop.org; Zhu, James
> <James.Zhu@amd.com>; Liu, Leo <Leo.Liu@amd.com>; kernel@gpiccoli.net;
> kernel-dev@igalia.com
> Subject: Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on
> Vangogh broken BIOSes
>
> On 19/04/2023 10:16, Deucher, Alexander wrote:
> > [...]
> >> This is quite strange for me, we have 2 commit hashes pointing to the
> >> *same* commit, and each one is present..in a different release !!?!
> >> Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y
> >> stable misses it, since it only contains 9a8cc8cabc1e (which is the same
> patch!).
> >>
> >> Alex, do you have an idea why sometimes commits from the AMD tree
> >> appear duplicate in mainline? Specially in different releases, this
> >> could cause some confusion I guess.
> >
> > This is pretty common in the drm. The problem is there is not a good way
> to get bug fixes into both -next and -fixes without constant back merges. So
> the patches end up in -next and if they are important for stable, they go to -
> fixes too. We don't want -next to be broken, but we can't wait until the next
> kernel cycle to get the bug fixes into the current kernel and stable. I don't
> know of a good way to make this smoother.
> >
> > Alex
>
> Thanks Alex, seems quite complicated indeed.
>
> So, let me check if I understood properly: there are 2 trees (-fixes and -next),
> and the problem is that their merge onto mainline happens apart and there
> are kind of duplicate commits, that were first merged on -fixes, then "re-
> merged" on -next, right?
>
Each subsystem works on new features, then when the merge window opens, Linus pulls them into mainline. At that point, mainline goes into RCs and then mainline is bug fixes only. Meanwhile in parallel, each subsystem is working on new feature development for the next merge window (subsystem -next trees). The trees tend to lag behind mainline a bit. Most developers focus on them in support of upcoming features. They are usually also where bugs are fixed. If a bug is fixed in the -next tree, what's the best way to get it into mainline? I guess ideally it would be fixed in mainline and them mainline would be merged into everyone's -next branch, but that's not always practical. Often times new features depend on bug fixes and then you'd end up stalling new development waiting for a back merge, or you'd have to rebase your -next branches regularly so that they would shed any patches in mainline which is not great either. We try and cherry-pick -x when we can to show the relationship.
> Would be possible to clean the -next tree right before the merge, using
> some script to find commits there that are going to be merged but are
> already present? Then you'd have a -next-to-merge tree that is clean and
> doesn't present dups, avoiding this.
There's no real way to clean it without rewriting history. You'd basically need to do regular backmerges and rebases of the -next trees. Also then what do you do if you already have a feature in -next (say Dave or Daniel have already pulled in my latest amdgpu PR for -next) and you realize that one of those patches is an important bug fix for mainline? If the drm -next tree rebased then all of the other drm driver trees that feed into it would need to rebase as well otherwise they'd have stale history.
You also have the case of a fix in -next needing to land in mainline, but due to differences in the trees, it needs backporting to apply properly.
>
> Disclaimer: I'm not a maintainer, maybe there are smart ways of doing that or
> perhaps my suggestion is silly and unfeasible heh But seems likely that other
> maintainers face this problem as well and came up with some solution -
> avoiding the dups would be a good thing, IMO, if possible.
No worries. I agree. I haven't seen anything so far that really addresses handles this effectively.
Alex
>
> Cheers,
>
>
> Guilherme
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-19 20:04 ` Deucher, Alexander
@ 2023-04-20 12:36 ` Guilherme G. Piccoli
2023-04-20 12:42 ` gregkh
0 siblings, 1 reply; 13+ messages in thread
From: Guilherme G. Piccoli @ 2023-04-20 12:36 UTC (permalink / raw)
To: Deucher, Alexander, gregkh@linuxfoundation.org
Cc: stable@vger.kernel.org, sashal@kernel.org,
amd-gfx@lists.freedesktop.org, Zhu, James, Liu, Leo,
kernel@gpiccoli.net, kernel-dev@igalia.com
On 19/04/2023 17:04, Deucher, Alexander wrote:
> [...]
>> So, let me check if I understood properly: there are 2 trees (-fixes and -next),
>> and the problem is that their merge onto mainline happens apart and there
>> are kind of duplicate commits, that were first merged on -fixes, then "re-
>> merged" on -next, right?
>>
>
> Each subsystem works on new features, then when the merge window opens, Linus pulls them into mainline. At that point, mainline goes into RCs and then mainline is bug fixes only. Meanwhile in parallel, each subsystem is working on new feature development for the next merge window (subsystem -next trees). The trees tend to lag behind mainline a bit. Most developers focus on them in support of upcoming features. They are usually also where bugs are fixed. If a bug is fixed in the -next tree, what's the best way to get it into mainline? I guess ideally it would be fixed in mainline and them mainline would be merged into everyone's -next branch, but that's not always practical. Often times new features depend on bug fixes and then you'd end up stalling new development waiting for a back merge, or you'd have to rebase your -next branches regularly so that they would shed any patches in mainline which is not great either. We try and cherry-pick -x when we can to show the relationship.
>
>> Would be possible to clean the -next tree right before the merge, using
>> some script to find commits there that are going to be merged but are
>> already present? Then you'd have a -next-to-merge tree that is clean and
>> doesn't present dups, avoiding this.
>
> There's no real way to clean it without rewriting history. You'd basically need to do regular backmerges and rebases of the -next trees. Also then what do you do if you already have a feature in -next (say Dave or Daniel have already pulled in my latest amdgpu PR for -next) and you realize that one of those patches is an important bug fix for mainline? If the drm -next tree rebased then all of the other drm driver trees that feed into it would need to rebase as well otherwise they'd have stale history.
>
> You also have the case of a fix in -next needing to land in mainline, but due to differences in the trees, it needs backporting to apply properly.
>
>>
>> Disclaimer: I'm not a maintainer, maybe there are smart ways of doing that or
>> perhaps my suggestion is silly and unfeasible heh But seems likely that other
>> maintainers face this problem as well and came up with some solution -
>> avoiding the dups would be a good thing, IMO, if possible.
>
>
> No worries. I agree. I haven't seen anything so far that really addresses handles this effectively.
>
> Alex
Thanks a bunch Alex, I appreciate your time detailing the issue, which
seems...quite complex and annoying, indeed ={
What I'll do from now on is trying to check all hashes that match a
commit, so I can add more than one fixes tag if that makes sense. At
least, this way I can prevent missing fixes in stable like this patch heh
@Greg, can you pick this one? Thanks!
Cheers,
Guilherme
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-20 12:36 ` Guilherme G. Piccoli
@ 2023-04-20 12:42 ` gregkh
2023-04-20 12:59 ` Guilherme G. Piccoli
0 siblings, 1 reply; 13+ messages in thread
From: gregkh @ 2023-04-20 12:42 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: Deucher, Alexander, stable@vger.kernel.org, sashal@kernel.org,
amd-gfx@lists.freedesktop.org, Zhu, James, Liu, Leo,
kernel@gpiccoli.net, kernel-dev@igalia.com
On Thu, Apr 20, 2023 at 09:36:28AM -0300, Guilherme G. Piccoli wrote:
> On 19/04/2023 17:04, Deucher, Alexander wrote:
> > [...]
> >> So, let me check if I understood properly: there are 2 trees (-fixes and -next),
> >> and the problem is that their merge onto mainline happens apart and there
> >> are kind of duplicate commits, that were first merged on -fixes, then "re-
> >> merged" on -next, right?
> >>
> >
> > Each subsystem works on new features, then when the merge window opens, Linus pulls them into mainline. At that point, mainline goes into RCs and then mainline is bug fixes only. Meanwhile in parallel, each subsystem is working on new feature development for the next merge window (subsystem -next trees). The trees tend to lag behind mainline a bit. Most developers focus on them in support of upcoming features. They are usually also where bugs are fixed. If a bug is fixed in the -next tree, what's the best way to get it into mainline? I guess ideally it would be fixed in mainline and them mainline would be merged into everyone's -next branch, but that's not always practical. Often times new features depend on bug fixes and then you'd end up stalling new development waiting for a back merge, or you'd have to rebase your -next branches regularly so that they would shed any patches in mainline which is not great either. We try and cherry-pick -x when we can to show the relationship.
> >
> >> Would be possible to clean the -next tree right before the merge, using
> >> some script to find commits there that are going to be merged but are
> >> already present? Then you'd have a -next-to-merge tree that is clean and
> >> doesn't present dups, avoiding this.
> >
> > There's no real way to clean it without rewriting history. You'd basically need to do regular backmerges and rebases of the -next trees. Also then what do you do if you already have a feature in -next (say Dave or Daniel have already pulled in my latest amdgpu PR for -next) and you realize that one of those patches is an important bug fix for mainline? If the drm -next tree rebased then all of the other drm driver trees that feed into it would need to rebase as well otherwise they'd have stale history.
> >
> > You also have the case of a fix in -next needing to land in mainline, but due to differences in the trees, it needs backporting to apply properly.
> >
> >>
> >> Disclaimer: I'm not a maintainer, maybe there are smart ways of doing that or
> >> perhaps my suggestion is silly and unfeasible heh But seems likely that other
> >> maintainers face this problem as well and came up with some solution -
> >> avoiding the dups would be a good thing, IMO, if possible.
> >
> >
> > No worries. I agree. I haven't seen anything so far that really addresses handles this effectively.
> >
> > Alex
>
> Thanks a bunch Alex, I appreciate your time detailing the issue, which
> seems...quite complex and annoying, indeed ={
And it drives me crazy, I hate seeing drm patches show up in my stable
queue and I put them at the end of the list for this very reason. I've
complained about it for years, but oh well...
> @Greg, can you pick this one? Thanks!
Which "one" are you referring to here?
confused,
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-20 12:42 ` gregkh
@ 2023-04-20 12:59 ` Guilherme G. Piccoli
2023-04-20 15:02 ` gregkh
0 siblings, 1 reply; 13+ messages in thread
From: Guilherme G. Piccoli @ 2023-04-20 12:59 UTC (permalink / raw)
To: gregkh@linuxfoundation.org
Cc: Deucher, Alexander, stable@vger.kernel.org, sashal@kernel.org,
amd-gfx@lists.freedesktop.org, Zhu, James, Liu, Leo,
kernel@gpiccoli.net, kernel-dev@igalia.com
On 20/04/2023 09:42, gregkh@linuxfoundation.org wrote:
> [...]
>> @Greg, can you pick this one? Thanks!
>
> Which "one" are you referring to here?
>
> confused,
>
> greg k-h
This one, sent in this email thread.
The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on
Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream
hash(es) is 542a56e8eb44 heh
Cheers,
Guilherme
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-20 12:59 ` Guilherme G. Piccoli
@ 2023-04-20 15:02 ` gregkh
2023-04-20 15:26 ` Alex Deucher
2023-04-20 15:36 ` Guilherme G. Piccoli
0 siblings, 2 replies; 13+ messages in thread
From: gregkh @ 2023-04-20 15:02 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: Deucher, Alexander, stable@vger.kernel.org, sashal@kernel.org,
amd-gfx@lists.freedesktop.org, Zhu, James, Liu, Leo,
kernel@gpiccoli.net, kernel-dev@igalia.com
On Thu, Apr 20, 2023 at 09:59:17AM -0300, Guilherme G. Piccoli wrote:
> On 20/04/2023 09:42, gregkh@linuxfoundation.org wrote:
> > [...]
> >> @Greg, can you pick this one? Thanks!
> >
> > Which "one" are you referring to here?
> >
> > confused,
> >
> > greg k-h
>
> This one, sent in this email thread.
I don't have "this email thread" anymore, remember, some of us get
thousand+ emails a day...
> The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on
> Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream
> hash(es) is 542a56e8eb44 heh
But that commit says it fixes a problem in the 6.2 tree, why is this
relevant for 6.1.y?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-20 15:02 ` gregkh
@ 2023-04-20 15:26 ` Alex Deucher
2023-04-20 15:36 ` Guilherme G. Piccoli
1 sibling, 0 replies; 13+ messages in thread
From: Alex Deucher @ 2023-04-20 15:26 UTC (permalink / raw)
To: gregkh@linuxfoundation.org
Cc: Guilherme G. Piccoli, sashal@kernel.org, kernel@gpiccoli.net,
amd-gfx@lists.freedesktop.org, stable@vger.kernel.org,
kernel-dev@igalia.com, Deucher, Alexander, Zhu, James, Liu, Leo
On Thu, Apr 20, 2023 at 11:04 AM gregkh@linuxfoundation.org
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, Apr 20, 2023 at 09:59:17AM -0300, Guilherme G. Piccoli wrote:
> > On 20/04/2023 09:42, gregkh@linuxfoundation.org wrote:
> > > [...]
> > >> @Greg, can you pick this one? Thanks!
> > >
> > > Which "one" are you referring to here?
> > >
> > > confused,
> > >
> > > greg k-h
> >
> > This one, sent in this email thread.
>
> I don't have "this email thread" anymore, remember, some of us get
> thousand+ emails a day...
>
> > The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on
> > Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream
> > hash(es) is 542a56e8eb44 heh
>
> But that commit says it fixes a problem in the 6.2 tree, why is this
> relevant for 6.1.y?
It fixes a generic issue with certain sbios versions.
Alex
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-20 15:02 ` gregkh
2023-04-20 15:26 ` Alex Deucher
@ 2023-04-20 15:36 ` Guilherme G. Piccoli
2023-04-20 15:56 ` gregkh
1 sibling, 1 reply; 13+ messages in thread
From: Guilherme G. Piccoli @ 2023-04-20 15:36 UTC (permalink / raw)
To: gregkh@linuxfoundation.org
Cc: Deucher, Alexander, stable@vger.kernel.org, sashal@kernel.org,
amd-gfx@lists.freedesktop.org, Zhu, James, Liu, Leo,
kernel@gpiccoli.net, kernel-dev@igalia.com
On 20/04/2023 12:02, gregkh@linuxfoundation.org wrote:
>> [...]
>>> Which "one" are you referring to here?
>>>
>>> confused,
>>>
>>> greg k-h
>>
>> This one, sent in this email thread.
>
> I don't have "this email thread" anymore, remember, some of us get
> thousand+ emails a day...
I don't really understand the issue to be honest, we are talking in the
very email thread! The email was sent April/18, it's not old or anything.
But in any case, for reference, this is the original email from the lore
archives:
https://lore.kernel.org/stable/20230418221522.1287942-1-gpiccoli@igalia.com/
>
>> The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on
>> Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream
>> hash(es) is 542a56e8eb44 heh
>
> But that commit says it fixes a problem in the 6.2 tree, why is this
> relevant for 6.1.y?
>
That is explained in the email and the very reason for that, is the
duplicate hashes we are discussing here.
The fix commit in question points the "Fixes:" tag to 82132ecc5432
("drm/amdgpu: enable Vangogh VCN indirect sram mode"), which appears to
be in 6.2 tree, right?
But notice that 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect
sram mode") is the *same* offender and..is present on 6.1 !
In other words, when I first wrote this fix, I just checked the tree
quickly and came up with "Fixes: 82132ecc5432", but to be thorough, I
should have pointed the fixes tag to 9a8cc8cabc1e, to pick it on 6.1.y.
tl;dr: the offender is present on 6.1.y, but this fix is not, hence I'm
hereby requesting the merge. Some backport/context adjustment was
necessary and it was properly tested in the Steam Deck.
Thanks,
Guilherme
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-20 15:36 ` Guilherme G. Piccoli
@ 2023-04-20 15:56 ` gregkh
2023-04-20 16:45 ` Guilherme G. Piccoli
0 siblings, 1 reply; 13+ messages in thread
From: gregkh @ 2023-04-20 15:56 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: Deucher, Alexander, stable@vger.kernel.org, sashal@kernel.org,
amd-gfx@lists.freedesktop.org, Zhu, James, Liu, Leo,
kernel@gpiccoli.net, kernel-dev@igalia.com
On Thu, Apr 20, 2023 at 12:36:00PM -0300, Guilherme G. Piccoli wrote:
> On 20/04/2023 12:02, gregkh@linuxfoundation.org wrote:
> >> [...]
> >>> Which "one" are you referring to here?
> >>>
> >>> confused,
> >>>
> >>> greg k-h
> >>
> >> This one, sent in this email thread.
> >
> > I don't have "this email thread" anymore, remember, some of us get
> > thousand+ emails a day...
>
> I don't really understand the issue to be honest, we are talking in the
> very email thread! The email was sent April/18, it's not old or anything.
That's 3000+ emails ago for me :)
> But in any case, for reference, this is the original email from the lore
> archives:
> https://lore.kernel.org/stable/20230418221522.1287942-1-gpiccoli@igalia.com/
>
> >
> >> The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on
> >> Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream
> >> hash(es) is 542a56e8eb44 heh
> >
> > But that commit says it fixes a problem in the 6.2 tree, why is this
> > relevant for 6.1.y?
> >
>
> That is explained in the email and the very reason for that, is the
> duplicate hashes we are discussing here.
>
> The fix commit in question points the "Fixes:" tag to 82132ecc5432
> ("drm/amdgpu: enable Vangogh VCN indirect sram mode"), which appears to
> be in 6.2 tree, right?
>
> But notice that 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect
> sram mode") is the *same* offender and..is present on 6.1 !
>
> In other words, when I first wrote this fix, I just checked the tree
> quickly and came up with "Fixes: 82132ecc5432", but to be thorough, I
> should have pointed the fixes tag to 9a8cc8cabc1e, to pick it on 6.1.y.
>
>
> tl;dr: the offender is present on 6.1.y, but this fix is not, hence I'm
> hereby requesting the merge. Some backport/context adjustment was
> necessary and it was properly tested in the Steam Deck.
Ok, we'll queue it up soon, thanks.
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-20 15:56 ` gregkh
@ 2023-04-20 16:45 ` Guilherme G. Piccoli
0 siblings, 0 replies; 13+ messages in thread
From: Guilherme G. Piccoli @ 2023-04-20 16:45 UTC (permalink / raw)
To: gregkh@linuxfoundation.org
Cc: Deucher, Alexander, stable@vger.kernel.org, sashal@kernel.org,
amd-gfx@lists.freedesktop.org, Zhu, James, Liu, Leo,
kernel@gpiccoli.net, kernel-dev@igalia.com
On 20/04/2023 12:56, gregkh@linuxfoundation.org wrote:
> [...]
> That's 3000+ emails ago for me :)
/head_exploding
this is > 1000 emails per day, wow...my sympathies to you heh
> [...]
>> tl;dr: the offender is present on 6.1.y, but this fix is not, hence I'm
>> hereby requesting the merge. Some backport/context adjustment was
>> necessary and it was properly tested in the Steam Deck.
>
> Ok, we'll queue it up soon, thanks.
>
> greg k-h
Thanks =)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
2023-04-18 22:15 [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes Guilherme G. Piccoli
2023-04-19 13:16 ` Deucher, Alexander
@ 2023-04-22 15:19 ` Greg KH
1 sibling, 0 replies; 13+ messages in thread
From: Greg KH @ 2023-04-22 15:19 UTC (permalink / raw)
To: Guilherme G. Piccoli
Cc: stable, sashal, amd-gfx, alexander.deucher, James.Zhu, leo.liu,
kernel, kernel-dev
On Tue, Apr 18, 2023 at 07:15:22PM -0300, Guilherme G. Piccoli wrote:
> commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.
>
> The VCN firmware loading path enables the indirect SRAM mode if it's
> advertised as supported. We might have some cases of FW issues that
> prevents this mode to working properly though, ending-up in a failed
> probe. An example below, observed in the Steam Deck:
>
> [...]
> [drm] failed to load ucode VCN0_RAM(0x3A)
> [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000)
> amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
> [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110
> amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed
> amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
> [...]
>
> Disabling the VCN block circumvents this, but it's a very invasive
> workaround that turns off the entire feature. So, let's add a quirk
> on VCN loading that checks for known problematic BIOSes on Vangogh,
> so we can proactively disable the indirect SRAM mode and allow the
> HW proper probe and VCN IP block to work fine.
>
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
> Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
> Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
> Cc: stable@vger.kernel.org
> Cc: James Zhu <James.Zhu@amd.com>
> Cc: Leo Liu <leo.liu@amd.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> ---
>
> Hi folks, this was build/boot tested on Deck. I've also adjusted the
> context, function was reworked on 6.2.
>
> But what a surprise was for me not see this fix already in 6.1.y, since
> I've CCed stable, and the reason for that is really peculiar:
Now queued up, thanks.
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-04-22 15:19 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-18 22:15 [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes Guilherme G. Piccoli
2023-04-19 13:16 ` Deucher, Alexander
2023-04-19 14:14 ` Guilherme G. Piccoli
2023-04-19 20:04 ` Deucher, Alexander
2023-04-20 12:36 ` Guilherme G. Piccoli
2023-04-20 12:42 ` gregkh
2023-04-20 12:59 ` Guilherme G. Piccoli
2023-04-20 15:02 ` gregkh
2023-04-20 15:26 ` Alex Deucher
2023-04-20 15:36 ` Guilherme G. Piccoli
2023-04-20 15:56 ` gregkh
2023-04-20 16:45 ` Guilherme G. Piccoli
2023-04-22 15:19 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox