All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] drm/nouveau: Fix runtime suspend on R570
@ 2026-06-25 23:10 ` Lyude Paul
  0 siblings, 0 replies; 12+ messages in thread
From: Lyude Paul @ 2026-06-25 23:10 UTC (permalink / raw)
  To: dri-devel, nouveau, linux-kernel
  Cc: Timur Tabi, Dave Airlie, Andy Shevchenko, Maarten Lankhorst,
	Ben Skeggs, Kees Cook, Simona Vetter, David Airlie,
	Thomas Zimmermann, Maxime Ripard, Mel Henning, Danilo Krummrich,
	Lyude Paul

Turns out our last attempt at fixing suspend/resume was not entirely
correct, only partially correct. This reverts the partial fix, while
leaving the piping of suspend/runtime suspend/etc. that we did in
preparation for this originally, since I have a strong feeling it will
come in use again in the future.

Lyude Paul (2):
  Revert "nouveau/gsp: fix suspend/resume regression on r570 firmware"
  drm/nouveau/gsp/r570: Never enter Gcoff state

 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c | 2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c  | 2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c | 8 ++++----
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h        | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 0/2] drm/nouveau: Fix runtime suspend on R570
@ 2026-06-25 23:10 ` Lyude Paul
  0 siblings, 0 replies; 12+ messages in thread
From: Lyude Paul @ 2026-06-25 23:10 UTC (permalink / raw)
  To: dri-devel, nouveau, linux-kernel
  Cc: Dave Airlie, Andy Shevchenko, Maarten Lankhorst, Ben Skeggs,
	Kees Cook, Simona Vetter, Maxime Ripard, Danilo Krummrich

Turns out our last attempt at fixing suspend/resume was not entirely
correct, only partially correct. This reverts the partial fix, while
leaving the piping of suspend/runtime suspend/etc. that we did in
preparation for this originally, since I have a strong feeling it will
come in use again in the future.

Lyude Paul (2):
  Revert "nouveau/gsp: fix suspend/resume regression on r570 firmware"
  drm/nouveau/gsp/r570: Never enter Gcoff state

 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c | 2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c  | 2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c | 8 ++++----
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h        | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] Revert "nouveau/gsp: fix suspend/resume regression on r570 firmware"
  2026-06-25 23:10 ` Lyude Paul
@ 2026-06-25 23:10   ` Lyude Paul
  -1 siblings, 0 replies; 12+ messages in thread
From: Lyude Paul @ 2026-06-25 23:10 UTC (permalink / raw)
  To: dri-devel, nouveau, linux-kernel
  Cc: stable, Dave Airlie, Kees Cook, Danilo Krummrich, Timur Tabi,
	Ben Skeggs, Andy Shevchenko, Mel Henning, Maarten Lankhorst,
	Simona Vetter, David Airlie, Thomas Zimmermann, Maxime Ripard,
	Lyude Paul

This reverts commit 8302d0afeaec0bc57d951dd085e0cffe997d4d18.

It turns out this looked like the right fix on some systems, but it's not -
as this causes runtime PM to actually fail on many a laptop.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 8302d0afeaec ("nouveau/gsp: fix suspend/resume regression on r570 firmware")
Cc: <stable@vger.kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Timur Tabi <ttabi@nvidia.com>
Cc: Ben Skeggs <bskeggs@nvidia.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Mel Henning <mhenning@darkrefraction.com>
Cc: <stable@vger.kernel.org> # v6.19+
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c | 2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c  | 2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c | 8 ++++----
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h        | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c
index 700cea5def357..035b575722db7 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c
@@ -208,7 +208,7 @@ r535_fbsr_resume(struct nvkm_gsp *gsp)
 }
 
 static int
-r535_fbsr_suspend(struct nvkm_gsp *gsp, bool runtime)
+r535_fbsr_suspend(struct nvkm_gsp *gsp)
 {
 	struct nvkm_subdev *subdev = &gsp->subdev;
 	struct nvkm_device *device = subdev->device;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
index f544afa12b6bb..4a3b771ded255 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
@@ -1749,7 +1749,7 @@ r535_gsp_fini(struct nvkm_gsp *gsp, enum nvkm_suspend_state suspend)
 		sr->sysmemAddrOfSuspendResumeData = gsp->sr.radix3.lvl0.addr;
 		sr->sizeOfSuspendResumeData = len;
 
-		ret = rm->api->fbsr->suspend(gsp, suspend == NVKM_RUNTIME_SUSPEND);
+		ret = rm->api->fbsr->suspend(gsp);
 		if (ret) {
 			nvkm_gsp_mem_dtor(&gsp->sr.meta);
 			nvkm_gsp_radix3_dtor(gsp, &gsp->sr.radix3);
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
index 8ef8b4f655883..2945d5b4e5707 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
@@ -62,7 +62,7 @@ r570_fbsr_resume(struct nvkm_gsp *gsp)
 }
 
 static int
-r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size, bool runtime)
+r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size)
 {
 	NV2080_CTRL_INTERNAL_FBSR_INIT_PARAMS *ctrl;
 	struct nvkm_gsp_object memlist;
@@ -81,7 +81,7 @@ r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size, bool runtim
 	ctrl->hClient = gsp->internal.client.object.handle;
 	ctrl->hSysMem = memlist.handle;
 	ctrl->sysmemAddrOfSuspendResumeData = gsp->sr.meta.addr;
-	ctrl->bEnteringGcoffState = runtime ? 1 : 0;
+	ctrl->bEnteringGcoffState = 1;
 
 	ret = nvkm_gsp_rm_ctrl_wr(&gsp->internal.device.subdevice, ctrl);
 	if (ret)
@@ -92,7 +92,7 @@ r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size, bool runtim
 }
 
 static int
-r570_fbsr_suspend(struct nvkm_gsp *gsp, bool runtime)
+r570_fbsr_suspend(struct nvkm_gsp *gsp)
 {
 	struct nvkm_subdev *subdev = &gsp->subdev;
 	struct nvkm_device *device = subdev->device;
@@ -133,7 +133,7 @@ r570_fbsr_suspend(struct nvkm_gsp *gsp, bool runtime)
 		return ret;
 
 	/* Initialise FBSR on RM. */
-	ret = r570_fbsr_init(gsp, &gsp->sr.fbsr, size, runtime);
+	ret = r570_fbsr_init(gsp, &gsp->sr.fbsr, size);
 	if (ret) {
 		nvkm_gsp_sg_free(device, &gsp->sr.fbsr);
 		return ret;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h
index a9af94adf9efc..0fb0e67406c67 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h
@@ -78,7 +78,7 @@ struct nvkm_rm_api {
 	} *device;
 
 	const struct nvkm_rm_api_fbsr {
-		int (*suspend)(struct nvkm_gsp *, bool runtime);
+		int (*suspend)(struct nvkm_gsp *);
 		void (*resume)(struct nvkm_gsp *);
 	} *fbsr;
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 1/2] Revert "nouveau/gsp: fix suspend/resume regression on r570 firmware"
@ 2026-06-25 23:10   ` Lyude Paul
  0 siblings, 0 replies; 12+ messages in thread
From: Lyude Paul @ 2026-06-25 23:10 UTC (permalink / raw)
  To: dri-devel, nouveau, linux-kernel
  Cc: stable, Dave Airlie, Kees Cook, Danilo Krummrich, Ben Skeggs,
	Andy Shevchenko, Maarten Lankhorst, Simona Vetter, Maxime Ripard

This reverts commit 8302d0afeaec0bc57d951dd085e0cffe997d4d18.

It turns out this looked like the right fix on some systems, but it's not -
as this causes runtime PM to actually fail on many a laptop.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 8302d0afeaec ("nouveau/gsp: fix suspend/resume regression on r570 firmware")
Cc: <stable@vger.kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Timur Tabi <ttabi@nvidia.com>
Cc: Ben Skeggs <bskeggs@nvidia.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Mel Henning <mhenning@darkrefraction.com>
Cc: <stable@vger.kernel.org> # v6.19+
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c | 2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c  | 2 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c | 8 ++++----
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h        | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c
index 700cea5def357..035b575722db7 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fbsr.c
@@ -208,7 +208,7 @@ r535_fbsr_resume(struct nvkm_gsp *gsp)
 }
 
 static int
-r535_fbsr_suspend(struct nvkm_gsp *gsp, bool runtime)
+r535_fbsr_suspend(struct nvkm_gsp *gsp)
 {
 	struct nvkm_subdev *subdev = &gsp->subdev;
 	struct nvkm_device *device = subdev->device;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
index f544afa12b6bb..4a3b771ded255 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/gsp.c
@@ -1749,7 +1749,7 @@ r535_gsp_fini(struct nvkm_gsp *gsp, enum nvkm_suspend_state suspend)
 		sr->sysmemAddrOfSuspendResumeData = gsp->sr.radix3.lvl0.addr;
 		sr->sizeOfSuspendResumeData = len;
 
-		ret = rm->api->fbsr->suspend(gsp, suspend == NVKM_RUNTIME_SUSPEND);
+		ret = rm->api->fbsr->suspend(gsp);
 		if (ret) {
 			nvkm_gsp_mem_dtor(&gsp->sr.meta);
 			nvkm_gsp_radix3_dtor(gsp, &gsp->sr.radix3);
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
index 8ef8b4f655883..2945d5b4e5707 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
@@ -62,7 +62,7 @@ r570_fbsr_resume(struct nvkm_gsp *gsp)
 }
 
 static int
-r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size, bool runtime)
+r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size)
 {
 	NV2080_CTRL_INTERNAL_FBSR_INIT_PARAMS *ctrl;
 	struct nvkm_gsp_object memlist;
@@ -81,7 +81,7 @@ r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size, bool runtim
 	ctrl->hClient = gsp->internal.client.object.handle;
 	ctrl->hSysMem = memlist.handle;
 	ctrl->sysmemAddrOfSuspendResumeData = gsp->sr.meta.addr;
-	ctrl->bEnteringGcoffState = runtime ? 1 : 0;
+	ctrl->bEnteringGcoffState = 1;
 
 	ret = nvkm_gsp_rm_ctrl_wr(&gsp->internal.device.subdevice, ctrl);
 	if (ret)
@@ -92,7 +92,7 @@ r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size, bool runtim
 }
 
 static int
-r570_fbsr_suspend(struct nvkm_gsp *gsp, bool runtime)
+r570_fbsr_suspend(struct nvkm_gsp *gsp)
 {
 	struct nvkm_subdev *subdev = &gsp->subdev;
 	struct nvkm_device *device = subdev->device;
@@ -133,7 +133,7 @@ r570_fbsr_suspend(struct nvkm_gsp *gsp, bool runtime)
 		return ret;
 
 	/* Initialise FBSR on RM. */
-	ret = r570_fbsr_init(gsp, &gsp->sr.fbsr, size, runtime);
+	ret = r570_fbsr_init(gsp, &gsp->sr.fbsr, size);
 	if (ret) {
 		nvkm_gsp_sg_free(device, &gsp->sr.fbsr);
 		return ret;
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h
index a9af94adf9efc..0fb0e67406c67 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/rm.h
@@ -78,7 +78,7 @@ struct nvkm_rm_api {
 	} *device;
 
 	const struct nvkm_rm_api_fbsr {
-		int (*suspend)(struct nvkm_gsp *, bool runtime);
+		int (*suspend)(struct nvkm_gsp *);
 		void (*resume)(struct nvkm_gsp *);
 	} *fbsr;
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] drm/nouveau/gsp/r570: Never enter Gcoff state
  2026-06-25 23:10 ` Lyude Paul
@ 2026-06-25 23:10   ` Lyude Paul
  -1 siblings, 0 replies; 12+ messages in thread
From: Lyude Paul @ 2026-06-25 23:10 UTC (permalink / raw)
  To: dri-devel, nouveau, linux-kernel
  Cc: stable, Dave Airlie, Kees Cook, Danilo Krummrich, Timur Tabi,
	Ben Skeggs, Andy Shevchenko, Mel Henning, Maarten Lankhorst,
	Simona Vetter, David Airlie, Thomas Zimmermann, Maxime Ripard,
	Lyude Paul

It turns out that the only reason our previous fixes looked like they
worked for this was because we would occasionally set the Gcoff state to 0
in the normal S3 path, which fixed suspend/resume on desktops - but not on
machines using runtime suspend.

The proper fix is to just never set this flag. Our current guess for the
reasoning behind this is that Gcoff likely coincides with GC6, and not
literally power off.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 8302d0afeaec ("nouveau/gsp: fix suspend/resume regression on r570 firmware")
Cc: <stable@vger.kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Timur Tabi <ttabi@nvidia.com>
Cc: Ben Skeggs <bskeggs@nvidia.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Mel Henning <mhenning@darkrefraction.com>
Cc: <stable@vger.kernel.org> # v6.19+
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
index 2945d5b4e5707..af5aa5065c3dd 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
@@ -81,7 +81,7 @@ r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size)
 	ctrl->hClient = gsp->internal.client.object.handle;
 	ctrl->hSysMem = memlist.handle;
 	ctrl->sysmemAddrOfSuspendResumeData = gsp->sr.meta.addr;
-	ctrl->bEnteringGcoffState = 1;
+	ctrl->bEnteringGcoffState = 0;
 
 	ret = nvkm_gsp_rm_ctrl_wr(&gsp->internal.device.subdevice, ctrl);
 	if (ret)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] drm/nouveau/gsp/r570: Never enter Gcoff state
@ 2026-06-25 23:10   ` Lyude Paul
  0 siblings, 0 replies; 12+ messages in thread
From: Lyude Paul @ 2026-06-25 23:10 UTC (permalink / raw)
  To: dri-devel, nouveau, linux-kernel
  Cc: stable, Dave Airlie, Kees Cook, Danilo Krummrich, Ben Skeggs,
	Andy Shevchenko, Maarten Lankhorst, Simona Vetter, Maxime Ripard

It turns out that the only reason our previous fixes looked like they
worked for this was because we would occasionally set the Gcoff state to 0
in the normal S3 path, which fixed suspend/resume on desktops - but not on
machines using runtime suspend.

The proper fix is to just never set this flag. Our current guess for the
reasoning behind this is that Gcoff likely coincides with GC6, and not
literally power off.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 8302d0afeaec ("nouveau/gsp: fix suspend/resume regression on r570 firmware")
Cc: <stable@vger.kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Timur Tabi <ttabi@nvidia.com>
Cc: Ben Skeggs <bskeggs@nvidia.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Mel Henning <mhenning@darkrefraction.com>
Cc: <stable@vger.kernel.org> # v6.19+
---
 drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
index 2945d5b4e5707..af5aa5065c3dd 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
@@ -81,7 +81,7 @@ r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size)
 	ctrl->hClient = gsp->internal.client.object.handle;
 	ctrl->hSysMem = memlist.handle;
 	ctrl->sysmemAddrOfSuspendResumeData = gsp->sr.meta.addr;
-	ctrl->bEnteringGcoffState = 1;
+	ctrl->bEnteringGcoffState = 0;
 
 	ret = nvkm_gsp_rm_ctrl_wr(&gsp->internal.device.subdevice, ctrl);
 	if (ret)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] drm/nouveau/gsp/r570: Never enter Gcoff state
  2026-06-25 23:10   ` Lyude Paul
  (?)
@ 2026-06-25 23:25   ` sashiko-bot
  -1 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-25 23:25 UTC (permalink / raw)
  To: Lyude Paul; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Missing rollback of hardware state in suspend paths when operations fail, leaving the device hung.
--

commit ed5c98b04aafc61a13ebb42fe49a1f15121c1b4d
Author: Lyude Paul <lyude@redhat.com>

drm/nouveau/gsp/r570: Never enter Gcoff state

It turns out that the only reason our previous fixes looked like they
worked for this was because we would occasionally set the Gcoff state to 0
in the normal S3 path, which fixed suspend/resume on desktops - but not on
machines using runtime suspend.

The proper fix is to just never set this flag. Our current guess for the
reasoning behind this is that Gcoff likely coincides with GC6, and not
literally power off.

> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
> index 2945d5b4e5707..af5aa5065c3dd 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
> @@ -81,7 +81,7 @@ r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size)
>  	ctrl->hClient = gsp->internal.client.object.handle;
>  	ctrl->hSysMem = memlist.handle;
>  	ctrl->sysmemAddrOfSuspendResumeData = gsp->sr.meta.addr;
> -	ctrl->bEnteringGcoffState = 1;
> +	ctrl->bEnteringGcoffState = 0;
>  
>  	ret = nvkm_gsp_rm_ctrl_wr(&gsp->internal.device.subdevice, ctrl);
>  	if (ret)

[Severity: High]
This is a pre-existing issue, but I have a question about the suspend path's
error handling.

If nvkm_gsp_rm_ctrl_wr() fails here, r570_fbsr_init() returns an error back to
r570_fbsr_suspend():

drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c:r570_fbsr_suspend() {
    ...
    r570_fbsr_suspend_channels(gsp, true);
    ...
    device->bar->bar2 = false;
    ...
    ret = r570_fbsr_init(gsp, &gsp->sr.fbsr, size);
    if (ret) {
        nvkm_gsp_sg_free(device, &gsp->sr.fbsr);
        return ret;
    }
}

The same error path exists earlier in r570_fbsr_suspend() if
nvkm_instobj_save() or nvkm_gsp_sg() fail.

Since the PM core doesn't call the resume callback for devices that fail to
suspend, does this leave the GPU hung with suspended channels and BAR2
disabled?

Would it make sense to restore the channel and BAR state on error paths
in r570_fbsr_suspend() before returning?

>  		return ret;
>  
>  	nvkm_gsp_rm_free(&memlist);
>  	return 0;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625231252.89684-1-lyude@redhat.com?part=2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] Revert "nouveau/gsp: fix suspend/resume regression on r570 firmware"
  2026-06-25 23:10   ` Lyude Paul
  (?)
@ 2026-06-25 23:27   ` sashiko-bot
  -1 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-25 23:27 UTC (permalink / raw)
  To: Lyude Paul; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Hardware channels and BAR2 are left permanently disabled when r570_fbsr_suspend fails midway.
--

commit ba836a35e2d15cfdb3c33dd77c467b89aed9e52a
Author: Lyude Paul <lyude@redhat.com>

Revert "nouveau/gsp: fix suspend/resume regression on r570 firmware"

This reverts commit 8302d0afeaec0bc57d951dd085e0cffe997d4d18, which was
intended to fix a suspend/resume regression on r570 firmware but instead
caused runtime PM to fail on many laptops.

Cc: <stable@vger.kernel.org> # v6.19+

> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c

[ ... ]

> @@ -92,7 +92,7 @@ r570_fbsr_init(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size, bool runtim
>  }
>  
>  static int
> -r570_fbsr_suspend(struct nvkm_gsp *gsp, bool runtime)
> +r570_fbsr_suspend(struct nvkm_gsp *gsp)
>  {
>  	struct nvkm_subdev *subdev = &gsp->subdev;
>  	struct nvkm_device *device = subdev->device;

[ ... ]

> @@ -133,7 +133,7 @@ r570_fbsr_suspend(struct nvkm_gsp *gsp, bool runtime)
>  		return ret;
>  
>  	/* Initialise FBSR on RM. */
> -	ret = r570_fbsr_init(gsp, &gsp->sr.fbsr, size, runtime);
> +	ret = r570_fbsr_init(gsp, &gsp->sr.fbsr, size);
>  	if (ret) {
>  		nvkm_gsp_sg_free(device, &gsp->sr.fbsr);
>  		return ret;

[Severity: High]
This is a pre-existing issue, but does this error path leave the GPU
unusable if it fails?

During the suspend sequence, r570_fbsr_suspend() disables channel
scheduling and BAR2 early on. If a subsequent operation like
r570_fbsr_init() (or nvkm_gsp_sg() earlier) fails, it returns an error
here without restoring the hardware state.

Since the PM core will abort the suspend on error and skip the resume
callback for this device, does this mean channels and BAR2 are never
re-enabled, leaving the device in an unrecoverable state until reboot?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625231252.89684-1-lyude@redhat.com?part=1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] Revert "nouveau/gsp: fix suspend/resume regression on r570 firmware"
  2026-06-25 23:10   ` Lyude Paul
@ 2026-06-26 12:31     ` Andy Shevchenko
  -1 siblings, 0 replies; 12+ messages in thread
From: Andy Shevchenko @ 2026-06-26 12:31 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, linux-kernel, stable, Dave Airlie, Kees Cook,
	Danilo Krummrich, Timur Tabi, Ben Skeggs, Mel Henning,
	Maarten Lankhorst, Simona Vetter, David Airlie, Thomas Zimmermann,
	Maxime Ripard

On Thu, Jun 25, 2026 at 07:10:54PM -0400, Lyude Paul wrote:
> This reverts commit 8302d0afeaec0bc57d951dd085e0cffe997d4d18.
> 
> It turns out this looked like the right fix on some systems, but it's not -
> as this causes runtime PM to actually fail on many a laptop.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: 8302d0afeaec ("nouveau/gsp: fix suspend/resume regression on r570 firmware")

> Cc: <stable@vger.kernel.org>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Timur Tabi <ttabi@nvidia.com>
> Cc: Ben Skeggs <bskeggs@nvidia.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Mel Henning <mhenning@darkrefraction.com>
> Cc: <stable@vger.kernel.org> # v6.19+

Two times Cc to stable.
I also recommend to move the rest of the Cc list below cutter '---' line.

Will be something like this:

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org> # v6.19+
Fixes: 8302d0afeaec ("nouveau/gsp: fix suspend/resume regression on r570 firmware")
---
Cc: ...
Cc: ...
---

This will reduce the unneeded noise in the commit messages.


-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] Revert "nouveau/gsp: fix suspend/resume regression on r570 firmware"
@ 2026-06-26 12:31     ` Andy Shevchenko
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Shevchenko @ 2026-06-26 12:31 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, linux-kernel, stable, Dave Airlie, Kees Cook,
	Danilo Krummrich, Ben Skeggs, Maarten Lankhorst, Simona Vetter,
	Maxime Ripard

On Thu, Jun 25, 2026 at 07:10:54PM -0400, Lyude Paul wrote:
> This reverts commit 8302d0afeaec0bc57d951dd085e0cffe997d4d18.
> 
> It turns out this looked like the right fix on some systems, but it's not -
> as this causes runtime PM to actually fail on many a laptop.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: 8302d0afeaec ("nouveau/gsp: fix suspend/resume regression on r570 firmware")

> Cc: <stable@vger.kernel.org>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Timur Tabi <ttabi@nvidia.com>
> Cc: Ben Skeggs <bskeggs@nvidia.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Mel Henning <mhenning@darkrefraction.com>
> Cc: <stable@vger.kernel.org> # v6.19+

Two times Cc to stable.
I also recommend to move the rest of the Cc list below cutter '---' line.

Will be something like this:

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: <stable@vger.kernel.org> # v6.19+
Fixes: 8302d0afeaec ("nouveau/gsp: fix suspend/resume regression on r570 firmware")
---
Cc: ...
Cc: ...
---

This will reduce the unneeded noise in the commit messages.


-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] drm/nouveau/gsp/r570: Never enter Gcoff state
  2026-06-25 23:10   ` Lyude Paul
@ 2026-06-26 12:32     ` Andy Shevchenko
  -1 siblings, 0 replies; 12+ messages in thread
From: Andy Shevchenko @ 2026-06-26 12:32 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, linux-kernel, stable, Dave Airlie, Kees Cook,
	Danilo Krummrich, Timur Tabi, Ben Skeggs, Mel Henning,
	Maarten Lankhorst, Simona Vetter, David Airlie, Thomas Zimmermann,
	Maxime Ripard

On Thu, Jun 25, 2026 at 07:10:55PM -0400, Lyude Paul wrote:
> It turns out that the only reason our previous fixes looked like they
> worked for this was because we would occasionally set the Gcoff state to 0
> in the normal S3 path, which fixed suspend/resume on desktops - but not on
> machines using runtime suspend.
> 
> The proper fix is to just never set this flag. Our current guess for the
> reasoning behind this is that Gcoff likely coincides with GC6, and not
> literally power off.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: 8302d0afeaec ("nouveau/gsp: fix suspend/resume regression on r570 firmware")
> Cc: <stable@vger.kernel.org>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Timur Tabi <ttabi@nvidia.com>
> Cc: Ben Skeggs <bskeggs@nvidia.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Mel Henning <mhenning@darkrefraction.com>
> Cc: <stable@vger.kernel.org> # v6.19+
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Same comment here.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] drm/nouveau/gsp/r570: Never enter Gcoff state
@ 2026-06-26 12:32     ` Andy Shevchenko
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Shevchenko @ 2026-06-26 12:32 UTC (permalink / raw)
  To: Lyude Paul
  Cc: dri-devel, nouveau, linux-kernel, stable, Dave Airlie, Kees Cook,
	Danilo Krummrich, Ben Skeggs, Maarten Lankhorst, Simona Vetter,
	Maxime Ripard

On Thu, Jun 25, 2026 at 07:10:55PM -0400, Lyude Paul wrote:
> It turns out that the only reason our previous fixes looked like they
> worked for this was because we would occasionally set the Gcoff state to 0
> in the normal S3 path, which fixed suspend/resume on desktops - but not on
> machines using runtime suspend.
> 
> The proper fix is to just never set this flag. Our current guess for the
> reasoning behind this is that Gcoff likely coincides with GC6, and not
> literally power off.
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: 8302d0afeaec ("nouveau/gsp: fix suspend/resume regression on r570 firmware")
> Cc: <stable@vger.kernel.org>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Kees Cook <kees@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Timur Tabi <ttabi@nvidia.com>
> Cc: Ben Skeggs <bskeggs@nvidia.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Mel Henning <mhenning@darkrefraction.com>
> Cc: <stable@vger.kernel.org> # v6.19+
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r570/fbsr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Same comment here.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-06-26 12:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 23:10 [PATCH 0/2] drm/nouveau: Fix runtime suspend on R570 Lyude Paul
2026-06-25 23:10 ` Lyude Paul
2026-06-25 23:10 ` [PATCH 1/2] Revert "nouveau/gsp: fix suspend/resume regression on r570 firmware" Lyude Paul
2026-06-25 23:10   ` Lyude Paul
2026-06-25 23:27   ` sashiko-bot
2026-06-26 12:31   ` Andy Shevchenko
2026-06-26 12:31     ` Andy Shevchenko
2026-06-25 23:10 ` [PATCH 2/2] drm/nouveau/gsp/r570: Never enter Gcoff state Lyude Paul
2026-06-25 23:10   ` Lyude Paul
2026-06-25 23:25   ` sashiko-bot
2026-06-26 12:32   ` Andy Shevchenko
2026-06-26 12:32     ` Andy Shevchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.