All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] drm/nouveau: NVAC (MCP79) MSI rearm + SOR-disable NULL guard
@ 2026-06-11  7:26 Marek Czernohous
  2026-06-11  7:26 ` [PATCH v2 1/2] drm/nouveau/pci: use config-space MSI rearm on MCP79/MCP7A (NVAC) Marek Czernohous
  2026-06-11  7:26 ` [PATCH v2 2/2] drm/nouveau/kms: guard NULL crtc in nv50_sor_atomic_disable() Marek Czernohous
  0 siblings, 2 replies; 4+ messages in thread
From: Marek Czernohous @ 2026-06-11  7:26 UTC (permalink / raw)
  To: Lyude Paul, Danilo Krummrich; +Cc: dri-devel, nouveau, linux-kernel, Fab Stz

This is a v2 of the NVAC (MCP79/MCP7A) stability series, narrowed to the
two changes that are genuinely stable material and reworked after review
plus an independent second-machine test.

The original v1 [1] was a three-patch series. Since then it has had a
three-week soak on my Apple Mac mini (early 2009, MCP79) and an independent
test by Fab Stz on a second machine (iMac9,1 / MCP79, 6.12.90, X11/KDE 6),
who confirmed the system is more stable with the series and provided a
Tested-by. After an internal review pass:

  Patch 1 (MSI rearm): NVAC re-arms MSI through the MMIO mirror of PCI
    config space, which is unreliable on this IGP; a missed re-arm leaves
    the interrupt line dead and the GPU wedges. v1 switched the whole
    shared g94_pci_func (ten chipsets); v2 narrows this to a dedicated
    mcp79 pci func so only the tested chipset (0xac) changes, matching the
    existing g92 precedent.

  Patch 2 (SOR disable NULL guard): nv50_sor_atomic_disable() can run with
    a NULL nv_encoder->crtc under Wayland session teardown / VT switches,
    and because the deref goes through container_of() the NULL becomes a
    bogus pointer that faults. v2 restores the guard as drm_WARN_ON_ONCE()
    and, unlike v1, does not call nvif_outp_release() in the early return
    (that path is owned by the commit_tail release loop; the v1 form could
    release twice and detach the OR before the disable flush).

The v1 patch 3 (HPD link-check retry) is intentionally dropped from this
series: it is a behavioral workaround that sleeps under mode_config.mutex,
not an obviously-correct fix, so it does not meet the stable bar. I will
revisit it separately if it is still needed.

Both patches carry Fixes: and Cc: stable tags. Per-patch v1 -> v2
changelogs are under the --- in each patch.

Disclosure: this work was done with assistance from an AI coding assistant
(Anthropic's Claude, Opus 4.7 and 4.8); the analysis and conclusions are
mine and have been verified on hardware. Each patch also carries an
Assisted-by trailer to that effect.

[1] https://lore.kernel.org/nouveau/20260409172126.115441-1-marek@czernohous.de/

Marek Czernohous (2):
  drm/nouveau/pci: use config-space MSI rearm on MCP79/MCP7A (NVAC)
  drm/nouveau/kms: guard NULL crtc in nv50_sor_atomic_disable()

 drivers/gpu/drm/nouveau/dispnv50/disp.c       | 16 ++++++++-
 .../gpu/drm/nouveau/include/nvkm/subdev/pci.h |  2 ++
 .../gpu/drm/nouveau/nvkm/engine/device/base.c |  2 +-
 .../gpu/drm/nouveau/nvkm/subdev/pci/Kbuild    |  1 +
 .../gpu/drm/nouveau/nvkm/subdev/pci/mcp79.c   | 33 +++++++++++++++++++
 5 files changed, 52 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/pci/mcp79.c

-- 
2.53.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/2] drm/nouveau/pci: use config-space MSI rearm on MCP79/MCP7A (NVAC)
  2026-06-11  7:26 [PATCH v2 0/2] drm/nouveau: NVAC (MCP79) MSI rearm + SOR-disable NULL guard Marek Czernohous
@ 2026-06-11  7:26 ` Marek Czernohous
  2026-06-11  7:26 ` [PATCH v2 2/2] drm/nouveau/kms: guard NULL crtc in nv50_sor_atomic_disable() Marek Czernohous
  1 sibling, 0 replies; 4+ messages in thread
From: Marek Czernohous @ 2026-06-11  7:26 UTC (permalink / raw)
  To: Lyude Paul, Danilo Krummrich; +Cc: dri-devel, nouveau, linux-kernel, Fab Stz

From: Marek Czernohous <mczernohous@gmail.com>

From: Marek Czernohous <marek@czernohous.de>

NVAC (MCP79/MCP7A) uses g94_pci_func, whose .msi_rearm is
nv40_pci_msi_rearm(): a re-arm write through the MMIO mirror of PCI
config space.  On this IGP that path is unreliable; when a re-arm is
missed the interrupt line stays dead, command submission times out and
the GPU appears hung until reboot.  On an Apple Mac mini (early 2009,
MCP79, 0xac080b1) this showed as sporadic fifo timeouts and GPU hangs
under load unless MSI was disabled via config=NvMSI=0.

Give NVAC its own pci func that re-arms through real PCI config space
(nv46_pci_msi_rearm) instead.  This follows existing precedent: nv46.c
documents the MMIO-mirror re-arm as broken on several related parts,
and commit 5112abc6a433 ("drm/nouveau/pci/g92: Fix rearm") already
split g92 out of the shared table for the same reason.  The sibling
IGP NVAA (MCP77/MCP78) has MSI disabled entirely as "reported broken"
in nvkm_pci_new_(); NVAC works correctly once the re-arm goes through
config space, so disabling MSI is not necessary.

Only NVAC is switched: that is the hardware this has been validated
on.  The other users of g94_pci_func (G94/G96/G98/GT2xx and the
MCP77/MCP89 IGPs) keep their current behavior; MCP77 and MCP89
plausibly want the same treatment but were not tested.

Tested on the Mac mini as a daily driver for two months with MSI
enabled and zero fifo timeouts; independently confirmed stable on an
iMac9,1 (MCP79) running 6.12.90.

Fixes: 5112abc6a433 ("drm/nouveau/pci/g92: Fix rearm")
Cc: <stable@vger.kernel.org>
Tested-by: Fab Stz <fabstz-it@yahoo.fr>
Assisted-by: Claude:claude-opus-4-7
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Marek Czernohous <marek@czernohous.de>
---
v1 -> v2: narrowed to NVAC via a dedicated mcp79 pci func instead of
changing the shared g94 table (only chipset the fix was validated on);
added Fixes/stable/Assisted-by tags.
v1: https://lore.kernel.org/nouveau/20260409172126.115441-2-marek@czernohous.de/

 .../gpu/drm/nouveau/include/nvkm/subdev/pci.h |  2 ++
 .../gpu/drm/nouveau/nvkm/engine/device/base.c |  2 +-
 .../gpu/drm/nouveau/nvkm/subdev/pci/Kbuild    |  1 +
 .../gpu/drm/nouveau/nvkm/subdev/pci/mcp79.c   | 33 +++++++++++++++++++
 4 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/nouveau/nvkm/subdev/pci/mcp79.c

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/pci.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/pci.h
index 112b674..88efe06 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/pci.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/pci.h
@@ -46,6 +46,8 @@ int nv4c_pci_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, struct n
 int g84_pci_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, struct nvkm_pci **);
 int g92_pci_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, struct nvkm_pci **);
 int g94_pci_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, struct nvkm_pci **);
+int mcp79_pci_new(struct nvkm_device *device, enum nvkm_subdev_type type,
+		  int inst, struct nvkm_pci **ppci);
 int gf100_pci_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, struct nvkm_pci **);
 int gf106_pci_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, struct nvkm_pci **);
 int gk104_pci_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, struct nvkm_pci **);
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
index b101e14..a809ec3 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
@@ -1237,7 +1237,7 @@ nvac_chipset = {
 	.mc       = { 0x00000001, g98_mc_new },
 	.mmu      = { 0x00000001, mcp77_mmu_new },
 	.mxm      = { 0x00000001, nv50_mxm_new },
-	.pci      = { 0x00000001, g94_pci_new },
+	.pci      = { 0x00000001, mcp79_pci_new },
 	.therm    = { 0x00000001, g84_therm_new },
 	.timer    = { 0x00000001, nv41_timer_new },
 	.volt     = { 0x00000001, nv40_volt_new },
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/Kbuild b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/Kbuild
index a14ea0f..90f03ba 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/Kbuild
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/Kbuild
@@ -9,6 +9,7 @@ nvkm-y += nvkm/subdev/pci/nv4c.o
 nvkm-y += nvkm/subdev/pci/g84.o
 nvkm-y += nvkm/subdev/pci/g92.o
 nvkm-y += nvkm/subdev/pci/g94.o
+nvkm-y += nvkm/subdev/pci/mcp79.o
 nvkm-y += nvkm/subdev/pci/gf100.o
 nvkm-y += nvkm/subdev/pci/gf106.o
 nvkm-y += nvkm/subdev/pci/gk104.o
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/mcp79.c b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/mcp79.c
new file mode 100644
index 0000000..9f6a6cd
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/mcp79.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: MIT
+/*
+ * MCP79/MCP7A (NVAC): like g94, but MSI re-arm goes through real PCI
+ * config space.  The MMIO-mirror re-arm is unreliable on this IGP and a
+ * missed re-arm kills the interrupt line (see the nv46 comment; g92 was
+ * split out of the shared table for the same reason).
+ */
+#include "priv.h"
+
+static const struct nvkm_pci_func
+mcp79_pci_func = {
+	.cfg = { .addr = 0x088000, .size = 0x1000 },
+
+	.init = g84_pci_init,
+	.msi_rearm = nv46_pci_msi_rearm,
+
+	.pcie.init = g84_pcie_init,
+	.pcie.set_link = g84_pcie_set_link,
+
+	.pcie.max_speed = g84_pcie_max_speed,
+	.pcie.cur_speed = g84_pcie_cur_speed,
+
+	.pcie.set_version = g84_pcie_set_version,
+	.pcie.version = g84_pcie_version,
+	.pcie.version_supported = g92_pcie_version_supported,
+};
+
+int
+mcp79_pci_new(struct nvkm_device *device, enum nvkm_subdev_type type, int inst,
+	      struct nvkm_pci **ppci)
+{
+	return nvkm_pci_new_(&mcp79_pci_func, device, type, inst, ppci);
+}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] drm/nouveau/kms: guard NULL crtc in nv50_sor_atomic_disable()
  2026-06-11  7:26 [PATCH v2 0/2] drm/nouveau: NVAC (MCP79) MSI rearm + SOR-disable NULL guard Marek Czernohous
  2026-06-11  7:26 ` [PATCH v2 1/2] drm/nouveau/pci: use config-space MSI rearm on MCP79/MCP7A (NVAC) Marek Czernohous
@ 2026-06-11  7:26 ` Marek Czernohous
  2026-06-11  7:39   ` sashiko-bot
  1 sibling, 1 reply; 4+ messages in thread
From: Marek Czernohous @ 2026-06-11  7:26 UTC (permalink / raw)
  To: Lyude Paul, Danilo Krummrich; +Cc: dri-devel, nouveau, linux-kernel, Fab Stz

From: Marek Czernohous <mczernohous@gmail.com>

From: Marek Czernohous <marek@czernohous.de>

nv50_sor_atomic_disable() unconditionally computes
nv50_head(nv_encoder->crtc) and dereferences the result a few lines
later.  nv_encoder->crtc is nouveau's own shadow pointer, set in
.atomic_enable and cleared at the end of .atomic_disable.
Commit f575f2bdb6c3 ("drm/nouveau/kms/nv50-: Remove (nv_encoder->crtc)
checks in ->disable callbacks") removed the NULL check here,
reasoning that the atomic helpers never call ->disable without a
crtc.  On NVAC (MCP79) under
Wayland sessions (observed with Weston's DRM backend and with
labwc/wlroots) we have hit the NULL case in practice during session
teardown and VT switches: disable runs without (or after) its matching
enable, and because nv50_head() is container_of(), the NULL does not
stay NULL but becomes a small bogus pointer, so the subsequent head
dereferences fault and the kernel oopses.

Restore the guard, as drm_WARN_ON_ONCE() instead of a silent return:
a NULL crtc here still indicates a state-tracking inconsistency that
should stay visible.  Return without touching the output; in this path
either enable never ran (nothing to tear down) or an earlier disable
already did the teardown, and the encoder release is handled by the
commit_tail release loop in both cases.

Fixes: f575f2bdb6c3 ("drm/nouveau/kms/nv50-: Remove (nv_encoder->crtc) checks in ->disable callbacks")
Cc: <stable@vger.kernel.org>
Tested-by: Fab Stz <fabstz-it@yahoo.fr>
Assisted-by: Claude:claude-opus-4-7
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Marek Czernohous <marek@czernohous.de>
---
v1 -> v2: dropped the nvif_outp_release() call from the early return
(the release is owned by the commit_tail release loop; calling it here
released twice and detached the OR before the disable flush); turned
the silent return into drm_WARN_ON_ONCE(); reworded the commit message
(the v1 "race between atomic_check and atomic_commit" wording did not
match the mechanism).
v1: https://lore.kernel.org/nouveau/20260409172126.115441-3-marek@czernohous.de/

 drivers/gpu/drm/nouveau/dispnv50/disp.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 6c3a871..597bc64 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -1565,14 +1565,28 @@ static void
 nv50_sor_atomic_disable(struct drm_encoder *encoder, struct drm_atomic_state *state)
 {
 	struct nouveau_encoder *nv_encoder = nouveau_encoder(encoder);
-	struct nv50_head *head = nv50_head(nv_encoder->crtc);
+	struct nv50_head *head;
 #ifdef CONFIG_DRM_NOUVEAU_BACKLIGHT
 	struct nouveau_connector *nv_connector = nv50_outp_get_old_connector(state, nv_encoder);
 	struct nouveau_drm *drm = nouveau_drm(nv_encoder->base.base.dev);
 	struct nouveau_backlight *backlight = nv_connector->backlight;
 	struct drm_dp_aux *aux = &nv_connector->aux;
 	int ret;
+#endif
+
+	/* nv_encoder->crtc is the driver's shadow pointer, set in
+	 * .atomic_enable and cleared at the end of this function.  NULL here
+	 * means disable-without-enable or a double disable; bail before
+	 * container_of() turns it into a bogus head pointer (checking the
+	 * result would not work, container_of(NULL) is never NULL).  The
+	 * encoder release is handled by the commit_tail release loop, so
+	 * there is nothing to clean up here.
+	 */
+	if (drm_WARN_ON_ONCE(encoder->dev, !nv_encoder->crtc))
+		return;
+	head = nv50_head(nv_encoder->crtc);
 
+#ifdef CONFIG_DRM_NOUVEAU_BACKLIGHT
 	if (backlight && backlight->uses_dpcd) {
 		ret = drm_edp_backlight_disable(aux, &backlight->edp_info);
 		if (ret < 0)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 2/2] drm/nouveau/kms: guard NULL crtc in nv50_sor_atomic_disable()
  2026-06-11  7:26 ` [PATCH v2 2/2] drm/nouveau/kms: guard NULL crtc in nv50_sor_atomic_disable() Marek Czernohous
@ 2026-06-11  7:39   ` sashiko-bot
  0 siblings, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-06-11  7:39 UTC (permalink / raw)
  To: Marek Czernohous; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The early return added to guard against a NULL `nv_encoder->crtc` during inconsistent atomic disables is placed too late. When `CONFIG_DRM_NOUVEAU_BACKLIGHT` is enabled, the code unconditionally initializes and dereferences `nv_connector` (via `nv_connector->backlight`) before the new `!nv_encoder->crtc` safeguard is reached. In the exact state-tracking inconsistencies this patch aims to fix (e.g., disable without a matching enable), `nv50_outp_get_old_connector()` can return NULL, leading to a NULL pointer dereference.
--

commit c0ee57413f11f85e156ca80153f2707e59b79de5
Author: Marek Czernohous <mczernohous@gmail.com>

drm/nouveau/kms: guard NULL crtc in nv50_sor_atomic_disable()

This commit restores a NULL guard for the crtc pointer in
nv50_sor_atomic_disable() to prevent a kernel oops during VT switches and
Wayland session teardowns. The guard replaces a silent return with a warning
to ensure state-tracking inconsistencies remain visible without crashing.

> diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> index 6c3a8712d38ab..597bc64073a5c 100644
> --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
> +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
> @@ -1565,14 +1565,28 @@ static void
>  nv50_sor_atomic_disable(struct drm_encoder *encoder, struct drm_atomic_state *state)
>  {
>  	struct nouveau_encoder *nv_encoder = nouveau_encoder(encoder);
> -	struct nv50_head *head = nv50_head(nv_encoder->crtc);
> +	struct nv50_head *head;
>  #ifdef CONFIG_DRM_NOUVEAU_BACKLIGHT
>  	struct nouveau_connector *nv_connector = nv50_outp_get_old_connector(state, nv_encoder);
>  	struct nouveau_drm *drm = nouveau_drm(nv_encoder->base.base.dev);
>  	struct nouveau_backlight *backlight = nv_connector->backlight;
>  	struct drm_dp_aux *aux = &nv_connector->aux;

[Severity: High]
This isn't a bug introduced by this patch, but does the new early return
happen too late if CONFIG_DRM_NOUVEAU_BACKLIGHT is enabled?

In the exact disable-without-enable scenario this patch is addressing, 
nv50_outp_get_old_connector() can return NULL. Because nv_connector is
unconditionally dereferenced here to initialize the backlight and aux
variables, won't this still cause a NULL pointer dereference in
nv50_sor_atomic_disable() before the newly added crtc guard is ever reached?

>  	int ret;
> +#endif
> +
> +	/* nv_encoder->crtc is the driver's shadow pointer, set in
> +	 * .atomic_enable and cleared at the end of this function.  NULL here
> +	 * means disable-without-enable or a double disable; bail before
> +	 * container_of() turns it into a bogus head pointer (checking the
> +	 * result would not work, container_of(NULL) is never NULL).  The
> +	 * encoder release is handled by the commit_tail release loop, so
> +	 * there is nothing to clean up here.
> +	 */
> +	if (drm_WARN_ON_ONCE(encoder->dev, !nv_encoder->crtc))
> +		return;
> +	head = nv50_head(nv_encoder->crtc);
>  
> +#ifdef CONFIG_DRM_NOUVEAU_BACKLIGHT
>  	if (backlight && backlight->uses_dpcd) {
>  		ret = drm_edp_backlight_disable(aux, &backlight->edp_info);
>  		if (ret < 0)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1781162589.git.marek@czernohous.de?part=2

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-11  7:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11  7:26 [PATCH v2 0/2] drm/nouveau: NVAC (MCP79) MSI rearm + SOR-disable NULL guard Marek Czernohous
2026-06-11  7:26 ` [PATCH v2 1/2] drm/nouveau/pci: use config-space MSI rearm on MCP79/MCP7A (NVAC) Marek Czernohous
2026-06-11  7:26 ` [PATCH v2 2/2] drm/nouveau/kms: guard NULL crtc in nv50_sor_atomic_disable() Marek Czernohous
2026-06-11  7:39   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.