From: Imre Deak <imre.deak@intel.com>
To: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: <intel-gfx@lists.freedesktop.org>,
<intel-xe@lists.freedesktop.org>,
Mohammed Thasleem <mohammed.thasleem@intel.com>,
Jani Nikula <jani.nikula@linux.intel.com>,
Tao Liu <ltao@redhat.com>, <stable@vger.kernel.org>,
Jani Nikula <jani.nikula@intel.com>
Subject: Re: [CI] drm/i915/dmc: Fix an unlikely NULL pointer deference at probe
Date: Tue, 10 Mar 2026 11:26:44 +0200 [thread overview]
Message-ID: <aa_j1Gxa7iWEEYHi@ideak-desk.lan> (raw)
In-Reply-To: <aa_Y7shwd1Vqiy3i@intel.com>
On Tue, Mar 10, 2026 at 10:40:14AM +0200, Ville Syrjälä wrote:
> On Mon, Mar 09, 2026 at 06:48:03PM +0200, Imre Deak wrote:
> > intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been
> > initialized, and dmc is thus NULL.
> >
> > That would be the case when the call path is
> > intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() ->
> > gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as
> > intel_power_domains_init_hw() is called *before* intel_dmc_init().
> >
> > However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count()
> > conditionally, depending on the current and target DC states. At probe,
> > the target is disabled, but if DC6 is enabled, the function is called,
> > and an oops follows. Apparently it's quite unlikely that DC6 is enabled
> > at probe, as we haven't seen this failure mode before.
> >
> > It is also strange to have DC6 enabled at boot, since that would require
> > the DMC firmware (loaded by BIOS); the BIOS loading the DMC firmware and
> > the driver stopping / reprogramming the firmware is a poorly specified
> > sequence and as such unlikely an intentional BIOS behaviour. It's more
> > likely that BIOS is leaving an unintentionally enabled DC6 HW state
> > behind (without actually loading the required DMC firmware for this).
>
> Wasn't the original case some kdump kernel thing?
According to Jani the original issue was a KASAN run in QEMU, see [1].
Not sure if that also resulted in kexec/kdump.
However the case reported by Tao later is related to kexec/kdump indeed.
> I think that has a few issues:
> - loading full GPU drivers for a kdump kernel after the real kernel
> has crashed seems a bit risky. Who knows what state the hardware
> is in after the crash...
> - we should probably try to unload DMC at kexec time (to the extent
> that DMC can actually be unloaded)
AFAICS that involves calling the pci_driver::shutdown which (for both xe
and i915) ends up calling intel_power_domains_disable(), which disables
DC states at least (hence the kexec'ed kernel should still not see DC6
being enabled). The DMC FW event handlers are not disabled though in
this case (which would be what you refer to unloading DMC I presume) as
opposed to system/runtime suspend, where all the DMC events are also
disabled.
I agree that the kexec->shutdown, driver remove etc. handlers should be
synced at least wrt. the above DMC unloading with the suspend handlers.
However, I consider that as a separate issue to the one fixed in this
patch, which is using the HW DC state (which is unreliable) incorrectly
to track the DC6 allowed counter (the correct way being using the SW DC
state instead). So are you okay to go ahead with this patch still for
now and follow up with syncing the above shutdown/driver remove handlers
with the suspend ones?
[1] https://lore.kernel.org/all/43c4d7f0d9fe4ba6acac828306b41d612dd4f085@intel.com
> > The tracking of the DC6 allowed counter only works if starting /
> > stopping the counter depends on the _SW_ DC6 state vs. the current _HW_
> > DC6 state (since stopping the counter requires the DC5 counter captured
> > when the counter was started). Thus, using the HW DC6 state is incorrect
> > and it also leads to the above oops. Fix both issues by using the SW DC6
> > state for the tracking.
> >
> > This is v2 of the fix originally sent by Jani, updated based on the
> > first Link: discussion below.
> >
> > Link: https://lore.kernel.org/all/3626411dc9e556452c432d0919821b76d9991217@intel.com
> > Link: https://lore.kernel.org/all/20260228130946.50919-2-ltao@redhat.com
> > Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter")
> > Cc: Mohammed Thasleem <mohammed.thasleem@intel.com>
> > Cc: Jani Nikula <jani.nikula@linux.intel.com>
> > Cc: Tao Liu <ltao@redhat.com>
> > Cc: <stable@vger.kernel.org> # v6.16+
> > Tested-by: Tao Liu <ltao@redhat.com>
> > Reviewed-by: Jani Nikula <jani.nikula@intel.com>
> > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > ---
> > drivers/gpu/drm/i915/display/intel_display_power_well.c | 2 +-
> > drivers/gpu/drm/i915/display/intel_dmc.c | 3 +--
> > 2 files changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_display_power_well.c b/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > index 1e03187dbd38a..f855f0f886946 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display_power_well.c
> > @@ -852,7 +852,7 @@ void gen9_set_dc_state(struct intel_display *display, u32 state)
> > power_domains->dc_state, val & mask);
> >
> > enable_dc6 = state & DC_STATE_EN_UPTO_DC6;
> > - dc6_was_enabled = val & DC_STATE_EN_UPTO_DC6;
> > + dc6_was_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
> > if (!dc6_was_enabled && enable_dc6)
> > intel_dmc_update_dc6_allowed_count(display, true);
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c b/drivers/gpu/drm/i915/display/intel_dmc.c
> > index c3b411259a0c5..90ba932d940ac 100644
> > --- a/drivers/gpu/drm/i915/display/intel_dmc.c
> > +++ b/drivers/gpu/drm/i915/display/intel_dmc.c
> > @@ -1598,8 +1598,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 *
> > return false;
> >
> > mutex_lock(&power_domains->lock);
> > - dc6_enabled = intel_de_read(display, DC_STATE_EN) &
> > - DC_STATE_EN_UPTO_DC6;
> > + dc6_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
> > if (dc6_enabled)
> > intel_dmc_update_dc6_allowed_count(display, false);
> >
> > --
> > 2.49.1
>
> --
> Ville Syrjälä
> Intel
next prev parent reply other threads:[~2026-03-10 9:27 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-09 16:48 [CI] drm/i915/dmc: Fix an unlikely NULL pointer deference at probe Imre Deak
2026-03-10 8:40 ` Ville Syrjälä
2026-03-10 9:26 ` Imre Deak [this message]
2026-03-11 12:25 ` Ville Syrjälä
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa_j1Gxa7iWEEYHi@ideak-desk.lan \
--to=imre.deak@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
--cc=jani.nikula@linux.intel.com \
--cc=ltao@redhat.com \
--cc=mohammed.thasleem@intel.com \
--cc=stable@vger.kernel.org \
--cc=ville.syrjala@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox