From: Imre Deak <imre.deak@intel.com>
To: Jani Nikula <jani.nikula@intel.com>
Cc: <intel-gfx@lists.freedesktop.org>,
<intel-xe@lists.freedesktop.org>,
Mohammed Thasleem <mohammed.thasleem@intel.com>,
<stable@vger.kernel.org>
Subject: Re: [PATCH] drm/i915/dmc: fix an unlikely NULL pointer deference at probe
Date: Thu, 4 Dec 2025 15:02:06 +0200 [thread overview]
Message-ID: <aTGGThc98Il6FCTC@ideak-desk> (raw)
In-Reply-To: <7bfb6dabe5bf83028f695d4d248597b721ce0e0c@intel.com>
On Thu, Dec 04, 2025 at 01:30:27PM +0200, Jani Nikula wrote:
> On Wed, 03 Dec 2025, Imre Deak <imre.deak@intel.com> wrote:
> > On Wed, Dec 03, 2025 at 10:13:44AM +0200, Jani Nikula wrote:
> >> On Tue, 02 Dec 2025, Imre Deak <imre.deak@intel.com> wrote:
> >> > On Tue, Dec 02, 2025 at 11:24:42PM +0200, Imre Deak wrote:
> >> >> On Tue, Dec 02, 2025 at 08:39:50PM +0200, Jani Nikula wrote:
> >> >> > intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been
> >> >> > initialized, and dmc is thus NULL.
> >> >> >
> >> >> > That would be the case when the call path is
> >> >> > intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() ->
> >> >> > gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as
> >> >> > intel_power_domains_init_hw() is called *before* intel_dmc_init().
> >> >> >
> >> >> > However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count()
> >> >> > conditionally, depending on the current and target DC states. At probe,
> >> >> > the target is disabled, but if DC6 is enabled, the function is called,
> >> >> > and an oops follows. Apparently it's quite unlikely that DC6 is enabled
> >> >> > at probe, as we haven't seen this failure mode before.
> >> >> >
> >> >> > Add NULL checks and switch the dmc->display references to just display.
> >> >> >
> >> >> > Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter")
> >> >> > Cc: Mohammed Thasleem <mohammed.thasleem@intel.com>
> >> >> > Cc: Imre Deak <imre.deak@intel.com>
> >> >> > Cc: <stable@vger.kernel.org> # v6.16+
> >> >> > Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> >> >> >
> >> >> > ---
> >> >> >
> >> >> > Rare case, but this may also throw off the rc6 counting in debugfs when
> >> >> > it does happen.
> >> >>
> >> >> Yes, I missed the case where the driver is being loaded while DC6 is
> >> >> enabled, this is what happens for the reporter:
> >> >>
> >> >> i915 0000:00:04.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2)
> >> >>
> >> >> That's odd, as DC6 requires the DMC firmware, which - if it's indeed
> >> >> loaded by BIOS for instance - will be overwritten by the driver, not a
> >> >> well specified sequence (even though the driver is trying to handle it
> >> >> correctly by disabling any active firmware handler).
> >> >>
> >> >> But as you pointed out this would also throw off the cooked-up DC6
> >> >> counter tracking,
> >> >
> >> > Actually the patch would keep the counter working, as the counter
> >> > wouldn't be updated in the dmc==NULL case. However I still think the
> >> > correct fix would be to check the correct DC state, which from the POV
> >> > of the counter tracking is the driver's version of the state, not the HW
> >> > state.
> >>
> >> One thing I failed to mention is that this happens in a KASAN run in
> >> QEMU. So I'm kind of not surprised we haven't hit this before. And it
> >> impacts the deductions about the DC state.
> >
> > Ok, it's strange why QEMU decides to initialize the DC_STATE_EN register
> > to a non-zero value then. But in any case the driver should handle it.
> >
> >> I'm not quite sure what exactly you're suggesting, maybe a draft patch
> >> would communicate the idea better than plain English? ;)
> >
> > intel_dmc_get_dc6_allowed_count() still needs to check for dmc==NULL, as
> > the debugfs entry can be read at any point. With that, what I meant is:
> >
> > in gen9_set_dc_state():
> > ...
> > - dc6_was_enabled = val & DC_STATE_EN_UPTO_DC6;
> > + dc6_was_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
>
> I still don't understand why we can trust our own value rather than
> what's in the hardware in this case.
The BIOS/FW can set random flags in the register, as in the above case,
so it can't be trusted. The counter update ending the tracking of the
duration of a DC6 enabled state only works if the driver did in fact
enable DC6 previously and has started the tracking accordingly. This is
only guaranteed if the driver has set DC_STATE_EN_UPTO_DC6 in
power_domains->dc_state, the corrsponding HW flag doesn't guarantee it.
> For resume, we even call gen9_sanitize_dc_state(), but not for probe.
After system suspend, the driver enabling DC9 by setting the
corrsponding DC9 flag in the DC_STATE_EN register, the HW/firmware will
disable DC9 while resuming. The SW version of the DC state will be
updated accordingly in the above function to reflect the disabled DC9
state.
> > ...
> >
> > in intel_dmc_get_dc6_allowed_count():
> > ...
> > if (DISPLAY_VER(display) < 14)
> > return false;
> >
> > + if (!dmc) {
> > + *count = 0;
> > + return true;
> > + }
> > +
>
> This seems neat but is overkill. dmc is never NULL here, but I added the
> check for completeness.
intel_dmc_get_dc6_allowed_count() shouldn't fall back on DISPLAY_VER>=14
to report the DC6 residency in a way that only works for older
platforms. Hence the function should return true for DISPLAY_VER>=14.
> It's the intel_dmc_update_dc6_allowed_count() that's more fragile, and I
> want that to have the !dmc check, instead of relying on the subtle
> dependency on power_domains->dc_state.
The counter tracking should depend on the power_domians->dc_state SW
state as described above, so that's the correct thing to do there.
dmc==NULL in intel_dmc_update_dc6_allowed_count() would be only a bug in
the driver, if you wanted to check for that it should be a
WARN_ON(!dmc) check.
> > mutex_lock(&power_domains->lock);
> > - dc6_enabled = intel_de_read(display, DC_STATE_EN) &
> > - DC_STATE_EN_UPTO_DC6;
> > + dc6_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
> > ...
> >
> >> Anyway, I think "not oopsing" is a lot better than "inaccurate DC
> >> counters in debugfs".
> >
> > Agreed, the above would ensure both.
> >
> >>
> >> BR,
> >> Jani.
> >>
> >>
> >> >
> >> >> so could instead the counter update depend on the
> >> >> driver's DC state instead of the HW state? I.e. set
> >> >> gen9_set_dc_state()/dc6_was_enabled,
> >> >> intel_dmc_get_dc6_allowed_count()/dc6_enable if power_domains->dc_state
> >> >> says that DC6 was indeed enabled by the driver (instead of checking the
> >> >> HW state).
> >> >>
> >> >> That would fix the reporter's oops when calling
> >> >> intel_dmc_update_dc6_allowed_count(start_tracking=false), by not calling
> >> >> it if the driver hasn't actually enabled DC6 and it would also keep the
> >> >> DC6 counter tracking correct.
> >> >>
> >> >> intel_dmc_update_dc6_allowed_count(start_tracking=true) would be also
> >> >> guaranteed to be called only once the firmware is loaded, as until that
> >> >> point enabling DC6 is blocked (by holding a reference on the DC_off
> >> >> power well).
> >> >>
> >> >> > ---
> >> >> > drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++---
> >> >> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >> >> >
> >> >> > diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c b/drivers/gpu/drm/i915/display/intel_dmc.c
> >> >> > index 2fb6fec6dc99..169bbbc91f6d 100644
> >> >> > --- a/drivers/gpu/drm/i915/display/intel_dmc.c
> >> >> > +++ b/drivers/gpu/drm/i915/display/intel_dmc.c
> >> >> > @@ -1570,10 +1570,10 @@ void intel_dmc_update_dc6_allowed_count(struct intel_display *display,
> >> >> > struct intel_dmc *dmc = display_to_dmc(display);
> >> >> > u32 dc5_cur_count;
> >> >> >
> >> >> > - if (DISPLAY_VER(dmc->display) < 14)
> >> >> > + if (!dmc || DISPLAY_VER(display) < 14)
> >> >> > return;
> >> >> >
> >> >> > - dc5_cur_count = intel_de_read(dmc->display, DG1_DMC_DEBUG_DC5_COUNT);
> >> >> > + dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
> >> >> >
> >> >> > if (!start_tracking)
> >> >> > dmc->dc6_allowed.count += dc5_cur_count - dmc->dc6_allowed.dc5_start;
> >> >> > @@ -1587,7 +1587,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 *
> >> >> > struct intel_dmc *dmc = display_to_dmc(display);
> >> >> > bool dc6_enabled;
> >> >> >
> >> >> > - if (DISPLAY_VER(display) < 14)
> >> >> > + if (!dmc || DISPLAY_VER(display) < 14)
> >> >> > return false;
> >> >> >
> >> >> > mutex_lock(&power_domains->lock);
> >> >> > --
> >> >> > 2.47.3
> >> >> >
> >>
> >> --
> >> Jani Nikula, Intel
>
> --
> Jani Nikula, Intel
next prev parent reply other threads:[~2025-12-04 13:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-02 18:39 [PATCH] drm/i915/dmc: fix an unlikely NULL pointer deference at probe Jani Nikula
2025-12-02 21:24 ` Imre Deak
2025-12-02 21:35 ` Imre Deak
2025-12-03 8:13 ` Jani Nikula
2025-12-03 10:38 ` Imre Deak
2025-12-04 11:30 ` Jani Nikula
2025-12-04 13:02 ` Imre Deak [this message]
2026-01-07 11:16 ` Jani Nikula
2026-03-02 17:48 ` [PATCH v2] " Imre Deak
2026-03-03 2:44 ` Tao Liu
2026-03-06 10:12 ` Jani Nikula
2026-03-06 14:57 ` Imre Deak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTGGThc98Il6FCTC@ideak-desk \
--to=imre.deak@intel.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
--cc=mohammed.thasleem@intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox