From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/dg2: Catch and log more unexpected values in DG1_MSTR_TILE_INTR
Date: Thu, 26 May 2022 11:29:49 +0100 [thread overview]
Message-ID: <1ac432c1-a8fb-71fd-4e66-94aa3f80debf@linux.intel.com> (raw)
In-Reply-To: <20220525181417.wnilka4bigyi5fle@ldmartin-desk2>
On 25/05/2022 19:14, Lucas De Marchi wrote:
> On Wed, May 25, 2022 at 05:03:13PM +0100, Tvrtko Ursulin wrote:
>>
>> On 24/05/2022 18:51, Matt Roper wrote:
>>> On Tue, May 24, 2022 at 10:43:39AM +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Catch and log any garbage in the register, including no tiles
>>>> marked, or
>>>> multiple tiles marked.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>>> ---
>>>> We caught garbage in DG1_MSTR_TILE_INTR with DG2 (actual value
>>>> 0xF9D2C008)
>>>> during glmark and more badness. So I thought lets log all possible
>>>> failure
>>>> modes from here and also use per device logging.
>>>> ---
>>>> drivers/gpu/drm/i915/i915_irq.c | 33 ++++++++++++++++++++++-----------
>>>> drivers/gpu/drm/i915/i915_reg.h | 1 +
>>>> 2 files changed, 23 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_irq.c
>>>> b/drivers/gpu/drm/i915/i915_irq.c
>>>> index 73cebc6aa650..79853d3fc1ed 100644
>>>> --- a/drivers/gpu/drm/i915/i915_irq.c
>>>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>>>> @@ -2778,24 +2778,30 @@ static irqreturn_t dg1_irq_handler(int irq,
>>>> void *arg)
>>>> u32 gu_misc_iir;
>>>> if (!intel_irqs_enabled(i915))
>>>> - return IRQ_NONE;
>>>> + goto none;
>>>> master_tile_ctl = dg1_master_intr_disable(regs);
>>>> - if (!master_tile_ctl) {
>>>> - dg1_master_intr_enable(regs);
>>>> - return IRQ_NONE;
>>>> + if (!master_tile_ctl)
>>>> + goto enable_none;
>>>> +
>>>> + if (master_tile_ctl & ~(DG1_MSTR_IRQ | DG1_MSTR_TILE_MASK)) {
>>>> + drm_warn(&i915->drm, "Garbage in master_tile_ctl: 0x%08x!\n",
>>>> + master_tile_ctl);
>>>
>>> I know we have a bunch of them already, but shouldn't we be avoiding
>>> printk-based stuff like this inside interrupt handlers? Should we be
>>> migrating all these error messages over to trace_printk or something
>>> similar that's safer to use?
>>
>> Not sure - I kind of think some really unexpected and worrying
>> situations should be loud and on by default. Risk is then spam if not
>> ratelimited. Maybe we should instead ratelimit most errors/warnings
>> coming for irq handlers?
>>
>> In this particular case at least DRM_ERROR with no device info is the
>> odd one out in the entire file so I'd suggest changing at least that,
>> if the rest of my changes is of questionable benefit.
>
> I'd rather remove the printk's from irq rather than adding more. At the
> very
> least, they should be the _once variant or ratelimited. One of the few
> cases to even deserve a unlikely(), even to document this shouldn't
> really be happening.
I would support ratelimited for all the unexpected bits set, no bits
set, or similar conditions. I wouldn't remove such printks to
micro-optimize things. That would potentially lose important feedback in
cases when we hit truly unexpected situations.
But annotating them as unlikely would be a good thing.
> Our irq handlers (particularly on dgfx and multi-gt) are already too
> long running... I don't like making them any onger or slower.
How come? I mean which interrupts are a problem there? Isn't GuC
supposed to be taking on that load on itself, isn't that one of the main
selling points?
Regards,
Tvrtko
prev parent reply other threads:[~2022-05-26 10:29 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-24 9:43 [Intel-gfx] [PATCH] drm/i915/dg2: Catch and log more unexpected values in DG1_MSTR_TILE_INTR Tvrtko Ursulin
2022-05-24 10:01 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2022-05-24 10:22 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-05-24 11:46 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-05-24 17:51 ` [Intel-gfx] [PATCH] " Matt Roper
2022-05-25 16:03 ` Tvrtko Ursulin
2022-05-25 18:05 ` Matt Roper
2022-05-26 10:18 ` Tvrtko Ursulin
2022-05-27 18:42 ` Matt Roper
2022-06-06 11:55 ` Tvrtko Ursulin
2022-06-06 15:21 ` Matt Roper
2022-06-07 9:20 ` Tvrtko Ursulin
2022-05-25 18:14 ` Lucas De Marchi
2022-05-26 10:29 ` Tvrtko Ursulin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1ac432c1-a8fb-71fd-4e66-94aa3f80debf@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=Intel-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox