All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: John Harrison <john.c.harrison@intel.com>
Cc: Intel-GFX@lists.freedesktop.org, DRI-Devel@lists.freedesktop.org,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [Intel-gfx] [PATCH 0/2] Add support for dumping error captures via kernel logging
Date: Tue, 11 Apr 2023 18:50:53 +0200	[thread overview]
Message-ID: <ZDWP7TRexJRphUNQ@phenom.ffwll.local> (raw)
In-Reply-To: <f4c5dfbf-6dc2-52cb-c31d-c6e78646bcac@intel.com>

On Tue, Apr 11, 2023 at 09:41:04AM -0700, John Harrison wrote:
> On 4/11/2023 07:41, Rodrigo Vivi wrote:
> > On Mon, Apr 10, 2023 at 12:25:21PM -0700, John.C.Harrison@Intel.com wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > Sometimes, the only effective way to debug an issue is to dump all the
> > > interesting information at the point of failure. So add support for
> > > doing that.
> > No! Please no!
> > We have some of this on Xe and I'm hating it. I'm going to try to remove
> > from there soon. It is horrible when you lost the hability to use dmesg
> > directly because it goes over the number of lines it saves... or even
> > with dmesg -w it goes over the number of lines of your terminal...
> > or the ssh and serial slowness when printing a bunch of information.
> > 
> > We probably want to be able to capture multiple error states and be
> > able to cross them with a kernel timeline, but definitely not overflood
> > our log terminals.
> I think you are missing the point.
> 
> This is the emergency backup plan for when nothing else works. It is not on
> by default. It should never happen on an end user system unless we
> specifically request them to run with a patched kernel to enable a dump at a
> specific point.
> 
> But there are (many) times when nothing else works. In those instances, it
> is extremely useful to be able to dump the system state in this manner.
> 
> It is code we have been using internally for some time and it has helped
> resolve a number of different difficult to debug bugs. As our Xe generation
> platforms are now out in the wild and no longer just internal, it is also
> proving important to have this facility available in upstream trees as well.
> And having it merged rather than floating around as random patches passed
> from person to person is far easier to manage and would also help reduce the
> internal tree burden.

Note that Xe needs to move over to devcoredump infrastructure, so if you
need dumping straight to dmesg that would be a patch for that subsystem in
the future.

Not sure how much you want to add fun here in the i915-gem deadend, I'll
leave that up to i915 maintainers.

Just figured this is a good place to drop this aside :-)
-Daniel

> 
> John.
> 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > 
> > > John Harrison (2):
> > >    drm/i915: Dump error capture to kernel log
> > >    drm/i915/guc: Dump error capture to dmesg on CTB error
> > > 
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  53 +++++++++
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |   6 +
> > >   drivers/gpu/drm/i915/i915_gpu_error.c     | 130 ++++++++++++++++++++++
> > >   drivers/gpu/drm/i915/i915_gpu_error.h     |   8 ++
> > >   4 files changed, 197 insertions(+)
> > > 
> > > -- 
> > > 2.39.1
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: John Harrison <john.c.harrison@intel.com>
Cc: Intel-GFX@lists.freedesktop.org, DRI-Devel@lists.freedesktop.org,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [PATCH 0/2] Add support for dumping error captures via kernel logging
Date: Tue, 11 Apr 2023 18:50:53 +0200	[thread overview]
Message-ID: <ZDWP7TRexJRphUNQ@phenom.ffwll.local> (raw)
In-Reply-To: <f4c5dfbf-6dc2-52cb-c31d-c6e78646bcac@intel.com>

On Tue, Apr 11, 2023 at 09:41:04AM -0700, John Harrison wrote:
> On 4/11/2023 07:41, Rodrigo Vivi wrote:
> > On Mon, Apr 10, 2023 at 12:25:21PM -0700, John.C.Harrison@Intel.com wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > Sometimes, the only effective way to debug an issue is to dump all the
> > > interesting information at the point of failure. So add support for
> > > doing that.
> > No! Please no!
> > We have some of this on Xe and I'm hating it. I'm going to try to remove
> > from there soon. It is horrible when you lost the hability to use dmesg
> > directly because it goes over the number of lines it saves... or even
> > with dmesg -w it goes over the number of lines of your terminal...
> > or the ssh and serial slowness when printing a bunch of information.
> > 
> > We probably want to be able to capture multiple error states and be
> > able to cross them with a kernel timeline, but definitely not overflood
> > our log terminals.
> I think you are missing the point.
> 
> This is the emergency backup plan for when nothing else works. It is not on
> by default. It should never happen on an end user system unless we
> specifically request them to run with a patched kernel to enable a dump at a
> specific point.
> 
> But there are (many) times when nothing else works. In those instances, it
> is extremely useful to be able to dump the system state in this manner.
> 
> It is code we have been using internally for some time and it has helped
> resolve a number of different difficult to debug bugs. As our Xe generation
> platforms are now out in the wild and no longer just internal, it is also
> proving important to have this facility available in upstream trees as well.
> And having it merged rather than floating around as random patches passed
> from person to person is far easier to manage and would also help reduce the
> internal tree burden.

Note that Xe needs to move over to devcoredump infrastructure, so if you
need dumping straight to dmesg that would be a patch for that subsystem in
the future.

Not sure how much you want to add fun here in the i915-gem deadend, I'll
leave that up to i915 maintainers.

Just figured this is a good place to drop this aside :-)
-Daniel

> 
> John.
> 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > 
> > > John Harrison (2):
> > >    drm/i915: Dump error capture to kernel log
> > >    drm/i915/guc: Dump error capture to dmesg on CTB error
> > > 
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  53 +++++++++
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h |   6 +
> > >   drivers/gpu/drm/i915/i915_gpu_error.c     | 130 ++++++++++++++++++++++
> > >   drivers/gpu/drm/i915/i915_gpu_error.h     |   8 ++
> > >   4 files changed, 197 insertions(+)
> > > 
> > > -- 
> > > 2.39.1
> > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2023-04-11 16:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-10 19:25 [Intel-gfx] [PATCH 0/2] Add support for dumping error captures via kernel logging John.C.Harrison
2023-04-10 19:25 ` John.C.Harrison
2023-04-10 19:25 ` [Intel-gfx] [PATCH 1/2] drm/i915: Dump error capture to kernel log John.C.Harrison
2023-04-10 19:25   ` John.C.Harrison
2023-04-10 19:25 ` [Intel-gfx] [PATCH 2/2] drm/i915/guc: Dump error capture to dmesg on CTB error John.C.Harrison
2023-04-10 19:25   ` John.C.Harrison
2023-04-10 19:50 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add support for dumping error captures via kernel logging Patchwork
2023-04-10 19:50 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-04-10 19:59 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-04-10 21:15 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2023-04-11 14:41 ` [Intel-gfx] [PATCH 0/2] " Rodrigo Vivi
2023-04-11 14:41   ` Rodrigo Vivi
2023-04-11 16:41   ` [Intel-gfx] " John Harrison
2023-04-11 16:41     ` John Harrison
2023-04-11 16:50     ` Daniel Vetter [this message]
2023-04-11 16:50       ` Daniel Vetter
2023-04-18 16:38       ` [Intel-gfx] " Rodrigo Vivi
2023-04-18 16:38         ` Rodrigo Vivi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDWP7TRexJRphUNQ@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=DRI-Devel@lists.freedesktop.org \
    --cc=Intel-GFX@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.