Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Souza, Jose" <jose.souza@intel.com>
To: "De Marchi, Lucas" <lucas.demarchi@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Vivi,  Rodrigo" <rodrigo.vivi@intel.com>,
	"quic_mojha@quicinc.com" <quic_mojha@quicinc.com>,
	"johannes@sipsolutions.net" <johannes@sipsolutions.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Cavitt, Jonathan" <jonathan.cavitt@intel.com>
Subject: Re: [PATCH v2 2/4] devcoredump: Add dev_coredumpm_timeout()
Date: Tue, 5 Mar 2024 14:21:45 +0000	[thread overview]
Message-ID: <f7c2d3381e50dd9c5e9211461e0abe487f5059df.camel@intel.com> (raw)
In-Reply-To: <zfrpz4vuhjwmilbqft5d4qh4s3gs3okzyxbsh4lc5rhzjy5ktx@xuu32mxhun4c>

On Mon, 2024-03-04 at 17:55 -0600, Lucas De Marchi wrote:
> On Mon, Mar 04, 2024 at 02:29:03PM +0000, Jose Souza wrote:
> > On Fri, 2024-03-01 at 09:38 +0100, Johannes Berg wrote:
> > > > On Wed, 2024-02-28 at 17:56 +0000, Souza, Jose wrote:
> > > > > > 
> > > > > > In my opinion, the timeout should depend on the type of device driver.
> > > > > > 
> > > > > > In the case of server-class Ethernet cards, where corporate users automate most tasks, five minutes might even be considered excessive.
> > > > > > 
> > > > > > For our case, GPUs, users might experience minor glitches and only search for what happened after finishing their current task (writing an email,
> > > > > > ending a gaming match, watching a YouTube video, etc.).
> > > > > > If they land on https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html or the future Xe version of that page, following the
> > > > > > instructions alone may take inexperienced Linux users more than five minutes.
> > > > 
> > > > That's all not wrong, but I don't see why you wouldn't automate this
> > > > even on end user machines? I feel you're boxing the problem in by
> > > > wanting to solve it entirely in the kernel?
> > 
> > The other part of the stack that we provide are the libraries implementing Vulkan and OpenGL APIs, I don't think we could ship scripts that needs
> > elevated privileges to read and store coredump.
> 
> it's still a very valid point though. Why are we doing this only on
> kernel side or mesa side rather than doing it in the proper place?  As
> Johannes said, this could very well be automated via udev rules.
> Distros automate getting the coredump already with systemd-coredump and
> the like.  Why wouldn't we do it similarly for GPU?  Handling this at
> the proper place you leave the policy there for "how long to retain the
> log", "maximum size", "rotation", etc.... outside of the kernel.

Where and how would this udev rules be distributed?
There is portable way to do that for distros that don't ship with systemd?

> 
> For the purposes of reporting a bug, wouldn't it be better to instruct
> users to get the log that was saved to disk so they don't risk losing
> it? I view the timeout more as a "protection" from the kernel side to
> not waste memory if the complete stack is not in place. It shoudln't
> be viewed as a timeout for how long the *user* will take to get the log
> and create bug reports.
> 
> Lucas De Marchi
> 
> > 
> > > > 
> > > > > > I have set the timeout to one hour in the Xe driver, but this could increase if we start receiving user complaints.
> > > > 
> > > > At an hour now, people will probably start arguing that "indefinitely"
> > > > is about right? But at that point you're probably back to persisting
> > > > them on disk anyway? Or maybe glitches happen during logout/shutdown ...
> > 
> > i915 driver don't use coredump and it persist the error dump in memory until user frees it or reboot it and we got no complains.
> > 
> > > > 
> > > > Anyway, I don't want to block this because I just don't care enough
> > > > about how you do things, but I think the kernel is the wrong place to
> > > > solve this problem... The intent here was to give some userspace time to
> > > > grab it (and yes for that 5 minutes is already way too long), not the
> > > > users. That's also part of the reason we only hold on to a single
> > > > instance, since I didn't want it to keep consuming more and more memory
> > > > for it if happens repeatedly.
> > > > 
> > 
> > okay so will move forward with other version applying your suggestion to make dev_coredumpm() static inline and move to the header.
> > 
> > thank you for the feedback
> > 
> > > > johannes
> > 


  reply	other threads:[~2024-03-05 14:22 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 16:57 [PATCH v2 1/4] devcoredump: Add dev_coredump_put() José Roberto de Souza
2024-02-28 16:57 ` [PATCH v2 2/4] devcoredump: Add dev_coredumpm_timeout() José Roberto de Souza
2024-02-28 17:05   ` Johannes Berg
2024-02-28 17:56     ` Souza, Jose
2024-03-01  8:38       ` Johannes Berg
2024-03-04 14:29         ` Souza, Jose
2024-03-04 23:55           ` Lucas De Marchi
2024-03-05 14:21             ` Souza, Jose [this message]
2024-03-05 15:22               ` Lucas De Marchi
2024-03-05 15:38                 ` Souza, Jose
2024-03-08 14:53                   ` Lucas De Marchi
2024-03-08 15:10                     ` Rodrigo Vivi
2024-02-28 16:57 ` [PATCH v2 3/4] drm/xe: Remove devcoredump during driver release José Roberto de Souza
2024-02-28 16:57 ` [PATCH v2 4/4] drm/xe: Increase devcoredump timeout José Roberto de Souza
2024-02-28 17:02 ` [PATCH v2 1/4] devcoredump: Add dev_coredump_put() Cavitt, Jonathan
2024-02-28 17:04 ` ✓ CI.Patch_applied: success for series starting with [v2,1/4] " Patchwork
2024-02-28 17:04 ` ✗ CI.checkpatch: warning " Patchwork
2024-02-28 17:05 ` ✓ CI.KUnit: success " Patchwork
2024-02-28 17:16 ` ✓ CI.Build: " Patchwork
2024-02-28 17:17 ` ✓ CI.Hooks: " Patchwork
2024-02-28 17:18 ` ✓ CI.checksparse: " Patchwork
2024-02-28 17:40 ` ✓ CI.BAT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7c2d3381e50dd9c5e9211461e0abe487f5059df.camel@intel.com \
    --to=jose.souza@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=johannes@sipsolutions.net \
    --cc=jonathan.cavitt@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucas.demarchi@intel.com \
    --cc=quic_mojha@quicinc.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox