Re: [Nouveau] [PATCH] drm/ttm/nouveau: add DRM_NOUVEAU_GEM_CPU_PREP_TIMEOUT

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marcin Slusarz <marcin.slusarz@gmail.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: nouveau@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Nouveau] [PATCH] drm/ttm/nouveau: add DRM_NOUVEAU_GEM_CPU_PREP_TIMEOUT
Date: Sun, 18 Sep 2011 16:30:04 +0200	[thread overview]
Message-ID: <20110918143004.GA8929@joi.lan> (raw)
In-Reply-To: <20110918135950.GC2815@phenom.ffwll.local>

On Sun, Sep 18, 2011 at 03:59:50PM +0200, Daniel Vetter wrote:
> On Sun, Sep 18, 2011 at 03:18:57PM +0200, Marcin Slusarz wrote:
> > Currently DRM_NOUVEAU_GEM_CPU_PREP ioctl is broken WRT handling of signals.
> > 
> > nouveau_gem_ioctl_cpu_prep calls ttm_bo_wait which waits for fence to
> > "signal" or 3 seconds timeout pass.
> > But if it detects pending signal, it returns ERESTARTSYS and goes back
> > to userspace. After signal handler, userspace repeats the same ioctl which
> > starts _new 3 seconds loop_.
> > So when the application relies on signals, some ioctls may never finish
> > from application POV.
> > 
> > There is one important application which does this - Xorg. It uses SIGIO
> > (for input handling) and SIGALARM.
> > 
> > GPU lockups lead to endless ioctl loop which eventually manifests in crash
> > with "[mi] EQ overflowing. The server is probably stuck in an infinite loop."
> > message instead of being propagated to DDX.
> > 
> > The solutions is to add new ioctl NOUVEAU_GEM_CPU_PREP_TIMEOUT with
> > timeout parameter and decrease it on every signal.
> 
> Just fyi: We handle that issue in i915 by returning -EIO when the kernel
> decides that the gpu has died for good and that resetting doesn't help.
> Until then we rely on the ioctl restarting to kick everyone out of kernel
> mode so the reset handler can do its business. If the reset is
> successfull, userspace continues (due to the ioctl being restarted)
> hopefully mostly undisturbed. While the gpu is hung, but not yet reset, we
> stall all ioctls before taking the struct_mutex (see i915_gem_wait_error
> in i915_mutex_lock_interruptible).
> 
> Imo the advantage of that approach is that the kernel utlimately decides
> when the gpu is gone, and userspace (lacking much of the required
> information) must not engage in such guessing-games, too.

This approach would be preferrable, but we don't know yet how to reset
nvidia's gpu. Fixing this API bug could at least let us degrade to noaccel.
And I believe there are cases where ttm_bo_wait can fail with EBUSY and it
doesn't mean GPU locked up...

Marcin

next prev parent reply	other threads:[~2011-09-18 14:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-18 13:18 [PATCH] drm/ttm/nouveau: add DRM_NOUVEAU_GEM_CPU_PREP_TIMEOUT Marcin Slusarz
2011-09-18 13:59 ` [Nouveau] " Daniel Vetter
2011-09-18 14:30   ` Marcin Slusarz [this message]
2011-09-18 15:07     ` Daniel Vetter
2011-09-18 16:25 ` Thomas Hellstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110918143004.GA8929@joi.lan \
    --to=marcin.slusarz@gmail.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=nouveau@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.