public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Daniel Vetter <daniel@ffwll.ch>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Convert WARNs during userptr revoke to SIGBUS
Date: Tue, 13 Oct 2015 14:23:57 +0200	[thread overview]
Message-ID: <20151013122357.GV26718@phenom.ffwll.local> (raw)
In-Reply-To: <20151013114405.GB16727@nuc-i3427.alporthouse.com>

On Tue, Oct 13, 2015 at 12:44:05PM +0100, Chris Wilson wrote:
> On Tue, Oct 13, 2015 at 01:26:36PM +0200, Daniel Vetter wrote:
> > On Mon, Oct 12, 2015 at 10:31:35AM +0100, Chris Wilson wrote:
> > > On Mon, Oct 12, 2015 at 10:06:23AM +0100, Tvrtko Ursulin wrote:
> > > > 
> > > > On 09/10/15 18:26, Chris Wilson wrote:
> > > > >On Fri, Oct 09, 2015 at 07:14:02PM +0200, Daniel Vetter wrote:
> > > > >>On Fri, Oct 09, 2015 at 10:03:14AM +0100, Tvrtko Ursulin wrote:
> > > > >>>
> > > > >>>On 09/10/15 09:55, Daniel Vetter wrote:
> > > > >>>>On Fri, Oct 09, 2015 at 09:40:53AM +0100, Chris Wilson wrote:
> > > > >>>>>On Fri, Oct 09, 2015 at 09:48:01AM +0200, Daniel Vetter wrote:
> > > > >>>>>>On Thu, Oct 08, 2015 at 10:45:47AM +0100, Tvrtko Ursulin wrote:
> > > > >>>>>>The concern is that this isn't how SIG_SEGV works, it's a signal the
> > > > >>>>>>thread who made the invalid access gets directly. You never get a SIG_SEGV
> > > > >>>>>>for bad access someone else has made. So essentially it's new ABI.
> > > > >>>>>
> > > > >>>>>SIGBUS. For which the answer is yes, you can and do get SIGBUS for
> > > > >>>>>actions taken by other processes.
> > > > >>>>
> > > > >>>>Oh right I always forget that SIGBUS aliases with SIGIO. Anyway if
> > > > >>>>userspace wants SIGIO we just need to provide it with a pollable fd and
> > > > >>>>then it can use fcntl to make that happen. That's imo a much better api
> > > > >>>>than unconditionally throwing around signals. Also we already have the
> > > > >>>>reset stats ioctl to tell userspace that its gpu context is toats. If
> > > > >>>>anyone wants that to be pollable (or even send SIGIO) I think we should
> > > > >>>>extend that, with all the usual "needs userspace&igt" stuff on top.
> > > > >>>
> > > > >>>I don't see that this notification can be optional. Process is confused
> > > > >>>about its memory map use so should die. :)
> > > > >>>
> > > > >>>This is not a GPU error/hang - this is the process doing stupid things.
> > > > >>>
> > > > >>>MMU notifiers do not support decision making otherwise we could say
> > > > >>>-ETXTBUSY or something on munmap, but we can't. Not even sure that it would
> > > > >>>help in all cases, would have to fail clone as well and who knows what.
> > > > >>
> > > > >>So what happens if the gpu just keeps using the memory? It'll all be
> > > > >>horribly undefined behaviour and eventually it'll die on an -EFAULT in
> > > > >>execbuf, but does anything else bad happen?
> > > > >
> > > > >We don't see an EFAULT unless a miracle occurs, and the stale pages
> > > > >continue to be read/written by other processes (as well as the client).
> > > > >Horribly undefined behaviour with a misinformation leak.
> > > > 
> > > > What other processes? Pages will still be referenced so won't be
> > > > reused so there is not information leak across unrelated processes.
> > > > Unless you meant ones involved in object sharing?
> > > 
> > > This client is trying to replace the userptr with a fresh set of pages.
> > > The GPU and other processes continue to see the old pages i.e. old
> > > information (misinformation :) leaks.
> > 
> > userptr explicitly doesn't support this. You need to tear down the
> > existing userptr object and then create a new one if you change the
> > mmap'ing. So that's really just a bug in userspace, and the question is
> > how do we tell userspace best that it's done something stupid.
> 
> Pardon? Note this also affects munmap if you don't accept mremap (which
> is not explicitly unsupported as it fits quite nicely within the
> existing rules).
> 
> > My stance that the best one is to either kill any ctx using that object or
> > at least indicate there's trouble using reset stats. But sending a
> > SIGBUS/SIG_SEGV (which can only happen to the thread that does a memory
> > access, not any other thread that's accidentally in the same process
> > group) is imo abuse.
> 
> The signal is sent to everything that inherited the mm, not bound to the
> single thread.
> 
> > Or we just need to make sure we do get the EFAULT on
> > the next execbuffer.
> > 
> > Or maybe it just doesn't matter, i.e. where is the userspace which a) does
> > silly stuff like this b) wants proper notification? Adding ABI just
> > because we can't isn't going to get merged.
> 
> No client wants to be killed just because it does something stupid, it
> is killed to protect the integrity of the system.

But killing the client won't get rid of the ctx/objects, so won't really
solve all that much? Especially it won't get rid of the framebuffers,
which is the real trouble here it seems. That's because logind keeps a
duped copy of the fd for it's own purposes.

So the better fix would be to make sure we don't accidentally pin a
userptr object, i.e. adding a check in intel_pin_and_fence_fb for that
(plus igt testcase). I somehow missed in all this discussion that this is
about pinned objects ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-10-13 12:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-23 20:07 [PATCH] drm/i915: Convert WARNs during userptr revoke to SIGBUS Chris Wilson
2015-09-24 10:23 ` Tvrtko Ursulin
2015-09-24 10:31   ` Chris Wilson
2015-09-24 10:55     ` Tvrtko Ursulin
2015-09-28 13:42 ` Daniel Vetter
2015-09-28 13:52   ` Chris Wilson
2015-09-28 14:14     ` Daniel Vetter
2015-10-08  9:45       ` Tvrtko Ursulin
2015-10-09  7:48         ` Daniel Vetter
2015-10-09  8:40           ` Chris Wilson
2015-10-09  8:55             ` Daniel Vetter
2015-10-09  8:59               ` Chris Wilson
2015-10-09  9:03               ` Tvrtko Ursulin
2015-10-09 17:14                 ` Daniel Vetter
2015-10-09 17:26                   ` Chris Wilson
2015-10-09 18:33                     ` Dave Gordon
2015-10-12  9:06                     ` Tvrtko Ursulin
2015-10-12  9:31                       ` Chris Wilson
2015-10-12 10:10                         ` Chris Wilson
2015-10-12 12:59                           ` Tvrtko Ursulin
2015-10-13 11:26                         ` Daniel Vetter
2015-10-13 11:44                           ` Chris Wilson
2015-10-13 12:23                             ` Daniel Vetter [this message]
2015-10-13 13:09                               ` Chris Wilson
2015-10-13 13:39                                 ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151013122357.GV26718@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox