Re: [PATCH] drm/nouveau/gem: tolerate a buffer specified multiple times

From: Bryan O'Donoghue <pure.logic@nexus-software.ie>
To: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Peter Hurley <peter@hurleysoftware.com>,
	Timo Aaltonen <tjaalton@debian.org>,
	Emil Velikov <emil.l.velikov@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Ben Skeggs <bskeggs@redhat.com>,
	Maarten Lankhorst <maarten.lankhorst@canonical.com>
Subject: Re: [PATCH] drm/nouveau/gem: tolerate a buffer specified multiple times
Date: Fri, 31 Jul 2015 17:43:35 +0100	[thread overview]
Message-ID: <55BBA5B7.9040809@nexus-software.ie> (raw)
In-Reply-To: <CAKb7UvgyQ_6V2_xH2sxaJ5daDwXZNJ-5KAPTbtSaxHRQA1rABQ@mail.gmail.com>

On 31/07/15 17:36, Ilia Mirkin wrote:
> On Fri, Jul 31, 2015 at 6:27 AM, Bryan O'Donoghue
> <pure.logic@nexus-software.ie> wrote:
>> ah no... 2.4.60 is right...
>>
>> Yes so Ilia - I've switched out 2.4.60 as per your suggestion to 2.4.56
>> (getting the version numbers right :) ) and it's still definitely giving me
>> the multiple instances message.
>
> This is going to sound like a stupid question, but I'll ask anyways --
> you *did* restart chrome after changing libdrm versions, right?

There are no stupid questions - just stupid answers like 'whaddya mean 
restart chrome'

Seriously though, I've restarted the machine each time I've tried to 
switch out those libraries, so it's definitely not that.

> I was going to mention that there were a handful of fixes in libdrm,
> potentially since 2.4.56 (I forget the exact versions), but if 2.4.60
> also fails, then that would have them.
>
> There was a final assert() added in 2.4.62, but that was to better
> isolate the cause of weirdo crashes (i.e. crash when the thing going
> wrong happens rather than stashing bad pointers for later very
> confusing dereference). Not GPU crashes.
>
> Just for your information,
>
> nouveau E[   PFIFO][0000:01:00.0] PFIFO: read fault at
> 0x0003e21000 [PAGE_NOT_PRESENT] from (unknown enum
> 0x00000000)/GPC0/(unknown enum 0x0000000f) on channel 0x007f80c000
> [unknown]
>
> means that there was VM fault from an unknown gpu unit (???) when
> reading some resource by the GPU.

OK, I was assuming it was a side effect of the -EINVAL when we get the 
multiple instances message.

> (The GPU has its own MMU.)
> Unfortunately this can happen for one of a million reasons, the
> biggest one being "unknown", but mesa definitely doesn't handle
> command submission failures particularly well... should probably add a
> "fail 1% of the time" thing to help fix that up.
>
> Do you have a reproducible way of achieving the multiple buffer on
> validation list thing? What GPU do you have? (Looking for a codename,
> not a marketing name... lspci should have it... GFxxx or GKxxx or

01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 
750M Mac Edition] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Apple Inc. Device 0130
	Flags: bus master, fast devsel, latency 0, IRQ 45
	Memory at c0000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 80000000 (64-bit, prefetchable) [size=256M]
	Memory at 90000000 (64-bit, prefetchable) [size=32M]
	I/O ports at 1000 [size=128]
	Expansion ROM at c1000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Kernel driver in use: nouveau

Macbook pro retina 2014
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel