public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Hellstrom <thellstrom@vmware.com>
To: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Jerome Glisse <j.glisse@gmail.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"airlied@linux.ie" <airlied@linux.ie>,
	Michel Danzer <daenzer@vmware.com>
Subject: Re: Radeon RS780 - BUG: unable to handle kernel NULL pointer dereference
Date: Tue, 09 Nov 2010 10:53:11 +0100	[thread overview]
Message-ID: <4CD91A07.1060308@vmware.com> (raw)
In-Reply-To: <20101109092920.GA1542@arch.trippelsdorf.de>

On 11/09/2010 10:29 AM, Markus Trippelsdorf wrote:
> On Mon, Nov 08, 2010 at 11:29:16PM +0100, Thomas Hellstrom wrote:
>    
>> On 11/08/2010 09:53 PM, Jerome Glisse wrote:
>>      
>>> On Mon, Nov 8, 2010 at 2:02 PM, Markus Trippelsdorf
>>> <markus@trippelsdorf.de>   wrote:
>>>        
>>>> On Mon, Nov 08, 2010 at 07:43:02PM +0100, Markus Trippelsdorf wrote:
>>>>          
>>>>> On Mon, Nov 08, 2010 at 06:07:37PM +0100, Markus Trippelsdorf wrote:
>>>>>            
>>>>>> On Mon, Nov 08, 2010 at 06:02:21PM +0100, Markus Trippelsdorf wrote:
>>>>>>              
>>>>>>> I can trigger a kernel crash on my system by simply loading this png
>>>>>>> image with firefox:
>>>>>>> http://mediaarchive.cern.ch/MediaArchive/Photo/Public/2010/1011251/1011251_01/1011251_01-A4-at-144-dpi.jpg
>>>>>>>                
>>>>>> Sorry the above link is wrong, this is the right one (that triggers the
>>>>>> crash):
>>>>>> http://cdsweb.cern.ch/record/1305179/files/HI-150431-630470-huge.png
>>>>>>              
>>>>> I triggered it a few more times and took the attached picture.
>>>>> It points to the BUG() call at drivers/gpu/drm/ttm/ttm_bo.c:1628 .
>>>>> (Sorry for the bad picture quality)
>>>>>            
>>>> And here the same BUG in plaintext (should be a bit easier to read):
>>>>
>>>> Nov  8 19:28:23 arch kernel: ------------[ cut here ]------------
>>>> Nov  8 19:28:23 arch kernel: kernel BUG at drivers/gpu/drm/ttm/ttm_bo.c:1628!
>>>>
>>>>          
>>> Thomas this bug seems to point to a case where we endup trying adding
>>> an entry to
>>> same offset in the rb tree for addr_space_mm. After reviewing
>>> carefully the locking
>>> around the rb tree modification&   addr_space_mm i am fairly confident
>>> that no race can
>>> occur. Would you have any idea on what might go wrong here ? I guess i would
>>> ultimately need to dump mm&   rb tree state when BUG get trigger to try
>>> to understand
>>> states of things.
>>>        
>> I agree there shouldn't be a race in this case.
>> The locking around these operations is simple and straightforward.
>>
>> So this IMHO should either be a memory corruption or a bug in the
>> range manager. I've never seen this BUG trigger before. Dumping mm /
>> rb tree contents or bisecting should probably find the culprit.
>>      
> OK I've found the buggy commit by bisection:
>
> e376573f7267390f4e1bdc552564b6fb913bce76 is the first bad commit
> commit e376573f7267390f4e1bdc552564b6fb913bce76
> Author: Michel Dänzer<daenzer@vmware.com>
> Date:   Thu Jul 8 12:43:28 2010 +1000
>
>      drm/radeon: fall back to GTT if bo creation/validation in VRAM fails.
>
>      This fixes a problem where on low VRAM cards we'd run out of space for validation.
>
>      [airlied: Tested on my M7, Thinkpad T42, compiz works with no problems.]
>
>      Signed-off-by: Michel Dänzer<daenzer@vmware.com>
>      Cc: stable@kernel.org
>      Signed-off-by: Dave Airlie<airlied@redhat.com>
>
> Please note that this is an old commit from 2.6.36-rc. When I revert it the
> kernel no longer crashes. Instead I see the following in my dmesg:
>
>    

Hmm, so this sounds like something in the Radeon eviction error path is 
causing corruption.
I had a similar problem with vmwgfx, when I tried to unref a BO _after_ 
ttm_bo_init() failed.
ttm_bo_init() is really supposed to call unref itself for various 
reasons,  so calling unref() or kfree() after a failed ttm_bo_init() 
will cause corruption.

In any case, the error below also suggests something is a bit fragile in 
the Radeon driver:

First, an accelerated eviction may fail, like in the message below, but 
then there must always be a backup plan, like unaccelerated eviction to 
system. On BO creation, there are a number of placement strategies, but 
if all else fails, it should be possible to initially place the BO in 
system memory.

Second, If bo validation fails during a command submission, due to 
insufficient VRAM / TT, then the driver should retry the complete 
validation cycle after first blocking all other validators and then 
evicting everything not pinned, to avoid failures due to fragmentation.

/Thomas


> [TTM] Failed to find memory space for buffer 0xffff880113e10e48 eviction.
> [TTM] No space for ffff880113e10e48 (25650 pages, 102600K, 100M)
> [TTM]   placement[0]=0x00070002 (1)
> [TTM]     has_type: 1
> [TTM]     use_type: 1
> [TTM]     flags: 0x0000000A
> [TTM]     gpu_offset: 0xA0000000
> [TTM]     size: 131072
> [TTM]     available_caching: 0x00070000
> [TTM]     default_caching: 0x00010000
> [TTM]  0x00000000-0x00000001:        1: used
> [TTM]  0x00000001-0x00000011:       16: used
> [TTM]  0x00000011-0x00000111:      256: used
> [TTM]  0x00000111-0x00000211:      256: used
> [TTM]  0x00000211-0x00000248:       55: free
> [TTM]  0x00000248-0x0000024c:        4: used
> [TTM]  0x0000024c-0x00001976:     5930: free
> [TTM]  0x00001976-0x000021aa:     2100: used
> [TTM]  0x000021aa-0x0000285f:     1717: free
> [TTM]  0x0000285f-0x00002860:        1: used
> [TTM]  0x00002860-0x00002873:       19: free
> [TTM]  0x00002873-0x000029b3:      320: used
> [TTM]  0x000029b3-0x00020000:   120397: free
> [TTM]  total: 131072, used 2954 free 128118
> [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -12!
> radeon 0000:01:05.0: object_init failed for (117555200, 0x00000004)
> [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (117555200, 4, 4096, -12)
> radeon 0000:01:05.0: object_init failed for (117555200, 0x00000004)
> [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (117555200, 4, 4096, -12)
> radeon 0000:01:05.0: object_init failed for (117555200, 0x00000004)
> [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (117555200, 4, 4096, -12)
> radeon 0000:01:05.0: object_init failed for (117555200, 0x00000004)
> [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (117555200, 4, 4096, -12)
> radeon 0000:01:05.0: object_init failed for (117555200, 0x00000004)
> ...
>
> And the following in the xorg log buffer:
>
> Failed to alloc memory
> Failed to allocat:
>     size:     : 117555200 bytes
>     alignment : 0 bytes
>     domains   : 4
> ...
>
>    


  reply	other threads:[~2010-11-09  9:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-08 17:02 Radeon RS780 - BUG: unable to handle kernel NULL pointer dereference Markus Trippelsdorf
2010-11-08 17:07 ` Markus Trippelsdorf
2010-11-08 18:43   ` Markus Trippelsdorf
2010-11-08 19:02     ` Markus Trippelsdorf
2010-11-08 19:36       ` Jerome Glisse
2010-11-08 20:53       ` Jerome Glisse
2010-11-08 20:58         ` Rafael J. Wysocki
2010-11-08 22:01           ` Jerome Glisse
2010-11-08 22:25           ` Thomas Hellstrom
2010-11-08 22:29         ` Thomas Hellstrom
2010-11-09  9:29           ` Markus Trippelsdorf
2010-11-09  9:53             ` Thomas Hellstrom [this message]
2010-11-09 10:07               ` Thomas Hellstrom
2010-11-09 10:32                 ` Michel Dänzer
2010-11-09 10:37                   ` Markus Trippelsdorf
2010-11-09 10:52                     ` Michel Dänzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CD91A07.1060308@vmware.com \
    --to=thellstrom@vmware.com \
    --cc=airlied@linux.ie \
    --cc=daenzer@vmware.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=j.glisse@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markus@trippelsdorf.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox