All of lore.kernel.org
 help / color / mirror / Atom feed
From: acourbot@nvidia.com (Alexandre Courbot)
To: linux-arm-kernel@lists.infradead.org
Subject: Nouveau crashes in 4.6-rc on arm64
Date: Wed, 20 Apr 2016 13:35:11 +0900	[thread overview]
Message-ID: <571706FF.1010300@nvidia.com> (raw)
In-Reply-To: <570B50B4.4020304@nvidia.com>

On 04/11/2016 04:22 PM, Alexandre Courbot wrote:
> Hi Robin,
>
> On 04/09/2016 03:46 AM, Robin Murphy wrote:
>> Hi Alex,
>>
>> On 08/04/16 05:47, Alexandre Courbot wrote:
>>> Hi Robin,
>>>
>>> On 04/07/2016 08:50 PM, Robin Murphy wrote:
>>>> Hello,
>>>>
>>>> With 4.6-rc2 (and -rc1) I'm seeing Nouveau blowing up at boot, from the
>>>> look of it by dereferencing some offset from NULL inside
>>>> nouveau_fbcon_imageblit(). My setup is an old XFX 7600GT card plugged
>>>> into an ARM Juno r1 board, which works fine with 4.5 and earlier.
>>>>
>>>> Attached are a couple of logs from booting arm64 defconfig plus DRM and
>>>> Nouveau enabled - the second also has framebuffer console rotation
>>>> turned on, which interestingly seems to move the point of failure, and
>>>> the display does eventually come up to show the tail end of the
>>>> panic in
>>>> that case.
>>>>
>>>> I might be able to find time for a full bisection next week if isn't
>>>> something sufficiently obvious to anyone who knows this driver.
>>>
>>> Looking at the log it is not clear to me what could be causing this. I
>>> can boot 4.6-rc2 with a GM206 card without any issue. A bisect would
>>> indeed be useful here.
>>
>> OK, turns out the lure of writing something to remotely drive a Juno and
>> parse kernel bootlogs through an automatic bisection was too great to
>> resist on a Friday afternoon :D
>>
>> Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as
>> non-CPU-coherent on ARM64"), and sure enough reverting that removes the
>> crash.
>
> Thanks for taking the time to bisect this. And apologies as it seems my
> commit is the reason for your troubles.
>
> The CPU coherency flag is used for two things: explicitly sync buffers
> pages when required, and allocating buffers that are not explicitly
> synced (like fences or pushbuffers) using the DMA API. For this latter
> use, it also accesses the buffer's content using the mapping provided by
> dma_alloc_coherent() instead of creating a new one. All nouveau_bos are
> supposed to be written using nouveau_bo_rd32(), and this function
> handles the case of an DMA-API allocated object by detecting that the
> result of ttm_kmap_obj_virtual() is NULL.
>
> But as it turns out, OUT_RINGp() also calls ttm_kmap_obj_virtual() in
> order to perform a memcpy and uses its result directly - which means we
> are doing memcpy on a NULL pointer. We never caught this because we
> typically do not use Nouveau's fbcon with an ARM setup.
>
> I don't really like this special access for coherent objects, and
> actually had a patch in my tree to attempt to remove it (attached).
> Although it is not the whole solution (see below), the issue should at
> least not be visible with it applied - could you confirm?

Hi Robin, could you confirm whether the attached patch in my previous 
mail helps with your problem?

Thanks!

WARNING: multiple messages have this Message-ID (diff)
From: Alexandre Courbot <acourbot@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>,
	dri-devel@lists.freedesktop.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Cc: bskeggs@redhat.com
Subject: Re: Nouveau crashes in 4.6-rc on arm64
Date: Wed, 20 Apr 2016 13:35:11 +0900	[thread overview]
Message-ID: <571706FF.1010300@nvidia.com> (raw)
In-Reply-To: <570B50B4.4020304@nvidia.com>

On 04/11/2016 04:22 PM, Alexandre Courbot wrote:
> Hi Robin,
>
> On 04/09/2016 03:46 AM, Robin Murphy wrote:
>> Hi Alex,
>>
>> On 08/04/16 05:47, Alexandre Courbot wrote:
>>> Hi Robin,
>>>
>>> On 04/07/2016 08:50 PM, Robin Murphy wrote:
>>>> Hello,
>>>>
>>>> With 4.6-rc2 (and -rc1) I'm seeing Nouveau blowing up at boot, from the
>>>> look of it by dereferencing some offset from NULL inside
>>>> nouveau_fbcon_imageblit(). My setup is an old XFX 7600GT card plugged
>>>> into an ARM Juno r1 board, which works fine with 4.5 and earlier.
>>>>
>>>> Attached are a couple of logs from booting arm64 defconfig plus DRM and
>>>> Nouveau enabled - the second also has framebuffer console rotation
>>>> turned on, which interestingly seems to move the point of failure, and
>>>> the display does eventually come up to show the tail end of the
>>>> panic in
>>>> that case.
>>>>
>>>> I might be able to find time for a full bisection next week if isn't
>>>> something sufficiently obvious to anyone who knows this driver.
>>>
>>> Looking at the log it is not clear to me what could be causing this. I
>>> can boot 4.6-rc2 with a GM206 card without any issue. A bisect would
>>> indeed be useful here.
>>
>> OK, turns out the lure of writing something to remotely drive a Juno and
>> parse kernel bootlogs through an automatic bisection was too great to
>> resist on a Friday afternoon :D
>>
>> Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as
>> non-CPU-coherent on ARM64"), and sure enough reverting that removes the
>> crash.
>
> Thanks for taking the time to bisect this. And apologies as it seems my
> commit is the reason for your troubles.
>
> The CPU coherency flag is used for two things: explicitly sync buffers
> pages when required, and allocating buffers that are not explicitly
> synced (like fences or pushbuffers) using the DMA API. For this latter
> use, it also accesses the buffer's content using the mapping provided by
> dma_alloc_coherent() instead of creating a new one. All nouveau_bos are
> supposed to be written using nouveau_bo_rd32(), and this function
> handles the case of an DMA-API allocated object by detecting that the
> result of ttm_kmap_obj_virtual() is NULL.
>
> But as it turns out, OUT_RINGp() also calls ttm_kmap_obj_virtual() in
> order to perform a memcpy and uses its result directly - which means we
> are doing memcpy on a NULL pointer. We never caught this because we
> typically do not use Nouveau's fbcon with an ARM setup.
>
> I don't really like this special access for coherent objects, and
> actually had a patch in my tree to attempt to remove it (attached).
> Although it is not the whole solution (see below), the issue should at
> least not be visible with it applied - could you confirm?

Hi Robin, could you confirm whether the attached patch in my previous 
mail helps with your problem?

Thanks!

WARNING: multiple messages have this Message-ID (diff)
From: Alexandre Courbot <acourbot@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>,
	<dri-devel@lists.freedesktop.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>
Cc: <bskeggs@redhat.com>
Subject: Re: Nouveau crashes in 4.6-rc on arm64
Date: Wed, 20 Apr 2016 13:35:11 +0900	[thread overview]
Message-ID: <571706FF.1010300@nvidia.com> (raw)
In-Reply-To: <570B50B4.4020304@nvidia.com>

On 04/11/2016 04:22 PM, Alexandre Courbot wrote:
> Hi Robin,
>
> On 04/09/2016 03:46 AM, Robin Murphy wrote:
>> Hi Alex,
>>
>> On 08/04/16 05:47, Alexandre Courbot wrote:
>>> Hi Robin,
>>>
>>> On 04/07/2016 08:50 PM, Robin Murphy wrote:
>>>> Hello,
>>>>
>>>> With 4.6-rc2 (and -rc1) I'm seeing Nouveau blowing up at boot, from the
>>>> look of it by dereferencing some offset from NULL inside
>>>> nouveau_fbcon_imageblit(). My setup is an old XFX 7600GT card plugged
>>>> into an ARM Juno r1 board, which works fine with 4.5 and earlier.
>>>>
>>>> Attached are a couple of logs from booting arm64 defconfig plus DRM and
>>>> Nouveau enabled - the second also has framebuffer console rotation
>>>> turned on, which interestingly seems to move the point of failure, and
>>>> the display does eventually come up to show the tail end of the
>>>> panic in
>>>> that case.
>>>>
>>>> I might be able to find time for a full bisection next week if isn't
>>>> something sufficiently obvious to anyone who knows this driver.
>>>
>>> Looking at the log it is not clear to me what could be causing this. I
>>> can boot 4.6-rc2 with a GM206 card without any issue. A bisect would
>>> indeed be useful here.
>>
>> OK, turns out the lure of writing something to remotely drive a Juno and
>> parse kernel bootlogs through an automatic bisection was too great to
>> resist on a Friday afternoon :D
>>
>> Bisection came down to 1733a2ad3674("drm/nouveau/device/pci: set as
>> non-CPU-coherent on ARM64"), and sure enough reverting that removes the
>> crash.
>
> Thanks for taking the time to bisect this. And apologies as it seems my
> commit is the reason for your troubles.
>
> The CPU coherency flag is used for two things: explicitly sync buffers
> pages when required, and allocating buffers that are not explicitly
> synced (like fences or pushbuffers) using the DMA API. For this latter
> use, it also accesses the buffer's content using the mapping provided by
> dma_alloc_coherent() instead of creating a new one. All nouveau_bos are
> supposed to be written using nouveau_bo_rd32(), and this function
> handles the case of an DMA-API allocated object by detecting that the
> result of ttm_kmap_obj_virtual() is NULL.
>
> But as it turns out, OUT_RINGp() also calls ttm_kmap_obj_virtual() in
> order to perform a memcpy and uses its result directly - which means we
> are doing memcpy on a NULL pointer. We never caught this because we
> typically do not use Nouveau's fbcon with an ARM setup.
>
> I don't really like this special access for coherent objects, and
> actually had a patch in my tree to attempt to remove it (attached).
> Although it is not the whole solution (see below), the issue should at
> least not be visible with it applied - could you confirm?

Hi Robin, could you confirm whether the attached patch in my previous 
mail helps with your problem?

Thanks!

  parent reply	other threads:[~2016-04-20  4:35 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-07 11:50 Nouveau crashes in 4.6-rc on arm64 Robin Murphy
2016-04-07 11:50 ` Robin Murphy
2016-04-08  4:47 ` Alexandre Courbot
2016-04-08  4:47   ` Alexandre Courbot
2016-04-08  4:47   ` Alexandre Courbot
2016-04-08  6:27   ` Ilia Mirkin
2016-04-08  6:27     ` Ilia Mirkin
2016-04-08  6:27     ` Ilia Mirkin
2016-04-08 18:46   ` Robin Murphy
2016-04-08 18:46     ` Robin Murphy
2016-04-08 18:46     ` Robin Murphy
2016-04-11  7:22     ` Alexandre Courbot
2016-04-11  7:22       ` Alexandre Courbot
2016-04-11  7:22       ` Alexandre Courbot
2016-04-11  7:55       ` Alexandre Courbot
2016-04-11  7:55         ` Alexandre Courbot
2016-04-11  7:55         ` Alexandre Courbot
2016-04-20  4:35       ` Alexandre Courbot [this message]
2016-04-20  4:35         ` Alexandre Courbot
2016-04-20  4:35         ` Alexandre Courbot
2016-04-20 10:44         ` Robin Murphy
2016-04-20 10:44           ` Robin Murphy
2016-04-20 10:44           ` Robin Murphy
2016-04-20 10:51           ` Robin Murphy
2016-04-20 10:51             ` Robin Murphy
2016-04-20 10:51             ` Robin Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571706FF.1010300@nvidia.com \
    --to=acourbot@nvidia.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.