Re: [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND

From: Dmitry Osipenko <digetx@gmail.com>
To: Thierry Reding <thierry.reding@gmail.com>
Cc: linux-tegra@vger.kernel.org, dri-devel@lists.freedesktop.org,
	Mikko Perttunen <mperttunen@nvidia.com>
Subject: Re: [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
Date: Fri, 1 Feb 2019 17:40:59 +0300	[thread overview]
Message-ID: <e7cf07fb-b656-2dfd-39c3-d723fb4f480a@gmail.com> (raw)
In-Reply-To: <20190201141006.GC12829@ulmo>

01.02.2019 17:10, Thierry Reding пишет:
> On Fri, Feb 01, 2019 at 04:48:56PM +0300, Dmitry Osipenko wrote:
>> 01.02.2019 16:41, Thierry Reding пишет:
>>> On Fri, Feb 01, 2019 at 04:37:59PM +0300, Dmitry Osipenko wrote:
>>>> 01.02.2019 16:28, Thierry Reding пишет:
>>>>> From: Thierry Reding <treding@nvidia.com>
>>>>>
>>>>> The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
>>>>> the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
>>>>> absolute address. This can cause SMMU faults if the CDMA fetches past a
>>>>> pushbuffer's IOMMU mapping.
>>>>>
>>>>> Properly setting the DMAEND prevents the CDMA from fetching beyond that
>>>>> address and avoid such issues. This is currently not observed because a
>>>>> whole (almost) page of essentially scratch space absorbs any excessive
>>>>> prefetching by CDMA. However, changing the number of slots in the push
>>>>> buffer can trigger these SMMU faults.
>>>>>
>>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>>> ---
>>>>>  drivers/gpu/host1x/hw/cdma_hw.c | 4 ++--
>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
>>>>> index 485aef5761af..a24c090ac96f 100644
>>>>> --- a/drivers/gpu/host1x/hw/cdma_hw.c
>>>>> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
>>>>> @@ -75,7 +75,7 @@ static void cdma_start(struct host1x_cdma *cdma)
>>>>>  
>>>>>  	cdma->last_pos = cdma->push_buffer.pos;
>>>>>  	start = cdma->push_buffer.dma;
>>>>> -	end = start + cdma->push_buffer.size + 4;
>>>>> +	end = cdma->push_buffer.size + 4;
>>>>>  
>>>>>  	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
>>>>>  			 HOST1X_CHANNEL_DMACTRL);
>>>>> @@ -126,7 +126,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
>>>>>  			 HOST1X_CHANNEL_DMACTRL);
>>>>>  
>>>>>  	start = cdma->push_buffer.dma;
>>>>> -	end = start + cdma->push_buffer.size + 4;
>>>>> +	end = cdma->push_buffer.size + 4;
>>>>>  
>>>>>  	/* set base, end pointer (all of memory) */
>>>>>  	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
>>>>>
>>>>
>>>> This seems fixes problem that was added by a previous patch in this
>>>> series, "gpu: host1x: Support 40-bit addressing". What about to just
>>>> squash the patches? 
>>>
>>> This actually fixes a bug that's always been there. This just happens to
>>> touch the same lines as an earlier patch as a result of some refactoring
>>> that the earlier patch did.
>>
>> Oh, wow. Indeed! That's a bit unfortunate :) Though it's quite
>> difficult to spot that bug by looking at the code, good that it got
>> caught. Was this bug triggered by trying to move IOVA allocation to
>> the end of the AS?
> 
> This was actually triggered because I noticed that we were using 512
> slots in the push buffer, which means 4096 bytes, but then we needed
> that extra 4 bytes for the RESTART opcode, which means that we're
> currently allocating 8192 bytes of which only 4092 are being used.
> 
> So I thought: "Well, we should be able to live with 511 slots per push
> buffer and save a full memory page. This is an easy optimization!" Boy
> was I wrong... After making that change I started to see SMMU faults
> for the address immediately after the push buffer mapping. I think the
> same error happens regardless of where the push buffer is located. The
> reason for the SMMU faults seem to be that CDMA happily keeps on pre-
> fetching (a lot of) commands before it wraps around because of the
> RESTART opcode. DMAEND is designed as a mechanism to prevent CDMA from
> excessively fetching commands, but if you program a really large value
> into it, it'll add that really large value to the DMASTART as offset
> and the mechanism doesn't work anymore.
> 
> We don't currently see this because the 4092 bytes in the "scratch"
> page of the push buffer allocation are enough to absorb the prefetching
> that CDMA does.
> 
> We would also likely never see it happen in non-SMMU cases because the
> CDMA would just keep on prefetching whatever memory happened to be after
> the push buffer.

Yeah, that's what usually happens when the code is getting improved. Good job! :)
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel