DMA Engine development
 help / color / mirror / Atom feed
* Re: [PATCH] dmaengine: xgene-dma: fix spelling mistake "descripto" -> "descriptor"
From: Vinod Koul @ 2019-04-26 11:27 UTC (permalink / raw)
  To: Colin King; +Cc: Dan Williams, dmaengine, kernel-janitors, linux-kernel
In-Reply-To: <20190415153028.24849-1-colin.king@canonical.com>

On 15-04-19, 16:30, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> There is a spelling mistake in a chan_dbg message, fix it.

Applied, thanks

-- 
~Vinod

^ permalink raw reply

* dmaengine: xgene-dma: fix spelling mistake "descripto" -> "descriptor"
From: Vinod Koul @ 2019-04-26 11:27 UTC (permalink / raw)
  To: Colin King; +Cc: Dan Williams, dmaengine, kernel-janitors, linux-kernel

On 15-04-19, 16:30, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> There is a spelling mistake in a chan_dbg message, fix it.

Applied, thanks

^ permalink raw reply

* Re: [PATCH v1] dmaengine: idma64: Move driver name to the header
From: Vinod Koul @ 2019-04-26 11:25 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: dmaengine, Lee Jones, linux-kernel
In-Reply-To: <20190409150219.15094-1-andriy.shevchenko@linux.intel.com>

On 09-04-19, 18:02, Andy Shevchenko wrote:
> There are two drivers that are relying on the iDMA 64-bit driver name
> to match. Instead of duplicating string in both of them, dedicate
> a header file and share it between users.

Applied, thanks

-- 
~Vinod

^ permalink raw reply

* [v1] dmaengine: idma64: Move driver name to the header
From: Vinod Koul @ 2019-04-26 11:25 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: dmaengine, Lee Jones, linux-kernel

On 09-04-19, 18:02, Andy Shevchenko wrote:
> There are two drivers that are relying on the iDMA 64-bit driver name
> to match. Instead of duplicating string in both of them, dedicate
> a header file and share it between users.

Applied, thanks

^ permalink raw reply

* Re: [PATCH] dmaengine: bcm2835: Drop duplicate capability setting.
From: Vinod Koul @ 2019-04-26 11:23 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: bcm-kernel-feedback-list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel, Dan Williams, Eric Anholt,
	Stefan Wahren, Florian Fainelli, Ray Jui, Scott Branden
In-Reply-To: <20190404172503.12155-1-msuchanek@suse.de>

On 04-04-19, 19:25, Michal Suchanek wrote:
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
>  drivers/dma/bcm2835-dma.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index ec8a291d62ba..e38b19dd2895 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -891,7 +891,6 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
>  	dma_cap_set(DMA_SLAVE, od->ddev.cap_mask);
>  	dma_cap_set(DMA_PRIVATE, od->ddev.cap_mask);
>  	dma_cap_set(DMA_CYCLIC, od->ddev.cap_mask);
> -	dma_cap_set(DMA_SLAVE, od->ddev.cap_mask);
>  	dma_cap_set(DMA_MEMCPY, od->ddev.cap_mask);
>  	od->ddev.device_alloc_chan_resources = bcm2835_dma_alloc_chan_resources;
>  	od->ddev.device_free_chan_resources = bcm2835_dma_free_chan_resources;

Applied, thanks

-- 
~Vinod

^ permalink raw reply

* dmaengine: bcm2835: Drop duplicate capability setting.
From: Vinod Koul @ 2019-04-26 11:23 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: bcm-kernel-feedback-list, dmaengine, linux-rpi-kernel,
	linux-arm-kernel, linux-kernel, Dan Williams, Eric Anholt,
	Stefan Wahren, Florian Fainelli, Ray Jui, Scott Branden

On 04-04-19, 19:25, Michal Suchanek wrote:
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> ---
>  drivers/dma/bcm2835-dma.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/dma/bcm2835-dma.c b/drivers/dma/bcm2835-dma.c
> index ec8a291d62ba..e38b19dd2895 100644
> --- a/drivers/dma/bcm2835-dma.c
> +++ b/drivers/dma/bcm2835-dma.c
> @@ -891,7 +891,6 @@ static int bcm2835_dma_probe(struct platform_device *pdev)
>  	dma_cap_set(DMA_SLAVE, od->ddev.cap_mask);
>  	dma_cap_set(DMA_PRIVATE, od->ddev.cap_mask);
>  	dma_cap_set(DMA_CYCLIC, od->ddev.cap_mask);
> -	dma_cap_set(DMA_SLAVE, od->ddev.cap_mask);
>  	dma_cap_set(DMA_MEMCPY, od->ddev.cap_mask);
>  	od->ddev.device_alloc_chan_resources = bcm2835_dma_alloc_chan_resources;
>  	od->ddev.device_free_chan_resources = bcm2835_dma_free_chan_resources;

Applied, thanks

^ permalink raw reply

* Re: [PATCH] dmaengine: pl330: _stop: clear interrupt status
From: Vinod Koul @ 2019-04-26 11:21 UTC (permalink / raw)
  To: Sugar Zhang
  Cc: heiko, broonie, linux-rockchip, Dan Williams, dmaengine,
	linux-kernel
In-Reply-To: <1554289582-43695-1-git-send-email-sugar.zhang@rock-chips.com>

On 03-04-19, 19:06, Sugar Zhang wrote:
> This patch kill instructs the DMAC to immediately terminate
> execution of a thread. and then clear the interrupt status,
> at last, stop generating interrupts for DMA_SEV. to guarantee
> the next dma start is clean. otherwise, one interrupt maybe leave
> to next start and make some mistake.

Applied, thanks

-- 
~Vinod

^ permalink raw reply

* dmaengine: pl330: _stop: clear interrupt status
From: Vinod Koul @ 2019-04-26 11:21 UTC (permalink / raw)
  To: Sugar Zhang
  Cc: heiko, broonie, linux-rockchip, Dan Williams, dmaengine,
	linux-kernel

On 03-04-19, 19:06, Sugar Zhang wrote:
> This patch kill instructs the DMAC to immediately terminate
> execution of a thread. and then clear the interrupt status,
> at last, stop generating interrupts for DMA_SEV. to guarantee
> the next dma start is clean. otherwise, one interrupt maybe leave
> to next start and make some mistake.

Applied, thanks

^ permalink raw reply

* Re: [PATCH v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Jon Hunter @ 2019-04-26 11:13 UTC (permalink / raw)
  To: Dmitry Osipenko, Laxman Dewangan, Vinod Koul, Thierry Reding
  Cc: dmaengine, linux-tegra, linux-kernel
In-Reply-To: <4a315b63-bc71-3c3e-f1ae-8638bcf4033d@gmail.com>


On 26/04/2019 11:45, Dmitry Osipenko wrote:
> 26.04.2019 12:52, Jon Hunter пишет:
>>
>> On 25/04/2019 00:17, Dmitry Osipenko wrote:
>>> The readl/writel functions are inserting memory barrier in order to
>>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
>>> results in L2 cache syncing which isn't a cheapest operation. The
>>> tegra20-apb-dma driver doesn't need to synchronize generic memory
>>> accesses, hence use the relaxed versions of the functions.
>>
>> Do you mean device-io accesses here as this is not generic memory?
> 
> Yes. The IOMEM accesses within are always ordered and uncached, while
> generic memory accesses are out-of-order and cached.
> 
>> Although there may not be any issues with this change, I think I need a
>> bit more convincing that we should do this given that we have had it
>> this way for sometime and I would not like to see us introduce any
>> regressions as this point without being 100% certain we would not.
>> Ideally, if I had some good extensive tests I could run to hammer the
>> DMA for all configurations with different combinations of channels
>> running simultaneously then we could test this, but right now I don't :-(
>>
>> Have you ...
>> 1. Tested both cyclic and scatter-gather transfers?
>> 2. Stress tested simultaneous transfers with various different
>>    configurations?
>> 3. Quantified the actual performance benefit of this change so we can
>>    understand how much of a performance boost this offers?
> 
> Actually I found a case where this change causes a problem, I'm seeing
> I2C transfer timeout for touchscreen and it breaks the touch input.
> Indeed, I haven't tested this patch very well.
> 
> And the fix is this:
> 
> @@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
> *dev)
>  						  TEGRA_APBDMA_CHAN_WCOUNT);
>  	}
> 
> +	dsb();
> +
>  	clk_disable_unprepare(tdma->dma_clk);
> 
>  	return 0;
> 
> 
> Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
> incoherent and CPU disables clock before writes are reaching DMA controller.
> 
> I'd say that cyclic and scatter-gather transfers are now tested. I also
> made some more testing of simultaneous transfers.
> 
> Quantifying performance probably won't be easy to make as the DMA
> read/writes are not on any kind of code's hot-path.

So why make the change?
> Jon, are you still insisting about to drop this patch or you will be
> fine with the v2 that will have the dsb() in place?

If we can't quantify the performance gain, then it is difficult to
justify the change. I would also be concerned if that is the only place
we need an explicit dsb.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* [v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Jon Hunter @ 2019-04-26 11:13 UTC (permalink / raw)
  To: Dmitry Osipenko, Laxman Dewangan, Vinod Koul, Thierry Reding
  Cc: dmaengine, linux-tegra, linux-kernel

On 26/04/2019 11:45, Dmitry Osipenko wrote:
> 26.04.2019 12:52, Jon Hunter пишет:
>>
>> On 25/04/2019 00:17, Dmitry Osipenko wrote:
>>> The readl/writel functions are inserting memory barrier in order to
>>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
>>> results in L2 cache syncing which isn't a cheapest operation. The
>>> tegra20-apb-dma driver doesn't need to synchronize generic memory
>>> accesses, hence use the relaxed versions of the functions.
>>
>> Do you mean device-io accesses here as this is not generic memory?
> 
> Yes. The IOMEM accesses within are always ordered and uncached, while
> generic memory accesses are out-of-order and cached.
> 
>> Although there may not be any issues with this change, I think I need a
>> bit more convincing that we should do this given that we have had it
>> this way for sometime and I would not like to see us introduce any
>> regressions as this point without being 100% certain we would not.
>> Ideally, if I had some good extensive tests I could run to hammer the
>> DMA for all configurations with different combinations of channels
>> running simultaneously then we could test this, but right now I don't :-(
>>
>> Have you ...
>> 1. Tested both cyclic and scatter-gather transfers?
>> 2. Stress tested simultaneous transfers with various different
>>    configurations?
>> 3. Quantified the actual performance benefit of this change so we can
>>    understand how much of a performance boost this offers?
> 
> Actually I found a case where this change causes a problem, I'm seeing
> I2C transfer timeout for touchscreen and it breaks the touch input.
> Indeed, I haven't tested this patch very well.
> 
> And the fix is this:
> 
> @@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
> *dev)
>  						  TEGRA_APBDMA_CHAN_WCOUNT);
>  	}
> 
> +	dsb();
> +
>  	clk_disable_unprepare(tdma->dma_clk);
> 
>  	return 0;
> 
> 
> Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
> incoherent and CPU disables clock before writes are reaching DMA controller.
> 
> I'd say that cyclic and scatter-gather transfers are now tested. I also
> made some more testing of simultaneous transfers.
> 
> Quantifying performance probably won't be easy to make as the DMA
> read/writes are not on any kind of code's hot-path.

So why make the change?
> Jon, are you still insisting about to drop this patch or you will be
> fine with the v2 that will have the dsb() in place?

If we can't quantify the performance gain, then it is difficult to
justify the change. I would also be concerned if that is the only place
we need an explicit dsb.

Cheers
Jon

^ permalink raw reply

* Re: [PATCH v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Dmitry Osipenko @ 2019-04-26 10:45 UTC (permalink / raw)
  To: Jon Hunter, Laxman Dewangan, Vinod Koul, Thierry Reding
  Cc: dmaengine, linux-tegra, linux-kernel
In-Reply-To: <d789f1bc-44d1-abf6-a046-7cf835416e8f@nvidia.com>

26.04.2019 12:52, Jon Hunter пишет:
> 
> On 25/04/2019 00:17, Dmitry Osipenko wrote:
>> The readl/writel functions are inserting memory barrier in order to
>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
>> results in L2 cache syncing which isn't a cheapest operation. The
>> tegra20-apb-dma driver doesn't need to synchronize generic memory
>> accesses, hence use the relaxed versions of the functions.
> 
> Do you mean device-io accesses here as this is not generic memory?

Yes. The IOMEM accesses within are always ordered and uncached, while
generic memory accesses are out-of-order and cached.

> Although there may not be any issues with this change, I think I need a
> bit more convincing that we should do this given that we have had it
> this way for sometime and I would not like to see us introduce any
> regressions as this point without being 100% certain we would not.
> Ideally, if I had some good extensive tests I could run to hammer the
> DMA for all configurations with different combinations of channels
> running simultaneously then we could test this, but right now I don't :-(
> 
> Have you ...
> 1. Tested both cyclic and scatter-gather transfers?
> 2. Stress tested simultaneous transfers with various different
>    configurations?
> 3. Quantified the actual performance benefit of this change so we can
>    understand how much of a performance boost this offers?

Actually I found a case where this change causes a problem, I'm seeing
I2C transfer timeout for touchscreen and it breaks the touch input.
Indeed, I haven't tested this patch very well.

And the fix is this:

@@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
*dev)
 						  TEGRA_APBDMA_CHAN_WCOUNT);
 	}

+	dsb();
+
 	clk_disable_unprepare(tdma->dma_clk);

 	return 0;


Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
incoherent and CPU disables clock before writes are reaching DMA controller.

I'd say that cyclic and scatter-gather transfers are now tested. I also
made some more testing of simultaneous transfers.

Quantifying performance probably won't be easy to make as the DMA
read/writes are not on any kind of code's hot-path.

Jon, are you still insisting about to drop this patch or you will be
fine with the v2 that will have the dsb() in place?

^ permalink raw reply

* [v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Dmitry Osipenko @ 2019-04-26 10:45 UTC (permalink / raw)
  To: Jon Hunter, Laxman Dewangan, Vinod Koul, Thierry Reding
  Cc: dmaengine, linux-tegra, linux-kernel

26.04.2019 12:52, Jon Hunter пишет:
> 
> On 25/04/2019 00:17, Dmitry Osipenko wrote:
>> The readl/writel functions are inserting memory barrier in order to
>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
>> results in L2 cache syncing which isn't a cheapest operation. The
>> tegra20-apb-dma driver doesn't need to synchronize generic memory
>> accesses, hence use the relaxed versions of the functions.
> 
> Do you mean device-io accesses here as this is not generic memory?

Yes. The IOMEM accesses within are always ordered and uncached, while
generic memory accesses are out-of-order and cached.

> Although there may not be any issues with this change, I think I need a
> bit more convincing that we should do this given that we have had it
> this way for sometime and I would not like to see us introduce any
> regressions as this point without being 100% certain we would not.
> Ideally, if I had some good extensive tests I could run to hammer the
> DMA for all configurations with different combinations of channels
> running simultaneously then we could test this, but right now I don't :-(
> 
> Have you ...
> 1. Tested both cyclic and scatter-gather transfers?
> 2. Stress tested simultaneous transfers with various different
>    configurations?
> 3. Quantified the actual performance benefit of this change so we can
>    understand how much of a performance boost this offers?

Actually I found a case where this change causes a problem, I'm seeing
I2C transfer timeout for touchscreen and it breaks the touch input.
Indeed, I haven't tested this patch very well.

And the fix is this:

@@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
*dev)
 						  TEGRA_APBDMA_CHAN_WCOUNT);
 	}

+	dsb();
+
 	clk_disable_unprepare(tdma->dma_clk);

 	return 0;


Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
incoherent and CPU disables clock before writes are reaching DMA controller.

I'd say that cyclic and scatter-gather transfers are now tested. I also
made some more testing of simultaneous transfers.

Quantifying performance probably won't be easy to make as the DMA
read/writes are not on any kind of code's hot-path.

Jon, are you still insisting about to drop this patch or you will be
fine with the v2 that will have the dsb() in place?

^ permalink raw reply

* Re: [EXT] Re: [PATCH v2 06/15] spi: imx: fix ERR009165
From: Robin Gong @ 2019-04-26  9:53 UTC (permalink / raw)
  To: robh+dt@kernel.org, u.kleine-koenig@pengutronix.de,
	festevam@gmail.com, plyatov@gmail.com, dan.j.williams@intel.com,
	broonie@kernel.org, mark.rutland@arm.com, catalin.marinas@arm.com,
	will.deacon@arm.com, shawnguo@kernel.org, l.stach@pengutronix.de,
	s.hauer@pengutronix.de
  Cc: linux-spi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, dmaengine@vger.kernel.org,
	dl-linux-imx, kernel@pengutronix.de, devicetree@vger.kernel.org
In-Reply-To: <1556271442.2584.29.camel@pengutronix.de>

On 2019-04-26 at 11:37 +0200, Lucas Stach wrote:
> Am Freitag, den 26.04.2019, 09:22 +0000 schrieb Robin Gong:
> > 
> > On 2019-04-26 at 11:07 +0200, Lucas Stach wrote:
> > > 
> > > 
> > > Am Freitag, den 26.04.2019, 08:05 +0000 schrieb Robin Gong:
> > > > 
> > > >  static void mx51_setup_wml(struct spi_imx_data *spi_imx)
> > > >  {
> > > > +     u32 tx_wml = 0;
> > > > +
> > > With a wml of 0 you might set the maxburst of the TX dma channel
> > > to
> > > fifosize to minimize the performance impact of this workaround.
> > > 
> > > Regards,
> > > Lucas
> > Unfortunately, this is a MUST part of errata which cause
> > performance
> > drop.
> I'm not talking about changing the WML, but when the WML must be 0
> you
> can safely increase the DMA burst size without overflowing the FIFO,
> which might recover some of of the performance loss.
> 
> Regards,
> Lucas
Good point, will add it in V3, thanks.

^ permalink raw reply

* [v2,06/15] spi: imx: fix ERR009165
From: Robin Gong @ 2019-04-26  9:53 UTC (permalink / raw)
  To: robh+dt@kernel.org, u.kleine-koenig@pengutronix.de,
	festevam@gmail.com, plyatov@gmail.com, dan.j.williams@intel.com,
	broonie@kernel.org, mark.rutland@arm.com, catalin.marinas@arm.com,
	will.deacon@arm.com, shawnguo@kernel.org, l.stach@pengutronix.de,
	s.hauer@pengutronix.de
  Cc: linux-spi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, dmaengine@vger.kernel.org,
	dl-linux-imx, kernel@pengutronix.de, devicetree@vger.kernel.org

On 2019-04-26 at 11:37 +0200, Lucas Stach wrote:
> Am Freitag, den 26.04.2019, 09:22 +0000 schrieb Robin Gong:
> > 
> > On 2019-04-26 at 11:07 +0200, Lucas Stach wrote:
> > > 
> > > 
> > > Am Freitag, den 26.04.2019, 08:05 +0000 schrieb Robin Gong:
> > > > 
> > > >  static void mx51_setup_wml(struct spi_imx_data *spi_imx)
> > > >  {
> > > > +     u32 tx_wml = 0;
> > > > +
> > > With a wml of 0 you might set the maxburst of the TX dma channel
> > > to
> > > fifosize to minimize the performance impact of this workaround.
> > > 
> > > Regards,
> > > Lucas
> > Unfortunately, this is a MUST part of errata which cause
> > performance
> > drop.
> I'm not talking about changing the WML, but when the WML must be 0
> you
> can safely increase the DMA burst size without overflowing the FIFO,
> which might recover some of of the performance loss.
> 
> Regards,
> Lucas
Good point, will add it in V3, thanks.

^ permalink raw reply

* Re: [PATCH v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Jon Hunter @ 2019-04-26  9:52 UTC (permalink / raw)
  To: Dmitry Osipenko, Laxman Dewangan, Vinod Koul, Thierry Reding
  Cc: dmaengine, linux-tegra, linux-kernel
In-Reply-To: <20190424231708.21219-1-digetx@gmail.com>


On 25/04/2019 00:17, Dmitry Osipenko wrote:
> The readl/writel functions are inserting memory barrier in order to
> ensure that memory stores are completed. On Tegra20 and Tegra30 this
> results in L2 cache syncing which isn't a cheapest operation. The
> tegra20-apb-dma driver doesn't need to synchronize generic memory
> accesses, hence use the relaxed versions of the functions.

Do you mean device-io accesses here as this is not generic memory?

Although there may not be any issues with this change, I think I need a
bit more convincing that we should do this given that we have had it
this way for sometime and I would not like to see us introduce any
regressions as this point without being 100% certain we would not.
Ideally, if I had some good extensive tests I could run to hammer the
DMA for all configurations with different combinations of channels
running simultaneously then we could test this, but right now I don't :-(

Have you ...
1. Tested both cyclic and scatter-gather transfers?
2. Stress tested simultaneous transfers with various different
   configurations?
3. Quantified the actual performance benefit of this change so we can
   understand how much of a performance boost this offers?

Cheers
Jon

^ permalink raw reply

* [v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Jon Hunter @ 2019-04-26  9:52 UTC (permalink / raw)
  To: Dmitry Osipenko, Laxman Dewangan, Vinod Koul, Thierry Reding
  Cc: dmaengine, linux-tegra, linux-kernel

On 25/04/2019 00:17, Dmitry Osipenko wrote:
> The readl/writel functions are inserting memory barrier in order to
> ensure that memory stores are completed. On Tegra20 and Tegra30 this
> results in L2 cache syncing which isn't a cheapest operation. The
> tegra20-apb-dma driver doesn't need to synchronize generic memory
> accesses, hence use the relaxed versions of the functions.

Do you mean device-io accesses here as this is not generic memory?

Although there may not be any issues with this change, I think I need a
bit more convincing that we should do this given that we have had it
this way for sometime and I would not like to see us introduce any
regressions as this point without being 100% certain we would not.
Ideally, if I had some good extensive tests I could run to hammer the
DMA for all configurations with different combinations of channels
running simultaneously then we could test this, but right now I don't :-(

Have you ...
1. Tested both cyclic and scatter-gather transfers?
2. Stress tested simultaneous transfers with various different
   configurations?
3. Quantified the actual performance benefit of this change so we can
   understand how much of a performance boost this offers?

Cheers
Jon

^ permalink raw reply

* Re: [EXT] Re: [PATCH v2 06/15] spi: imx: fix ERR009165
From: Lucas Stach @ 2019-04-26  9:37 UTC (permalink / raw)
  To: Robin Gong, robh+dt@kernel.org, u.kleine-koenig@pengutronix.de,
	festevam@gmail.com, plyatov@gmail.com, broonie@kernel.org,
	dan.j.williams@intel.com, mark.rutland@arm.com,
	catalin.marinas@arm.com, will.deacon@arm.com, shawnguo@kernel.org,
	s.hauer@pengutronix.de
  Cc: linux-spi@vger.kernel.org, linux-kernel@vger.kernel.org,
	dmaengine@vger.kernel.org, dl-linux-imx, kernel@pengutronix.de,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <1556270196.9298.16.camel@nxp.com>

Am Freitag, den 26.04.2019, 09:22 +0000 schrieb Robin Gong:
> On 2019-04-26 at 11:07 +0200, Lucas Stach wrote:
> > 
> > Am Freitag, den 26.04.2019, 08:05 +0000 schrieb Robin Gong:
> > >  static void mx51_setup_wml(struct spi_imx_data *spi_imx)
> > >  {
> > > +     u32 tx_wml = 0;
> > > +
> > 
> > With a wml of 0 you might set the maxburst of the TX dma channel to
> > fifosize to minimize the performance impact of this workaround.
> > 
> > Regards,
> > Lucas
> 
> Unfortunately, this is a MUST part of errata which cause performance
> drop.

I'm not talking about changing the WML, but when the WML must be 0 you
can safely increase the DMA burst size without overflowing the FIFO,
which might recover some of of the performance loss.

Regards,
Lucas

^ permalink raw reply

* [v2,06/15] spi: imx: fix ERR009165
From: Lucas Stach @ 2019-04-26  9:37 UTC (permalink / raw)
  To: Robin Gong, robh+dt@kernel.org, u.kleine-koenig@pengutronix.de,
	festevam@gmail.com, plyatov@gmail.com, broonie@kernel.org,
	dan.j.williams@intel.com, mark.rutland@arm.com,
	catalin.marinas@arm.com, will.deacon@arm.com, shawnguo@kernel.org,
	s.hauer@pengutronix.de
  Cc: linux-spi@vger.kernel.org, linux-kernel@vger.kernel.org,
	dmaengine@vger.kernel.org, dl-linux-imx, kernel@pengutronix.de,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org

Am Freitag, den 26.04.2019, 09:22 +0000 schrieb Robin Gong:
> On 2019-04-26 at 11:07 +0200, Lucas Stach wrote:
> > 
> > Am Freitag, den 26.04.2019, 08:05 +0000 schrieb Robin Gong:
> > >  static void mx51_setup_wml(struct spi_imx_data *spi_imx)
> > >  {
> > > +     u32 tx_wml = 0;
> > > +
> > 
> > With a wml of 0 you might set the maxburst of the TX dma channel to
> > fifosize to minimize the performance impact of this workaround.
> > 
> > Regards,
> > Lucas
> 
> Unfortunately, this is a MUST part of errata which cause performance
> drop.

I'm not talking about changing the WML, but when the WML must be 0 you
can safely increase the DMA burst size without overflowing the FIFO,
which might recover some of of the performance loss.

Regards,
Lucas

^ permalink raw reply

* Re: [EXT] Re: [PATCH v2 07/15] spi: imx: remove ERR009165 workaround on i.mx6ul
From: Robin Gong @ 2019-04-26  9:32 UTC (permalink / raw)
  To: robh+dt@kernel.org, u.kleine-koenig@pengutronix.de,
	festevam@gmail.com, plyatov@gmail.com, broonie@kernel.org,
	dan.j.williams@intel.com, mark.rutland@arm.com,
	catalin.marinas@arm.com, will.deacon@arm.com, shawnguo@kernel.org,
	l.stach@pengutronix.de, s.hauer@pengutronix.de
  Cc: linux-spi@vger.kernel.org, linux-kernel@vger.kernel.org,
	dmaengine@vger.kernel.org, dl-linux-imx, kernel@pengutronix.de,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <1556269796.2584.25.camel@pengutronix.de>

On 2019-04-26 at 11:09 +0200, Lucas Stach wrote:
> 
> >  static inline int is_imx53_ecspi(struct spi_imx_data *d)
> > @@ -585,9 +587,16 @@ static int mx51_ecspi_prepare_transfer(struct
> > spi_imx_data *spi_imx,
> >       ctrl |= mx51_ecspi_clkdiv(spi_imx, t->speed_hz, &clk);
> >       spi_imx->spi_bus_clk = clk;
> > 
> > -     /* ERR009165: work in XHC mode as PIO */
> > -     if (spi_imx->usedma)
> > -             ctrl &= ~MX51_ECSPI_CTRL_SMC;
> > +     /*
> > +      * ERR009165: work in XHC mode instead of SMC as PIO on the
> > chips
> > +      * before i.mx6ul.
> > +      */
> > +     if (spi_imx->usedma) {
> > +             if (spi_imx->devtype_data->devtype == IMX6UL_ECSPI)
> > 
> > =, otherwise the workaround might be applied to later generations
> > of
> the core if more are added later.
> 
> Regards,
> Lucas
Understood your point, but for now choose different compatible name
could apply this workaround or not. I prefer to leave it for next ecspi
IP upgrade if it really come in the future, otherwise that '>=' bring
a little bit confuse if there is no update for ecspi IP...

^ permalink raw reply

* [v2,07/15] spi: imx: remove ERR009165 workaround on i.mx6ul
From: Robin Gong @ 2019-04-26  9:32 UTC (permalink / raw)
  To: robh+dt@kernel.org, u.kleine-koenig@pengutronix.de,
	festevam@gmail.com, plyatov@gmail.com, broonie@kernel.org,
	dan.j.williams@intel.com, mark.rutland@arm.com,
	catalin.marinas@arm.com, will.deacon@arm.com, shawnguo@kernel.org,
	l.stach@pengutronix.de, s.hauer@pengutronix.de
  Cc: linux-spi@vger.kernel.org, linux-kernel@vger.kernel.org,
	dmaengine@vger.kernel.org, dl-linux-imx, kernel@pengutronix.de,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org

On 2019-04-26 at 11:09 +0200, Lucas Stach wrote:
> 
> >  static inline int is_imx53_ecspi(struct spi_imx_data *d)
> > @@ -585,9 +587,16 @@ static int mx51_ecspi_prepare_transfer(struct
> > spi_imx_data *spi_imx,
> >       ctrl |= mx51_ecspi_clkdiv(spi_imx, t->speed_hz, &clk);
> >       spi_imx->spi_bus_clk = clk;
> > 
> > -     /* ERR009165: work in XHC mode as PIO */
> > -     if (spi_imx->usedma)
> > -             ctrl &= ~MX51_ECSPI_CTRL_SMC;
> > +     /*
> > +      * ERR009165: work in XHC mode instead of SMC as PIO on the
> > chips
> > +      * before i.mx6ul.
> > +      */
> > +     if (spi_imx->usedma) {
> > +             if (spi_imx->devtype_data->devtype == IMX6UL_ECSPI)
> > 
> > =, otherwise the workaround might be applied to later generations
> > of
> the core if more are added later.
> 
> Regards,
> Lucas
Understood your point, but for now choose different compatible name
could apply this workaround or not. I prefer to leave it for next ecspi
IP upgrade if it really come in the future, otherwise that '>=' bring
a little bit confuse if there is no update for ecspi IP...

^ permalink raw reply

* Re: [EXT] Re: [PATCH v2 06/15] spi: imx: fix ERR009165
From: Robin Gong @ 2019-04-26  9:22 UTC (permalink / raw)
  To: robh+dt@kernel.org, u.kleine-koenig@pengutronix.de,
	festevam@gmail.com, plyatov@gmail.com, broonie@kernel.org,
	dan.j.williams@intel.com, mark.rutland@arm.com,
	catalin.marinas@arm.com, will.deacon@arm.com, shawnguo@kernel.org,
	l.stach@pengutronix.de, s.hauer@pengutronix.de
  Cc: linux-spi@vger.kernel.org, linux-kernel@vger.kernel.org,
	dmaengine@vger.kernel.org, dl-linux-imx, kernel@pengutronix.de,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <1556269635.2584.23.camel@pengutronix.de>

On 2019-04-26 at 11:07 +0200, Lucas Stach wrote:
> 
> Am Freitag, den 26.04.2019, 08:05 +0000 schrieb Robin Gong:
> >  static void mx51_setup_wml(struct spi_imx_data *spi_imx)
> >  {
> > +     u32 tx_wml = 0;
> > +
> With a wml of 0 you might set the maxburst of the TX dma channel to
> fifosize to minimize the performance impact of this workaround.
> 
> Regards,
> Lucas
Unfortunately, this is a MUST part of errata which cause performance
drop.

^ permalink raw reply

* [v2,06/15] spi: imx: fix ERR009165
From: Robin Gong @ 2019-04-26  9:22 UTC (permalink / raw)
  To: robh+dt@kernel.org, u.kleine-koenig@pengutronix.de,
	festevam@gmail.com, plyatov@gmail.com, broonie@kernel.org,
	dan.j.williams@intel.com, mark.rutland@arm.com,
	catalin.marinas@arm.com, will.deacon@arm.com, shawnguo@kernel.org,
	l.stach@pengutronix.de, s.hauer@pengutronix.de
  Cc: linux-spi@vger.kernel.org, linux-kernel@vger.kernel.org,
	dmaengine@vger.kernel.org, dl-linux-imx, kernel@pengutronix.de,
	devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org

On 2019-04-26 at 11:07 +0200, Lucas Stach wrote:
> 
> Am Freitag, den 26.04.2019, 08:05 +0000 schrieb Robin Gong:
> >  static void mx51_setup_wml(struct spi_imx_data *spi_imx)
> >  {
> > +     u32 tx_wml = 0;
> > +
> With a wml of 0 you might set the maxburst of the TX dma channel to
> fifosize to minimize the performance impact of this workaround.
> 
> Regards,
> Lucas
Unfortunately, this is a MUST part of errata which cause performance
drop.

^ permalink raw reply

* Re: [PATCH v2 07/15] spi: imx: remove ERR009165 workaround on i.mx6ul
From: Lucas Stach @ 2019-04-26  9:09 UTC (permalink / raw)
  To: Robin Gong, broonie@kernel.org, shawnguo@kernel.org,
	s.hauer@pengutronix.de, festevam@gmail.com, robh+dt@kernel.org,
	mark.rutland@arm.com, u.kleine-koenig@pengutronix.de,
	plyatov@gmail.com, dan.j.williams@intel.com,
	catalin.marinas@arm.com, will.deacon@arm.com
  Cc: devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-spi@vger.kernel.org, dl-linux-imx, kernel@pengutronix.de,
	dmaengine@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <1556265512-9130-8-git-send-email-yibin.gong@nxp.com>

Am Freitag, den 26.04.2019, 08:05 +0000 schrieb Robin Gong:
>    ERR009165 fix on i.mx6ul and next chip, such as i.mx6ull/i.mx8mq/i.mx8mm.
> Remove workaround on those chips. Add new i.mx6ul type for that.
> 
> Signed-off-by: Robin Gong <yibin.gong@nxp.com>
> ---
>  drivers/spi/spi-imx.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
> index eb56eac..2e5e978 100644
> --- a/drivers/spi/spi-imx.c
> +++ b/drivers/spi/spi-imx.c
> @@ -57,6 +57,7 @@ enum spi_imx_devtype {
>  	IMX35_CSPI,	/* CSPI on all i.mx except above */
>  	IMX51_ECSPI,	/* ECSPI on i.mx51 */
>  	IMX53_ECSPI,	/* ECSPI on i.mx53 and later */
> +	IMX6UL_ECSPI,	/* ERR009165 fix from i.mx6ul */
>  };
>  
>  struct spi_imx_data;
> @@ -128,7 +129,8 @@ static inline int is_imx35_cspi(struct spi_imx_data *d)
>  
>  static inline int is_imx51_ecspi(struct spi_imx_data *d)
>  {
> -	return d->devtype_data->devtype == IMX51_ECSPI;
> +	return d->devtype_data->devtype == IMX51_ECSPI ||
> +	       d->devtype_data->devtype == IMX6UL_ECSPI;
>  }
>  
>  static inline int is_imx53_ecspi(struct spi_imx_data *d)
> @@ -585,9 +587,16 @@ static int mx51_ecspi_prepare_transfer(struct spi_imx_data *spi_imx,
>  	ctrl |= mx51_ecspi_clkdiv(spi_imx, t->speed_hz, &clk);
>  	spi_imx->spi_bus_clk = clk;
>  
> -	/* ERR009165: work in XHC mode as PIO */
> -	if (spi_imx->usedma)
> -		ctrl &= ~MX51_ECSPI_CTRL_SMC;
> +	/*
> +	 * ERR009165: work in XHC mode instead of SMC as PIO on the chips
> +	 * before i.mx6ul.
> +	 */
> +	if (spi_imx->usedma) {
> +		if (spi_imx->devtype_data->devtype == IMX6UL_ECSPI)

>=, otherwise the workaround might be applied to later generations of
the core if more are added later.

Regards,
Lucas

^ permalink raw reply

* [v2,07/15] spi: imx: remove ERR009165 workaround on i.mx6ul
From: Lucas Stach @ 2019-04-26  9:09 UTC (permalink / raw)
  To: Robin Gong, broonie@kernel.org, shawnguo@kernel.org,
	s.hauer@pengutronix.de, festevam@gmail.com, robh+dt@kernel.org,
	mark.rutland@arm.com, u.kleine-koenig@pengutronix.de,
	plyatov@gmail.com, dan.j.williams@intel.com,
	catalin.marinas@arm.com, will.deacon@arm.com
  Cc: devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-spi@vger.kernel.org, dl-linux-imx, kernel@pengutronix.de,
	dmaengine@vger.kernel.org, linux-arm-kernel@lists.infradead.org

Am Freitag, den 26.04.2019, 08:05 +0000 schrieb Robin Gong:
>    ERR009165 fix on i.mx6ul and next chip, such as i.mx6ull/i.mx8mq/i.mx8mm.
> Remove workaround on those chips. Add new i.mx6ul type for that.
> 
> Signed-off-by: Robin Gong <yibin.gong@nxp.com>
> ---
>  drivers/spi/spi-imx.c | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
> index eb56eac..2e5e978 100644
> --- a/drivers/spi/spi-imx.c
> +++ b/drivers/spi/spi-imx.c
> @@ -57,6 +57,7 @@ enum spi_imx_devtype {
>  	IMX35_CSPI,	/* CSPI on all i.mx except above */
>  	IMX51_ECSPI,	/* ECSPI on i.mx51 */
>  	IMX53_ECSPI,	/* ECSPI on i.mx53 and later */
> +	IMX6UL_ECSPI,	/* ERR009165 fix from i.mx6ul */
>  };
>  
>  struct spi_imx_data;
> @@ -128,7 +129,8 @@ static inline int is_imx35_cspi(struct spi_imx_data *d)
>  
>  static inline int is_imx51_ecspi(struct spi_imx_data *d)
>  {
> -	return d->devtype_data->devtype == IMX51_ECSPI;
> +	return d->devtype_data->devtype == IMX51_ECSPI ||
> +	       d->devtype_data->devtype == IMX6UL_ECSPI;
>  }
>  
>  static inline int is_imx53_ecspi(struct spi_imx_data *d)
> @@ -585,9 +587,16 @@ static int mx51_ecspi_prepare_transfer(struct spi_imx_data *spi_imx,
>  	ctrl |= mx51_ecspi_clkdiv(spi_imx, t->speed_hz, &clk);
>  	spi_imx->spi_bus_clk = clk;
>  
> -	/* ERR009165: work in XHC mode as PIO */
> -	if (spi_imx->usedma)
> -		ctrl &= ~MX51_ECSPI_CTRL_SMC;
> +	/*
> +	 * ERR009165: work in XHC mode instead of SMC as PIO on the chips
> +	 * before i.mx6ul.
> +	 */
> +	if (spi_imx->usedma) {
> +		if (spi_imx->devtype_data->devtype == IMX6UL_ECSPI)

>=, otherwise the workaround might be applied to later generations of
the core if more are added later.

Regards,
Lucas

^ permalink raw reply

* Re: [PATCH v2 06/15] spi: imx: fix ERR009165
From: Lucas Stach @ 2019-04-26  9:07 UTC (permalink / raw)
  To: Robin Gong, broonie@kernel.org, shawnguo@kernel.org,
	s.hauer@pengutronix.de, festevam@gmail.com, robh+dt@kernel.org,
	mark.rutland@arm.com, u.kleine-koenig@pengutronix.de,
	plyatov@gmail.com, dan.j.williams@intel.com,
	catalin.marinas@arm.com, will.deacon@arm.com
  Cc: devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-spi@vger.kernel.org, dl-linux-imx, kernel@pengutronix.de,
	dmaengine@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <1556265512-9130-7-git-send-email-yibin.gong@nxp.com>

Am Freitag, den 26.04.2019, 08:05 +0000 schrieb Robin Gong:
> Change to XCH  mode even in dma mode, please refer to the below
> errata:
> https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf
> 
> Signed-off-by: Robin Gong <yibin.gong@nxp.com>
> ---
>  drivers/spi/spi-imx.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
> index 09c9a1e..eb56eac 100644
> --- a/drivers/spi/spi-imx.c
> +++ b/drivers/spi/spi-imx.c
> @@ -585,8 +585,9 @@ static int mx51_ecspi_prepare_transfer(struct
> spi_imx_data *spi_imx,
>  	ctrl |= mx51_ecspi_clkdiv(spi_imx, t->speed_hz, &clk);
>  	spi_imx->spi_bus_clk = clk;
>  
> +	/* ERR009165: work in XHC mode as PIO */
>  	if (spi_imx->usedma)
> -		ctrl |= MX51_ECSPI_CTRL_SMC;
> +		ctrl &= ~MX51_ECSPI_CTRL_SMC;
>  
>  	writel(ctrl, spi_imx->base + MX51_ECSPI_CTRL);
>  
> @@ -612,12 +613,14 @@ static int mx51_ecspi_prepare_transfer(struct
> spi_imx_data *spi_imx,
>  
>  static void mx51_setup_wml(struct spi_imx_data *spi_imx)
>  {
> +	u32 tx_wml = 0;
> +

With a wml of 0 you might set the maxburst of the TX dma channel to
fifosize to minimize the performance impact of this workaround.

Regards,
Lucas

>  	/*
>  	 * Configure the DMA register: setup the watermark
>  	 * and enable DMA request.
>  	 */
>  	writel(MX51_ECSPI_DMA_RX_WML(spi_imx->wml - 1) |
> -		MX51_ECSPI_DMA_TX_WML(spi_imx->wml) |
> +		MX51_ECSPI_DMA_TX_WML(tx_wml) |
>  		MX51_ECSPI_DMA_RXT_WML(spi_imx->wml) |
>  		MX51_ECSPI_DMA_TEDEN | MX51_ECSPI_DMA_RXDEN |
>  		MX51_ECSPI_DMA_RXTDEN, spi_imx->base +
> MX51_ECSPI_DMA);
> @@ -1265,10 +1268,6 @@ static int spi_imx_sdma_init(struct device
> *dev, struct spi_imx_data *spi_imx,
>  {
>  	int ret;
>  
> -	/* use pio mode for i.mx6dl chip TKT238285 */
> -	if (of_machine_is_compatible("fsl,imx6dl"))
> -		return 0;
> -
>  	spi_imx->wml = spi_imx->devtype_data->fifo_size / 2;
>  
>  	/* Prepare for TX DMA: */
> -- 
> 2.7.4
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox