DMA Engine development

DMA Engine development
 help / color / mirror / Atom feed

* [RFC] dmaengine: add fifo_size member
From: Vinod Koul @ 2019-05-02  6:04 UTC (permalink / raw)
  To: Sameer Pujar; +Cc: dan.j.williams, tiwai, jonathanh, dmaengine, linux-kernel

On 30-04-19, 17:00, Sameer Pujar wrote:
> During the DMA transfers from memory to I/O, it was observed that transfers
> were inconsistent and resulted in glitches for audio playback. It happened
> because fifo size on DMA did not match with slave channel configuration.
> 
> currently 'dma_slave_config' structure does not have a field for fifo size.
> Hence the platform pcm driver cannot pass the fifo size as a slave_config.
> Note that 'snd_dmaengine_dai_dma_data' structure has fifo_size field which
> cannot be used to pass the size info. This patch introduces fifo_size field
> and the same can be populated on slave side. Users can set required size
> for slave peripheral (multiple channels can be independently running with
> different fifo sizes) and the corresponding sizes are programmed through
> dma_slave_config on DMA side.

FIFO size is a hardware property not sure why you would want an
interface to program that?

On mismatch, I guess you need to take care of src/dst_maxburst..

> 
> Request for feedback/suggestions.
> 
> Signed-off-by: Sameer Pujar <spujar@nvidia.com>
> ---
>  include/linux/dmaengine.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index d49ec5c..9ec198b 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -351,6 +351,8 @@ enum dma_slave_buswidth {
>   * @slave_id: Slave requester id. Only valid for slave channels. The dma
>   * slave peripheral will have unique id as dma requester which need to be
>   * pass as slave config.
> + * @fifo_size: Fifo size value. The dma slave peripheral can configure required
> + * fifo size and the same needs to be passed as slave config.
>   *
>   * This struct is passed in as configuration data to a DMA engine
>   * in order to set up a certain channel for DMA transport at runtime.
> @@ -376,6 +378,7 @@ struct dma_slave_config {
>  	u32 dst_port_window_size;
>  	bool device_fc;
>  	unsigned int slave_id;
> +	u32 fifo_size;
>  };
>  
>  /**
> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH 4/7] dmaengine: sprd: Add device validation to support multiple controllers
From: Vinod Koul @ 2019-05-02  6:01 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Dan Williams, eric.long, Orson Zhai, Chunyan Zhang, Mark Brown,
	dmaengine, LKML
In-Reply-To: <CAMz4kuLFyckFdzVgL9FH0xW8036OoAbyjHqfOAVhibPyNssPDA@mail.gmail.com>

On 30-04-19, 16:53, Baolin Wang wrote:
> Hi Vinod,
> 
> On Tue, 30 Apr 2019 at 16:34, Baolin Wang <baolin.wang@linaro.org> wrote:
> >
> > On Tue, 30 Apr 2019 at 16:30, Vinod Koul <vkoul@kernel.org> wrote:
> > >
> > > On 30-04-19, 13:30, Baolin Wang wrote:
> > > > On Mon, 29 Apr 2019 at 22:05, Vinod Koul <vkoul@kernel.org> wrote:
> > > > >
> > > > > On 29-04-19, 20:20, Baolin Wang wrote:
> > > > > > On Mon, 29 Apr 2019 at 19:57, Vinod Koul <vkoul@kernel.org> wrote:
> > > > > > >
> > > > > > > On 15-04-19, 20:14, Baolin Wang wrote:
> > > > > > > > From: Eric Long <eric.long@unisoc.com>
> > > > > > > >
> > > > > > > > Since we can support multiple DMA engine controllers, we should add
> > > > > > > > device validation in filter function to check if the correct controller
> > > > > > > > to be requested.
> > > > > > > >
> > > > > > > > Signed-off-by: Eric Long <eric.long@unisoc.com>
> > > > > > > > Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> > > > > > > > ---
> > > > > > > >  drivers/dma/sprd-dma.c |    5 +++++
> > > > > > > >  1 file changed, 5 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
> > > > > > > > index 0f92e60..9f99d4b 100644
> > > > > > > > --- a/drivers/dma/sprd-dma.c
> > > > > > > > +++ b/drivers/dma/sprd-dma.c
> > > > > > > > @@ -1020,8 +1020,13 @@ static void sprd_dma_free_desc(struct virt_dma_desc *vd)
> > > > > > > >  static bool sprd_dma_filter_fn(struct dma_chan *chan, void *param)
> > > > > > > >  {
> > > > > > > >       struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> > > > > > > > +     struct of_phandle_args *dma_spec =
> > > > > > > > +             container_of(param, struct of_phandle_args, args[0]);
> > > > > > > >       u32 slave_id = *(u32 *)param;
> > > > > > > >
> > > > > > > > +     if (chan->device->dev->of_node != dma_spec->np)
> > > > > > >
> > > > > > > Are you not using of_dma_find_controller() that does this, so this would
> > > > > > > be useless!
> > > > > >
> > > > > > Yes, we can use of_dma_find_controller(), but that will be a little
> > > > > > complicated than current solution. Since we need introduce one
> > > > > > structure to save the node to validate in the filter function like
> > > > > > below, which seems make things complicated. But if you still like to
> > > > > > use of_dma_find_controller(), I can change to use it in next version.
> > > > >
> > > > > Sorry I should have clarified more..
> > > > >
> > > > > of_dma_find_controller() is called by xlate, so you already run this
> > > > > check, so why use this :)
> > > >
> > > > The of_dma_find_controller() can save the requested device node into
> > > > dma_spec, and in the of_dma_simple_xlate() function, it will call
> > > > dma_request_channel() to request one channel, but it did not validate
> > > > the device node to find the corresponding dma device in
> > > > dma_request_channel(). So we should in our filter function to validate
> > > > the device node with the device node specified by the dma_spec. Hope I
> > > > make things clear.
> > >
> > > But dma_request_channel() calls of_dma_request_slave_channel() which
> > > invokes of_dma_find_controller() why is it broken for you if that is the
> > > case..
> >
> > No,the calling process should be:
> > dma_request_slave_channel()
> > --->dma_request_chan()--->of_dma_request_slave_channel()---->of_dma_simple_xlate()
> > ----> dma_request_channel().

The thing is that this is a generic issue, so fix should be in the core
and not in the driver. Agree in you case of_dma_find_controller() is not
invoked, so we should fix that in core

> 
> You can check other drivers, they also will save the device node to
> validate in their filter function.
> For example the imx-sdma driver:
> https://elixir.bootlin.com/linux/v5.1-rc6/source/drivers/dma/imx-sdma.c#L1931

Exactly, more the reason this should be in core :)

-- 
~Vinod

^ permalink raw reply

* [4/7] dmaengine: sprd: Add device validation to support multiple controllers
From: Vinod Koul @ 2019-05-02  6:01 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Dan Williams, eric.long, Orson Zhai, Chunyan Zhang, Mark Brown,
	dmaengine, LKML

On 30-04-19, 16:53, Baolin Wang wrote:
> Hi Vinod,
> 
> On Tue, 30 Apr 2019 at 16:34, Baolin Wang <baolin.wang@linaro.org> wrote:
> >
> > On Tue, 30 Apr 2019 at 16:30, Vinod Koul <vkoul@kernel.org> wrote:
> > >
> > > On 30-04-19, 13:30, Baolin Wang wrote:
> > > > On Mon, 29 Apr 2019 at 22:05, Vinod Koul <vkoul@kernel.org> wrote:
> > > > >
> > > > > On 29-04-19, 20:20, Baolin Wang wrote:
> > > > > > On Mon, 29 Apr 2019 at 19:57, Vinod Koul <vkoul@kernel.org> wrote:
> > > > > > >
> > > > > > > On 15-04-19, 20:14, Baolin Wang wrote:
> > > > > > > > From: Eric Long <eric.long@unisoc.com>
> > > > > > > >
> > > > > > > > Since we can support multiple DMA engine controllers, we should add
> > > > > > > > device validation in filter function to check if the correct controller
> > > > > > > > to be requested.
> > > > > > > >
> > > > > > > > Signed-off-by: Eric Long <eric.long@unisoc.com>
> > > > > > > > Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> > > > > > > > ---
> > > > > > > >  drivers/dma/sprd-dma.c |    5 +++++
> > > > > > > >  1 file changed, 5 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
> > > > > > > > index 0f92e60..9f99d4b 100644
> > > > > > > > --- a/drivers/dma/sprd-dma.c
> > > > > > > > +++ b/drivers/dma/sprd-dma.c
> > > > > > > > @@ -1020,8 +1020,13 @@ static void sprd_dma_free_desc(struct virt_dma_desc *vd)
> > > > > > > >  static bool sprd_dma_filter_fn(struct dma_chan *chan, void *param)
> > > > > > > >  {
> > > > > > > >       struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> > > > > > > > +     struct of_phandle_args *dma_spec =
> > > > > > > > +             container_of(param, struct of_phandle_args, args[0]);
> > > > > > > >       u32 slave_id = *(u32 *)param;
> > > > > > > >
> > > > > > > > +     if (chan->device->dev->of_node != dma_spec->np)
> > > > > > >
> > > > > > > Are you not using of_dma_find_controller() that does this, so this would
> > > > > > > be useless!
> > > > > >
> > > > > > Yes, we can use of_dma_find_controller(), but that will be a little
> > > > > > complicated than current solution. Since we need introduce one
> > > > > > structure to save the node to validate in the filter function like
> > > > > > below, which seems make things complicated. But if you still like to
> > > > > > use of_dma_find_controller(), I can change to use it in next version.
> > > > >
> > > > > Sorry I should have clarified more..
> > > > >
> > > > > of_dma_find_controller() is called by xlate, so you already run this
> > > > > check, so why use this :)
> > > >
> > > > The of_dma_find_controller() can save the requested device node into
> > > > dma_spec, and in the of_dma_simple_xlate() function, it will call
> > > > dma_request_channel() to request one channel, but it did not validate
> > > > the device node to find the corresponding dma device in
> > > > dma_request_channel(). So we should in our filter function to validate
> > > > the device node with the device node specified by the dma_spec. Hope I
> > > > make things clear.
> > >
> > > But dma_request_channel() calls of_dma_request_slave_channel() which
> > > invokes of_dma_find_controller() why is it broken for you if that is the
> > > case..
> >
> > No,the calling process should be:
> > dma_request_slave_channel()
> > --->dma_request_chan()--->of_dma_request_slave_channel()---->of_dma_simple_xlate()
> > ----> dma_request_channel().

The thing is that this is a generic issue, so fix should be in the core
and not in the driver. Agree in you case of_dma_find_controller() is not
invoked, so we should fix that in core

> 
> You can check other drivers, they also will save the device node to
> validate in their filter function.
> For example the imx-sdma driver:
> https://elixir.bootlin.com/linux/v5.1-rc6/source/drivers/dma/imx-sdma.c#L1931

Exactly, more the reason this should be in core :)

^ permalink raw reply

* Re: [PATCH v2 10/15] dt-bindings: dma: imx-sdma: add i.mx6ul/6sx compatible name
From: Rob Herring @ 2019-05-01 20:07 UTC (permalink / raw)
  To: Robin Gong
  Cc: broonie@kernel.org, shawnguo@kernel.org, s.hauer@pengutronix.de,
	festevam@gmail.com, robh+dt@kernel.org, mark.rutland@arm.com,
	u.kleine-koenig@pengutronix.de, plyatov@gmail.com,
	dan.j.williams@intel.com, catalin.marinas@arm.com,
	will.deacon@arm.com, dl-linux-imx, linux-spi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	devicetree@vger.kernel.org, kernel@pengutronix.de
In-Reply-To: <1556265512-9130-11-git-send-email-yibin.gong@nxp.com>

On Fri, 26 Apr 2019 08:06:03 +0000, Robin Gong wrote:
> Add i.mx6ul and i.mx6sx compatible name.
> 
> Signed-off-by: Robin Gong <yibin.gong@nxp.com>
> ---
>  Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt | 2 ++
>  1 file changed, 2 insertions(+)
> 

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* [v2,10/15] dt-bindings: dma: imx-sdma: add i.mx6ul/6sx compatible name
From: Rob Herring @ 2019-05-01 20:07 UTC (permalink / raw)
  To: Robin Gong
  Cc: broonie@kernel.org, shawnguo@kernel.org, s.hauer@pengutronix.de,
	festevam@gmail.com, robh+dt@kernel.org, mark.rutland@arm.com,
	u.kleine-koenig@pengutronix.de, plyatov@gmail.com,
	dan.j.williams@intel.com, catalin.marinas@arm.com,
	will.deacon@arm.com, dl-linux-imx, linux-spi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	devicetree@vger.kernel.org, kernel@pengutronix.de

On Fri, 26 Apr 2019 08:06:03 +0000, Robin Gong wrote:
> Add i.mx6ul and i.mx6sx compatible name.
> 
> Signed-off-by: Robin Gong <yibin.gong@nxp.com>
> ---
>  Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt | 2 ++
>  1 file changed, 2 insertions(+)
> 

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* Re: [PATCH v2 08/15] dt-bindings: spi: imx: add i.mx6ul to state errata fixed
From: Rob Herring @ 2019-05-01 20:07 UTC (permalink / raw)
  To: Robin Gong
  Cc: broonie@kernel.org, shawnguo@kernel.org, s.hauer@pengutronix.de,
	festevam@gmail.com, mark.rutland@arm.com,
	u.kleine-koenig@pengutronix.de, plyatov@gmail.com,
	dan.j.williams@intel.com, catalin.marinas@arm.com,
	will.deacon@arm.com, dl-linux-imx, linux-spi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	devicetree@vger.kernel.org, kernel@pengutronix.de
In-Reply-To: <1556265512-9130-9-git-send-email-yibin.gong@nxp.com>

On Fri, Apr 26, 2019 at 08:05:51AM +0000, Robin Gong wrote:
> ERR009165 fixed from i.mx6ul, add it to show the errata fixed.
> 
> Signed-off-by: Robin Gong <yibin.gong@nxp.com>
> ---
>  Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt b/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
> index 2d32641..32c4263d 100644
> --- a/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
> +++ b/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
> @@ -10,6 +10,8 @@ Required properties:
>    - "fsl,imx35-cspi" for SPI compatible with the one integrated on i.MX35
>    - "fsl,imx51-ecspi" for SPI compatible with the one integrated on i.MX51
>    - "fsl,imx53-ecspi" for SPI compatible with the one integrated on i.MX53 and later Soc
> +  - "fsl,imx6ul-ecspi" ERR009165 fixed on i.MX6UL and later Soc
> +    (https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf)

What about other i.MX6 chips?

Seems like this is missing some fallbacks. The binding doc should make 
it clear what are all valid combinations of compatible strings.

>    - "fsl,imx8mq-ecspi" for SPI compatible with the one integrated on i.MX8M
>  - reg : Offset and length of the register set for the device
>  - interrupts : Should contain CSPI/eCSPI interrupt
> -- 
> 2.7.4
> 

^ permalink raw reply

* [v2,08/15] dt-bindings: spi: imx: add i.mx6ul to state errata fixed
From: Rob Herring @ 2019-05-01 20:07 UTC (permalink / raw)
  To: Robin Gong
  Cc: broonie@kernel.org, shawnguo@kernel.org, s.hauer@pengutronix.de,
	festevam@gmail.com, mark.rutland@arm.com,
	u.kleine-koenig@pengutronix.de, plyatov@gmail.com,
	dan.j.williams@intel.com, catalin.marinas@arm.com,
	will.deacon@arm.com, dl-linux-imx, linux-spi@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org,
	devicetree@vger.kernel.org, kernel@pengutronix.de

On Fri, Apr 26, 2019 at 08:05:51AM +0000, Robin Gong wrote:
> ERR009165 fixed from i.mx6ul, add it to show the errata fixed.
> 
> Signed-off-by: Robin Gong <yibin.gong@nxp.com>
> ---
>  Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt b/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
> index 2d32641..32c4263d 100644
> --- a/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
> +++ b/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
> @@ -10,6 +10,8 @@ Required properties:
>    - "fsl,imx35-cspi" for SPI compatible with the one integrated on i.MX35
>    - "fsl,imx51-ecspi" for SPI compatible with the one integrated on i.MX51
>    - "fsl,imx53-ecspi" for SPI compatible with the one integrated on i.MX53 and later Soc
> +  - "fsl,imx6ul-ecspi" ERR009165 fixed on i.MX6UL and later Soc
> +    (https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf)

What about other i.MX6 chips?

Seems like this is missing some fallbacks. The binding doc should make 
it clear what are all valid combinations of compatible strings.

>    - "fsl,imx8mq-ecspi" for SPI compatible with the one integrated on i.MX8M
>  - reg : Offset and length of the register set for the device
>  - interrupts : Should contain CSPI/eCSPI interrupt
> -- 
> 2.7.4
>

^ permalink raw reply

* Re: [PATCH] dma: tegra: add accurate reporting of dma state
From: Vinod Koul @ 2019-05-01 13:13 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Ben Dooks, linux-kernel, Dmitry Osipenko, Laxman Dewangan,
	Dan Williams, Thierry Reding, dmaengine, linux-tegra,
	linux-kernel
In-Reply-To: <71198258-40b8-3f7f-1401-58513bfaaab5@nvidia.com>

On 01-05-19, 09:33, Jon Hunter wrote:

> > @@ -1444,12 +1529,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
> >  		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
> >  		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
> >  	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
> > -	/*
> > -	 * XXX The hardware appears to support
> > -	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
> > -	 * only used by this driver during tegra_dma_terminate_all()
> > -	 */
> > -	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
> > +	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
> >  	tdma->dma_dev.device_config = tegra_dma_slave_config;
> >  	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
> >  	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;
> 
> In addition to Dmitry's comments, can you please make sure you run this
> through checkpatch.pl?

And use correct subsystem name !

-- 
~Vinod

^ permalink raw reply

* dma: tegra: add accurate reporting of dma state
From: Vinod Koul @ 2019-05-01 13:13 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Ben Dooks, linux-kernel, Dmitry Osipenko, Laxman Dewangan,
	Dan Williams, Thierry Reding, dmaengine, linux-tegra,
	linux-kernel

On 01-05-19, 09:33, Jon Hunter wrote:

> > @@ -1444,12 +1529,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
> >  		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
> >  		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
> >  	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
> > -	/*
> > -	 * XXX The hardware appears to support
> > -	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
> > -	 * only used by this driver during tegra_dma_terminate_all()
> > -	 */
> > -	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
> > +	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
> >  	tdma->dma_dev.device_config = tegra_dma_slave_config;
> >  	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
> >  	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;
> 
> In addition to Dmitry's comments, can you please make sure you run this
> through checkpatch.pl?

And use correct subsystem name !

^ permalink raw reply

* Re: [PATCH] dma: tegra: add accurate reporting of dma state
From: Ben Dooks @ 2019-05-01  8:58 UTC (permalink / raw)
  To: Dmitry Osipenko, linux-kernel
  Cc: Laxman Dewangan, Jon Hunter, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel
In-Reply-To: <dbe7c256-c7ad-a172-9e63-9251bec6f182@gmail.com>

On 24/04/2019 19:17, Dmitry Osipenko wrote:
> 24.04.2019 19:23, Ben Dooks пишет:
>> The tx_status callback does not report the state of the transfer
>> beyond complete segments. This causes problems with users such as
>> ALSA when applications want to know accurately how much data has
>> been moved.
>>
>> This patch addes a function tegra_dma_update_residual() to query
>> the hardware and modify the residual information accordinly. It
>> takes into account any hardware issues when trying to read the
>> state, such as delays between finishing a buffer and signalling
>> the interrupt.
>>
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> 
> Hello Ben,
> 
> Thank you very much for keeping it up. I have couple comments, please see them below.
> 
>> Cc: Dmitry Osipenko <digetx@gmail.com>
>> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
>> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
>> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
>> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
>> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
>> Cc: linux-kernel@vger.kernel.org (open list)
>> ---
>>   drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>>   1 file changed, 86 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
>> index cf462b1abc0b..544e7273e741 100644
>> --- a/drivers/dma/tegra20-apb-dma.c
>> +++ b/drivers/dma/tegra20-apb-dma.c
>> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
>>   	return 0;
>>   }
>>   
>> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
>> +					      struct tegra_dma_sg_req *sg_req,
>> +					      struct tegra_dma_desc *dma_desc,
>> +					      unsigned int residual)
>> +{
>> +	unsigned long status = 0x0;
>> +	unsigned long wcount;
>> +	unsigned long ahbptr;
>> +	unsigned long tmp = 0x0;
>> +	unsigned int result;
> 
> You could pre-assign ahbptr=0xffffffff and result=residual here, then you could remove all the duplicated assigns below.

ok, ta.

>> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
>> +	int done;
>> +
>> +	/* if we're not the current request, then don't alter the residual */
>> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
>> +				       struct tegra_dma_sg_req, node)) {
>> +		result = residual;
>> +		ahbptr = 0xffffffff;
>> +		goto done;
>> +	}
>> +
>> +	/* loop until we have a reliable result for residual */
>> +	do {
>> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
>> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
> 
> The "tmp" variable isn't used anywhere in the code, please remove it.

must have been left over.

>> +
>> +		/* check status, if channel isn't busy then skip */
>> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
>> +			result = residual;
>> +			break;
>> +		}
> 
> This doesn't look correct because TRM says "Busy bit gets set as soon as a channel is enabled and gets cleared after transfer completes", hence a cleared BUSY bit means that all transfers are completed and result=residual is incorrect here. Given that there is a check for EOC bit being set below, this hunk should be removed.

I'll check notes, but see below.

>> +
>> +		/* if we've got an interrupt pending on the channel, don't
>> +		 * try and deal with the residue as the hardware has likely
>> +		 * moved on to the next buffer. return all data moved.
>> +		 */
>> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
>> +			result = residual - sg_req->req_len;
>> +			break;
>> +		}
>> +
>> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
>> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>> +		else
>> +			wcount = status;
>> +
>> +		/* If the request is at the full point, then there is a
>> +		 * chance that we have read the status register in the
>> +		 * middle of the hardware reloading the next buffer.
>> +		 *
>> +		 * The sequence seems to be at the end of the buffer, to
>> +		 * load the new word count before raising the EOC flag (or
>> +		 * changing the ping-pong flag which could have also been
>> +		 * used to determine a new buffer). This  means there is a
>> +		 * small window where we cannot determine zero-done for the
>> +		 * current buffer, or moved to next buffer.
>> +		 *
>> +		 * If done shows 0, then retry the load, as it may hit the
>> +		 * above hardware race. We will either get a new value which
>> +		 * is from the first buffer, or we get an EOC (new buffer)
>> +		 * or both a new value and an EOC...
>> +		 */
>> +		done = get_current_xferred_count(tdc, sg_req, wcount);
>> +		if (done != 0) {
>> +			result = residual - done;
>> +			break;
>> +		}
>> +
>> +		ndelay(100);
> 
> Please use udelay(1) because there is no ndelay on arm32 and ndelay(100) is getting rounded up to 1usec. AFAIK, arm64 doesn't have reliable ndelay on Tegra either because timer rate changes with the CPU frequency scaling.

I'll check, but last time it was implemented. This seems a backwards step.

> Secondly done=0 isn't a error case, technically this could be the case when tegra_dma_update_residual() is invoked just after starting the transfer. Hence I think this do-while loop and timeout checking aren't needed at all since done=0 is a perfectly valid case.

this is not checking for an error, it's checking for a possible
inaccurate reading.

> 
> Altogether seems the tegra_dma_update_residual() could be reduced to:
> 
> static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
> 					      struct tegra_dma_sg_req *sg_req,
> 					      struct tegra_dma_desc *dma_desc,
> 					      unsigned int residual)
> {
> 	unsigned long status, wcount;
> 
> 	if (list_is_first(&sg_req->node, &tdc->pending_sg_req))
> 		return residual;
> 
> 	if (tdc->tdma->chip_data->support_separate_wcount_reg)
> 		wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> 
> 	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> 
> 	if (!tdc->tdma->chip_data->support_separate_wcount_reg)
> 		wcount = status;
> 
> 	if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
> 		return residual - sg_req->req_len;
> 
> 	return residual - get_current_xferred_count(tdc, sg_req, wcount);
> }

I'm not sure if that will work all the time. It took days of testing to
get reliable error data for the cases we're looking for here.


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

^ permalink raw reply

* dma: tegra: add accurate reporting of dma state
From: Ben Dooks @ 2019-05-01  8:58 UTC (permalink / raw)
  To: Dmitry Osipenko, linux-kernel
  Cc: Laxman Dewangan, Jon Hunter, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel

On 24/04/2019 19:17, Dmitry Osipenko wrote:
> 24.04.2019 19:23, Ben Dooks пишет:
>> The tx_status callback does not report the state of the transfer
>> beyond complete segments. This causes problems with users such as
>> ALSA when applications want to know accurately how much data has
>> been moved.
>>
>> This patch addes a function tegra_dma_update_residual() to query
>> the hardware and modify the residual information accordinly. It
>> takes into account any hardware issues when trying to read the
>> state, such as delays between finishing a buffer and signalling
>> the interrupt.
>>
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> 
> Hello Ben,
> 
> Thank you very much for keeping it up. I have couple comments, please see them below.
> 
>> Cc: Dmitry Osipenko <digetx@gmail.com>
>> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
>> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
>> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
>> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
>> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
>> Cc: linux-kernel@vger.kernel.org (open list)
>> ---
>>   drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>>   1 file changed, 86 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
>> index cf462b1abc0b..544e7273e741 100644
>> --- a/drivers/dma/tegra20-apb-dma.c
>> +++ b/drivers/dma/tegra20-apb-dma.c
>> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
>>   	return 0;
>>   }
>>   
>> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
>> +					      struct tegra_dma_sg_req *sg_req,
>> +					      struct tegra_dma_desc *dma_desc,
>> +					      unsigned int residual)
>> +{
>> +	unsigned long status = 0x0;
>> +	unsigned long wcount;
>> +	unsigned long ahbptr;
>> +	unsigned long tmp = 0x0;
>> +	unsigned int result;
> 
> You could pre-assign ahbptr=0xffffffff and result=residual here, then you could remove all the duplicated assigns below.

ok, ta.

>> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
>> +	int done;
>> +
>> +	/* if we're not the current request, then don't alter the residual */
>> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
>> +				       struct tegra_dma_sg_req, node)) {
>> +		result = residual;
>> +		ahbptr = 0xffffffff;
>> +		goto done;
>> +	}
>> +
>> +	/* loop until we have a reliable result for residual */
>> +	do {
>> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
>> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
> 
> The "tmp" variable isn't used anywhere in the code, please remove it.

must have been left over.

>> +
>> +		/* check status, if channel isn't busy then skip */
>> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
>> +			result = residual;
>> +			break;
>> +		}
> 
> This doesn't look correct because TRM says "Busy bit gets set as soon as a channel is enabled and gets cleared after transfer completes", hence a cleared BUSY bit means that all transfers are completed and result=residual is incorrect here. Given that there is a check for EOC bit being set below, this hunk should be removed.

I'll check notes, but see below.

>> +
>> +		/* if we've got an interrupt pending on the channel, don't
>> +		 * try and deal with the residue as the hardware has likely
>> +		 * moved on to the next buffer. return all data moved.
>> +		 */
>> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
>> +			result = residual - sg_req->req_len;
>> +			break;
>> +		}
>> +
>> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
>> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>> +		else
>> +			wcount = status;
>> +
>> +		/* If the request is at the full point, then there is a
>> +		 * chance that we have read the status register in the
>> +		 * middle of the hardware reloading the next buffer.
>> +		 *
>> +		 * The sequence seems to be at the end of the buffer, to
>> +		 * load the new word count before raising the EOC flag (or
>> +		 * changing the ping-pong flag which could have also been
>> +		 * used to determine a new buffer). This  means there is a
>> +		 * small window where we cannot determine zero-done for the
>> +		 * current buffer, or moved to next buffer.
>> +		 *
>> +		 * If done shows 0, then retry the load, as it may hit the
>> +		 * above hardware race. We will either get a new value which
>> +		 * is from the first buffer, or we get an EOC (new buffer)
>> +		 * or both a new value and an EOC...
>> +		 */
>> +		done = get_current_xferred_count(tdc, sg_req, wcount);
>> +		if (done != 0) {
>> +			result = residual - done;
>> +			break;
>> +		}
>> +
>> +		ndelay(100);
> 
> Please use udelay(1) because there is no ndelay on arm32 and ndelay(100) is getting rounded up to 1usec. AFAIK, arm64 doesn't have reliable ndelay on Tegra either because timer rate changes with the CPU frequency scaling.

I'll check, but last time it was implemented. This seems a backwards step.

> Secondly done=0 isn't a error case, technically this could be the case when tegra_dma_update_residual() is invoked just after starting the transfer. Hence I think this do-while loop and timeout checking aren't needed at all since done=0 is a perfectly valid case.

this is not checking for an error, it's checking for a possible
inaccurate reading.

> 
> Altogether seems the tegra_dma_update_residual() could be reduced to:
> 
> static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
> 					      struct tegra_dma_sg_req *sg_req,
> 					      struct tegra_dma_desc *dma_desc,
> 					      unsigned int residual)
> {
> 	unsigned long status, wcount;
> 
> 	if (list_is_first(&sg_req->node, &tdc->pending_sg_req))
> 		return residual;
> 
> 	if (tdc->tdma->chip_data->support_separate_wcount_reg)
> 		wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> 
> 	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> 
> 	if (!tdc->tdma->chip_data->support_separate_wcount_reg)
> 		wcount = status;
> 
> 	if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
> 		return residual - sg_req->req_len;
> 
> 	return residual - get_current_xferred_count(tdc, sg_req, wcount);
> }

I'm not sure if that will work all the time. It took days of testing to
get reliable error data for the cases we're looking for here.

^ permalink raw reply

* Re: [PATCH] dma: tegra: add accurate reporting of dma state
From: Jon Hunter @ 2019-05-01  8:33 UTC (permalink / raw)
  To: Ben Dooks, linux-kernel
  Cc: Dmitry Osipenko, Laxman Dewangan, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel
In-Reply-To: <20190424162348.23692-1-ben.dooks@codethink.co.uk>


On 24/04/2019 17:23, Ben Dooks wrote:
> The tx_status callback does not report the state of the transfer
> beyond complete segments. This causes problems with users such as
> ALSA when applications want to know accurately how much data has
> been moved.
> 
> This patch addes a function tegra_dma_update_residual() to query
> the hardware and modify the residual information accordinly. It
> takes into account any hardware issues when trying to read the
> state, such as delays between finishing a buffer and signalling
> the interrupt.
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
> Cc: Dmitry Osipenko <digetx@gmail.com>
> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
> Cc: linux-kernel@vger.kernel.org (open list)
> ---
>  drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>  1 file changed, 86 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> index cf462b1abc0b..544e7273e741 100644
> --- a/drivers/dma/tegra20-apb-dma.c
> +++ b/drivers/dma/tegra20-apb-dma.c
> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
>  	return 0;
>  }
>  
> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
> +					      struct tegra_dma_sg_req *sg_req,
> +					      struct tegra_dma_desc *dma_desc,
> +					      unsigned int residual)
> +{
> +	unsigned long status = 0x0;
> +	unsigned long wcount;
> +	unsigned long ahbptr;
> +	unsigned long tmp = 0x0;
> +	unsigned int result;
> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
> +	int done;
> +
> +	/* if we're not the current request, then don't alter the residual */
> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
> +				       struct tegra_dma_sg_req, node)) {
> +		result = residual;
> +		ahbptr = 0xffffffff;
> +		goto done;
> +	}
> +
> +	/* loop until we have a reliable result for residual */
> +	do {
> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
> +
> +		/* check status, if channel isn't busy then skip */
> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
> +			result = residual;
> +			break;
> +		}
> +
> +		/* if we've got an interrupt pending on the channel, don't
> +		 * try and deal with the residue as the hardware has likely
> +		 * moved on to the next buffer. return all data moved.
> +		 */
> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
> +			result = residual - sg_req->req_len;
> +			break;
> +		}
> +
> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> +		else
> +			wcount = status;
> +
> +		/* If the request is at the full point, then there is a
> +		 * chance that we have read the status register in the
> +		 * middle of the hardware reloading the next buffer.
> +		 *
> +		 * The sequence seems to be at the end of the buffer, to
> +		 * load the new word count before raising the EOC flag (or
> +		 * changing the ping-pong flag which could have also been
> +		 * used to determine a new buffer). This  means there is a
> +		 * small window where we cannot determine zero-done for the
> +		 * current buffer, or moved to next buffer.
> +		 *
> +		 * If done shows 0, then retry the load, as it may hit the
> +		 * above hardware race. We will either get a new value which
> +		 * is from the first buffer, or we get an EOC (new buffer)
> +		 * or both a new value and an EOC...
> +		 */
> +		done = get_current_xferred_count(tdc, sg_req, wcount);
> +		if (done != 0) {
> +			result = residual - done;
> +			break;
> +		}
> +
> +		ndelay(100);
> +	} while (--retries > 0);
> +
> +	if (retries <= 0) {
> +		dev_err(tdc2dev(tdc), "timeout waiting for dma load\n");
> +		result = residual;
> +	}
> +
> +done:	
> +	dev_dbg(tdc2dev(tdc), "residual: req %08lx, ahb@%08lx, wcount %08lx, done %d\n",
> +		 sg_req->ch_regs.ahb_ptr, ahbptr, wcount, done);
> +
> +	return result;
> +}
> +
>  static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>  	dma_cookie_t cookie, struct dma_tx_state *txstate)
>  {
> @@ -849,6 +933,7 @@ static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>  		residual = dma_desc->bytes_requested -
>  			   (dma_desc->bytes_transferred %
>  			    dma_desc->bytes_requested);
> +		residual = tegra_dma_update_residual(tdc, sg_req, dma_desc, residual);
>  		dma_set_residue(txstate, residual);
>  	}
>  
> @@ -1444,12 +1529,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
>  		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
>  		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
>  	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
> -	/*
> -	 * XXX The hardware appears to support
> -	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
> -	 * only used by this driver during tegra_dma_terminate_all()
> -	 */
> -	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
> +	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
>  	tdma->dma_dev.device_config = tegra_dma_slave_config;
>  	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
>  	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;

In addition to Dmitry's comments, can you please make sure you run this
through checkpatch.pl?

Thanks
Jon

-- 
nvpublic

^ permalink raw reply

* dma: tegra: add accurate reporting of dma state
From: Jon Hunter @ 2019-05-01  8:33 UTC (permalink / raw)
  To: Ben Dooks, linux-kernel
  Cc: Dmitry Osipenko, Laxman Dewangan, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel

On 24/04/2019 17:23, Ben Dooks wrote:
> The tx_status callback does not report the state of the transfer
> beyond complete segments. This causes problems with users such as
> ALSA when applications want to know accurately how much data has
> been moved.
> 
> This patch addes a function tegra_dma_update_residual() to query
> the hardware and modify the residual information accordinly. It
> takes into account any hardware issues when trying to read the
> state, such as delays between finishing a buffer and signalling
> the interrupt.
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
> Cc: Dmitry Osipenko <digetx@gmail.com>
> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
> Cc: linux-kernel@vger.kernel.org (open list)
> ---
>  drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>  1 file changed, 86 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> index cf462b1abc0b..544e7273e741 100644
> --- a/drivers/dma/tegra20-apb-dma.c
> +++ b/drivers/dma/tegra20-apb-dma.c
> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
>  	return 0;
>  }
>  
> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
> +					      struct tegra_dma_sg_req *sg_req,
> +					      struct tegra_dma_desc *dma_desc,
> +					      unsigned int residual)
> +{
> +	unsigned long status = 0x0;
> +	unsigned long wcount;
> +	unsigned long ahbptr;
> +	unsigned long tmp = 0x0;
> +	unsigned int result;
> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
> +	int done;
> +
> +	/* if we're not the current request, then don't alter the residual */
> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
> +				       struct tegra_dma_sg_req, node)) {
> +		result = residual;
> +		ahbptr = 0xffffffff;
> +		goto done;
> +	}
> +
> +	/* loop until we have a reliable result for residual */
> +	do {
> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
> +
> +		/* check status, if channel isn't busy then skip */
> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
> +			result = residual;
> +			break;
> +		}
> +
> +		/* if we've got an interrupt pending on the channel, don't
> +		 * try and deal with the residue as the hardware has likely
> +		 * moved on to the next buffer. return all data moved.
> +		 */
> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
> +			result = residual - sg_req->req_len;
> +			break;
> +		}
> +
> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> +		else
> +			wcount = status;
> +
> +		/* If the request is at the full point, then there is a
> +		 * chance that we have read the status register in the
> +		 * middle of the hardware reloading the next buffer.
> +		 *
> +		 * The sequence seems to be at the end of the buffer, to
> +		 * load the new word count before raising the EOC flag (or
> +		 * changing the ping-pong flag which could have also been
> +		 * used to determine a new buffer). This  means there is a
> +		 * small window where we cannot determine zero-done for the
> +		 * current buffer, or moved to next buffer.
> +		 *
> +		 * If done shows 0, then retry the load, as it may hit the
> +		 * above hardware race. We will either get a new value which
> +		 * is from the first buffer, or we get an EOC (new buffer)
> +		 * or both a new value and an EOC...
> +		 */
> +		done = get_current_xferred_count(tdc, sg_req, wcount);
> +		if (done != 0) {
> +			result = residual - done;
> +			break;
> +		}
> +
> +		ndelay(100);
> +	} while (--retries > 0);
> +
> +	if (retries <= 0) {
> +		dev_err(tdc2dev(tdc), "timeout waiting for dma load\n");
> +		result = residual;
> +	}
> +
> +done:	
> +	dev_dbg(tdc2dev(tdc), "residual: req %08lx, ahb@%08lx, wcount %08lx, done %d\n",
> +		 sg_req->ch_regs.ahb_ptr, ahbptr, wcount, done);
> +
> +	return result;
> +}
> +
>  static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>  	dma_cookie_t cookie, struct dma_tx_state *txstate)
>  {
> @@ -849,6 +933,7 @@ static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>  		residual = dma_desc->bytes_requested -
>  			   (dma_desc->bytes_transferred %
>  			    dma_desc->bytes_requested);
> +		residual = tegra_dma_update_residual(tdc, sg_req, dma_desc, residual);
>  		dma_set_residue(txstate, residual);
>  	}
>  
> @@ -1444,12 +1529,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
>  		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
>  		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
>  	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
> -	/*
> -	 * XXX The hardware appears to support
> -	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
> -	 * only used by this driver during tegra_dma_terminate_all()
> -	 */
> -	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
> +	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
>  	tdma->dma_dev.device_config = tegra_dma_slave_config;
>  	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
>  	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;

In addition to Dmitry's comments, can you please make sure you run this
through checkpatch.pl?

Thanks
Jon

^ permalink raw reply

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
From: Vinod Koul @ 2019-04-30 17:18 UTC (permalink / raw)
  To: Arnaud Pouliquen
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel,
	dmaengine
In-Reply-To: <f7f28d56-a3e4-38a3-8580-b241fe0dcb49@st.com>

On 30-04-19, 16:58, Arnaud Pouliquen wrote:

> >> Hope that will help to clarify.
> > 
> > Yes that helps, maybe we should add these bits in code and changelog..
> > :)I will update the comments and commit message in a V2 in this way
> > 
> > And how does this impact non cyclic case where N descriptors maybe
> > issued. The driver seems to support non cyclic too...
> 
> Correct it supports SG as well, but double buffer mode is not used in
> such case. Hw is programmed under IT for every descriptors : no
> automatic register reloaded as in cyclic mode. We won't end up in the
> situation depicted below.

Okay sounds good then. Can you add a bit more of this in the code (this
was very helpful) not as a fix but documentation so that people (or you
down the line) will remember why this was done like this

Thanks

-- 
~Vinod

^ permalink raw reply

* dmaengine: stm32-dma: fix residue calculation in stm32-dma
From: Vinod Koul @ 2019-04-30 17:18 UTC (permalink / raw)
  To: Arnaud Pouliquen
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel,
	dmaengine

On 30-04-19, 16:58, Arnaud Pouliquen wrote:

> >> Hope that will help to clarify.
> > 
> > Yes that helps, maybe we should add these bits in code and changelog..
> > :)I will update the comments and commit message in a V2 in this way
> > 
> > And how does this impact non cyclic case where N descriptors maybe
> > issued. The driver seems to support non cyclic too...
> 
> Correct it supports SG as well, but double buffer mode is not used in
> such case. Hw is programmed under IT for every descriptors : no
> automatic register reloaded as in cyclic mode. We won't end up in the
> situation depicted below.

Okay sounds good then. Can you add a bit more of this in the code (this
was very helpful) not as a fix but documentation so that people (or you
down the line) will remember why this was done like this

Thanks

^ permalink raw reply

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
From: Arnaud Pouliquen @ 2019-04-30 14:58 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel,
	dmaengine
In-Reply-To: <20190430082255.GP3845@vkoul-mobl.Dlink>



On 4/30/19 10:22 AM, Vinod Koul wrote:
> On 29-04-19, 16:52, Arnaud Pouliquen wrote:
>>
>>
>> On 4/29/19 7:13 AM, Vinod Koul wrote:
>>> On 26-04-19, 15:41, Arnaud Pouliquen wrote:
>>>>>> During residue calculation. the DMA can switch to the next sg. When
>>>>>> this race condition occurs, the residue returned value is not valid.
>>>>>> Indeed the position in the sg returned by the hardware is the position
>>>>>> of the next sg, not the current sg.
>>>>>> Solution is to check the sg after the calculation to verify it.
>>>>>> If a transition is detected we consider that the DMA has switched to
>>>>>> the beginning of next sg.
>>>>>
>>>>> Now, that sounds like duct tape. Why should we bother doing that.
>>>>>
>>>>> Also looking back at the stm32_dma_desc_residue() and calls to it from
>>>>> stm32_dma_tx_status() am not sure we are doing the right thing
>>>> Please, could you explain what you have in mind here?
>>>
>>> So when we call vchan_find_desc() that tells us if the descriptor is in
>>> the issued queue or not..  Ideally it should not matter if we have one
>>> or N descriptors issued to hardware.
>>>
>>> So why should you bother checking for next_sg.
>>>
>>>>> why are we looking at next_sg here, can you explain me that please
>>>>
>>>> This solution is similar to one implemented in the at_hdmac.c driver
>>>> (atc_get_bytes_left function).
>>>>
>>>> Yes could be consider as a workaround for a hardware issue...
>>>>
>>>> In stm32 DMA Peripheral, we can register up to 2 sg descriptors (sg1 &
>>>> sg2)in DMA registers, and use it in a cyclic mode (auto reload). This
>>>> mode is mainly use for audio transfer initiated by an ALSA driver.
>>>>
>>>> >From hardware point of view the DMA transfers first block based on sg1,
>>>> then it updates registers to prepare sg2 transfer, and then generates an
>>>> IRQ to inform that it issues the next transfer (sg2).
>>>>
>>>> Then driver can update sg1 to prepare the third transfer...
>>>>
>>>> In parallel the client driver can requests status to get the residue to
>>>> update internal pointer.
>>>> The issue is in the race condition between the call of the
>>>> device_tx_status ops and the update of the DMA register on sg switch.
>>>
>>> Sorry I do not agree! You are in stm32_dma_tx_status() hold the lock and
>>> IRQs are disabled, so even if sg2 was loaded, you will not get an
>>> interrupt and wont know. By looking at sg1 register you will see that
>>> sg1 is telling you that it has finished and residue can be zero. That is
>>> fine and correct to report.
>>>
>>> Most important thing here is that reside is for _requested_ descriptor
>>> and not _current_ descriptor, so looking into sg2 doesnt not fit.
>>>
>>>> During a short time the hardware updated the registers containing the
>>>> sg ID but not the transfer counter(SxNDTR). In this case there is a
>>>> mismatch between the Sg ID and the associated transfer counter.
>>>> So residue calculation is wrong.
>>>> Idea of this patch is to perform the calculation and then to crosscheck
>>>> that the hardware has not switched to the next sg during the
>>>> calculation. The way to crosscheck is to compare the the sg ID before
>>>> and after the calculation.
>>>>
>>>> I tested the solution to force a new recalculation but no real solution
>>>> to trust the registers during this phase. In this case an approximation
>>>> is to consider that the DMA is transferring the first bytes of the next sg.
>>>> So we return the residue corresponding to the beginning of the next buffer.
>>>
>>> And that is wrong!. The argument is 'cookie' and you return residue for
>>> that cookie.
>>>
>>> For example, if you have dma txn with cookie 1, 2, 3, 4 submitted, then currently HW
>>> is processing cookie 2, then for tx_status on:
>>> cookie 1: return DMA_COMPLETE, residue 0
>>> cookie 2: return DMA_IN_PROGRESS, residue (read from HW)
>>> cookie 3: return DMA_IN_PROGRESS, residue txn length
>>> cookie 4: return DMA_IN_PROGRESS, residue txn length
>>>
>>> Thanks
>>>
>> I think i miss something in my explanation, as from my humble POV (not
>> enough expert in DMA framework...) we only one cookie here as only one
>> cyclic transfer...
> 
>> Regarding your answers it looks like my sg explanation are not clear and
>> introduce confusions... Sorry for this, i was used sg for internal STM32
>> DMA driver, not for the framework API itself.
>>
>> Let try retry to re-explain you the stm32 DMA cyclic mode management.
>>
>> STM32 STM32 hardware:
>> -------------------
>> (ref manual:
>> https://www.st.com/content/ccc/resource/technical/document/reference_manual/group0/51/ba/9e/5e/78/5b/4b/dd/DM00327659/files/DM00327659.pdf/jcr:content/translations/en.DM00327659.pdf)
>>
>> The stm32 DMA supports cyclic mode using a hardware double
>> buffer mode.
>> In this double buffer, we can program up to 2 transfers. When one is
>> completed, the DMA automatically switch on the other. This could be see
>> as a hardware LLI with only 2 transfer descriptors.
>> A hardware bit CT (current target) is used to determine the
>> current transfer (CT = 0 or 1).
>> A hardware NDT (num of data to transfer) counter can be read to
>> determine DMA position in current transfer.
>> An IRQ is generated when this CT bit is updated to allows driver to
>> update the double buffer for the next transfer.
>>
>> On client side (a.e audio):
>> -------------------------
>> The client requests a cyclic transfer by calling
>> stm32_dma_prep_dma_cyclic. For instance it can request the transfer of a
>> buffer divided in 10 periods. In this case only one cookie submitted
>> (right?).
>>
>> At stm32dma driver level these 10 periods are registered in an internal
>> software table (desc->sg_req[]).As cyclic, the last sg_req point to the
>> first one.
>>
>> So to be able to transfer the whole software table, we have to update
>> the STM32 DMA double buffer at each end of transfer period.
>> The filed chan->next_sg points to the next sg_req in the software table.
>> that should be write in the STM32 DMA double buffer.
>>
>> Residue calculation:
>> -------------------
>> During a transfer we can get the position in a period thanks to the
>> NDT(num of data to transfer) bit-field.
>>
>> So the calculation is :
>> 1) Get the NDT field value
>> 3) add the periods remaining in the desc->sg_req[] table.
>>
>> In parallel the STM32 DMA hardware updates the transfer buffer in 3 steps:
>> 1) update CT register field.
>> 2) Update NDT register field.
>> 3) generate the IRQ (As you mention the IRQ is not treated during the
>> device_tx_status as protected from interrupts).
>>
>> We are facing issue when computing the residue during the update of the
>> CT and the NDT. The CT and NDT can as been updated ( both or only CT...)
>> without driver context update (IRQ disabled).
>> In this case we can point to the beginning of the current transfer(
>> completed) instead of the next_transfer. This generates a residue error
>> and for audio a time-stamp regression (so video freeze or audio plop).
>>
>> So the patch proposed consists in:
>> 1) getting the current NDT value
>> 2) reading CT and check that the hardware does not point to the next_sg.
>> 	if yes:
>> 	- CT has been updated by hardware but IRQ still not treated.
>> 	- By default we consider the current_sg as completed, so we
>> 	  point to the beginning of the next_sg buffer.
>>
>> Hope that will help to clarify.
> 
> Yes that helps, maybe we should add these bits in code and changelog..
> :)I will update the comments and commit message in a V2 in this way
> 
> And how does this impact non cyclic case where N descriptors maybe
> issued. The driver seems to support non cyclic too...

Correct it supports SG as well, but double buffer mode is not used in
such case. Hw is programmed under IT for every descriptors : no
automatic register reloaded as in cyclic mode. We won't end up in the
situation depicted below.

Thanks
-- 
~Arnaud

^ permalink raw reply

* dmaengine: stm32-dma: fix residue calculation in stm32-dma
From: Arnaud Pouliquen @ 2019-04-30 14:58 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel,
	dmaengine

On 4/30/19 10:22 AM, Vinod Koul wrote:
> On 29-04-19, 16:52, Arnaud Pouliquen wrote:
>>
>>
>> On 4/29/19 7:13 AM, Vinod Koul wrote:
>>> On 26-04-19, 15:41, Arnaud Pouliquen wrote:
>>>>>> During residue calculation. the DMA can switch to the next sg. When
>>>>>> this race condition occurs, the residue returned value is not valid.
>>>>>> Indeed the position in the sg returned by the hardware is the position
>>>>>> of the next sg, not the current sg.
>>>>>> Solution is to check the sg after the calculation to verify it.
>>>>>> If a transition is detected we consider that the DMA has switched to
>>>>>> the beginning of next sg.
>>>>>
>>>>> Now, that sounds like duct tape. Why should we bother doing that.
>>>>>
>>>>> Also looking back at the stm32_dma_desc_residue() and calls to it from
>>>>> stm32_dma_tx_status() am not sure we are doing the right thing
>>>> Please, could you explain what you have in mind here?
>>>
>>> So when we call vchan_find_desc() that tells us if the descriptor is in
>>> the issued queue or not..  Ideally it should not matter if we have one
>>> or N descriptors issued to hardware.
>>>
>>> So why should you bother checking for next_sg.
>>>
>>>>> why are we looking at next_sg here, can you explain me that please
>>>>
>>>> This solution is similar to one implemented in the at_hdmac.c driver
>>>> (atc_get_bytes_left function).
>>>>
>>>> Yes could be consider as a workaround for a hardware issue...
>>>>
>>>> In stm32 DMA Peripheral, we can register up to 2 sg descriptors (sg1 &
>>>> sg2)in DMA registers, and use it in a cyclic mode (auto reload). This
>>>> mode is mainly use for audio transfer initiated by an ALSA driver.
>>>>
>>>> >From hardware point of view the DMA transfers first block based on sg1,
>>>> then it updates registers to prepare sg2 transfer, and then generates an
>>>> IRQ to inform that it issues the next transfer (sg2).
>>>>
>>>> Then driver can update sg1 to prepare the third transfer...
>>>>
>>>> In parallel the client driver can requests status to get the residue to
>>>> update internal pointer.
>>>> The issue is in the race condition between the call of the
>>>> device_tx_status ops and the update of the DMA register on sg switch.
>>>
>>> Sorry I do not agree! You are in stm32_dma_tx_status() hold the lock and
>>> IRQs are disabled, so even if sg2 was loaded, you will not get an
>>> interrupt and wont know. By looking at sg1 register you will see that
>>> sg1 is telling you that it has finished and residue can be zero. That is
>>> fine and correct to report.
>>>
>>> Most important thing here is that reside is for _requested_ descriptor
>>> and not _current_ descriptor, so looking into sg2 doesnt not fit.
>>>
>>>> During a short time the hardware updated the registers containing the
>>>> sg ID but not the transfer counter(SxNDTR). In this case there is a
>>>> mismatch between the Sg ID and the associated transfer counter.
>>>> So residue calculation is wrong.
>>>> Idea of this patch is to perform the calculation and then to crosscheck
>>>> that the hardware has not switched to the next sg during the
>>>> calculation. The way to crosscheck is to compare the the sg ID before
>>>> and after the calculation.
>>>>
>>>> I tested the solution to force a new recalculation but no real solution
>>>> to trust the registers during this phase. In this case an approximation
>>>> is to consider that the DMA is transferring the first bytes of the next sg.
>>>> So we return the residue corresponding to the beginning of the next buffer.
>>>
>>> And that is wrong!. The argument is 'cookie' and you return residue for
>>> that cookie.
>>>
>>> For example, if you have dma txn with cookie 1, 2, 3, 4 submitted, then currently HW
>>> is processing cookie 2, then for tx_status on:
>>> cookie 1: return DMA_COMPLETE, residue 0
>>> cookie 2: return DMA_IN_PROGRESS, residue (read from HW)
>>> cookie 3: return DMA_IN_PROGRESS, residue txn length
>>> cookie 4: return DMA_IN_PROGRESS, residue txn length
>>>
>>> Thanks
>>>
>> I think i miss something in my explanation, as from my humble POV (not
>> enough expert in DMA framework...) we only one cookie here as only one
>> cyclic transfer...
> 
>> Regarding your answers it looks like my sg explanation are not clear and
>> introduce confusions... Sorry for this, i was used sg for internal STM32
>> DMA driver, not for the framework API itself.
>>
>> Let try retry to re-explain you the stm32 DMA cyclic mode management.
>>
>> STM32 STM32 hardware:
>> -------------------
>> (ref manual:
>> https://www.st.com/content/ccc/resource/technical/document/reference_manual/group0/51/ba/9e/5e/78/5b/4b/dd/DM00327659/files/DM00327659.pdf/jcr:content/translations/en.DM00327659.pdf)
>>
>> The stm32 DMA supports cyclic mode using a hardware double
>> buffer mode.
>> In this double buffer, we can program up to 2 transfers. When one is
>> completed, the DMA automatically switch on the other. This could be see
>> as a hardware LLI with only 2 transfer descriptors.
>> A hardware bit CT (current target) is used to determine the
>> current transfer (CT = 0 or 1).
>> A hardware NDT (num of data to transfer) counter can be read to
>> determine DMA position in current transfer.
>> An IRQ is generated when this CT bit is updated to allows driver to
>> update the double buffer for the next transfer.
>>
>> On client side (a.e audio):
>> -------------------------
>> The client requests a cyclic transfer by calling
>> stm32_dma_prep_dma_cyclic. For instance it can request the transfer of a
>> buffer divided in 10 periods. In this case only one cookie submitted
>> (right?).
>>
>> At stm32dma driver level these 10 periods are registered in an internal
>> software table (desc->sg_req[]).As cyclic, the last sg_req point to the
>> first one.
>>
>> So to be able to transfer the whole software table, we have to update
>> the STM32 DMA double buffer at each end of transfer period.
>> The filed chan->next_sg points to the next sg_req in the software table.
>> that should be write in the STM32 DMA double buffer.
>>
>> Residue calculation:
>> -------------------
>> During a transfer we can get the position in a period thanks to the
>> NDT(num of data to transfer) bit-field.
>>
>> So the calculation is :
>> 1) Get the NDT field value
>> 3) add the periods remaining in the desc->sg_req[] table.
>>
>> In parallel the STM32 DMA hardware updates the transfer buffer in 3 steps:
>> 1) update CT register field.
>> 2) Update NDT register field.
>> 3) generate the IRQ (As you mention the IRQ is not treated during the
>> device_tx_status as protected from interrupts).
>>
>> We are facing issue when computing the residue during the update of the
>> CT and the NDT. The CT and NDT can as been updated ( both or only CT...)
>> without driver context update (IRQ disabled).
>> In this case we can point to the beginning of the current transfer(
>> completed) instead of the next_transfer. This generates a residue error
>> and for audio a time-stamp regression (so video freeze or audio plop).
>>
>> So the patch proposed consists in:
>> 1) getting the current NDT value
>> 2) reading CT and check that the hardware does not point to the next_sg.
>> 	if yes:
>> 	- CT has been updated by hardware but IRQ still not treated.
>> 	- By default we consider the current_sg as completed, so we
>> 	  point to the beginning of the next_sg buffer.
>>
>> Hope that will help to clarify.
> 
> Yes that helps, maybe we should add these bits in code and changelog..
> :)I will update the comments and commit message in a V2 in this way
> 
> And how does this impact non cyclic case where N descriptors maybe
> issued. The driver seems to support non cyclic too...

Correct it supports SG as well, but double buffer mode is not used in
such case. Hw is programmed under IT for every descriptors : no
automatic register reloaded as in cyclic mode. We won't end up in the
situation depicted below.

Thanks

^ permalink raw reply

* Re: [PATCH v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Dmitry Osipenko @ 2019-04-30 12:25 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Jon Hunter, Laxman Dewangan, Vinod Koul, dmaengine, linux-tegra,
	linux-kernel
In-Reply-To: <20190426151157.GA19559@ulmo>

26.04.2019 18:11, Thierry Reding пишет:
> On Fri, Apr 26, 2019 at 04:03:08PM +0300, Dmitry Osipenko wrote:
>> 26.04.2019 15:42, Dmitry Osipenko пишет:
>>> 26.04.2019 15:18, Dmitry Osipenko пишет:
>>>> 26.04.2019 14:13, Jon Hunter пишет:
>>>>>
>>>>> On 26/04/2019 11:45, Dmitry Osipenko wrote:
>>>>>> 26.04.2019 12:52, Jon Hunter пишет:
>>>>>>>
>>>>>>> On 25/04/2019 00:17, Dmitry Osipenko wrote:
>>>>>>>> The readl/writel functions are inserting memory barrier in order to
>>>>>>>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
>>>>>>>> results in L2 cache syncing which isn't a cheapest operation. The
>>>>>>>> tegra20-apb-dma driver doesn't need to synchronize generic memory
>>>>>>>> accesses, hence use the relaxed versions of the functions.
>>>>>>>
>>>>>>> Do you mean device-io accesses here as this is not generic memory?
>>>>>>
>>>>>> Yes. The IOMEM accesses within are always ordered and uncached, while
>>>>>> generic memory accesses are out-of-order and cached.
>>>>>>
>>>>>>> Although there may not be any issues with this change, I think I need a
>>>>>>> bit more convincing that we should do this given that we have had it
>>>>>>> this way for sometime and I would not like to see us introduce any
>>>>>>> regressions as this point without being 100% certain we would not.
>>>>>>> Ideally, if I had some good extensive tests I could run to hammer the
>>>>>>> DMA for all configurations with different combinations of channels
>>>>>>> running simultaneously then we could test this, but right now I don't :-(
>>>>>>>
>>>>>>> Have you ...
>>>>>>> 1. Tested both cyclic and scatter-gather transfers?
>>>>>>> 2. Stress tested simultaneous transfers with various different
>>>>>>>    configurations?
>>>>>>> 3. Quantified the actual performance benefit of this change so we can
>>>>>>>    understand how much of a performance boost this offers?
>>>>>>
>>>>>> Actually I found a case where this change causes a problem, I'm seeing
>>>>>> I2C transfer timeout for touchscreen and it breaks the touch input.
>>>>>> Indeed, I haven't tested this patch very well.
>>>>>>
>>>>>> And the fix is this:
>>>>>>
>>>>>> @@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
>>>>>> *dev)
>>>>>>  						  TEGRA_APBDMA_CHAN_WCOUNT);
>>>>>>  	}
>>>>>>
>>>>>> +	dsb();
>>>>>> +
>>>>>>  	clk_disable_unprepare(tdma->dma_clk);
>>>>>>
>>>>>>  	return 0;
>>>>>>
>>>>>>
>>>>>> Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
>>>>>> incoherent and CPU disables clock before writes are reaching DMA controller.
>>>>>>
>>>>>> I'd say that cyclic and scatter-gather transfers are now tested. I also
>>>>>> made some more testing of simultaneous transfers.
>>>>>>
>>>>>> Quantifying performance probably won't be easy to make as the DMA
>>>>>> read/writes are not on any kind of code's hot-path.
>>>>>
>>>>> So why make the change?
>>>>
>>>> For consistency.
>>>>
>>>>>> Jon, are you still insisting about to drop this patch or you will be
>>>>>> fine with the v2 that will have the dsb() in place?
>>>>>
>>>>> If we can't quantify the performance gain, then it is difficult to
>>>>> justify the change. I would also be concerned if that is the only place
>>>>> we need an explicit dsb.
>>>>
>>>> Maybe it won't hurt to add dsb to the ISR as well. But okay, let's drop
>>>> this patch for now.
>>>>
>>>
>>> Jon, it occurred to me that there still should be a problem with the
>>> writel() ordering in the driver because writel() ensures that memory
>>> stores are completed *before* the write occurs and hence translates into
>>> iowmb() + writel_relaxed() [0]. Thus the last write will always happen
>>> asynchronously in regards to clk accesses.
>>>
>>> [0]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm/include/asm/io.h#n311
>>>
>>
>> Also please note that iowmb() translates into wmb() if
>> CONFIG_ARM_DMA_MEM_BUFFERABLE=y and sometime ago I was profiling host1x
>> driver job submission performance and have seen cases where wmb() could
>> take up to 1ms on T20 due to L2 syncing if there are outstanding memory
>> writes in the cache (or even more, I don't remember exactly already how
>> bad it was..).
> 
> This looks to be primarily caused by the fact that we have the L2X0
> cache on Tegra20. So there's not really anything that can be done there
> without potentially compromising correctness of the code.
> 
>> Altogether, I think the usage of readl/writel in pretty much all of
>> Tegra drivers is plainly wrong and explicit dsb() shall be used in
>> places where hardware synchronization is really needed.
> 
> I don't think that's an accurate observation. readl()/writel() are more
> likely to be correct than the relaxed versions. You already saw yourself
> that using the relaxed versions can easily introduce regressions.
> 
> Granted, readl()/writel() might add more memory barriers than strictly
> necessary, and therefore they might in many cases be suboptimal. But, we
> can't just go and engage in a wholesale conversion of all drivers. If we
> do this, we need to very carefully audit every conversion to make sure
> no regressions are introduced. This is especially complicated because
> these would be subtle regressions and may be difficult to catch or
> reproduce.
> 
> Also, we should avoid using primitives such as dsb in driver code to
> avoid making the code too architecture specific.

I was testing this a bit more for a couple of days and my current
conclusion that there is likely some problem that is getting masked by
writel/readl because I tried to manually insert the syncing that
writel/readl does for the relaxed versions (and more) and that slight
shuffling of the code makes the problem to occur intermittently. My
observations show that it's only the I2C-DMA that has the trouble, other
DMA clients are working fine. Maybe there is some timing problem or
missing ready-state polling somewhere, for now I don't know what's the
actual problem is.

^ permalink raw reply

* [v1] dmaengine: tegra: Use relaxed versions of readl/writel
From: Dmitry Osipenko @ 2019-04-30 12:25 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Jon Hunter, Laxman Dewangan, Vinod Koul, dmaengine, linux-tegra,
	linux-kernel

26.04.2019 18:11, Thierry Reding пишет:
> On Fri, Apr 26, 2019 at 04:03:08PM +0300, Dmitry Osipenko wrote:
>> 26.04.2019 15:42, Dmitry Osipenko пишет:
>>> 26.04.2019 15:18, Dmitry Osipenko пишет:
>>>> 26.04.2019 14:13, Jon Hunter пишет:
>>>>>
>>>>> On 26/04/2019 11:45, Dmitry Osipenko wrote:
>>>>>> 26.04.2019 12:52, Jon Hunter пишет:
>>>>>>>
>>>>>>> On 25/04/2019 00:17, Dmitry Osipenko wrote:
>>>>>>>> The readl/writel functions are inserting memory barrier in order to
>>>>>>>> ensure that memory stores are completed. On Tegra20 and Tegra30 this
>>>>>>>> results in L2 cache syncing which isn't a cheapest operation. The
>>>>>>>> tegra20-apb-dma driver doesn't need to synchronize generic memory
>>>>>>>> accesses, hence use the relaxed versions of the functions.
>>>>>>>
>>>>>>> Do you mean device-io accesses here as this is not generic memory?
>>>>>>
>>>>>> Yes. The IOMEM accesses within are always ordered and uncached, while
>>>>>> generic memory accesses are out-of-order and cached.
>>>>>>
>>>>>>> Although there may not be any issues with this change, I think I need a
>>>>>>> bit more convincing that we should do this given that we have had it
>>>>>>> this way for sometime and I would not like to see us introduce any
>>>>>>> regressions as this point without being 100% certain we would not.
>>>>>>> Ideally, if I had some good extensive tests I could run to hammer the
>>>>>>> DMA for all configurations with different combinations of channels
>>>>>>> running simultaneously then we could test this, but right now I don't :-(
>>>>>>>
>>>>>>> Have you ...
>>>>>>> 1. Tested both cyclic and scatter-gather transfers?
>>>>>>> 2. Stress tested simultaneous transfers with various different
>>>>>>>    configurations?
>>>>>>> 3. Quantified the actual performance benefit of this change so we can
>>>>>>>    understand how much of a performance boost this offers?
>>>>>>
>>>>>> Actually I found a case where this change causes a problem, I'm seeing
>>>>>> I2C transfer timeout for touchscreen and it breaks the touch input.
>>>>>> Indeed, I haven't tested this patch very well.
>>>>>>
>>>>>> And the fix is this:
>>>>>>
>>>>>> @@ -1592,6 +1592,8 @@ static int tegra_dma_runtime_suspend(struct device
>>>>>> *dev)
>>>>>>  						  TEGRA_APBDMA_CHAN_WCOUNT);
>>>>>>  	}
>>>>>>
>>>>>> +	dsb();
>>>>>> +
>>>>>>  	clk_disable_unprepare(tdma->dma_clk);
>>>>>>
>>>>>>  	return 0;
>>>>>>
>>>>>>
>>>>>> Apparently the problem is that CLK/DMA (PPSB/APB) accesses are
>>>>>> incoherent and CPU disables clock before writes are reaching DMA controller.
>>>>>>
>>>>>> I'd say that cyclic and scatter-gather transfers are now tested. I also
>>>>>> made some more testing of simultaneous transfers.
>>>>>>
>>>>>> Quantifying performance probably won't be easy to make as the DMA
>>>>>> read/writes are not on any kind of code's hot-path.
>>>>>
>>>>> So why make the change?
>>>>
>>>> For consistency.
>>>>
>>>>>> Jon, are you still insisting about to drop this patch or you will be
>>>>>> fine with the v2 that will have the dsb() in place?
>>>>>
>>>>> If we can't quantify the performance gain, then it is difficult to
>>>>> justify the change. I would also be concerned if that is the only place
>>>>> we need an explicit dsb.
>>>>
>>>> Maybe it won't hurt to add dsb to the ISR as well. But okay, let's drop
>>>> this patch for now.
>>>>
>>>
>>> Jon, it occurred to me that there still should be a problem with the
>>> writel() ordering in the driver because writel() ensures that memory
>>> stores are completed *before* the write occurs and hence translates into
>>> iowmb() + writel_relaxed() [0]. Thus the last write will always happen
>>> asynchronously in regards to clk accesses.
>>>
>>> [0]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm/include/asm/io.h#n311
>>>
>>
>> Also please note that iowmb() translates into wmb() if
>> CONFIG_ARM_DMA_MEM_BUFFERABLE=y and sometime ago I was profiling host1x
>> driver job submission performance and have seen cases where wmb() could
>> take up to 1ms on T20 due to L2 syncing if there are outstanding memory
>> writes in the cache (or even more, I don't remember exactly already how
>> bad it was..).
> 
> This looks to be primarily caused by the fact that we have the L2X0
> cache on Tegra20. So there's not really anything that can be done there
> without potentially compromising correctness of the code.
> 
>> Altogether, I think the usage of readl/writel in pretty much all of
>> Tegra drivers is plainly wrong and explicit dsb() shall be used in
>> places where hardware synchronization is really needed.
> 
> I don't think that's an accurate observation. readl()/writel() are more
> likely to be correct than the relaxed versions. You already saw yourself
> that using the relaxed versions can easily introduce regressions.
> 
> Granted, readl()/writel() might add more memory barriers than strictly
> necessary, and therefore they might in many cases be suboptimal. But, we
> can't just go and engage in a wholesale conversion of all drivers. If we
> do this, we need to very carefully audit every conversion to make sure
> no regressions are introduced. This is especially complicated because
> these would be subtle regressions and may be difficult to catch or
> reproduce.
> 
> Also, we should avoid using primitives such as dsb in driver code to
> avoid making the code too architecture specific.

I was testing this a bit more for a couple of days and my current
conclusion that there is likely some problem that is getting masked by
writel/readl because I tried to manually insert the syncing that
writel/readl does for the relaxed versions (and more) and that slight
shuffling of the code makes the problem to occur intermittently. My
observations show that it's only the I2C-DMA that has the trouble, other
DMA clients are working fine. Maybe there is some timing problem or
missing ready-state polling somewhere, for now I don't know what's the
actual problem is.

^ permalink raw reply

* [PATCH] [RFC] dmaengine: add fifo_size member
From: Sameer Pujar @ 2019-04-30 11:30 UTC (permalink / raw)
  To: vkoul, dan.j.williams, tiwai
  Cc: jonathanh, dmaengine, linux-kernel, Sameer Pujar

During the DMA transfers from memory to I/O, it was observed that transfers
were inconsistent and resulted in glitches for audio playback. It happened
because fifo size on DMA did not match with slave channel configuration.

currently 'dma_slave_config' structure does not have a field for fifo size.
Hence the platform pcm driver cannot pass the fifo size as a slave_config.
Note that 'snd_dmaengine_dai_dma_data' structure has fifo_size field which
cannot be used to pass the size info. This patch introduces fifo_size field
and the same can be populated on slave side. Users can set required size
for slave peripheral (multiple channels can be independently running with
different fifo sizes) and the corresponding sizes are programmed through
dma_slave_config on DMA side.

Request for feedback/suggestions.

Signed-off-by: Sameer Pujar <spujar@nvidia.com>
---
 include/linux/dmaengine.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index d49ec5c..9ec198b 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -351,6 +351,8 @@ enum dma_slave_buswidth {
  * @slave_id: Slave requester id. Only valid for slave channels. The dma
  * slave peripheral will have unique id as dma requester which need to be
  * pass as slave config.
+ * @fifo_size: Fifo size value. The dma slave peripheral can configure required
+ * fifo size and the same needs to be passed as slave config.
  *
  * This struct is passed in as configuration data to a DMA engine
  * in order to set up a certain channel for DMA transport at runtime.
@@ -376,6 +378,7 @@ struct dma_slave_config {
 	u32 dst_port_window_size;
 	bool device_fc;
 	unsigned int slave_id;
+	u32 fifo_size;
 };

 /**
-- 
2.7.4

^ permalink raw reply related

* [RFC] dmaengine: add fifo_size member
From: Sameer Pujar @ 2019-04-30 11:30 UTC (permalink / raw)
  To: vkoul, dan.j.williams, tiwai
  Cc: jonathanh, dmaengine, linux-kernel, Sameer Pujar

During the DMA transfers from memory to I/O, it was observed that transfers
were inconsistent and resulted in glitches for audio playback. It happened
because fifo size on DMA did not match with slave channel configuration.

currently 'dma_slave_config' structure does not have a field for fifo size.
Hence the platform pcm driver cannot pass the fifo size as a slave_config.
Note that 'snd_dmaengine_dai_dma_data' structure has fifo_size field which
cannot be used to pass the size info. This patch introduces fifo_size field
and the same can be populated on slave side. Users can set required size
for slave peripheral (multiple channels can be independently running with
different fifo sizes) and the corresponding sizes are programmed through
dma_slave_config on DMA side.

Request for feedback/suggestions.

Signed-off-by: Sameer Pujar <spujar@nvidia.com>
---
 include/linux/dmaengine.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index d49ec5c..9ec198b 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -351,6 +351,8 @@ enum dma_slave_buswidth {
  * @slave_id: Slave requester id. Only valid for slave channels. The dma
  * slave peripheral will have unique id as dma requester which need to be
  * pass as slave config.
+ * @fifo_size: Fifo size value. The dma slave peripheral can configure required
+ * fifo size and the same needs to be passed as slave config.
  *
  * This struct is passed in as configuration data to a DMA engine
  * in order to set up a certain channel for DMA transport at runtime.
@@ -376,6 +378,7 @@ struct dma_slave_config {
 	u32 dst_port_window_size;
 	bool device_fc;
 	unsigned int slave_id;
+	u32 fifo_size;
 };

 /**

^ permalink raw reply related

* Re: [PATCH 4/7] dmaengine: sprd: Add device validation to support multiple controllers
From: Baolin Wang @ 2019-04-30  8:53 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, eric.long, Orson Zhai, Chunyan Zhang, Mark Brown,
	dmaengine, LKML
In-Reply-To: <CAMz4kuKV3J+aw9sic=QOhmcnr+H_pZ-pmq4pRbLX1R+XAR=phA@mail.gmail.com>

Hi Vinod,

On Tue, 30 Apr 2019 at 16:34, Baolin Wang <baolin.wang@linaro.org> wrote:
>
> On Tue, 30 Apr 2019 at 16:30, Vinod Koul <vkoul@kernel.org> wrote:
> >
> > On 30-04-19, 13:30, Baolin Wang wrote:
> > > On Mon, 29 Apr 2019 at 22:05, Vinod Koul <vkoul@kernel.org> wrote:
> > > >
> > > > On 29-04-19, 20:20, Baolin Wang wrote:
> > > > > On Mon, 29 Apr 2019 at 19:57, Vinod Koul <vkoul@kernel.org> wrote:
> > > > > >
> > > > > > On 15-04-19, 20:14, Baolin Wang wrote:
> > > > > > > From: Eric Long <eric.long@unisoc.com>
> > > > > > >
> > > > > > > Since we can support multiple DMA engine controllers, we should add
> > > > > > > device validation in filter function to check if the correct controller
> > > > > > > to be requested.
> > > > > > >
> > > > > > > Signed-off-by: Eric Long <eric.long@unisoc.com>
> > > > > > > Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> > > > > > > ---
> > > > > > >  drivers/dma/sprd-dma.c |    5 +++++
> > > > > > >  1 file changed, 5 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
> > > > > > > index 0f92e60..9f99d4b 100644
> > > > > > > --- a/drivers/dma/sprd-dma.c
> > > > > > > +++ b/drivers/dma/sprd-dma.c
> > > > > > > @@ -1020,8 +1020,13 @@ static void sprd_dma_free_desc(struct virt_dma_desc *vd)
> > > > > > >  static bool sprd_dma_filter_fn(struct dma_chan *chan, void *param)
> > > > > > >  {
> > > > > > >       struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> > > > > > > +     struct of_phandle_args *dma_spec =
> > > > > > > +             container_of(param, struct of_phandle_args, args[0]);
> > > > > > >       u32 slave_id = *(u32 *)param;
> > > > > > >
> > > > > > > +     if (chan->device->dev->of_node != dma_spec->np)
> > > > > >
> > > > > > Are you not using of_dma_find_controller() that does this, so this would
> > > > > > be useless!
> > > > >
> > > > > Yes, we can use of_dma_find_controller(), but that will be a little
> > > > > complicated than current solution. Since we need introduce one
> > > > > structure to save the node to validate in the filter function like
> > > > > below, which seems make things complicated. But if you still like to
> > > > > use of_dma_find_controller(), I can change to use it in next version.
> > > >
> > > > Sorry I should have clarified more..
> > > >
> > > > of_dma_find_controller() is called by xlate, so you already run this
> > > > check, so why use this :)
> > >
> > > The of_dma_find_controller() can save the requested device node into
> > > dma_spec, and in the of_dma_simple_xlate() function, it will call
> > > dma_request_channel() to request one channel, but it did not validate
> > > the device node to find the corresponding dma device in
> > > dma_request_channel(). So we should in our filter function to validate
> > > the device node with the device node specified by the dma_spec. Hope I
> > > make things clear.
> >
> > But dma_request_channel() calls of_dma_request_slave_channel() which
> > invokes of_dma_find_controller() why is it broken for you if that is the
> > case..
>
> No,the calling process should be:
> dma_request_slave_channel()
> --->dma_request_chan()--->of_dma_request_slave_channel()---->of_dma_simple_xlate()
> ----> dma_request_channel().
>

You can check other drivers, they also will save the device node to
validate in their filter function.
For example the imx-sdma driver:
https://elixir.bootlin.com/linux/v5.1-rc6/source/drivers/dma/imx-sdma.c#L1931

-- 
Baolin Wang
Best Regards

^ permalink raw reply

* [4/7] dmaengine: sprd: Add device validation to support multiple controllers
From: Baolin Wang @ 2019-04-30  8:53 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, eric.long, Orson Zhai, Chunyan Zhang, Mark Brown,
	dmaengine, LKML

Hi Vinod,

On Tue, 30 Apr 2019 at 16:34, Baolin Wang <baolin.wang@linaro.org> wrote:
>
> On Tue, 30 Apr 2019 at 16:30, Vinod Koul <vkoul@kernel.org> wrote:
> >
> > On 30-04-19, 13:30, Baolin Wang wrote:
> > > On Mon, 29 Apr 2019 at 22:05, Vinod Koul <vkoul@kernel.org> wrote:
> > > >
> > > > On 29-04-19, 20:20, Baolin Wang wrote:
> > > > > On Mon, 29 Apr 2019 at 19:57, Vinod Koul <vkoul@kernel.org> wrote:
> > > > > >
> > > > > > On 15-04-19, 20:14, Baolin Wang wrote:
> > > > > > > From: Eric Long <eric.long@unisoc.com>
> > > > > > >
> > > > > > > Since we can support multiple DMA engine controllers, we should add
> > > > > > > device validation in filter function to check if the correct controller
> > > > > > > to be requested.
> > > > > > >
> > > > > > > Signed-off-by: Eric Long <eric.long@unisoc.com>
> > > > > > > Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> > > > > > > ---
> > > > > > >  drivers/dma/sprd-dma.c |    5 +++++
> > > > > > >  1 file changed, 5 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
> > > > > > > index 0f92e60..9f99d4b 100644
> > > > > > > --- a/drivers/dma/sprd-dma.c
> > > > > > > +++ b/drivers/dma/sprd-dma.c
> > > > > > > @@ -1020,8 +1020,13 @@ static void sprd_dma_free_desc(struct virt_dma_desc *vd)
> > > > > > >  static bool sprd_dma_filter_fn(struct dma_chan *chan, void *param)
> > > > > > >  {
> > > > > > >       struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> > > > > > > +     struct of_phandle_args *dma_spec =
> > > > > > > +             container_of(param, struct of_phandle_args, args[0]);
> > > > > > >       u32 slave_id = *(u32 *)param;
> > > > > > >
> > > > > > > +     if (chan->device->dev->of_node != dma_spec->np)
> > > > > >
> > > > > > Are you not using of_dma_find_controller() that does this, so this would
> > > > > > be useless!
> > > > >
> > > > > Yes, we can use of_dma_find_controller(), but that will be a little
> > > > > complicated than current solution. Since we need introduce one
> > > > > structure to save the node to validate in the filter function like
> > > > > below, which seems make things complicated. But if you still like to
> > > > > use of_dma_find_controller(), I can change to use it in next version.
> > > >
> > > > Sorry I should have clarified more..
> > > >
> > > > of_dma_find_controller() is called by xlate, so you already run this
> > > > check, so why use this :)
> > >
> > > The of_dma_find_controller() can save the requested device node into
> > > dma_spec, and in the of_dma_simple_xlate() function, it will call
> > > dma_request_channel() to request one channel, but it did not validate
> > > the device node to find the corresponding dma device in
> > > dma_request_channel(). So we should in our filter function to validate
> > > the device node with the device node specified by the dma_spec. Hope I
> > > make things clear.
> >
> > But dma_request_channel() calls of_dma_request_slave_channel() which
> > invokes of_dma_find_controller() why is it broken for you if that is the
> > case..
>
> No,the calling process should be:
> dma_request_slave_channel()
> --->dma_request_chan()--->of_dma_request_slave_channel()---->of_dma_simple_xlate()
> ----> dma_request_channel().
>

You can check other drivers, they also will save the device node to
validate in their filter function.
For example the imx-sdma driver:
https://elixir.bootlin.com/linux/v5.1-rc6/source/drivers/dma/imx-sdma.c#L1931

^ permalink raw reply

* Re: [PATCH 4/7] dmaengine: sprd: Add device validation to support multiple controllers
From: Baolin Wang @ 2019-04-30  8:34 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, eric.long, Orson Zhai, Chunyan Zhang, Mark Brown,
	dmaengine, LKML
In-Reply-To: <20190430082954.GQ3845@vkoul-mobl.Dlink>

On Tue, 30 Apr 2019 at 16:30, Vinod Koul <vkoul@kernel.org> wrote:
>
> On 30-04-19, 13:30, Baolin Wang wrote:
> > On Mon, 29 Apr 2019 at 22:05, Vinod Koul <vkoul@kernel.org> wrote:
> > >
> > > On 29-04-19, 20:20, Baolin Wang wrote:
> > > > On Mon, 29 Apr 2019 at 19:57, Vinod Koul <vkoul@kernel.org> wrote:
> > > > >
> > > > > On 15-04-19, 20:14, Baolin Wang wrote:
> > > > > > From: Eric Long <eric.long@unisoc.com>
> > > > > >
> > > > > > Since we can support multiple DMA engine controllers, we should add
> > > > > > device validation in filter function to check if the correct controller
> > > > > > to be requested.
> > > > > >
> > > > > > Signed-off-by: Eric Long <eric.long@unisoc.com>
> > > > > > Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> > > > > > ---
> > > > > >  drivers/dma/sprd-dma.c |    5 +++++
> > > > > >  1 file changed, 5 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
> > > > > > index 0f92e60..9f99d4b 100644
> > > > > > --- a/drivers/dma/sprd-dma.c
> > > > > > +++ b/drivers/dma/sprd-dma.c
> > > > > > @@ -1020,8 +1020,13 @@ static void sprd_dma_free_desc(struct virt_dma_desc *vd)
> > > > > >  static bool sprd_dma_filter_fn(struct dma_chan *chan, void *param)
> > > > > >  {
> > > > > >       struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> > > > > > +     struct of_phandle_args *dma_spec =
> > > > > > +             container_of(param, struct of_phandle_args, args[0]);
> > > > > >       u32 slave_id = *(u32 *)param;
> > > > > >
> > > > > > +     if (chan->device->dev->of_node != dma_spec->np)
> > > > >
> > > > > Are you not using of_dma_find_controller() that does this, so this would
> > > > > be useless!
> > > >
> > > > Yes, we can use of_dma_find_controller(), but that will be a little
> > > > complicated than current solution. Since we need introduce one
> > > > structure to save the node to validate in the filter function like
> > > > below, which seems make things complicated. But if you still like to
> > > > use of_dma_find_controller(), I can change to use it in next version.
> > >
> > > Sorry I should have clarified more..
> > >
> > > of_dma_find_controller() is called by xlate, so you already run this
> > > check, so why use this :)
> >
> > The of_dma_find_controller() can save the requested device node into
> > dma_spec, and in the of_dma_simple_xlate() function, it will call
> > dma_request_channel() to request one channel, but it did not validate
> > the device node to find the corresponding dma device in
> > dma_request_channel(). So we should in our filter function to validate
> > the device node with the device node specified by the dma_spec. Hope I
> > make things clear.
>
> But dma_request_channel() calls of_dma_request_slave_channel() which
> invokes of_dma_find_controller() why is it broken for you if that is the
> case..

No,the calling process should be:
dma_request_slave_channel()
--->dma_request_chan()--->of_dma_request_slave_channel()---->of_dma_simple_xlate()
----> dma_request_channel().

-- 
Baolin Wang
Best Regards

^ permalink raw reply

* [4/7] dmaengine: sprd: Add device validation to support multiple controllers
From: Baolin Wang @ 2019-04-30  8:34 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, eric.long, Orson Zhai, Chunyan Zhang, Mark Brown,
	dmaengine, LKML

On Tue, 30 Apr 2019 at 16:30, Vinod Koul <vkoul@kernel.org> wrote:
>
> On 30-04-19, 13:30, Baolin Wang wrote:
> > On Mon, 29 Apr 2019 at 22:05, Vinod Koul <vkoul@kernel.org> wrote:
> > >
> > > On 29-04-19, 20:20, Baolin Wang wrote:
> > > > On Mon, 29 Apr 2019 at 19:57, Vinod Koul <vkoul@kernel.org> wrote:
> > > > >
> > > > > On 15-04-19, 20:14, Baolin Wang wrote:
> > > > > > From: Eric Long <eric.long@unisoc.com>
> > > > > >
> > > > > > Since we can support multiple DMA engine controllers, we should add
> > > > > > device validation in filter function to check if the correct controller
> > > > > > to be requested.
> > > > > >
> > > > > > Signed-off-by: Eric Long <eric.long@unisoc.com>
> > > > > > Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
> > > > > > ---
> > > > > >  drivers/dma/sprd-dma.c |    5 +++++
> > > > > >  1 file changed, 5 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
> > > > > > index 0f92e60..9f99d4b 100644
> > > > > > --- a/drivers/dma/sprd-dma.c
> > > > > > +++ b/drivers/dma/sprd-dma.c
> > > > > > @@ -1020,8 +1020,13 @@ static void sprd_dma_free_desc(struct virt_dma_desc *vd)
> > > > > >  static bool sprd_dma_filter_fn(struct dma_chan *chan, void *param)
> > > > > >  {
> > > > > >       struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
> > > > > > +     struct of_phandle_args *dma_spec =
> > > > > > +             container_of(param, struct of_phandle_args, args[0]);
> > > > > >       u32 slave_id = *(u32 *)param;
> > > > > >
> > > > > > +     if (chan->device->dev->of_node != dma_spec->np)
> > > > >
> > > > > Are you not using of_dma_find_controller() that does this, so this would
> > > > > be useless!
> > > >
> > > > Yes, we can use of_dma_find_controller(), but that will be a little
> > > > complicated than current solution. Since we need introduce one
> > > > structure to save the node to validate in the filter function like
> > > > below, which seems make things complicated. But if you still like to
> > > > use of_dma_find_controller(), I can change to use it in next version.
> > >
> > > Sorry I should have clarified more..
> > >
> > > of_dma_find_controller() is called by xlate, so you already run this
> > > check, so why use this :)
> >
> > The of_dma_find_controller() can save the requested device node into
> > dma_spec, and in the of_dma_simple_xlate() function, it will call
> > dma_request_channel() to request one channel, but it did not validate
> > the device node to find the corresponding dma device in
> > dma_request_channel(). So we should in our filter function to validate
> > the device node with the device node specified by the dma_spec. Hope I
> > make things clear.
>
> But dma_request_channel() calls of_dma_request_slave_channel() which
> invokes of_dma_find_controller() why is it broken for you if that is the
> case..

No,the calling process should be:
dma_request_slave_channel()
--->dma_request_chan()--->of_dma_request_slave_channel()---->of_dma_simple_xlate()
----> dma_request_channel().

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox