dmaengine.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Dooks <ben.dooks@codethink.co.uk>
To: Dmitry Osipenko <digetx@gmail.com>
Cc: dan.j.williams@intel.com, vkoul@kernel.org, ldewangan@nvidia.com,
	dmaengine@vger.kernel.org, linux-tegra@vger.kernel.org
Subject: [4/6] dma: tegra: add accurate reporting of dma state
Date: Mon, 05 Nov 2018 09:03:54 +0000	[thread overview]
Message-ID: <d978a3ee77d71106b3de7361779a1f6e@codethink.co.uk> (raw)

On 2018-11-03 12:24, Dmitry Osipenko wrote:
> On 31.10.2018 19:03, Ben Dooks wrote:
>> The tx_status callback does not report the state of the transfer
>> beyond complete segments. This causes problems with users such as
>> ALSA when applications want to know accurately how much data has
>> been moved.
>> 
>> This patch addes a function tegra_dma_update_residual() to query
>> the hardware and modify the residual information accordinly. It
>> takes into account any hardware issues when trying to read the
>> state, such as delays between finishing a buffer and signalling
>> the interrupt.
>> 
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> 
> Hello Ben,
> 
> Thank you very much for the patch! It fixes "farting sound" for
> chromium-browser and applications that use chromium-engine (tested on
> Tegra20) because apparently it tries to use low latency for everything
> and audio buffer is constantly underflowing without more detailed
> DMA-progress reporting. See couple more comments below.
> 
>> ---
>>  drivers/dma/tegra20-apb-dma.c | 94 
>> ++++++++++++++++++++++++++++++++---
>>  1 file changed, 87 insertions(+), 7 deletions(-)
>> 
>> diff --git a/drivers/dma/tegra20-apb-dma.c 
>> b/drivers/dma/tegra20-apb-dma.c
>> index 4f7d1e576d03..3fa3a1ac4f57 100644
>> --- a/drivers/dma/tegra20-apb-dma.c
>> +++ b/drivers/dma/tegra20-apb-dma.c
>> @@ -802,12 +802,96 @@ static int tegra_dma_terminate_all(struct 
>> dma_chan *dc)
>>  	return 0;
>>  }
>> 
>> +static unsigned int tegra_dma_update_residual(struct 
>> tegra_dma_channel *tdc,
>> +					      struct tegra_dma_sg_req *sg_req,
>> +					      struct tegra_dma_desc *dma_desc,
>> +					      unsigned int residual)
>> +{
>> +	unsigned long status = 0x0;
> 
> There is no need to initialize "status" variable.

ok, will check this.

>> +	unsigned long wcount;
>> +	unsigned long ahbptr;
>> +	unsigned long tmp = 0x0;
>> +	unsigned int result;
>> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
>> +	int done;
>> +
>> +	/* if we're not the current request, then don't alter the residual 
>> */
>> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
>> +				       struct tegra_dma_sg_req, node)) {
>> +		result = residual;
>> +		ahbptr = 0xffffffff;
>> +		goto done;
>> +	}
>> +
>> +	/* loop until we have a reliable result for residual */
>> +	do {
>> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
>> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> 
>> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
> 
> Register 0x08 (DMA_BYTE_STA) doesn't present on Tegra20 and "tmp"
> isn't used anywhere in the code. Please remove it entirely.

ok, fixed

>> +
>> +		/* check status, if channel isn't busy then skip */
>> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
>> +			result = residual;
>> +			break;
>> +		}
> 
> If "BUSY" is unset, doesn't this mean that transaction could be
> completed already? I don't quite understand why you want to skip here.

I can't remember what the reasoning behind this was, this work was
originally done over a year ago and I am not sure if I can find any
of the notes from this.

>> +
>> +		/* if we've got an interrupt pending on the channel, don't
>> +		 * try and deal with the residue as the hardware has likely
>> +		 * moved on to the next buffer. return all data moved.
>> +		 */
>> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
>> +			result = residual - sg_req->req_len;
>> +			break;
>> +		}
>> +
>> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
>> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>> +		else
>> +			wcount = status;
>> +
>> +		/* If the request is at the full point, then there is a
>> +		 * chance that we have read the status register in the
>> +		 * middle of the hardware reloading the next buffer.
>> +		 *
>> +		 * The sequence seems to be at the end of the buffer, to
>> +		 * load the new word count before raising the EOC flag (or
>> +		 * changing the ping-pong flag which could have also been
>> +		 * used to determine a new buffer). This  means there is a
>                                two whitespaces here-----|
> 
>> +		 * small window where we cannot determine zero-done for the
>> +		 * current buffer, or moved to next buffer.
>> +		 *
> 
>> +		 * If done shows 0, then retry the load, as it may hit the
>> +		 * above hardware race. We will either get a new value which
>> +		 * is from the first buffer, or we get an EOC (new buffer)
>> +		 * or both a new value and an EOC...
> 
> I think we just need to wait 20usec after reading out "words count"
> and then re-check interrupt status, so transfer is done if interrupt
> is set and otherwise "words count" value is actual and reliable.

At the moment I have no way of going back and re-testing this code
against the problem it originally fixed, so would rather not change
the algorithm in this.

> 
>> +		 */
>> +		done = get_current_xferred_count(tdc, sg_req, wcount);
>> +		if (done != 0) {
>> +			result = residual - done;
>> +			break;
>> +		}
>> +
>> +		ndelay(100);
> 
> There is no ndelay() on ARM, hence your 20usec timeout is 200usec.
> Please use udelay().

I thought there was one based on the lpj calculations, I will check 
later.

>> +	} while (--retries > 0);
>> +
>> +	if (retries <= 0) {
>> +		dev_err(tdc2dev(tdc), "timeout waiting for dma load\n");
>> +		result = residual;
>> +	}
>> +
>> +done:
> 
> Please rename goto label as it duplicates local variable name.

ok, fixed.

>> +	dev_dbg(tdc2dev(tdc), "residual: req %08lx, ahb@%08lx, wcount %08lx, 
>> done %d\n",
>> +		 sg_req->ch_regs.ahb_ptr, ahbptr, wcount, done);
> 
> Whitespace just after tab not needed.
> 
>> +
>> +	return result;
>> +}
>> +
>>  static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>>  	dma_cookie_t cookie, struct dma_tx_state *txstate)
>>  {
>>  	struct tegra_dma_channel *tdc = to_tegra_dma_chan(dc);
>>  	struct tegra_dma_desc *dma_desc;
>> -	struct tegra_dma_sg_req *sg_req;
>> +	struct tegra_dma_sg_req *sg_req = NULL;
>>  	enum dma_status ret;
>>  	unsigned long flags;
>>  	unsigned int residual;
>> @@ -843,6 +927,7 @@ static enum dma_status tegra_dma_tx_status(struct 
>> dma_chan *dc,
>>  		residual = dma_desc->bytes_requested -
>>  			   (dma_desc->bytes_transferred %
>>  			    dma_desc->bytes_requested);
>> +		residual = tegra_dma_update_residual(tdc, sg_req, dma_desc, 
>> residual);
>>  		dma_set_residue(txstate, residual);
>>  	}
>> 
>> @@ -1436,12 +1521,7 @@ static int tegra_dma_probe(struct 
>> platform_device *pdev)
>>  		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
>>  		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
>>  	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | 
>> BIT(DMA_MEM_TO_DEV);
>> -	/*
>> -	 * XXX The hardware appears to support
>> -	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
>> -	 * only used by this driver during tegra_dma_terminate_all()
>> -	 */
>> -	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
>> +	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
>>  	tdma->dma_dev.device_config = tegra_dma_slave_config;
>>  	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
>>  	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;
>> 
> 
> Summarizing all of the comments above, patch may look like this:
> 
> diff --git a/drivers/dma/tegra20-apb-dma.c 
> b/drivers/dma/tegra20-apb-dma.c
> index 9a558e30c461..956e8130c059 100644
> --- a/drivers/dma/tegra20-apb-dma.c
> +++ b/drivers/dma/tegra20-apb-dma.c
> @@ -799,12 +799,73 @@ static int tegra_dma_terminate_all(struct 
> dma_chan *dc)
>  	return 0;
>  }
> 
> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel 
> *tdc,
> +					      struct tegra_dma_sg_req *sg_req,
> +					      struct tegra_dma_desc *dma_desc,
> +					      unsigned int residual)
> +{
> +	unsigned long status;
> +	unsigned int result;
> +	int done;
> +
> +	/* if we're not the current request, then don't alter the residual */
> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
> +				       struct tegra_dma_sg_req, node))
> +		return residual;
> +
> +	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> +
> +	/* if we've got an interrupt pending on the channel, don't
> +	 * try and deal with the residue as the hardware has likely
> +	 * moved on to the next buffer. return all data moved.
> +	 */
> +	if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
> +		result = residual - sg_req->req_len;
> +		goto out;
> +	}
> +
> +	if (tdc->tdma->chip_data->support_separate_wcount_reg)
> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> +
> +	/*
> +	 * If the request is at the full point, then there is a
> +	 * chance that we have read the status register in the
> +	 * middle of the hardware reloading the next buffer.
> +	 *
> +	 * The sequence seems to be at the end of the buffer, to
> +	 * load the new word count before raising the EOC flag (or
> +	 * changing the ping-pong flag which could have also been
> +	 * used to determine a new buffer). This  means there is a
> +	 * small window where we cannot determine zero-done for the
> +	 * current buffer, or moved to next buffer.
> +	 */
> +	done = get_current_xferred_count(tdc, sg_req, status);
> +
> +	udelay(TEGRA_APBDMA_BURST_COMPLETE_TIME);
> +
> +	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> +
> +	if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
> +		result = residual - sg_req->req_len;
> +	else
> +		result = residual - done;
> +out:
> +#ifdef DEBUG
> +	ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
> +
> +	dev_dbg(tdc2dev(tdc), "residual: req %08lx, ahb@%08lx, wcount %08lx,
> done %d\n",
> +		sg_req->ch_regs.ahb_ptr, ahbptr, wcount, done);
> +#endif
> +
> +	return result;
> +}
> +
>  static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>  	dma_cookie_t cookie, struct dma_tx_state *txstate)
>  {
>  	struct tegra_dma_channel *tdc = to_tegra_dma_chan(dc);
>  	struct tegra_dma_desc *dma_desc;
> -	struct tegra_dma_sg_req *sg_req;
> +	struct tegra_dma_sg_req *sg_req = NULL;
>  	enum dma_status ret;
>  	unsigned long flags;
>  	unsigned int residual;
> @@ -840,6 +901,7 @@ static enum dma_status tegra_dma_tx_status(struct
> dma_chan *dc,
>  		residual = dma_desc->bytes_requested -
>  			   (dma_desc->bytes_transferred %
>  			    dma_desc->bytes_requested);
> +		residual = tegra_dma_update_residual(tdc, sg_req, dma_desc, 
> residual);
>  		dma_set_residue(txstate, residual);
>  	}
> 
> @@ -1433,12 +1495,7 @@ static int tegra_dma_probe(struct 
> platform_device *pdev)
>  		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
>  		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
>  	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
> -	/*
> -	 * XXX The hardware appears to support
> -	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
> -	 * only used by this driver during tegra_dma_terminate_all()
> -	 */
> -	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
> +	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
>  	tdma->dma_dev.device_config = tegra_dma_slave_config;
>  	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
>  	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;

             reply	other threads:[~2018-11-05  9:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-05  9:03 Ben Dooks [this message]
  -- strict thread matches above, loose matches on Subject: below --
2019-04-15 16:56 [4/6] dma: tegra: add accurate reporting of dma state Dmitry Osipenko
2019-04-15 15:01 Ben Dooks
2019-04-14 15:20 Dmitry Osipenko
2019-02-22 18:10 Dmitry Osipenko
2019-02-22 17:23 Ben Dooks
2019-02-21 13:02 Dmitry Osipenko
2019-02-21 10:06 Ben Dooks
2019-02-21  0:41 Dmitry Osipenko
2018-11-05 14:05 Dmitry Osipenko
2018-11-05 11:38 Jon Hunter
2018-11-05 11:34 Ben Dooks
2018-11-05 11:32 Jon Hunter
2018-11-03 12:24 Dmitry Osipenko
2018-10-31 16:03 Ben Dooks
2018-10-12 21:47 kbuild test robot
2018-10-12  9:44 Ben Dooks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d978a3ee77d71106b3de7361779a1f6e@codethink.co.uk \
    --to=ben.dooks@codethink.co.uk \
    --cc=dan.j.williams@intel.com \
    --cc=digetx@gmail.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=ldewangan@nvidia.com \
    --cc=linux-tegra@vger.kernel.org \
    --cc=vkoul@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).