From: Mathieu Poirier <mathieu.poirier@linaro.org>
To: Leo Yan <leo.yan@linaro.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>,
Mike Leach <mike.leach@linaro.org>,
Robin Murphy <robin.murphy@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4] coresight: tmc-etr: Speed up for bounce buffer in flat mode
Date: Tue, 14 Sep 2021 09:22:25 -0600 [thread overview]
Message-ID: <20210914152225.GA1719994@p14s> (raw)
In-Reply-To: <20210905032144.966766-1-leo.yan@linaro.org>
On Sun, Sep 05, 2021 at 11:21:44AM +0800, Leo Yan wrote:
> The AUX bounce buffer is allocated with API dma_alloc_coherent(), in the
> low level's architecture code, e.g. for Arm64, it maps the memory with
> the attribution "Normal non-cacheable"; this can be concluded from the
> definition for pgprot_dmacoherent() in arch/arm64/include/asm/pgtable.h.
>
> Later when access the AUX bounce buffer, since the memory mapping is
> non-cacheable, it's low efficiency due to every load instruction must
> reach out DRAM.
>
> This patch changes to allocate pages with dma_alloc_noncoherent(), the
> driver can access the memory via cacheable mapping; therefore, load
> instructions can fetch data from cache lines rather than always read
> data from DRAM, the driver can boost memory performance. After using
> the cacheable mapping, the driver uses dma_sync_single_for_cpu() to
> invalidate cacheline prior to read bounce buffer so can avoid read stale
> trace data.
>
> By measurement the duration for function tmc_update_etr_buffer() with
> ftrace function_graph tracer, it shows the performance significant
> improvement for copying 4MiB data from bounce buffer:
>
> # echo tmc_etr_get_data_flat_buf > set_graph_notrace // avoid noise
> # echo tmc_update_etr_buffer > set_graph_function
> # echo function_graph > current_tracer
>
> before:
>
> # CPU DURATION FUNCTION CALLS
> # | | | | | | |
> 2) | tmc_update_etr_buffer() {
> ...
> 2) # 8148.320 us | }
>
> after:
>
> # CPU DURATION FUNCTION CALLS
> # | | | | | | |
> 2) | tmc_update_etr_buffer() {
> ...
> 2) # 2525.420 us | }
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
I have merged this patch.
Thanks,
Mathieu
> ---
>
> Changes from v3:
> Refined change to use dma_alloc_noncoherent()/dma_free_noncoherent()
> (Robin Murphy);
> Retested functionality and performance on Juno-r2 board.
>
> Changes from v2:
> Sync the entire buffer in one go when the tracing is wrap around
> (Suzuki);
> Add Suzuki's review tage.
>
> .../hwtracing/coresight/coresight-tmc-etr.c | 26 ++++++++++++++++---
> 1 file changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index acdb59e0e661..a049b525a274 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -609,8 +609,9 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata,
> if (!flat_buf)
> return -ENOMEM;
>
> - flat_buf->vaddr = dma_alloc_coherent(real_dev, etr_buf->size,
> - &flat_buf->daddr, GFP_KERNEL);
> + flat_buf->vaddr = dma_alloc_noncoherent(real_dev, etr_buf->size,
> + &flat_buf->daddr,
> + DMA_FROM_DEVICE, GFP_KERNEL);
> if (!flat_buf->vaddr) {
> kfree(flat_buf);
> return -ENOMEM;
> @@ -631,14 +632,18 @@ static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf)
> if (flat_buf && flat_buf->daddr) {
> struct device *real_dev = flat_buf->dev->parent;
>
> - dma_free_coherent(real_dev, flat_buf->size,
> - flat_buf->vaddr, flat_buf->daddr);
> + dma_free_noncoherent(real_dev, etr_buf->size,
> + flat_buf->vaddr, flat_buf->daddr,
> + DMA_FROM_DEVICE);
> }
> kfree(flat_buf);
> }
>
> static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> {
> + struct etr_flat_buf *flat_buf = etr_buf->private;
> + struct device *real_dev = flat_buf->dev->parent;
> +
> /*
> * Adjust the buffer to point to the beginning of the trace data
> * and update the available trace data.
> @@ -648,6 +653,19 @@ static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp)
> etr_buf->len = etr_buf->size;
> else
> etr_buf->len = rwp - rrp;
> +
> + /*
> + * The driver always starts tracing at the beginning of the buffer,
> + * the only reason why we would get a wrap around is when the buffer
> + * is full. Sync the entire buffer in one go for this case.
> + */
> + if (etr_buf->offset + etr_buf->len > etr_buf->size)
> + dma_sync_single_for_cpu(real_dev, flat_buf->daddr,
> + etr_buf->size, DMA_FROM_DEVICE);
> + else
> + dma_sync_single_for_cpu(real_dev,
> + flat_buf->daddr + etr_buf->offset,
> + etr_buf->len, DMA_FROM_DEVICE);
> }
>
> static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf,
> --
> 2.25.1
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
prev parent reply other threads:[~2021-09-14 15:24 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-05 3:21 [PATCH v4] coresight: tmc-etr: Speed up for bounce buffer in flat mode Leo Yan
2021-09-13 17:56 ` Mathieu Poirier
2021-09-14 6:00 ` Suzuki K Poulose
2021-09-14 15:22 ` Mathieu Poirier [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210914152225.GA1719994@p14s \
--to=mathieu.poirier@linaro.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=coresight@lists.linaro.org \
--cc=leo.yan@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mike.leach@linaro.org \
--cc=robin.murphy@arm.com \
--cc=suzuki.poulose@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).