From: Takashi Iwai <tiwai@suse.de>
To: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>, alsa-devel@alsa-project.org
Subject: Re: [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
Date: Tue, 28 Sep 2021 11:01:49 +0200 [thread overview]
Message-ID: <s5hmtnxnkea.wl-tiwai@suse.de> (raw)
In-Reply-To: <203381d5-b3ce-c5ab-b96e-bd38d27cf813@linux.intel.com>
On Wed, 22 Sep 2021 00:18:14 +0200,
Pierre-Louis Bossart wrote:
>
> Sorry Takashi for the delay, I missed your reply.
>
> >>> The position reporting on Intel Skylake and later chips via
> >>> azx_get_pos_skl() contains a udelay(20) call for the capture streams.
> >>> A call for this alone doesn't sound too harmful. However, as the
> >>> pointer PCM ops is one of the hottest path in the PCM operations --
> >>> especially for the timer-scheduled operations like PulseAudio -- such
> >>> a delay hogs CPU usage significantly in the total performance.
> >>>
> >>> The function there was taken from the original code in ASoC SST
> >>> Skylake driver blindly. The udelay() is a workaround for the case
> >>> where the reported position is behind the period boundary at the
> >>> timing triggered from interrupts; applications often expect that the
> >>> full data is available for the whole period when returned (and also
> >>> that's the definition of the ALSA PCM period).
> >>
> >> that initial work-around from the Intel SST driver does not seem to be
> >> legit in the first place, when I checked with hardware folks recently no
> >> one understands why there are delays and special cases for capture. The
> >> only requirement wrt. DPIB is that the DDR update is valid across VC0
> >> and VC1, while the DPIB registers are only valid with VC0. For SOF we
> >> don't know of any VC1 use so will default to the DPIB vendor-specific
> >> registers.
> >
> > What are those VC0 and VC1 registers? I can't find the definitions in
> > the code, so I assume that none of ALSA/ASoC drivers use VC1.
>
> These are PCI concepts/capabilities. VC stands for "Virtual Channel",
> which are mapped to Traffic Class (TC). These registers are typically
> set by the BIOS AFAIK. The point is that if VC1 is enabled only the DPIB
> updates are valid, the vendor-specific will report information can be
> misleading.
>
> The recommendation from hardware folks is to use DPIB updates in DDR,
> which are known to work across both VC0 and VC1.
>
> For SOF, we do know VC1 is never used so we'll use the vendor-specific
> registers.
OK, thanks for clarification.
> >> See https://github.com/thesofproject/linux/pull/3143 for my WIP fixes
> >> for SOF.
> >>
> >> I don't have the time to look at this specific patch today but wanted to
> >> make sure you are aware of my on-going fixes.
> >>
> >> Note that the use of DPIB works best if you don't use the IOC interrupt.
> >> when the interrupt is thrown, there is likely a delay for the DPIB
> >> information to be refreshed.
> >
> > Thanks for the information! The delay could be the reason of the
> > udelay(), and that's superfluous as mentioned in the commit.
> >
> > So the remaining question seems to be which method is a better
> > approach for the capture stream: DPIB or posbuf. I kept the posbuf
> > just for conservatism, but judging from your comment, we may use DPIB
> > for both directions safely?
>
> sorry you lost me. Isn't DPIB updates in DDR precisely what posbuf reports?
>
> I think what you need to use is use azx_get_pos_posbuf(chip, azx_dev);
> unconditionally, both for capture and playback, and remove the use of
> the skylake specific stuff and the delay.
When I measured, I saw a slight difference between values in DPIB and
posbuf, so I wonder which is actually more accurate. The latter is
sometimes a bit behind the former.
If the posbuf is more correct in the sense for the PCM pointer, we can
simply move back to the posbuf like other platforms.
thanks,
Takashi
> > In anyway, the additional mechanism to check the delayed position
> > report in this patch can be kept no matter which way (DPIB or posbuf)
> > is used.
>
> Agree!
>
> >
> >
> > Takashi
> >
> >>
> >>>
> >>> OTOH, HD-audio (legacy) driver has already some workarounds for the
> >>> delayed position reporting due to its relatively large FIFO, such as
> >>> the BDL position adjustment and the delayed period-elapsed call in the
> >>> work. That said, the udelay() is almost superfluous for HD-audio
> >>> driver unlike SST, and we can drop the udelay().
> >>>
> >>> Though, the current code doesn't guarantee the full period readiness
> >>> as mentioned in the above, but rather it checks the wallclock and
> >>> detects the unexpected jump. That's one missing piece, and the drop
> >>> of udelay() needs a bit more sanity checks for the delayed handling.
> >>>
> >>> This patch implements those: the drop of udelay() call in
> >>> azx_get_pos_skl() and the more proper check of hwptr in
> >>> azx_position_ok(). The latter change is applied only for the case
> >>> where the stream is running in the normal mode without
> >>> no_period_wakeup flag. When no_period_wakeup is set, it essentially
> >>> ignores the period handling and rather concentrates only on the
> >>> current position; which implies that we don't need to care about the
> >>> period boundary at all.
> >>>
> >>> Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
> >>> Reported-by: Jens Axboe <axboe@kernel.dk>
> >>> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> >>> ---
> >>> sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++-----
> >>> 1 file changed, 23 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> >>> index 3aa432d814a2..faeeeb923d5e 100644
> >>> --- a/sound/pci/hda/hda_intel.c
> >>> +++ b/sound/pci/hda/hda_intel.c
> >>> @@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev)
> >>> * the update-IRQ timing. The IRQ is issued before actually the
> >>> * data is processed. So, we need to process it afterwords in a
> >>> * workqueue.
> >>> + *
> >>> + * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update
> >>> */
> >>> static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>> {
> >>> struct snd_pcm_substream *substream = azx_dev->core.substream;
> >>> + struct snd_pcm_runtime *runtime = substream->runtime;
> >>> int stream = substream->stream;
> >>> u32 wallclk;
> >>> unsigned int pos;
> >>> + snd_pcm_uframes_t hwptr, target;
> >>>
> >>> wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk;
> >>> if (wallclk < (azx_dev->core.period_wallclk * 2) / 3)
> >>> @@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>> /* NG - it's below the first next period boundary */
> >>> return chip->bdl_pos_adj ? 0 : -1;
> >>> azx_dev->core.start_wallclk += wallclk;
> >>> +
> >>> + if (azx_dev->core.no_period_wakeup)
> >>> + return 1; /* OK, no need to check period boundary */
> >>> +
> >>> + if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt)
> >>> + return 1; /* OK, already in hwptr updating process */
> >>> +
> >>> + /* check whether the period gets really elapsed */
> >>> + pos = bytes_to_frames(runtime, pos);
> >>> + hwptr = runtime->hw_ptr_base + pos;
> >>> + if (hwptr < runtime->status->hw_ptr)
> >>> + hwptr += runtime->buffer_size;
> >>> + target = runtime->hw_ptr_interrupt + runtime->period_size;
> >>> + if (hwptr < target) {
> >>> + /* too early wakeup, process it later */
> >>> + return chip->bdl_pos_adj ? 0 : -1;
> >>> + }
> >>> +
> >>> return 1; /* OK, it's fine */
> >>> }
> >>>
> >>> @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
> >>> if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
> >>> return azx_skl_get_dpib_pos(chip, azx_dev);
> >>>
> >>> - /* For capture, we need to read posbuf, but it requires a delay
> >>> - * for the possible boundary overlap; the read of DPIB fetches the
> >>> - * actual posbuf
> >>> - */
> >>> - udelay(20);
> >>> + /* read of DPIB fetches the actual posbuf */
> >>> azx_skl_get_dpib_pos(chip, azx_dev);
> >>> return azx_get_pos_posbuf(chip, azx_dev);
> >>> }
> >>>
> >>
>
next prev parent reply other threads:[~2021-09-28 9:02 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-10 14:10 [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting Takashi Iwai
2021-09-10 17:04 ` Pierre-Louis Bossart
2021-09-13 5:48 ` Takashi Iwai
2021-09-21 22:18 ` Pierre-Louis Bossart
2021-09-28 9:01 ` Takashi Iwai [this message]
2021-09-28 15:00 ` Pierre-Louis Bossart
2021-09-29 7:14 ` Takashi Iwai
2021-09-13 2:36 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=s5hmtnxnkea.wl-tiwai@suse.de \
--to=tiwai@suse.de \
--cc=alsa-devel@alsa-project.org \
--cc=axboe@kernel.dk \
--cc=pierre-louis.bossart@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).