alsa-devel.alsa-project.org archive mirror
 help / color / mirror / Atom feed
From: Takashi Iwai <tiwai@suse.de>
To: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>, alsa-devel@alsa-project.org
Subject: Re: [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
Date: Tue, 28 Sep 2021 11:01:49 +0200	[thread overview]
Message-ID: <s5hmtnxnkea.wl-tiwai@suse.de> (raw)
In-Reply-To: <203381d5-b3ce-c5ab-b96e-bd38d27cf813@linux.intel.com>

On Wed, 22 Sep 2021 00:18:14 +0200,
Pierre-Louis Bossart wrote:
> 
> Sorry Takashi for the delay, I missed your reply.
> 
> >>> The position reporting on Intel Skylake and later chips via
> >>> azx_get_pos_skl() contains a udelay(20) call for the capture streams.
> >>> A call for this alone doesn't sound too harmful.  However, as the
> >>> pointer PCM ops is one of the hottest path in the PCM operations --
> >>> especially for the timer-scheduled operations like PulseAudio -- such
> >>> a delay hogs CPU usage significantly in the total performance.
> >>>
> >>> The function there was taken from the original code in ASoC SST
> >>> Skylake driver blindly.  The udelay() is a workaround for the case
> >>> where the reported position is behind the period boundary at the
> >>> timing triggered from interrupts; applications often expect that the
> >>> full data is available for the whole period when returned (and also
> >>> that's the definition of the ALSA PCM period).
> >>
> >> that initial work-around from the Intel SST driver does not seem to be
> >> legit in the first place, when I checked with hardware folks recently no
> >> one understands why there are delays and special cases for capture. The
> >> only requirement wrt. DPIB is that the DDR update is valid across VC0
> >> and VC1, while the DPIB registers are only valid with VC0. For SOF we
> >> don't know of any VC1 use so will default to the DPIB vendor-specific
> >> registers.
> > 
> > What are those VC0 and VC1 registers?  I can't find the definitions in
> > the code, so I assume that none of ALSA/ASoC drivers use VC1.
> 
> These are PCI concepts/capabilities. VC stands for "Virtual Channel",
> which are mapped to Traffic Class (TC). These registers are typically
> set by the BIOS AFAIK. The point is that if VC1 is enabled only the DPIB
> updates are valid, the vendor-specific will report information can be
> misleading.
> 
> The recommendation from hardware folks is to use DPIB updates in DDR,
> which are known to work across both VC0 and VC1.
> 
> For SOF, we do know VC1 is never used so we'll use the vendor-specific
> registers.

OK, thanks for clarification.

> >> See https://github.com/thesofproject/linux/pull/3143 for my WIP fixes
> >> for SOF.
> >>
> >> I don't have the time to look at this specific patch today but wanted to
> >> make sure you are aware of my on-going fixes.
> >>
> >> Note that the use of DPIB works best if you don't use the IOC interrupt.
> >> when the interrupt is thrown, there is likely a delay for the DPIB
> >> information to be refreshed.
> > 
> > Thanks for the information!  The delay could be the reason of the
> > udelay(), and that's superfluous as mentioned in the commit.
> > 
> > So the remaining question seems to be which method is a better
> > approach for the capture stream: DPIB or posbuf.  I kept the posbuf
> > just for conservatism, but judging from your comment, we may use DPIB
> > for both directions safely?
> 
> sorry you lost me. Isn't DPIB updates in DDR precisely what posbuf reports?
> 
> I think what you need to use is use azx_get_pos_posbuf(chip, azx_dev);
> unconditionally, both for capture and playback, and remove the use of
> the skylake specific stuff and the delay.

When I measured, I saw a slight difference between values in DPIB and
posbuf, so I wonder which is actually more accurate.  The latter is
sometimes a bit behind the former.

If the posbuf is more correct in the sense for the PCM pointer, we can
simply move back to the posbuf like other platforms.


thanks,

Takashi


> > In anyway, the additional mechanism to check the delayed position
> > report in this patch can be kept no matter which way (DPIB or posbuf)
> > is used.
> 
> Agree!
> 
> > 
> > 
> > Takashi
> > 
> >>
> >>>
> >>> OTOH, HD-audio (legacy) driver has already some workarounds for the
> >>> delayed position reporting due to its relatively large FIFO, such as
> >>> the BDL position adjustment and the delayed period-elapsed call in the
> >>> work.  That said, the udelay() is almost superfluous for HD-audio
> >>> driver unlike SST, and we can drop the udelay().
> >>>
> >>> Though, the current code doesn't guarantee the full period readiness
> >>> as mentioned in the above, but rather it checks the wallclock and
> >>> detects the unexpected jump.  That's one missing piece, and the drop
> >>> of udelay() needs a bit more sanity checks for the delayed handling.
> >>>
> >>> This patch implements those: the drop of udelay() call in
> >>> azx_get_pos_skl() and the more proper check of hwptr in
> >>> azx_position_ok().  The latter change is applied only for the case
> >>> where the stream is running in the normal mode without
> >>> no_period_wakeup flag.  When no_period_wakeup is set, it essentially
> >>> ignores the period handling and rather concentrates only on the
> >>> current position; which implies that we don't need to care about the
> >>> period boundary at all.
> >>>
> >>> Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
> >>> Reported-by: Jens Axboe <axboe@kernel.dk>
> >>> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> >>> ---
> >>>  sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++-----
> >>>  1 file changed, 23 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> >>> index 3aa432d814a2..faeeeb923d5e 100644
> >>> --- a/sound/pci/hda/hda_intel.c
> >>> +++ b/sound/pci/hda/hda_intel.c
> >>> @@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev)
> >>>   * the update-IRQ timing.  The IRQ is issued before actually the
> >>>   * data is processed.  So, we need to process it afterwords in a
> >>>   * workqueue.
> >>> + *
> >>> + * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update
> >>>   */
> >>>  static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>>  {
> >>>  	struct snd_pcm_substream *substream = azx_dev->core.substream;
> >>> +	struct snd_pcm_runtime *runtime = substream->runtime;
> >>>  	int stream = substream->stream;
> >>>  	u32 wallclk;
> >>>  	unsigned int pos;
> >>> +	snd_pcm_uframes_t hwptr, target;
> >>>  
> >>>  	wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk;
> >>>  	if (wallclk < (azx_dev->core.period_wallclk * 2) / 3)
> >>> @@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>>  		/* NG - it's below the first next period boundary */
> >>>  		return chip->bdl_pos_adj ? 0 : -1;
> >>>  	azx_dev->core.start_wallclk += wallclk;
> >>> +
> >>> +	if (azx_dev->core.no_period_wakeup)
> >>> +		return 1; /* OK, no need to check period boundary */
> >>> +
> >>> +	if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt)
> >>> +		return 1; /* OK, already in hwptr updating process */
> >>> +
> >>> +	/* check whether the period gets really elapsed */
> >>> +	pos = bytes_to_frames(runtime, pos);
> >>> +	hwptr = runtime->hw_ptr_base + pos;
> >>> +	if (hwptr < runtime->status->hw_ptr)
> >>> +		hwptr += runtime->buffer_size;
> >>> +	target = runtime->hw_ptr_interrupt + runtime->period_size;
> >>> +	if (hwptr < target) {
> >>> +		/* too early wakeup, process it later */
> >>> +		return chip->bdl_pos_adj ? 0 : -1;
> >>> +	}
> >>> +
> >>>  	return 1; /* OK, it's fine */
> >>>  }
> >>>  
> >>> @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
> >>>  	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
> >>>  		return azx_skl_get_dpib_pos(chip, azx_dev);
> >>>  
> >>> -	/* For capture, we need to read posbuf, but it requires a delay
> >>> -	 * for the possible boundary overlap; the read of DPIB fetches the
> >>> -	 * actual posbuf
> >>> -	 */
> >>> -	udelay(20);
> >>> +	/* read of DPIB fetches the actual posbuf */
> >>>  	azx_skl_get_dpib_pos(chip, azx_dev);
> >>>  	return azx_get_pos_posbuf(chip, azx_dev);
> >>>  }
> >>>
> >>
> 

  reply	other threads:[~2021-09-28  9:02 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-10 14:10 [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting Takashi Iwai
2021-09-10 17:04 ` Pierre-Louis Bossart
2021-09-13  5:48   ` Takashi Iwai
2021-09-21 22:18     ` Pierre-Louis Bossart
2021-09-28  9:01       ` Takashi Iwai [this message]
2021-09-28 15:00         ` Pierre-Louis Bossart
2021-09-29  7:14           ` Takashi Iwai
2021-09-13  2:36 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=s5hmtnxnkea.wl-tiwai@suse.de \
    --to=tiwai@suse.de \
    --cc=alsa-devel@alsa-project.org \
    --cc=axboe@kernel.dk \
    --cc=pierre-louis.bossart@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).