(re)use and (re)definition of snd_pcm_hw_params->fifo

public inbox for linux-sound@vger.kernel.org
 help / color / mirror / Atom feed

* (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
@ 2026-03-23 13:34 Péter Ujfalusi
  2026-03-23 14:54 ` Jaroslav Kysela
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-23 13:34 UTC (permalink / raw)
  To: Takashi Iwai, Jaroslav Kysela, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

Hi,

for years the discussion around jumpy, bursty DMA have been popping up
and we always concluded that we are missing a good definition and agreed
way between kernel and user space to handle this.

In ALSA it is expected that the DMA (hw_ptr) moves in steady small
steps. There is a BATCH mode which tells that the DMA cannot report sub
period size position, progression.
The DMA bursting fits neither of this, it moves in bigger jumps (bursts)
and it can as well can report position in byte level.

Typically these systems have DSP which can consumes data in various
chunks, making the hw_ptr to jump. Over time the hw_ptr do move at the
sampling rate, but when zooming in we can see:
initial jump of X frames, hw_ptr stays static for about ~X frame worth
of time then the hw_ptr jumps again ahead of X frames, and again, hw_ptr
stops, ...

This can be a problem for user space which wants to write samples as
close to hw_ptr as possible since if it is not aware of the bursty-DMA
than it is possible that a burst will jump over the sw_ptr, causing xrun.

I was looking at what, how and where we should add this information in
kernel and to take it in use by user space when I stumbled across the
'fifo_size' of hw_params struct.

It is only set by a few drivers:
sound/arm/aaci.c
sound/arm/pxa2xx-pcm-lib.c
sound/soc/renesas/dma-sh7760.c
sound/soc/starfive/jh7110_tdm.c
sound/soc/tegra/tegra_pcm.c
sound/soc/xtensa/xtfpga-i2s.c
sound/usb/misc/ua101.c
sound/x86/intel_hdmi_audio.c

It's definition is awkwardly a bit different in kernel and alsa-lib:
in kernel it can be in bytes or frames, but in user space it is always
in frames (snd_pcm_hw_params_get_fifo_size).

So far I could not find evidence that this is in active use by user
space. Not used in alsa-utils, pipewire, pulseaudio at least and web
search came back empty handed as well.

My proposal thus is to re-use, re-define extend the fifo_size as and
indication that the hw_ptr _can_ jump at least fifo_size number of
frames so applications must take this into account when doing direct
update in ALSA buffer for low latency.

Or should we add a new member (carved out from the reserved section of
hw_params struct) specifically for this purpose, like dma_burst_size,
which likely going to be equal to fifo_size if both is filled by the driver.

Or a new flag (SNDRV_PCM_INFO_) that PCM devices can use to indicate
that the DMA is bursting and in that case the fifo_size holds the number
of frames that it is expected to jump.
But we are slowly running out of bits and I'm not sure if it is a good
idea to dual use in kernel internal bits for user ABI
(SNDRV_PCM_INFO_DRAIN_TRIGGER, SNDRV_PCM_INFO_FIFO_IN_FRAMES)

https://github.com/thesofproject/linux/issues/5313
https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4489

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-23 13:34 (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA' Péter Ujfalusi
@ 2026-03-23 14:54 ` Jaroslav Kysela
  2026-03-23 16:16   ` Péter Ujfalusi
  2026-03-24  7:12 ` Péter Ujfalusi
  2026-03-30 14:27 ` Takashi Iwai
  2 siblings, 1 reply; 25+ messages in thread
From: Jaroslav Kysela @ 2026-03-23 14:54 UTC (permalink / raw)
  To: Péter Ujfalusi, Takashi Iwai, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 3/23/26 14:34, Péter Ujfalusi wrote:
> Hi,
> 
> for years the discussion around jumpy, bursty DMA have been popping up
> and we always concluded that we are missing a good definition and agreed
> way between kernel and user space to handle this.

Hi,

thank you for taking care of this issue.

> Or should we add a new member (carved out from the reserved section of
> hw_params struct) specifically for this purpose, like dma_burst_size,
> which likely going to be equal to fifo_size if both is filled by the driver.

I would not change the fifo_size definition at this moment. It's not directly 
related to the sample transfers. This value just describes the additional 
latency for streaming.

The new member seems like a good way to go with more universal name like 
'chunk_size' (in frames) defining the transfer granularity information for the 
PCM buffer I/O.

					Jaroslav

-- 
Jaroslav Kysela <perex@perex.cz>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-23 14:54 ` Jaroslav Kysela
@ 2026-03-23 16:16   ` Péter Ujfalusi
  2026-03-24  8:58     ` Jaroslav Kysela
  0 siblings, 1 reply; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-23 16:16 UTC (permalink / raw)
  To: Jaroslav Kysela, Takashi Iwai, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 23/03/2026 16:54, Jaroslav Kysela wrote:
>> Or should we add a new member (carved out from the reserved section of
>> hw_params struct) specifically for this purpose, like dma_burst_size,
>> which likely going to be equal to fifo_size if both is filled by the
>> driver.
> 
> I would not change the fifo_size definition at this moment. It's not
> directly related to the sample transfers. This value just describes the
> additional latency for streaming.

Aren't we have the delay reported for this reason, which is actually in
use by user space?

> The new member seems like a good way to go with more universal name like
> 'chunk_size' (in frames) defining the transfer granularity information
> for the PCM buffer I/O.

right, but (I think these are the main issues that blocked any progress
on this) this is not really about the chunk_size, it is telling user
space what is the safe distance from hw_ptr to modify samples without
the DMA jumping over the area.

I know how SOF and tlv320dac33 does this, they have one overlapping and
one-one different modes

A. nSample mode (SOF with bigger than 8ms host buffer or dac33 MODE1)
initially fill the FIFO, then when only threshold amount of data remain
in FIFO, read N number of samples then wait for the threshold to be reached.

B. keep full (SOF with less than 8ms host buffer)
initially fill the FIFO, then whenever 1ms is free, read 1ms

C. two threshold mode (dac33 MODE7)
initially fill the FIFO up to upper threshold, then if the low threshold
is reached start reading data until the high threshold is hit, then wait
again for the low threshold to burst again.

The initial burst for A and B is equal, it is the size of the FIFO, in
case of C it is somewhere above the upper threshold as while filling the
FIFO there are samples played out, so it is a moving target, sort of.

But consequent DMA activity is different:
A: it is nSample, which is smaller than the fifo size
B: it is 1ms
C: something about the diff between the two thresholds

There could be different FIFO and burst setups out there, but in all
cases the fifo_size is something which is a hard safety limit that
applications must be aware of.

Fwiw, ASoC collects the delay from CPU driver, codec driver and also
from platform (DMA) driver, while the fifo_size never been configured,
afaik.

And the fun comes when we give numbers to these! Let's say the FIFO is
100ms long and we have the low threshold at 4ms (start DMA when only 4ms
left in FIFO).
A
start: read 100ms, wait
after 96ms read 96ms, wait
after 96ms, read 96ms, wait
The hw_ptr after 20ms after the start will be at 100ms
The hw_ptr after 70ms after the start will be at 100ms
The hw_ptr after 100ms after the start will be at 196ms

B (theoretical as SOF only uses this in 'small' FIFO case)
start: read 100ms, wait
after 1m, read 1ms, wait
after 1m, read 1ms, wait
The hw_ptr after 20ms after the start will be at 120ms
The hw_ptr after 70ms after the start will be at 170ms
The hw_ptr after 100ms after the start will be at 200ms

C
More like A, with reduced fifo_size, but that is a variable size.

Application must keep 2x fifo_size distance initially, if you plan to
check back in fifo_size time, B will be at one fifo_size ahead already
for examplem, but with A if you check at fifo_size time again, you risk
to race with the coming bursts, so you should not poke too close either..

My idea is to tell user space to keep fifo_size of data as minimum, but
initially this should be 2x fifo_size. How to document and do this is
still not clear, but the only thing the driver can tell: this is my
fifo_size, I burst based on this number.

And I'm really bad at naming things ;)

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-23 13:34 (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA' Péter Ujfalusi
  2026-03-23 14:54 ` Jaroslav Kysela
@ 2026-03-24  7:12 ` Péter Ujfalusi
  2026-03-30 14:27 ` Takashi Iwai
  2 siblings, 0 replies; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-24  7:12 UTC (permalink / raw)
  To: Takashi Iwai, Jaroslav Kysela, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 23/03/2026 15:34, Péter Ujfalusi wrote:
> Hi,
> 
> for years the discussion around jumpy, bursty DMA have been popping up
> and we always concluded that we are missing a good definition and agreed
> way between kernel and user space to handle this.
> 
> In ALSA it is expected that the DMA (hw_ptr) moves in steady small
> steps. There is a BATCH mode which tells that the DMA cannot report sub
> period size position, progression.
> The DMA bursting fits neither of this, it moves in bigger jumps (bursts)
> and it can as well can report position in byte level.

I think it worth explaining why this came up again...
In SOF we have a concept of DeepBuffer PCM devices for which in topology
one can specify the size of the host DMA buffer size (this is in the
DSP), this is where the host DMA transfers data from ALSA buffer.

side note: The DeepBuffer PCMs are not used in Linux (pipewire for
example), so no immediate need

I'm changing how it is working in a fundamental way with
https://github.com/thesofproject/linux/pull/5673
from static to dynamic allocation.

The current, static mode is really simple: in topology the DB is set to
let's say 100ms and the this is how it will be configured.
For applications I place a constraint on the minimum period size to be
100ms.
Since Pipewire does not care about period size, but it needs to provide
at least DB size of data ahead of hw_ptr we have a temporary way to use
the minimum period size as headroom:
https://gitlab.freedesktop.org/pipewire/pipewire/-/merge_requests/2548
This avoids xruns in PW and most ALSA applications also work fine.

But the change to dynamic mode will render the PW 'guessing' broken:
In dynamic mode the DB size will be configured based on the ALSA period
size, but caped at the DB value from topology. If DB was set to 100ms in
topology then this will be the maximum size of the DB, no matter how big
the ALSA period size is (see PR5673 for some numbers on how this looks),
but something like this:
ALSA period:    8 -> dma buffer:   4 ms
ALSA period:   10 -> dma buffer:   6 ms
ALSA period:   16 -> dma buffer:  12 ms
ALSA period:   19 -> dma buffer:  15 ms
ALSA period:   20 -> dma buffer:  20 ms
ALSA period:   50 -> dma buffer:  50 ms
ALSA period:  100 -> dma buffer: 100 ms
ALSA period:  150 -> dma buffer: 100 ms
ALSA period: 2000 -> dma buffer: 100 ms

Legacy applications should be fine with this, but PW still needs to
figure out the headroom andthe min period size will no longer aid that.

If we want to enable at some point the DeepBuffer PCMs for Linux via
pipewire, we need to have a way to communicate the DMA behavior.

I'm not keen on staying on the static DB as I see it as a limiting
factor. If we want a PCM with 200ms buffer (for audio playback) and we
want shallower one with 40ms for calls then you would need to have
separate PCMs and the period size would be constrained, so it is sort of
a headache for user space, while with dynamic DB you would have freedom
to use these as you wish and also benefit the power saving of the bigger
period sizes.

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-23 16:16   ` Péter Ujfalusi
@ 2026-03-24  8:58     ` Jaroslav Kysela
  2026-03-24 10:51       ` Péter Ujfalusi
  0 siblings, 1 reply; 25+ messages in thread
From: Jaroslav Kysela @ 2026-03-24  8:58 UTC (permalink / raw)
  To: Péter Ujfalusi, Takashi Iwai, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 3/23/26 17:16, Péter Ujfalusi wrote:
> 
> 
> On 23/03/2026 16:54, Jaroslav Kysela wrote:
>>> Or should we add a new member (carved out from the reserved section of
>>> hw_params struct) specifically for this purpose, like dma_burst_size,
>>> which likely going to be equal to fifo_size if both is filled by the
>>> driver.
>>
>> I would not change the fifo_size definition at this moment. It's not
>> directly related to the sample transfers. This value just describes the
>> additional latency for streaming.
> 
> Aren't we have the delay reported for this reason, which is actually in
> use by user space?

Yes, this field was defined before the final delay reporting. It describes the 
static FIFOs in hardware. But it was not defined to be related to the hw_ptr 
granularity (transfers). SOF may set this value to indicate fixed latencies 
caused by double buffering, too.

>> The new member seems like a good way to go with more universal name like
>> 'chunk_size' (in frames) defining the transfer granularity information
>> for the PCM buffer I/O.
> 
> right, but (I think these are the main issues that blocked any progress
> on this) this is not really about the chunk_size, it is telling user
> space what is the safe distance from hw_ptr to modify samples without
> the DMA jumping over the area.

It depends on the view and definition. From the hardware view, the 
'chunk_size' may be defined as the maximal block transfer from (or to) the 
audio buffer (describing the hw_ptr granularity). So applications will be 
notified, that filling buffer lower than 'chunk_size' may introduce underruns. 
It covers also the initial transfer (when hw_ptr moves immediately to 
'chunk_size').

But if we agree on other name like 'burst_size' or 'batch_size', I'll be fine 
with it, too.

					Jaroslav

-- 
Jaroslav Kysela <perex@perex.cz>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-24  8:58     ` Jaroslav Kysela
@ 2026-03-24 10:51       ` Péter Ujfalusi
  2026-03-24 13:25         ` Péter Ujfalusi
  2026-03-24 15:48         ` Jaroslav Kysela
  0 siblings, 2 replies; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-24 10:51 UTC (permalink / raw)
  To: Jaroslav Kysela, Takashi Iwai, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 24/03/2026 10:58, Jaroslav Kysela wrote:
> On 3/23/26 17:16, Péter Ujfalusi wrote:
>>> I would not change the fifo_size definition at this moment. It's not
>>> directly related to the sample transfers. This value just describes the
>>> additional latency for streaming.
>>
>> Aren't we have the delay reported for this reason, which is actually in
>> use by user space?
> 
> Yes, this field was defined before the final delay reporting. It
> describes the static FIFOs in hardware. But it was not defined to be
> related to the hw_ptr granularity (transfers). SOF may set this value to
> indicate fixed latencies caused by double buffering, too.

In SOF we use delay to report the delay that the DSP processing might add.
I have noticed the fifo_size only when started to look around again to
give another try to the DMA burst issue.
I did also looked at user space and saw no evidence of anything using
it, that's why I thought that it can be re-purposed - no users, no
regression ;)
But you are right, the original definition of fifo_size is detached from
hw_ptr.

> 
>>> The new member seems like a good way to go with more universal name like
>>> 'chunk_size' (in frames) defining the transfer granularity information
>>> for the PCM buffer I/O.
>>
>> right, but (I think these are the main issues that blocked any progress
>> on this) this is not really about the chunk_size, it is telling user
>> space what is the safe distance from hw_ptr to modify samples without
>> the DMA jumping over the area.
> 
> It depends on the view and definition. From the hardware view, the
> 'chunk_size' may be defined as the maximal block transfer from (or to)
> the audio buffer (describing the hw_ptr granularity). So applications
> will be notified, that filling buffer lower than 'chunk_size' may
> introduce underruns. It covers also the initial transfer (when hw_ptr
> moves immediately to 'chunk_size').

I'm trying to wrap my head around this, I think this is the generalized
view of this:
We have a FIFO in hardware which will be filled on start: dma_fifo_size
After the initial burst, the DMA keeps the FIFO full with sample, 1ms,
2ms, etc steps: dma_runtime_size
A DMA burst happens every dma_runtime_size time after the initial FIFO
fill burst.
Something like this:
dma_fifo_size = 100ms
dma_runtime_size = 1ms
Initial burst is 100ms, then after every 1ms DMA moves 1ms ahead

dma_fifo_size = 100ms
dma_runtime_size = 96ms
Initial burst is 100ms, then after every 96ms DMA moves 96ms ahead

On start thus it is going to cause immediate xrun if application only
provides dma_fifo_size data if the dma_runtime_size is 1ms. Well, it
will cause xrun right at 1ms playback time, if dma_runtime_size is 10ms,
then after 10ms, given if application is not quick to fill data right
away it started, which might not be the case.

During runtime however it is safe for application to have sort of
untouchable data of dma_runtime_size ahead of hw_ptr, but dma_fifo_size
is definitely a safe distance with some headroom.

The question is: is it enough to only have single value exposed or do we
need/want add all small variations of this to be exposed?
I would think that a single dma_fifo_size should be OK and apps should
prepare for the crawling DMA case.

> But if we agree on other name like 'burst_size' or 'batch_size', I'll be
> fine with it, too.

I think neither of these are right, unless we substitute them with a
variable which tells the size of the FIFO in hardware that is going to
be filled.
max_hw_ptr_burst_size or max_dma_burst_size or max_burst_size - in frames?

> 
>                     Jaroslav
> 

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-24 10:51       ` Péter Ujfalusi
@ 2026-03-24 13:25         ` Péter Ujfalusi
  2026-03-24 15:48         ` Jaroslav Kysela
  1 sibling, 0 replies; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-24 13:25 UTC (permalink / raw)
  To: Jaroslav Kysela, Takashi Iwai, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans



On 24/03/2026 12:51, Péter Ujfalusi wrote:
> The question is: is it enough to only have single value exposed or do we
> need/want add all small variations of this to be exposed?
> I would think that a single dma_fifo_size should be OK and apps should
> prepare for the crawling DMA case.
> 
>> But if we agree on other name like 'burst_size' or 'batch_size', I'll be
>> fine with it, too.
> 
> I think neither of these are right, unless we substitute them with a
> variable which tells the size of the FIFO in hardware that is going to
> be filled.
> max_hw_ptr_burst_size or max_dma_burst_size or max_burst_size - in frames?

something like this in kernel and then we need alsa-lib update and then we can introduce the use in pipewire?

diff --git a/include/uapi/sound/asound.h b/include/uapi/sound/asound.h
index d3ce75ba938a..3a1872b3e701 100644
--- a/include/uapi/sound/asound.h
+++ b/include/uapi/sound/asound.h
@@ -421,7 +421,8 @@ struct snd_pcm_hw_params {
 	unsigned int rate_den;		/* R: rate denominator */
 	snd_pcm_uframes_t fifo_size;	/* R: chip FIFO size in frames */
 	unsigned char sync[16];		/* R: synchronization ID (perfect sync - one clock source) */
-	unsigned char reserved[48];	/* reserved for future */
+	snd_pcm_uframes_t max_dma_burst_size;	/* R: maximum DMA burst size in frames */
+	unsigned char reserved[48 - sizeof(snd_pcm_uframes_t)];	/* reserved for future */
 };
 
 enum {
diff --git a/sound/core/pcm_compat.c b/sound/core/pcm_compat.c
index e71f393d3b01..462588f3527b 100644
--- a/sound/core/pcm_compat.c
+++ b/sound/core/pcm_compat.c
@@ -64,7 +64,9 @@ struct snd_pcm_hw_params32 {
 	u32 rate_num;
 	u32 rate_den;
 	u32 fifo_size;
-	unsigned char reserved[64];
+	unsigned char sync[16];
+	u32 max_dma_burst_size;
+	unsigned char reserved[44];
 };
 
 struct snd_pcm_sw_params32 {
@@ -247,9 +249,11 @@ static int snd_pcm_ioctl_hw_params_compat(struct snd_pcm_substream *substream,
 	if (!data)
 		return -ENOMEM;
 
-	/* only fifo_size (RO from userspace) is different, so just copy all */
+	/* copy common members and fix up 32-bit uframe fields explicitly */
 	if (copy_from_user(data, data32, sizeof(*data32)))
 		return -EFAULT;
+	data->fifo_size = data32->fifo_size;
+	data->max_dma_burst_size = data32->max_dma_burst_size;
 
 	if (refine) {
 		err = snd_pcm_hw_refine(substream, data);
@@ -262,7 +266,8 @@ static int snd_pcm_ioctl_hw_params_compat(struct snd_pcm_substream *substream,
 	if (err < 0)
 		return err;
 	if (copy_to_user(data32, data, sizeof(*data32)) ||
-	    put_user(data->fifo_size, &data32->fifo_size))
+	    put_user(data->fifo_size, &data32->fifo_size) ||
+	    put_user(data->max_dma_burst_size, &data32->max_dma_burst_size))
 		return -EFAULT;
 
 	if (! refine) {


-- 
Péter


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-24 10:51       ` Péter Ujfalusi
  2026-03-24 13:25         ` Péter Ujfalusi
@ 2026-03-24 15:48         ` Jaroslav Kysela
  2026-03-25 13:28           ` Péter Ujfalusi
  1 sibling, 1 reply; 25+ messages in thread
From: Jaroslav Kysela @ 2026-03-24 15:48 UTC (permalink / raw)
  To: Péter Ujfalusi, Takashi Iwai, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 3/24/26 11:51, Péter Ujfalusi wrote:

>> But if we agree on other name like 'burst_size' or 'batch_size', I'll be
>> fine with it, too.
> 
> I think neither of these are right, unless we substitute them with a
> variable which tells the size of the FIFO in hardware that is going to
> be filled.
> max_hw_ptr_burst_size or max_dma_burst_size or max_burst_size - in frames?

I missed the 100ms + 1ms case (it was a bit hidden in the examples). Looking 
to this again, I think that the terminology should be more abstracted. The DMA 
word is not appropriate in this contents. Even the burst word is not so good. 
Also, applications do not work with hw_ptr, but with the avail / delay values 
(calculated from the head [hw_ptr] /tail [appl_ptr] - playback).

I think that we both mean similar thing, but the question is how to define the 
right API for it.

Basically, for playback this value should be the "minimal fill" recommendation 
to avoid underruns. So for your examples, it should be 101ms or 196ms (but in 
samples) right ?

If we look to other direction (capture), it may describe the "maximal sample 
block" which will be put to the audio buffer (hw_ptr change).

So, perhaps, the xfer_latency name may be used (value should be in samples)? 
Applications should not go bellow this value.

					Jaroslav

-- 
Jaroslav Kysela <perex@perex.cz>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-24 15:48         ` Jaroslav Kysela
@ 2026-03-25 13:28           ` Péter Ujfalusi
  2026-03-25 14:08             ` Jaroslav Kysela
  0 siblings, 1 reply; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-25 13:28 UTC (permalink / raw)
  To: Jaroslav Kysela, Takashi Iwai, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 24/03/2026 17:48, Jaroslav Kysela wrote:
> On 3/24/26 11:51, Péter Ujfalusi wrote:
> 
>>> But if we agree on other name like 'burst_size' or 'batch_size', I'll be
>>> fine with it, too.
>>
>> I think neither of these are right, unless we substitute them with a
>> variable which tells the size of the FIFO in hardware that is going to
>> be filled.
>> max_hw_ptr_burst_size or max_dma_burst_size or max_burst_size - in
>> frames?
> 
> I missed the 100ms + 1ms case (it was a bit hidden in the examples).

Sorry about that, too much sometimes result less ;)

> Looking to this again, I think that the terminology should be more
> abstracted. The DMA word is not appropriate in this contents. Even the
> burst word is not so good. Also, applications do not work with hw_ptr,
> but with the avail / delay values (calculated from the head [hw_ptr] /
> tail [appl_ptr] - playback).
> 
> I think that we both mean similar thing, but the question is how to
> define the right API for it.

Yes, I agree.

> Basically, for playback this value should be the "minimal fill"
> recommendation to avoid underruns. So for your examples, it should be
> 101ms or 196ms (but in samples) right ?
> 
> If we look to other direction (capture), it may describe the "maximal
> sample block" which will be put to the audio buffer (hw_ptr change).
> 
> So, perhaps, the xfer_latency name may be used (value should be in
> samples)? Applications should not go bellow this value.

Looking at the PW code (spa/plugins/alsa/alsa-pcm.c) and I sort of
started to appreciate the 'headroom' term.

I disagree that this is not about DMA or about burst since it is
precisely about that. It can be viewed as 'latency' but the latency is
reported via the delay and the delay fluctuates over time based on how
the DSP FIFO drains and fills.

What we want to tell applications is to be careful as the hw_ptr is not
running in steady pace, but it is jumping, bursting, uses bigger steps.
If we really want to be precise, we would want to expose two numbers:
A: the size of the FIFO will will be filled initially
B: the size of the steps which the hw_ptr will move after the initial
burst. This is also the 'distance' of these steps

So, again the 100ms FIFO:
A=100ms
B=96ms

start: hw_ptr moves to 100ms and will stay there for 96ms
50ms: hw_ptr is still at 100ms
96ms: hw_ptr jumps to 196ms and will stay there for 96ms
...
if you want to predict where the hw_ptr will be at any given time of
Xms hw_ptr: 100 + roundown(X - 100, 96) // if I'm right

A=100ms
B=1ms

start: hw_ptr moves to 100ms and will stay there for 1ms
50ms: hw_ptr is at 150ms
96ms: hw_ptr is at 196ms
...
if you want to predict where the hw_ptr will be at any given time of
Xms hw_ptr: 100 + roundown(X - 100, 1) // if I'm right

The kernel cannot predict how application is planning to manage the
buffer, but if it knows when it will wake up then it can pre-calculate
the amount of data it must provide to avoid xrun.

If B is not provided they can use B=A a rughstimate ;)

On capture side the hw_ptr will jump by B, A have no relevance.

A: hw_buffer_size
B: hw_buffer_refill_step_size

s/hw_buffer/offload ?

PW then probably can use these:
state->start_delay = hw_buffer_size;
state->headroom = hw_buffer_refill_step_size

and on start/xrun it might need to provide
state->start_delay + max(state->start_delay, state->headroom)
or to the distance that it is planning to wake up.

Likely applications can work this out even with A to keep things simple.

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-25 13:28           ` Péter Ujfalusi
@ 2026-03-25 14:08             ` Jaroslav Kysela
  2026-03-26 12:04               ` Péter Ujfalusi
  0 siblings, 1 reply; 25+ messages in thread
From: Jaroslav Kysela @ 2026-03-25 14:08 UTC (permalink / raw)
  To: Péter Ujfalusi, Takashi Iwai, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 3/25/26 14:28, Péter Ujfalusi wrote:

> Looking at the PW code (spa/plugins/alsa/alsa-pcm.c) and I sort of
> started to appreciate the 'headroom' term.
> 
> I disagree that this is not about DMA or about burst since it is
> precisely about that. It can be viewed as 'latency' but the latency is
> reported via the delay and the delay fluctuates over time based on how
> the DSP FIFO drains and fills.

My point was that we should not use those terms for the structure members, 
because it's standard prefill / data fetch (put) granularity issue. DMA is 
just the concrete use. Describing both parameters seem like the best idea.

> PW then probably can use these:
> state->start_delay = hw_buffer_size;
> state->headroom = hw_buffer_refill_step_size

We don't use head/tail in the buffer variables for ALSA API. The headroom can 
be named just 'update_granularity' or 'update_step' or so. Also, start delay 
is not correct, too. Maybe 'prefill' or 'init_fill' says all. Or we can 
eventually try to be more consistent and use 'init_chunk' and 'step_chunk' to 
notify applications about the data transfer characteristics from the other 
(hw) side of the audio buffer. The chunk word is not used actually in the ALSA 
API, so developers should not mix that with other things like fifo or period.

> and on start/xrun it might need to provide
> state->start_delay + max(state->start_delay, state->headroom)
> or to the distance that it is planning to wake up.

It will end with 2 * prefill for your examples, because start_delay > 
headroom. It should be just prefill + step.

Also, it seems that in SOF case, the queued "prefill" sample block is 
mandatory for internal processing, otherwise xrun will happen, or no ?

				Jaroslav

-- 
Jaroslav Kysela <perex@perex.cz>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-25 14:08             ` Jaroslav Kysela
@ 2026-03-26 12:04               ` Péter Ujfalusi
  0 siblings, 0 replies; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-26 12:04 UTC (permalink / raw)
  To: Jaroslav Kysela, Takashi Iwai, Mark Brown, Liam Girdwood
  Cc: Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 25/03/2026 16:08, Jaroslav Kysela wrote:
> On 3/25/26 14:28, Péter Ujfalusi wrote:
> 
>> Looking at the PW code (spa/plugins/alsa/alsa-pcm.c) and I sort of
>> started to appreciate the 'headroom' term.
>>
>> I disagree that this is not about DMA or about burst since it is
>> precisely about that. It can be viewed as 'latency' but the latency is
>> reported via the delay and the delay fluctuates over time based on how
>> the DSP FIFO drains and fills.
> 
> My point was that we should not use those terms for the structure
> members, because it's standard prefill / data fetch (put) granularity
> issue. DMA is just the concrete use. Describing both parameters seem
> like the best idea.

OK.

>> PW then probably can use these:
>> state->start_delay = hw_buffer_size;
>> state->headroom = hw_buffer_refill_step_size
> 
> We don't use head/tail in the buffer variables for ALSA API. The
> headroom can be named just 'update_granularity' or 'update_step' or so.
> Also, start delay is not correct, too. Maybe 'prefill' or 'init_fill'
> says all. Or we can eventually try to be more consistent and use
> 'init_chunk' and 'step_chunk' to notify applications about the data
> transfer characteristics from the other (hw) side of the audio buffer.
> The chunk word is not used actually in the ALSA API, so developers
> should not mix that with other things like fifo or period.

The headroom and start_delay are pipewire internal variables.

Right, let's go with init_chunk and step_chunk it is.
I needed to move these around in mouth to get the taste of it, but I
think it will do it.

It is better to have them in frames, right?

>> and on start/xrun it might need to provide
>> state->start_delay + max(state->start_delay, state->headroom)
>> or to the distance that it is planning to wake up.
> 
> It will end with 2 * prefill for your examples, because start_delay >
> headroom.

true

> It should be just prefill + step.

Not quite right, this works when the DSP buffer is 'large enough' - yes,
the definition for now is 'large enough', see:
We have only experience with this with SOF so far... The default PCMs
are using 4ms buffer in DSP and what we see is that if the PW headroom
is less than 8ms then it can xrun right away at start.
In this case the firmware steps the DMA in 1ms steps, so after just 1ms
into the playback (and this is counted when the kernel flipped the start
bit!) the DMA already pointing at the 6th ms data, if there is any
scheduling latency spike around this then it is pretty much guarantied
that and xrun will happen.
With larger DSP buffers this is less likely as the DMA is more stationary.

But, this is up to the application to interpret the values the kernel
gives, I would personally do the headroom 2x init_chunk at minimum for
the start/xrun case then use the step_chunk runtime.

> Also, it seems that in SOF case, the queued "prefill" sample block is
> mandatory for internal processing, otherwise xrun will happen, or no ?

Yes, the firmware fills the host buffer initially, but with
https://github.com/thesofproject/linux/pull/5673 this buffer is not
static, it changes based on period size. Can be limited by topology.

The prefill block is sort of not mandatory, we need ~4ms minimum
internally on host side to be able to compensate DSP internal scheduling
(clock scaling, power transitions, rate changes, etc), this is the default.
With 'DeepBuffer' on PCM we can put the system to lower PC state while
playing audio as we only need the DSP and the IP to run and it has audio
data to chew through.

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-23 13:34 (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA' Péter Ujfalusi
  2026-03-23 14:54 ` Jaroslav Kysela
  2026-03-24  7:12 ` Péter Ujfalusi
@ 2026-03-30 14:27 ` Takashi Iwai
  2026-03-30 15:15   ` Péter Ujfalusi
  2 siblings, 1 reply; 25+ messages in thread
From: Takashi Iwai @ 2026-03-30 14:27 UTC (permalink / raw)
  To: Péter Ujfalusi
  Cc: Takashi Iwai, Jaroslav Kysela, Mark Brown, Liam Girdwood,
	Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On Mon, 23 Mar 2026 14:34:16 +0100,
Péter Ujfalusi wrote:
> 
> Hi,
> 
> for years the discussion around jumpy, bursty DMA have been popping up
> and we always concluded that we are missing a good definition and agreed
> way between kernel and user space to handle this.
> 
> In ALSA it is expected that the DMA (hw_ptr) moves in steady small
> steps. There is a BATCH mode which tells that the DMA cannot report sub
> period size position, progression.
> The DMA bursting fits neither of this, it moves in bigger jumps (bursts)
> and it can as well can report position in byte level.
> 
> Typically these systems have DSP which can consumes data in various
> chunks, making the hw_ptr to jump. Over time the hw_ptr do move at the
> sampling rate, but when zooming in we can see:
> initial jump of X frames, hw_ptr stays static for about ~X frame worth
> of time then the hw_ptr jumps again ahead of X frames, and again, hw_ptr
> stops, ...
> 
> This can be a problem for user space which wants to write samples as
> close to hw_ptr as possible since if it is not aware of the bursty-DMA
> than it is possible that a burst will jump over the sw_ptr, causing xrun.
> 
> I was looking at what, how and where we should add this information in
> kernel and to take it in use by user space when I stumbled across the
> 'fifo_size' of hw_params struct.
> 
> It is only set by a few drivers:
> sound/arm/aaci.c
> sound/arm/pxa2xx-pcm-lib.c
> sound/soc/renesas/dma-sh7760.c
> sound/soc/starfive/jh7110_tdm.c
> sound/soc/tegra/tegra_pcm.c
> sound/soc/xtensa/xtfpga-i2s.c
> sound/usb/misc/ua101.c
> sound/x86/intel_hdmi_audio.c
> 
> It's definition is awkwardly a bit different in kernel and alsa-lib:
> in kernel it can be in bytes or frames, but in user space it is always
> in frames (snd_pcm_hw_params_get_fifo_size).
> 
> So far I could not find evidence that this is in active use by user
> space. Not used in alsa-utils, pipewire, pulseaudio at least and web
> search came back empty handed as well.
> 
> My proposal thus is to re-use, re-define extend the fifo_size as and
> indication that the hw_ptr _can_ jump at least fifo_size number of
> frames so applications must take this into account when doing direct
> update in ALSA buffer for low latency.
> 
> Or should we add a new member (carved out from the reserved section of
> hw_params struct) specifically for this purpose, like dma_burst_size,
> which likely going to be equal to fifo_size if both is filled by the driver.
> 
> Or a new flag (SNDRV_PCM_INFO_) that PCM devices can use to indicate
> that the DMA is bursting and in that case the fifo_size holds the number
> of frames that it is expected to jump.
> But we are slowly running out of bits and I'm not sure if it is a good
> idea to dual use in kernel internal bits for user ABI
> (SNDRV_PCM_INFO_DRAIN_TRIGGER, SNDRV_PCM_INFO_FIFO_IN_FRAMES)
> 
> https://github.com/thesofproject/linux/issues/5313
> https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4489

Sorry to be late in the game as I was off in the last weeks, and
slowly catching up.  So, I'm reading this thread from the beginning,
and wondering what is needed from user-space API POV.

First off, we do already have a way to report a fine-grained playback
pointer as "delay" even in jumpy hw_ptr updates as long as the driver
supports it; e.g. a few drivers for hardware with packet-based
transfers like USB-audio support that mode.  Doesn't it suffice for
your need?

And, if a more different parameter is required and defined, how an
application can use it?  An application can read/write PCM parameters,
write PCM data (either via mmap or write/ioctl), and sleep/wakeup via
poll() -- basically that's all.  Would the new parameter influence on
the poll wakeup behavior?  Or who controls in which way?


thanks,

Takashi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-30 14:27 ` Takashi Iwai
@ 2026-03-30 15:15   ` Péter Ujfalusi
  2026-03-30 16:39     ` Takashi Iwai
  0 siblings, 1 reply; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-30 15:15 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Takashi Iwai, Jaroslav Kysela, Mark Brown, Liam Girdwood,
	Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 30/03/2026 17:27, Takashi Iwai wrote:
>> https://github.com/thesofproject/linux/issues/5313
>> https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4489
> 
> Sorry to be late in the game as I was off in the last weeks, and
> slowly catching up.  So, I'm reading this thread from the beginning,
> and wondering what is needed from user-space API POV.

This discussion is stretches back a decade or so, few weeks do not matter ;)

> First off, we do already have a way to report a fine-grained playback
> pointer as "delay" even in jumpy hw_ptr updates as long as the driver
> supports it;

The delay reporting with the jumpy hw_ptr works great, most application
on Linux uses it (mplayer, mpv, vlc, pipewire, pulseaudio, etc).
Chromium's alsa_conformance ironically did not, but it is for some times
now:
https://chromium.googlesource.com/chromiumos/platform/audiotest/+/eccd8be776d45a2e3b3006d74f174ff216cb01d8%5E%21/#F0

> e.g. a few drivers for hardware with packet-based
> transfers like USB-audio support that mode.  Doesn't it suffice for
> your need?

There is no real way to express the jumpy hw_ptr, it is not really
packet mode (while in some sense it is, but rather not) and the BATCH
mode certainly not a fit.

> And, if a more different parameter is required and defined, how an
> application can use it?  An application can read/write PCM parameters,
> write PCM data (either via mmap or write/ioctl), and sleep/wakeup via
> poll() -- basically that's all.  Would the new parameter influence on
> the poll wakeup behavior?  Or who controls in which way?

The main target at the moment is pipewire in Linux, it uses mmap and
timer (no period wakeup) to process audio in the most efficient way. It
can fall back to non mmap and poll, but that comes with lots of drawback
in power consumption and CPU use.

We have the default configuration of SOF working now fine with it's
jumpy-DMA, which is:
4ms host facing buffer inside of DSP and 1ms DMA bursts every 1ms.
This translates:
when audio starts, before 1ms elapsed the hw_ptr will be pointing to the
sample at 5ms, 4ms has been sent to the DSP, when 1ms elapsed from the
playback, the hw_ptr is at 6ms, you can say that it is 4ms ahead of the
playback progress viewed in the host facing buffer of the DSP.
But, this 4ms is not the delay, the delay is a different thing, which
includes the 4ms plus processing path within the DSP.

The issues at hand is that we need to tell the applications about this
so they can work out how to manage the mmaped area.
They must provide enough data on start - 1ms is not enough as the DMA
jumps 4ms, 4ms is likely not enough, unless the application can be sure
it can provide more data with 1ms - when the DMA will move 1ms ahead.

I have other examples in later mails for different configurations.

This is not really about delay, it is not really fits into BATCH mode
either.

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-30 15:15   ` Péter Ujfalusi
@ 2026-03-30 16:39     ` Takashi Iwai
  2026-03-31  6:00       ` Péter Ujfalusi
  0 siblings, 1 reply; 25+ messages in thread
From: Takashi Iwai @ 2026-03-30 16:39 UTC (permalink / raw)
  To: Péter Ujfalusi
  Cc: Takashi Iwai, Jaroslav Kysela, Mark Brown, Liam Girdwood,
	Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On Mon, 30 Mar 2026 17:15:38 +0200,
Péter Ujfalusi wrote:
> 
> 
> 
> On 30/03/2026 17:27, Takashi Iwai wrote:
> >> https://github.com/thesofproject/linux/issues/5313
> >> https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4489
> > 
> > Sorry to be late in the game as I was off in the last weeks, and
> > slowly catching up.  So, I'm reading this thread from the beginning,
> > and wondering what is needed from user-space API POV.
> 
> This discussion is stretches back a decade or so, few weeks do not matter ;)
> 
> > First off, we do already have a way to report a fine-grained playback
> > pointer as "delay" even in jumpy hw_ptr updates as long as the driver
> > supports it;
> 
> The delay reporting with the jumpy hw_ptr works great, most application
> on Linux uses it (mplayer, mpv, vlc, pipewire, pulseaudio, etc).
> Chromium's alsa_conformance ironically did not, but it is for some times
> now:
> https://chromium.googlesource.com/chromiumos/platform/audiotest/+/eccd8be776d45a2e3b3006d74f174ff216cb01d8%5E%21/#F0
> 
> > e.g. a few drivers for hardware with packet-based
> > transfers like USB-audio support that mode.  Doesn't it suffice for
> > your need?
> 
> There is no real way to express the jumpy hw_ptr, it is not really
> packet mode (while in some sense it is, but rather not) and the BATCH
> mode certainly not a fit.
> 
> > And, if a more different parameter is required and defined, how an
> > application can use it?  An application can read/write PCM parameters,
> > write PCM data (either via mmap or write/ioctl), and sleep/wakeup via
> > poll() -- basically that's all.  Would the new parameter influence on
> > the poll wakeup behavior?  Or who controls in which way?
> 
> The main target at the moment is pipewire in Linux, it uses mmap and
> timer (no period wakeup) to process audio in the most efficient way. It
> can fall back to non mmap and poll, but that comes with lots of drawback
> in power consumption and CPU use.
> 
> We have the default configuration of SOF working now fine with it's
> jumpy-DMA, which is:
> 4ms host facing buffer inside of DSP and 1ms DMA bursts every 1ms.
> This translates:
> when audio starts, before 1ms elapsed the hw_ptr will be pointing to the
> sample at 5ms, 4ms has been sent to the DSP, when 1ms elapsed from the
> playback, the hw_ptr is at 6ms, you can say that it is 4ms ahead of the
> playback progress viewed in the host facing buffer of the DSP.
> But, this 4ms is not the delay, the delay is a different thing, which
> includes the 4ms plus processing path within the DSP.
> 
> The issues at hand is that we need to tell the applications about this
> so they can work out how to manage the mmaped area.
> They must provide enough data on start - 1ms is not enough as the DMA
> jumps 4ms, 4ms is likely not enough, unless the application can be sure
> it can provide more data with 1ms - when the DMA will move 1ms ahead.
> 
> I have other examples in later mails for different configurations.
> 
> This is not really about delay, it is not really fits into BATCH mode
> either.

OK, then I'd say that the existing fifo_size doesn't fit fully for
this kind of stuff.  e.g. if a device allows a different queue size,
it should be configurable via hw_params.


Takashi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-30 16:39     ` Takashi Iwai
@ 2026-03-31  6:00       ` Péter Ujfalusi
  2026-03-31  6:36         ` Takashi Iwai
  0 siblings, 1 reply; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-31  6:00 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Takashi Iwai, Jaroslav Kysela, Mark Brown, Liam Girdwood,
	Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 30/03/2026 19:39, Takashi Iwai wrote:
> OK, then I'd say that the existing fifo_size doesn't fit fully for
> this kind of stuff.  e.g.

We came to the same conclusion with Jaroslav, and the plan is to
introduce two new parameter in hw_params:
init_chunk and step_chunk, both in frames.
init_chunk - is the size of the hw_ptr jump right when the start happens
step_chunk - is the runtime jump size which happens every step_chunk time.

for example:
init_chunk = 100ms
step_chunk = 1ms
hw_ptr moves 100ms on start (pointing to 101ms), after 1ms of time the
hw_ptr will move 1ms ahead to 102ms, in another 1ms it again moves 1ms
to 103ms...

init_chunk = 100ms
step_chunk = 96ms
hw_ptr moves 100ms on start (pointing to 101ms), after 96ms of time the
hw_ptr will move 96ms ahead to 197ms, in another 96ms it again moves 1ms
to 293ms...

Note, the first is theoretical, with SOF 1ms step is used only with
'small' DSP side buffer:

init_chunk = 4ms
step_chunk = 1ms
hw_ptr moves 4ms on start (pointing to 5ms), after 1ms of time the
hw_ptr will move 1ms ahead to 6ms, in another 1ms it again moves 1ms to
7ms...

I'm not sure if we want these to be snd_pcm_uframes_t types in
snd_pcm_hw_params or should be u32 simplify the shrinking of reserved..

-       unsigned char reserved[48];
+       snd_pcm_uframes_t init_chunk;	/* in frames */
+       snd_pcm_uframes_t step_chunk;	/* in frames */
+       unsigned char reserved[48 - 2 * sizeof(snd_pcm_uframes_t)];

with u32 we can simply change the reserved size to 40, which is anyways
going to be the case for the snd_pcm_hw_params32{}

> if a device allows a different queue size, it should be configurable
> via hw_params.

I'm not sure if I follow this statement. the fifo_size is a driver to
user space information, driver fills it and user space ignores it ;) - I
cannot find any evidence of it's use.

The init_chunk, step_chunk would be similar, the driver sets it and user
space would use it.

In SOF this will be dynamic and it will depend on the period size:
https://github.com/thesofproject/linux/pull/5673/commits/18f3ba5e42212d77019d79ec09b7057a7703d361

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-31  6:00       ` Péter Ujfalusi
@ 2026-03-31  6:36         ` Takashi Iwai
  2026-03-31  9:29           ` Jaroslav Kysela
  2026-03-31 11:19           ` Péter Ujfalusi
  0 siblings, 2 replies; 25+ messages in thread
From: Takashi Iwai @ 2026-03-31  6:36 UTC (permalink / raw)
  To: Péter Ujfalusi
  Cc: Takashi Iwai, Jaroslav Kysela, Mark Brown, Liam Girdwood,
	Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On Tue, 31 Mar 2026 08:00:56 +0200,
Péter Ujfalusi wrote:
> 
> 
> 
> On 30/03/2026 19:39, Takashi Iwai wrote:
> > OK, then I'd say that the existing fifo_size doesn't fit fully for
> > this kind of stuff.  e.g.
> 
> We came to the same conclusion with Jaroslav, and the plan is to
> introduce two new parameter in hw_params:
> init_chunk and step_chunk, both in frames.
> init_chunk - is the size of the hw_ptr jump right when the start happens
> step_chunk - is the runtime jump size which happens every step_chunk time.
> 
> for example:
> init_chunk = 100ms
> step_chunk = 1ms
> hw_ptr moves 100ms on start (pointing to 101ms), after 1ms of time the
> hw_ptr will move 1ms ahead to 102ms, in another 1ms it again moves 1ms
> to 103ms...
> 
> init_chunk = 100ms
> step_chunk = 96ms
> hw_ptr moves 100ms on start (pointing to 101ms), after 96ms of time the
> hw_ptr will move 96ms ahead to 197ms, in another 96ms it again moves 1ms
> to 293ms...

In the second example, where does 1ms offset comes from?
  hw_ptr moves 100ms on start (pointing to 101ms)

I thought that this 1ms is the step_chunk size in the first example...

> Note, the first is theoretical, with SOF 1ms step is used only with
> 'small' DSP side buffer:
> 
> init_chunk = 4ms
> step_chunk = 1ms
> hw_ptr moves 4ms on start (pointing to 5ms), after 1ms of time the
> hw_ptr will move 1ms ahead to 6ms, in another 1ms it again moves 1ms to
> 7ms...

So, init_chunk is the size to be filled up at the start, something
similar like sw_params.start_threshold, but it's rather a hardware
requirement.

And step_chunk is essentially the hw_ptr granularity?

> I'm not sure if we want these to be snd_pcm_uframes_t types in
> snd_pcm_hw_params or should be u32 simplify the shrinking of reserved..
> 
> -       unsigned char reserved[48];
> +       snd_pcm_uframes_t init_chunk;	/* in frames */
> +       snd_pcm_uframes_t step_chunk;	/* in frames */
> +       unsigned char reserved[48 - 2 * sizeof(snd_pcm_uframes_t)];
> 
> with u32 we can simply change the reserved size to 40, which is anyways
> going to be the case for the snd_pcm_hw_params32{}

In my idea, it may be configurable, hence it belongs to interval, so
it's two in ires[9] for the reserved intervals.

> > if a device allows a different queue size, it should be configurable
> > via hw_params.
> 
> I'm not sure if I follow this statement. the fifo_size is a driver to
> user space information, driver fills it and user space ignores it ;) - I
> cannot find any evidence of it's use.

If a chip has a similar constraint but the init and step sizes are
adjustable, they should be configurable via hw_params procedure --
that's my point.

> The init_chunk, step_chunk would be similar, the driver sets it and user
> space would use it.
> 
> In SOF this will be dynamic and it will depend on the period size:
> https://github.com/thesofproject/linux/pull/5673/commits/18f3ba5e42212d77019d79ec09b7057a7703d361

Well, so even in your case, the driver can implement the hw_constraint
for coupling those numbers, too.  Then application may choose the
init_chunk or step_chunk, which restricts the period size
automatically.  If application doesn't choose those, the hw_params
engine will choose depending on the period size, and application can
see the values after hw_params call.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-31  6:36         ` Takashi Iwai
@ 2026-03-31  9:29           ` Jaroslav Kysela
  2026-03-31 10:42             ` Kai Vehmanen
  2026-03-31 10:56             ` Péter Ujfalusi
  2026-03-31 11:19           ` Péter Ujfalusi
  1 sibling, 2 replies; 25+ messages in thread
From: Jaroslav Kysela @ 2026-03-31  9:29 UTC (permalink / raw)
  To: Takashi Iwai, Péter Ujfalusi
  Cc: Takashi Iwai, Mark Brown, Liam Girdwood, Linux-ALSA,
	linux-sound@vger.kernel.org, Kai Vehmanen, arun, wim.taymans

On 3/31/26 08:36, Takashi Iwai wrote:

>>> if a device allows a different queue size, it should be configurable
>>> via hw_params.
>>
>> I'm not sure if I follow this statement. the fifo_size is a driver to
>> user space information, driver fills it and user space ignores it ;) - I
>> cannot find any evidence of it's use.
> 
> If a chip has a similar constraint but the init and step sizes are
> adjustable, they should be configurable via hw_params procedure --
> that's my point.
> 
>> The init_chunk, step_chunk would be similar, the driver sets it and user
>> space would use it.
>>
>> In SOF this will be dynamic and it will depend on the period size:
>> https://github.com/thesofproject/linux/pull/5673/commits/18f3ba5e42212d77019d79ec09b7057a7703d361
> 
> Well, so even in your case, the driver can implement the hw_constraint
> for coupling those numbers, too.  Then application may choose the
> init_chunk or step_chunk, which restricts the period size
> automatically.  If application doesn't choose those, the hw_params
> engine will choose depending on the period size, and application can
> see the values after hw_params call.

I was more thinking about this problem and it seems that the root of this 
cause is that application (pipewire) is trying to bypass the period based 
mechanism for which we already have the handshake. It's no issue to ask for 
smaller periods from the app side to maintain the expected latency. It is also 
understandable that we need to fill at least two periods at start and keep 
this buffer timing (also with counting the system scheduling latencies).

So, perhaps, the only flag may be added notifying that the (first) minimal 
period is processed immediately after start. It's also common situation for 
other drivers with double buffering in the driver like USB, maybe FireWire, 
right ? Note that this flag won't be the BATCH flag. Or we can just add a 
field notifying how many minimal periods are queued at start to be more 
universal (apparently SOF requires this, because the initial chunk of queued 
samples is bigger then the later chunks - so the first transfer will go over 
more periods).

We may discuss if small periods are efficient. We have already mechanism to 
disable period events (SNDRV_PCM_HW_PARAMS_NO_PERIOD_WAKEUP) and drivers don't 
do usually deep buffering, so they program DMA transfers with smaller (or 
equal) chunks than period size.

It seems to me that we are trying to design another layer on top of the 
current just to satisfy the improper current PCM API use.

And saying this, it appears that the kernel drivers (yes SOF) are trying the 
bypass the period constraints make it freely customized [1] instead to apply 
constraints based on the hardware limits including the internal maintenance 
(CPU <-> DSP buffering). So the issue is on both sides and the things are 
failing because the standard period handshake is not honored.

					Jaroslav

[1] 
https://github.com/thesofproject/linux/pull/5673/changes#diff-d8bbc05d879b6eee2041d6fc0ee06f050be097ac05b12cfec9b35d89f66d3a84R79-R89

/*
  * When the host DMA buffer size is larger than 8ms, the firmware switches from		
  * a constant fill mode to burst mode, keeping a 4ms threshold to trigger 		
  * transfer of approximately host DMA buffer size - 4ms after the initial burst		
  * to fill the entire buffer.		
  * To simplify the logic, above 20ms ALSA period size use the same size for 
host		
  * DMA buffer, while if the ALSA period size is smaller than 20ms, then use a		
  * headroom between host DMA buffer and ALSA period size.		
  */

-- 
Jaroslav Kysela <perex@perex.cz>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-31  9:29           ` Jaroslav Kysela
@ 2026-03-31 10:42             ` Kai Vehmanen
  2026-03-31 10:56             ` Péter Ujfalusi
  1 sibling, 0 replies; 25+ messages in thread
From: Kai Vehmanen @ 2026-03-31 10:42 UTC (permalink / raw)
  To: Jaroslav Kysela
  Cc: Takashi Iwai, Péter Ujfalusi, Takashi Iwai, Mark Brown,
	Liam Girdwood, Linux-ALSA, linux-sound@vger.kernel.org,
	Kai Vehmanen, arun, wim.taymans

Hey,

On Tue, 31 Mar 2026, Jaroslav Kysela wrote:

> So, perhaps, the only flag may be added notifying that the (first) minimal
> period is processed immediately after start. It's also common situation for
> other drivers with double buffering in the driver like USB, maybe FireWire,
> right ? Note that this flag won't be the BATCH flag. Or we can just add a
> field notifying how many minimal periods are queued at start to be more
> universal (apparently SOF requires this, because the initial chunk of queued
> samples is bigger then the later chunks - so the first transfer will go over
> more periods).
> 
> We may discuss if small periods are efficient. We have already mechanism to
> disable period events (SNDRV_PCM_HW_PARAMS_NO_PERIOD_WAKEUP) and drivers don't
> do usually deep buffering, so they program DMA transfers with smaller (or
> equal) chunks than period size.
> 
> It seems to me that we are trying to design another layer on top of the
> current just to satisfy the improper current PCM API use.

btw, I think this is a bit of a chicken and egg problem. Hardware may have 
deep buffering capability, but it's difficult to enable via ALSA without 
breaking applications with current interfaces (and how they are used).
There could be more drivers that enable deep buffering if the semantics
were clear.

E.g. with SOF firmware, the buffering behaviour is completely 
programmable. E.g. we could have a 100ms buffer between main CPU and the 
DSP and still report accurate delay with snd_pcm_delay(). In most cases 
the FW support is there anyways and one could enable this just by 
modifying the DSP topology file (which is basicly an alsaconf file). The 
hw_ptr will move in bursts, and will move faster-than-realtime at start. 
And it's important to note, to maximize power benefits, the exact size and 
timing of the bursts may be driven by what else is going on in the system. 
The audio DSP of course knows what is the maximum size of a burst (as the 
DSP needs to have local memory to receive the transfer), but we can't 
guarantee the acual burst (as seen as hw_ptr movement) is always of the 
same size.

We do already honor period size configuration set by applications and SOF 
never allows DMA bursts to exceed period size.

But once we increase the buffer size beyond a few milliseconds, 
applications start to break and we see the interpretation of the ALSA 
period semantics becomes less and less clear.

PS The DSP code that decides on DMA reloads in SOF (this is called every 
1ms, hd->dma_buffer_size is buffer towards host, this can be tens of 
msecs):
   https://github.com/thesofproject/sof/blob/main/src/audio/host-zephyr.c#L570
   .. this is common code in SOF, not specific to any one vendor.

Br, Kai

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-31  9:29           ` Jaroslav Kysela
  2026-03-31 10:42             ` Kai Vehmanen
@ 2026-03-31 10:56             ` Péter Ujfalusi
  2026-03-31 12:00               ` Jaroslav Kysela
  1 sibling, 1 reply; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-31 10:56 UTC (permalink / raw)
  To: Jaroslav Kysela, Takashi Iwai
  Cc: Takashi Iwai, Mark Brown, Liam Girdwood, Linux-ALSA,
	linux-sound@vger.kernel.org, Kai Vehmanen, arun, wim.taymans

On 31/03/2026 12:29, Jaroslav Kysela wrote:
> On 3/31/26 08:36, Takashi Iwai wrote:
> 
>>>> if a device allows a different queue size, it should be configurable
>>>> via hw_params.
>>>
>>> I'm not sure if I follow this statement. the fifo_size is a driver to
>>> user space information, driver fills it and user space ignores it ;) - I
>>> cannot find any evidence of it's use.
>>
>> If a chip has a similar constraint but the init and step sizes are
>> adjustable, they should be configurable via hw_params procedure --
>> that's my point.
>>
>>> The init_chunk, step_chunk would be similar, the driver sets it and user
>>> space would use it.
>>>
>>> In SOF this will be dynamic and it will depend on the period size:
>>> https://github.com/thesofproject/linux/pull/5673/
>>> commits/18f3ba5e42212d77019d79ec09b7057a7703d361
>>
>> Well, so even in your case, the driver can implement the hw_constraint
>> for coupling those numbers, too.  Then application may choose the
>> init_chunk or step_chunk, which restricts the period size
>> automatically.  If application doesn't choose those, the hw_params
>> engine will choose depending on the period size, and application can
>> see the values after hw_params call.
> 
> I was more thinking about this problem and it seems that the root of
> this cause is that application (pipewire) is trying to bypass the period
> based mechanism for which we already have the handshake. It's no issue
> to ask for smaller periods from the app side to maintain the expected
> latency. It is also understandable that we need to fill at least two
> periods at start and keep this buffer timing (also with counting the
> system scheduling latencies).
> 
> So, perhaps, the only flag may be added notifying that the (first)
> minimal period is processed immediately after start. It's also common
> situation for other drivers with double buffering in the driver like
> USB, maybe FireWire, right ? Note that this flag won't be the BATCH
> flag. Or we can just add a field notifying how many minimal periods are
> queued at start to be more universal (apparently SOF requires this,
> because the initial chunk of queued samples is bigger then the later
> chunks - so the first transfer will go over more periods).
> 
> We may discuss if small periods are efficient. We have already mechanism
> to disable period events (SNDRV_PCM_HW_PARAMS_NO_PERIOD_WAKEUP) and
> drivers don't do usually deep buffering, so they program DMA transfers
> with smaller (or equal) chunks than period size.
> 
> It seems to me that we are trying to design another layer on top of the
> current just to satisfy the improper current PCM API use.
> 
> And saying this, it appears that the kernel drivers (yes SOF) are trying
> the bypass the period constraints make it freely customized [1] instead
> to apply constraints based on the hardware limits including the internal
> maintenance (CPU <-> DSP buffering). So the issue is on both sides and
> the things are failing because the standard period handshake is not
> honored.

The dynamic DeepBuffer support is not upstream for SOF, yet. Currently
we have static DeepBuffer where the topology defines the size of the hos
facing buffer to be used in the DSP.
This is entirely SW concept.
For this static DeepBuffer the kernel will place a constraint on the
minimum period size to not allow it smaller (plus some headroom) than
the DeepBuffer set by the topology file.
Pipewire uses the minimum period size internally for headroom calculation:
https://gitlab.freedesktop.org/pipewire/pipewire/-/commit/9c42c06af00818038be494fd41632cb2d50a0df5

This works more or less, but things started to get out of control soon.
To get the most power saving benefit, you would want quite big
DeepBuffer, let's say 100ms but for video/voice calls you don't want
this big buffer, around 40ms should be OK, some other use case might
need something in between.
If you run tinyplay on the DeepBuffer PCM w/o  specifying period size,
it will fail to find the minimum and outright fails unless we lower the
DB to around 20ms.
With 20ms you don't have much power savings in case of audio playback.
So, you spam out several PCM devices with different DeepBuffer to
accommodate use cases, this is really not something that makes sense.

This is the reason why I have reversed the rules and adjusted the
DeepBuffer size based on the period size.
In this case we still need to limit how large the DB can be, so it is
not true that the DeepBuffer equals to the period size, it is caped.

With the reversed rule we cannot constrain the period size obviously.

Still, with the constrained period size in current static DeepBuffer, PW
is free to do whatever it wants and thus we needed to enforce that the
it is not safe to be closer to the hw_ptr than the minimum period size.

We don't want to allow DeepBuffer on the normal PCM, we still want this
to be available on a special PCM device, use case:
audio plays on DeepBuffer which allows power saving and notifications
are played through normal PCMs - they are mixed in DSP and as soon as
normal PCM is closed we can immediately benefit from the power savings.

Probably a user configurable DeepBuffer can work, but then the kernel
should limit the maximum period size?

Strictly locking the DeepBuffer with period size creates another
obstacle for audio applications, audio servers like PW don't care about
periods, they work on the buffer with timer based updates and rewriting
samples in buffer ahead of the hw_ptr.

> 
>                     Jaroslav
> 
> [1] https://github.com/thesofproject/linux/pull/5673/changes#diff-
> d8bbc05d879b6eee2041d6fc0ee06f050be097ac05b12cfec9b35d89f66d3a84R79-R89
> 
> /*
>  * When the host DMA buffer size is larger than 8ms, the firmware
> switches from       
>  * a constant fill mode to burst mode, keeping a 4ms threshold to
> trigger        
>  * transfer of approximately host DMA buffer size - 4ms after the
> initial burst       
>  * to fill the entire buffer.       
>  * To simplify the logic, above 20ms ALSA period size use the same size
> for host       
>  * DMA buffer, while if the ALSA period size is smaller than 20ms, then
> use a       
>  * headroom between host DMA buffer and ALSA period size.       
>  */
> 

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-31  6:36         ` Takashi Iwai
  2026-03-31  9:29           ` Jaroslav Kysela
@ 2026-03-31 11:19           ` Péter Ujfalusi
  1 sibling, 0 replies; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-31 11:19 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Takashi Iwai, Jaroslav Kysela, Mark Brown, Liam Girdwood,
	Linux-ALSA, linux-sound@vger.kernel.org, Kai Vehmanen, arun,
	wim.taymans

On 31/03/2026 09:36, Takashi Iwai wrote:
>> We came to the same conclusion with Jaroslav, and the plan is to
>> introduce two new parameter in hw_params:
>> init_chunk and step_chunk, both in frames.
>> init_chunk - is the size of the hw_ptr jump right when the start happens
>> step_chunk - is the runtime jump size which happens every step_chunk time.
>>
>> for example:
>> init_chunk = 100ms
>> step_chunk = 1ms
>> hw_ptr moves 100ms on start (pointing to 101ms), after 1ms of time the
>> hw_ptr will move 1ms ahead to 102ms, in another 1ms it again moves 1ms
>> to 103ms...
>>
>> init_chunk = 100ms
>> step_chunk = 96ms
>> hw_ptr moves 100ms on start (pointing to 101ms), after 96ms of time the
>> hw_ptr will move 96ms ahead to 197ms, in another 96ms it again moves 1ms
>> to 293ms...
> 
> In the second example, where does 1ms offset comes from?
>   hw_ptr moves 100ms on start (pointing to 101ms)

The DMA moved 100ms, so the hw_ptr is now pointing to the data after it,
which is the start of the 101th ms.

> I thought that this 1ms is the step_chunk size in the first example...
> 
>> Note, the first is theoretical, with SOF 1ms step is used only with
>> 'small' DSP side buffer:
>>
>> init_chunk = 4ms
>> step_chunk = 1ms
>> hw_ptr moves 4ms on start (pointing to 5ms), after 1ms of time the
>> hw_ptr will move 1ms ahead to 6ms, in another 1ms it again moves 1ms to
>> 7ms...
> 
> So, init_chunk is the size to be filled up at the start, something
> similar like sw_params.start_threshold, but it's rather a hardware
> requirement.

Yes and no,
yes since there must be init_chunk amount be in the buffer to avoid
immediate xrun.
no, because if even if init_chunk amount is prepared, the application
must provide step_chunk data within step_chunk time, failure to do so
will also result xrun.
init_chunk=100ms
application needs ~5ms to produce new data after start
step_chunk=1ms
in 5ms the when the application is able to produce new data, the hw_ptr
will be 106ms, so it must have provided at least this amount on start.

step_chunk=1ms
in 5ms the when the application is able to produce new data, the hw_ptr
will be 101ms, application can safely rewrite t he data at that
location, it is safe to provide only 100ms of data for start.

> And step_chunk is essentially the hw_ptr granularity?

In essence yes, we can report sub step_chunk position, but in practice
the DMA burst is fast.

>> I'm not sure if we want these to be snd_pcm_uframes_t types in
>> snd_pcm_hw_params or should be u32 simplify the shrinking of reserved..
>>
>> -       unsigned char reserved[48];
>> +       snd_pcm_uframes_t init_chunk;	/* in frames */
>> +       snd_pcm_uframes_t step_chunk;	/* in frames */
>> +       unsigned char reserved[48 - 2 * sizeof(snd_pcm_uframes_t)];
>>
>> with u32 we can simply change the reserved size to 40, which is anyways
>> going to be the case for the snd_pcm_hw_params32{}
> 
> In my idea, it may be configurable, hence it belongs to interval, so
> it's two in ires[9] for the reserved intervals.

I see.

>>> if a device allows a different queue size, it should be configurable
>>> via hw_params.
>>
>> I'm not sure if I follow this statement. the fifo_size is a driver to
>> user space information, driver fills it and user space ignores it ;) - I
>> cannot find any evidence of it's use.
> 
> If a chip has a similar constraint but the init and step sizes are
> adjustable, they should be configurable via hw_params procedure --
> that's my point.

The fifo_size is driver to application information, it is set by the driver.

>> The init_chunk, step_chunk would be similar, the driver sets it and user
>> space would use it.
>>
>> In SOF this will be dynamic and it will depend on the period size:
>> https://github.com/thesofproject/linux/pull/5673/commits/18f3ba5e42212d77019d79ec09b7057a7703d361
> 
> Well, so even in your case, the driver can implement the hw_constraint
> for coupling those numbers, too.  Then application may choose the
> init_chunk or step_chunk, which restricts the period size
> automatically.

We would also need a capability flag to say that this is supported on
the PCM device, right?
But, I see, the kernel places min/max or even step constraint on the
init_chunk and step_chunk, but the step_chunk must be also constrained
and refined in core in correlation with the init_chunk.
Limiting the period size is something I'm not sure about, probably the
minimum size can be limited, but how to calculate it?
The upper constraint is problematic when for example you want some
buffering (~40ms) but still want to have bigger period sizes because you
want bigger buffer and the number of periods are limited already by
hardware.

> If application doesn't choose those, the hw_params
> engine will choose depending on the period size, and application can
> see the values after hw_params call.

Hm, I cannot constrain the period size in open when I don't know if the
user will set the init/step_chunk and if they set it then somehow I need
to use that, but if not set then use the dynamic DB size calculation and
set the interval accordingly?

I guess some of this work must be done by the core, some by the driver..

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-31 10:56             ` Péter Ujfalusi
@ 2026-03-31 12:00               ` Jaroslav Kysela
  2026-03-31 14:09                 ` Péter Ujfalusi
  0 siblings, 1 reply; 25+ messages in thread
From: Jaroslav Kysela @ 2026-03-31 12:00 UTC (permalink / raw)
  To: Péter Ujfalusi, Takashi Iwai
  Cc: Takashi Iwai, Mark Brown, Liam Girdwood, Linux-ALSA,
	linux-sound@vger.kernel.org, Kai Vehmanen, arun, wim.taymans

On 3/31/26 12:56, Péter Ujfalusi wrote:

> Pipewire uses the minimum period size internally for headroom calculation:
> https://gitlab.freedesktop.org/pipewire/pipewire/-/commit/9c42c06af00818038be494fd41632cb2d50a0df5
> 
> This works more or less, but things started to get out of control soon.

It's conditional, so it's not a general rule. Pipewire just tries to push the 
drivers to limits with hard-coded assumptions.

> To get the most power saving benefit, you would want quite big
> DeepBuffer, let's say 100ms but for video/voice calls you don't want
> this big buffer, around 40ms should be OK, some other use case might
> need something in between.
> If you run tinyplay on the DeepBuffer PCM w/o  specifying period size,
> it will fail to find the minimum and outright fails unless we lower the
> DB to around 20ms.
> With 20ms you don't have much power savings in case of audio playback.
> So, you spam out several PCM devices with different DeepBuffer to
> accommodate use cases, this is really not something that makes sense.
> 
> This is the reason why I have reversed the rules and adjusted the
> DeepBuffer size based on the period size.
> In this case we still need to limit how large the DB can be, so it is
> not true that the DeepBuffer equals to the period size, it is caped.

It seems that the discussion somewhere else:

1) applications don't know what they are want, thus driver will enforce a 
configuration (e.g. a forced deep buffer)

2) ALSA PCM period mapping to the hardware which allows "deep buffering"

At the moment, nothing prevents ALSA applications to use the bigger period 
sizes when there are known beforehand like "play background music". In other 
words - power efficiency can be controlled from the app. The applications can 
also update PCM buffer at any time, so new data (we are speaking about 
playback) can be feed at any time to make the time window for the underrun 
occurrence shorter when the driver transfers data to the hardware.

We may have those configurations (* = future when driver tells the initial 
count of queued periods):

  period | period count | init_periods  | minimal data latency |
--------+--------------+---------------+----------------------+---------
  4ms    | 4            | unsettled (1) | 4ms                  | realtime
*4ms    | 200          | 25            | 100ms                | semi-realtime
  1000ms | 2            | 1             | 1000ms               | pwr efficient

> Strictly locking the DeepBuffer with period size creates another
> obstacle for audio applications, audio servers like PW don't care about
> periods, they work on the buffer with timer based updates and rewriting
> samples in buffer ahead of the hw_ptr.

IMHO, the period size describes the expected minimal latency and applications 
can put more samples to the PCM buffer. So the whole question is to give 
suggestions which data threshold is safe for the standard operation.

The timer based queuing should just help to feed data more frequently, but it 
does not mean that applications should not set the period size based on the 
requirements. Ideally, everything should be coordinated - period sizes with 
proper sample feed timing from the application side.

					Jaroslav

-- 
Jaroslav Kysela <perex@perex.cz>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-31 12:00               ` Jaroslav Kysela
@ 2026-03-31 14:09                 ` Péter Ujfalusi
  2026-04-02 12:01                   ` Jaroslav Kysela
  0 siblings, 1 reply; 25+ messages in thread
From: Péter Ujfalusi @ 2026-03-31 14:09 UTC (permalink / raw)
  To: Jaroslav Kysela, Takashi Iwai
  Cc: Takashi Iwai, Mark Brown, Liam Girdwood, Linux-ALSA,
	linux-sound@vger.kernel.org, Kai Vehmanen, arun, wim.taymans

On 31/03/2026 15:00, Jaroslav Kysela wrote:
> On 3/31/26 12:56, Péter Ujfalusi wrote:
> 
>> Pipewire uses the minimum period size internally for headroom
>> calculation:
>> https://gitlab.freedesktop.org/pipewire/pipewire/-/
>> commit/9c42c06af00818038be494fd41632cb2d50a0df5
>>
>> This works more or less, but things started to get out of control soon.
> 
> It's conditional, so it's not a general rule. Pipewire just tries to
> push the drivers to limits with hard-coded assumptions.
> 
>> To get the most power saving benefit, you would want quite big
>> DeepBuffer, let's say 100ms but for video/voice calls you don't want
>> this big buffer, around 40ms should be OK, some other use case might
>> need something in between.
>> If you run tinyplay on the DeepBuffer PCM w/o  specifying period size,
>> it will fail to find the minimum and outright fails unless we lower the
>> DB to around 20ms.
>> With 20ms you don't have much power savings in case of audio playback.
>> So, you spam out several PCM devices with different DeepBuffer to
>> accommodate use cases, this is really not something that makes sense.
>>
>> This is the reason why I have reversed the rules and adjusted the
>> DeepBuffer size based on the period size.
>> In this case we still need to limit how large the DB can be, so it is
>> not true that the DeepBuffer equals to the period size, it is caped.
> 
> It seems that the discussion somewhere else:
> 
> 1) applications don't know what they are want, thus driver will enforce
> a configuration (e.g. a forced deep buffer)
> 
> 2) ALSA PCM period mapping to the hardware which allows "deep buffering"

With SOF we don't advertise the use of the DeepBuffer enabled PCM, it is only used on consumer devices where the audio HAL knows how to use it.
The static-DB and the dynamic-DB both works fine with applications, like mplayer, mpv, vlc, the only issue is with pipewire (which is not aware of it).
What I want is to achieve at the end is to advertise this via UCM's low-power-hifi, but due to the bursts, Pipewire can get xrun.

> At the moment, nothing prevents ALSA applications to use the bigger
> period sizes when there are known beforehand like "play background
> music". In other words - power efficiency can be controlled from the
> app.

But with a static-DB and locking t he period size to the DB size would prevent bigger period size use and even with bigger period size, if the DB size is 'small' then the power efficiency is out of the window.
But, if we increase the static-DB size then we limit the minimum period size to scale down.

> The applications can also update PCM buffer at any time, so new
> data (we are speaking about playback) can be feed at any time to make
> the time window for the underrun occurrence shorter when the driver
> transfers data to the hardware.

Yes, pipewire does this, it does not care about the period size while media players do. I guess most application which uses NO_PERIOD_WAKEUP does this, they don't care about period boundaries.
They look at avail (hw_ptr, sw_ptr, delay) and work on manage the data in buffer as close to hw_ptr as possible.

> 
> We may have those configurations (* = future when driver tells the
> initial count of queued periods):
> 

>  period | period count | init_periods  | minimal data latency |
> --------+--------------+---------------+----------------------+---------
>  4ms    | 4            | unsettled (1) | 4ms                  | realtime
> *4ms    | 200          | 25            | 100ms                | semi-realtime
>  1000ms | 2            | 1             | 1000ms               | pwr efficient

SOF specific, but we limit the period size to be min 8ms to unbrake pipewire to avoid xzrun on start and xrun handling and to hint application what is safe.
Also: we cannot scale the DeepBuffer infinitely, it is carved out from the DSP's SRAM, so it is limited.
period size=1000ms with DeepBuffer=100ms is perfectly valid configuration, as well period size=50ms with DeepBuffer=50ms, as well period size=1000ms with 4ms non-deepbuffer.

>> Strictly locking the DeepBuffer with period size creates another
>> obstacle for audio applications, audio servers like PW don't care about
>> periods, they work on the buffer with timer based updates and rewriting
>> samples in buffer ahead of the hw_ptr.
> 
> IMHO, the period size describes the expected minimal latency and
> applications can put more samples to the PCM buffer. So the whole
> question is to give suggestions which data threshold is safe for the
> standard operation.

Yes, it is and not. PW allocates big period sizes and manages really low latencies when it comes to audio.
And yes, the question is the 'headroom' that is safe to use by applications when dealing with something which is jumpy

> The timer based queuing should just help to feed data more frequently,
> but it does not mean that applications should not set the period size
> based on the requirements. Ideally, everything should be coordinated -
> period sizes with proper sample feed timing from the application side.

Right, but, but, in normal PCM of SOF, the ShallowBuffer is 4ms, we cannot set constraint for the period size to be min and max 4ms, that would xrun right away.
And this would just bring in the issue of the static DB that we have, if it is big, then applications would break if they want smaller periods.

BTW, should we still keep the dynamic DeepBuffer as a way from the kernel to do things automatically? It will allow unaware ALSA applications to gain this for free, but it needs information to be given to clever ones (like PW) on how to deal with the jumps.

Again: the deep buffer is not in use by user space, we just think that it is a great feature to have and would benefit laptop users if they listen to music with the screen blanked.

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-03-31 14:09                 ` Péter Ujfalusi
@ 2026-04-02 12:01                   ` Jaroslav Kysela
  2026-04-07 11:59                     ` Péter Ujfalusi
  0 siblings, 1 reply; 25+ messages in thread
From: Jaroslav Kysela @ 2026-04-02 12:01 UTC (permalink / raw)
  To: Péter Ujfalusi, Takashi Iwai
  Cc: Takashi Iwai, Mark Brown, Liam Girdwood, Linux-ALSA,
	linux-sound@vger.kernel.org, Kai Vehmanen, arun, wim.taymans

On 3/31/26 16:09, Péter Ujfalusi wrote:
> 
> 
> On 31/03/2026 15:00, Jaroslav Kysela wrote:
>> On 3/31/26 12:56, Péter Ujfalusi wrote:
>>
>>> Pipewire uses the minimum period size internally for headroom
>>> calculation:
>>> https://gitlab.freedesktop.org/pipewire/pipewire/-/
>>> commit/9c42c06af00818038be494fd41632cb2d50a0df5
>>>
>>> This works more or less, but things started to get out of control soon.
>>
>> It's conditional, so it's not a general rule. Pipewire just tries to
>> push the drivers to limits with hard-coded assumptions.
>>
>>> To get the most power saving benefit, you would want quite big
>>> DeepBuffer, let's say 100ms but for video/voice calls you don't want
>>> this big buffer, around 40ms should be OK, some other use case might
>>> need something in between.
>>> If you run tinyplay on the DeepBuffer PCM w/o  specifying period size,
>>> it will fail to find the minimum and outright fails unless we lower the
>>> DB to around 20ms.
>>> With 20ms you don't have much power savings in case of audio playback.
>>> So, you spam out several PCM devices with different DeepBuffer to
>>> accommodate use cases, this is really not something that makes sense.
>>>
>>> This is the reason why I have reversed the rules and adjusted the
>>> DeepBuffer size based on the period size.
>>> In this case we still need to limit how large the DB can be, so it is
>>> not true that the DeepBuffer equals to the period size, it is caped.
>>
>> It seems that the discussion somewhere else:
>>
>> 1) applications don't know what they are want, thus driver will enforce
>> a configuration (e.g. a forced deep buffer)
>>
>> 2) ALSA PCM period mapping to the hardware which allows "deep buffering"
> 
> With SOF we don't advertise the use of the DeepBuffer enabled PCM, it is only used on consumer devices where the audio HAL knows how to use it.
> The static-DB and the dynamic-DB both works fine with applications, like mplayer, mpv, vlc, the only issue is with pipewire (which is not aware of it).
> What I want is to achieve at the end is to advertise this via UCM's low-power-hifi, but due to the bursts, Pipewire can get xrun.
> 
>> At the moment, nothing prevents ALSA applications to use the bigger
>> period sizes when there are known beforehand like "play background
>> music". In other words - power efficiency can be controlled from the
>> app.
> 
> But with a static-DB and locking t he period size to the DB size would prevent bigger period size use and even with bigger period size, if the DB size is 'small' then the power efficiency is out of the window.
> But, if we increase the static-DB size then we limit the minimum period size to scale down.

It's safe to allow bigger periods from the driver which are multiple of the DB 
size.

>> The applications can also update PCM buffer at any time, so new
>> data (we are speaking about playback) can be feed at any time to make
>> the time window for the underrun occurrence shorter when the driver
>> transfers data to the hardware.
> 
> Yes, pipewire does this, it does not care about the period size while media players do. I guess most application which uses NO_PERIOD_WAKEUP does this, they don't care about period boundaries.
> They look at avail (hw_ptr, sw_ptr, delay) and work on manage the data in buffer as close to hw_ptr as possible.

But apps should be aware about the information which area (e.g. period) is 
current so they should avoid to work in this area, if system task scheduling 
is probably going to interrupt this realtime data feed.

Basically, originally we took the periods as the base transfer block in API 
design. The period size and the filled count of periods (playback) gave the 
latency (of course without additional FIFO in hw path - just for the CPU <-> 
HW path).

>> We may have those configurations (* = future when driver tells the
>> initial count of queued periods):
>>
> 
>>   period | period count | init_periods  | minimal data latency |
>> --------+--------------+---------------+----------------------+---------
>>   4ms    | 4            | unsettled (1) | 4ms                  | realtime
>> *4ms    | 200          | 25            | 100ms                | semi-realtime
>>   1000ms | 2            | 1             | 1000ms               | pwr efficient
> 
> SOF specific, but we limit the period size to be min 8ms to unbrake pipewire to avoid xzrun on start and xrun handling and to hint application what is safe.

And it's the whole problem. You are trying to solve a problem caused with the 
situation that applications do not know about those constraints.

>> IMHO, the period size describes the expected minimal latency and
>> applications can put more samples to the PCM buffer. So the whole
>> question is to give suggestions which data threshold is safe for the
>> standard operation.
> 
> Yes, it is and not. PW allocates big period sizes and manages really low latencies when it comes to audio.
> And yes, the question is the 'headroom' that is safe to use by applications when dealing with something which is jumpy

But current PW behaviour is based on "assumption" that the data are processed 
in small chunks from the PCM buffer. And this assumption is not true for SOF 
while it worked perfectly for legacy HDA and all simple PCI (even ISA :-)) 
sound hardware. Also the initial transfer at the stream start is different. 
And I think that USB / FireWire serial buses are in similar situation. I saw 
workaround (special settings) in PW for those cases, too.

>> The timer based queuing should just help to feed data more frequently,
>> but it does not mean that applications should not set the period size
>> based on the requirements. Ideally, everything should be coordinated -
>> period sizes with proper sample feed timing from the application side.
> 
> Right, but, but, in normal PCM of SOF, the ShallowBuffer is 4ms, we cannot set constraint for the period size to be min and max 4ms, that would xrun right away.

The xrun happens just because application do not push data to the next period 
in time, right?

> And this would just bring in the issue of the static DB that we have, if it is big, then applications would break if they want smaller periods.

We need definitely a handshake (app<->kernel) for this, but the question is, 
if just going back to honor the period sizes properly from the user space 
applications is a right way to go or not. IMHO it may be just a clarification 
for the current mechanism.

In any case we need those extensions:

Kernel -> user space:

- give the initial transfer chunk size to user space
- give the next (step) transfer chunk size to user space

The minimal requirement for the playback data at start would be:

	'init_chunk + (2 * step_chunk)'

Note that init_chunk may be zero (legacy PCI HW).

In my proposal, step_chunk == period_size and init_chunk will be provided 
using additional value in hw_params (as count of initial periods).

User space -> kernel:

- notification that the code honors period sizes + initial periods and uses 
period size to ask kernel for requested latency (keeping the current behavior 
for older binaries; and to allow drivers to do optimal setup)
- request for low hw_ptr granularity (to use e.g. deep buffers) - basically 
okay, we don't care, the "BATCH" transfer mode with whole periods is okay

Example (with the "honor period size" handshake activated):

20ms latency goal, 4ms hw transfers, 12ms initial hw buffer

-> set period size to 10ms or less
      # 2 periods must be filled to avoid an initial xrun
-> set buffer size to 30ms or more
<- get 8 periods x 4ms
<- 3 initial periods (3*4ms = 12ms)

suggested initial fill = 12ms + 2 * 4ms ; 20ms total

The driver (constraints setup) should take account the initial periods in this 
case.

> BTW, should we still keep the dynamic DeepBuffer as a way from the kernel to do things automatically? It will allow unaware ALSA applications to gain this for free, but it needs information to be given to clever ones (like PW) on how to deal with the jumps.

It's difficult to suggest something for this problem when the specific 
application does not give any hint to the kernel space about future API calls 
and expected use (if they will do rewind for example in future). We need 
properly document everything related to the transfers and let applications to 
choose between "expect data change soon" and "buffering" behaviour IMHO.

And definitely, we should not do optimizations related to single app - choose 
specific period size just because pipewire does not work correctly (speaking 
about the clarified API) - in the drivers.

				Jaroslav

-- 
Jaroslav Kysela <perex@perex.cz>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-04-02 12:01                   ` Jaroslav Kysela
@ 2026-04-07 11:59                     ` Péter Ujfalusi
  2026-04-07 13:50                       ` Jaroslav Kysela
  0 siblings, 1 reply; 25+ messages in thread
From: Péter Ujfalusi @ 2026-04-07 11:59 UTC (permalink / raw)
  To: Jaroslav Kysela, Takashi Iwai
  Cc: Takashi Iwai, Mark Brown, Liam Girdwood, Linux-ALSA,
	linux-sound@vger.kernel.org, Kai Vehmanen, arun, wim.taymans

On 02/04/2026 15:01, Jaroslav Kysela wrote:
>> But with a static-DB and locking t he period size to the DB size would
>> prevent bigger period size use and even with bigger period size, if
>> the DB size is 'small' then the power efficiency is out of the window.
>> But, if we increase the static-DB size then we limit the minimum
>> period size to scale down.
> 
> It's safe to allow bigger periods from the driver which are multiple of
> the DB size.

But that is the thing, this is not always true for every configuration.
4ms DB with 1ms runtime steps for example.

> But apps should be aware about the information which area (e.g. period)
> is current so they should avoid to work in this area, if system task
> scheduling is probably going to interrupt this realtime data feed.
> 
> Basically, originally we took the periods as the base transfer block in
> API design. The period size and the filled count of periods (playback)
> gave the latency (of course without additional FIFO in hw path - just
> for the CPU <-> HW path).

But then came the NO_PERIOD_WAKEUP (if I recall Android and Pipewire
were the ones to push for this and use it).
This concept kind of superseded the period size as scheduling unit
paradigm, but still it was expecting the DMA to move in continuous and
small steps - preferably in max frame sizes.

>>> We may have those configurations (* = future when driver tells the
>>> initial count of queued periods):
>>>
>>
>>>   period | period count | init_periods  | minimal data latency |
>>> --------+--------------+---------------+----------------------+---------
>>>   4ms    | 4            | unsettled (1) | 4ms                  |
>>> realtime
>>> *4ms    | 200          | 25            | 100ms                | semi-
>>> realtime
>>>   1000ms | 2            | 1             | 1000ms               | pwr
>>> efficient
>>
>> SOF specific, but we limit the period size to be min 8ms to unbrake
>> pipewire to avoid xzrun on start and xrun handling and to hint
>> application what is safe.
> 
> And it's the whole problem. You are trying to solve a problem caused
> with the situation that applications do not know about those constraints.

What we do is not unique among the systems that do have jumpy-DMA, they
all constrain the min period size.
The drivers forbid user space to use something which will fail on start.

The difference here is that we want to have a solution which could cover
all devices and systems.
Afaik all 'solutions' atm are product specific or the device is special
cased. We could do this with PW for SOF and call it a day ;)
That does not  scale...

The Nokia n9 had a PCM device with a codec which could suck up 128ms of
audio and play it from it's own memory (tlv320dac33), it has kcontrol
for user space to set the FIFO thresholds and modes and a modified
pulseaudio to understand this.

> But current PW behaviour is based on "assumption" that the data are
> processed in small chunks from the PCM buffer. And this assumption is
> not true for SOF while it worked perfectly for legacy HDA and all simple
> PCI (even ISA :-)) sound hardware. Also the initial transfer at the
> stream start is different. And I think that USB / FireWire serial buses
> are in similar situation. I saw workaround (special settings) in PW for
> those cases, too.

Yes, USB/FW audio sets the SNDRV_PCM_INFO_BLOCK_TRANSFER and PW looks at
the devices and if it see that BLOCK_TRANSFER is set and  the device is
USB/FW then it sort of ignores it, uses custom headroom (I think doubles
the period size for it) and ignores the interrupts.

We could stamp SOF as BLOCK_TRANSFER device (which it is not) and extend
the special casing from USB/FW to include SOF, could work. But then you
will add a new QC device or AMD device or.

>>> The timer based queuing should just help to feed data more frequently,
>>> but it does not mean that applications should not set the period size
>>> based on the requirements. Ideally, everything should be coordinated -
>>> period sizes with proper sample feed timing from the application side.
>>
>> Right, but, but, in normal PCM of SOF, the ShallowBuffer is 4ms, we
>> cannot set constraint for the period size to be min and max 4ms, that
>> would xrun right away.
> 
> The xrun happens just because application do not push data to the next
> period in time, right?

This is what we have seen, yes, PW provides minimal - even less than
period size - amount of data and then things fail.
Interestingly PA on the same hardware appears to work fine. I guess it
is not that aggressive?

>> And this would just bring in the issue of the static DB that we have,
>> if it is big, then applications would break if they want smaller periods.
> 
> We need definitely a handshake (app<->kernel) for this, but the question
> is, if just going back to honor the period sizes properly from the user
> space applications is a right way to go or not. IMHO it may be just a
> clarification for the current mechanism.
> 
> In any case we need those extensions:
> 
> Kernel -> user space:
> 
> - give the initial transfer chunk size to user space
> - give the next (step) transfer chunk size to user space
> 
> The minimal requirement for the playback data at start would be:
> 
>     'init_chunk + (2 * step_chunk)'
> 
> Note that init_chunk may be zero (legacy PCI HW).

and step_chunk as well.

> In my proposal, step_chunk == period_size and init_chunk will be
> provided using additional value in hw_params (as count of initial periods).

I don't think this would work without breaking user space. Locking the
period size to be the size of a step_chunk?
In SOF the default PCM device have 1ms step_chunk and it can have
maximum of 256 periods (HDA BDLE constraint).

> User space -> kernel:
> 
> - notification that the code honors period sizes + initial periods and
> uses period size to ask kernel for requested latency (keeping the
> current behavior for older binaries; and to allow drivers to do optimal
> setup)
> - request for low hw_ptr granularity (to use e.g. deep buffers) -
> basically okay, we don't care, the "BATCH" transfer mode with whole
> periods is okay
> 
> Example (with the "honor period size" handshake activated):
> 
> 20ms latency goal, 4ms hw transfers, 12ms initial hw buffer
> 
> -> set period size to 10ms or less
>      # 2 periods must be filled to avoid an initial xrun
> -> set buffer size to 30ms or more
> <- get 8 periods x 4ms
> <- 3 initial periods (3*4ms = 12ms)
> 
> suggested initial fill = 12ms + 2 * 4ms ; 20ms total
> 
> The driver (constraints setup) should take account the initial periods
> in this case.

You cannot set a constraint after the hw_params has been set, that is
reverse of how things work, no?

Also the original issue that initiated the thread was that we had a
fixed 96ms hw transfer with 100ms FIFO coming as a fixed preset.
This caught application wanting to have 10/20/30/40/50ms of period size
on guard...

The issue is different from hardware/kernel and user space pow.
- hardware
[A] it has no buffering or really minimal (few frames to keep the bus fed)
[B] have fixed size FIFO with fixed sized bursting equal to the FIFO
size - relatively large FIFO
[C] have fixed size FIFO with different initial and runtime bursts
[D] have configurable FIFO and the the relation between initial and
runtime burst can fall into [B] or [C] depending on FIFO size, the FIFO
is scaling with the period size (to some limit in most cases)

- applications
[1] uses ALSA period as processing unit
[2] uses NO_PERIOD_WAKEUP and ignores period size as processing unit.

I have access to setups which falls into [C] currently, these are SOF
based systems and would like to support [D].

[B] and [C] must place constraint on minimum period size (and they do)
to forbid smaller period sizes then the FIFO to avoid xrun on start.
Both [1] and [2] type of application works fine: [1] always proved at
least 2x period data (which is bigger than the min_size), [2] is tricky,
but they also need to do the same (PW now does).
Applications are free to request as big period as they want, the min
size (which is a constraint based on the fixed FIFO size) will guide them.

Switching the driver to [D] is no issue for applications of type [1],
they provide 2x periods and they don't really if the DMA will jump.
The problem is with [2] type, they only know the minimum period size
which has nothing to how the hw_ptr will behave with bigger period sizes.
At the moment we are not exposing the PCM device which would do this, we
need something which scales for as much devices and configs as possible.

I think the two new parameter for init_chunk and step_chunk within
hw_params returned from driver to user space  should cover this well.

- driver sets min_period_size/time constraint as the lowest FIFO size
(already done)
- user space configures whatever period/buffer size it wants.
- driver sets the init/step chunks to the configuration that it ended up
with the setup, or continue to not set it.
- user space checks if the chunk config is 0 -> use period size min as a
guidance, if they are not 0, use the information to set up the safety
headroom.

>> BTW, should we still keep the dynamic DeepBuffer as a way from the
>> kernel to do things automatically? It will allow unaware ALSA
>> applications to gain this for free, but it needs information to be
>> given to clever ones (like PW) on how to deal with the jumps.
> 
> It's difficult to suggest something for this problem when the specific
> application does not give any hint to the kernel space about future API
> calls and expected use (if they will do rewind for example in future).
> We need properly document everything related to the transfers and let
> applications to choose between "expect data change soon" and "buffering"
> behaviour IMHO.
> 
> And definitely, we should not do optimizations related to single app -
> choose specific period size just because pipewire does not work
> correctly (speaking about the clarified API) - in the drivers.

Certainly!
We could have gone to the “We’ll handle it in a clever way.” path for
SOF, but that would be counter productive for everyone, including us ;)

-- 
Péter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
  2026-04-07 11:59                     ` Péter Ujfalusi
@ 2026-04-07 13:50                       ` Jaroslav Kysela
  0 siblings, 0 replies; 25+ messages in thread
From: Jaroslav Kysela @ 2026-04-07 13:50 UTC (permalink / raw)
  To: Péter Ujfalusi, Takashi Iwai
  Cc: Takashi Iwai, Mark Brown, Liam Girdwood, Linux-ALSA,
	linux-sound@vger.kernel.org, Kai Vehmanen, arun, wim.taymans

On 4/7/26 13:59, Péter Ujfalusi wrote:

>> We need definitely a handshake (app<->kernel) for this, but the question
>> is, if just going back to honor the period sizes properly from the user
>> space applications is a right way to go or not. IMHO it may be just a
>> clarification for the current mechanism.
>>
>> In any case we need those extensions:
>>
>> Kernel -> user space:
>>
>> - give the initial transfer chunk size to user space
>> - give the next (step) transfer chunk size to user space
>>
>> The minimal requirement for the playback data at start would be:
>>
>>      'init_chunk + (2 * step_chunk)'
>>
>> Note that init_chunk may be zero (legacy PCI HW).
> 
> and step_chunk as well.

step_chunk should start from one sample. But there are chunks (bursts) even 
for PCI devices, although very small.

>> In my proposal, step_chunk == period_size and init_chunk will be
>> provided using additional value in hw_params (as count of initial periods).
> 
> I don't think this would work without breaking user space. Locking the
> period size to be the size of a step_chunk?
> In SOF the default PCM device have 1ms step_chunk and it can have
> maximum of 256 periods (HDA BDLE constraint).

The drivers can use period size in multiples of base transfer (step) chunk.

>> User space -> kernel:
>>
>> - notification that the code honors period sizes + initial periods and
>> uses period size to ask kernel for requested latency (keeping the
>> current behavior for older binaries; and to allow drivers to do optimal
>> setup)
>> - request for low hw_ptr granularity (to use e.g. deep buffers) -
>> basically okay, we don't care, the "BATCH" transfer mode with whole
>> periods is okay
>>
>> Example (with the "honor period size" handshake activated):
>>
>> 20ms latency goal, 4ms hw transfers, 12ms initial hw buffer
>>
>> -> set period size to 10ms or less
>>       # 2 periods must be filled to avoid an initial xrun
>> -> set buffer size to 30ms or more
>> <- get 8 periods x 4ms
>> <- 3 initial periods (3*4ms = 12ms)
>>
>> suggested initial fill = 12ms + 2 * 4ms ; 20ms total
>>
>> The driver (constraints setup) should take account the initial periods
>> in this case.
> 
> You cannot set a constraint after the hw_params has been set, that is
> reverse of how things work, no?

I expect to design new constraint describing the "base step chunk" and 
"initial chunk" in samples. Thus the refine mechanism will allow to settle 
period size based on this.

> Also the original issue that initiated the thread was that we had a
> fixed 96ms hw transfer with 100ms FIFO coming as a fixed preset.
> This caught application wanting to have 10/20/30/40/50ms of period size
> on guard...

100ms init = 25 periods / 96ms step = 24 periods with period size 4ms. This 
means that we need

1) add step fill (in periods) like for init fill returned from driver to app
2) or the driver (if possible) should prefer that init chunk is in multiple of 
step chunk - it's hw related restriction or just buffer size judgement from 
the developer POV for SOF?

> The issue is different from hardware/kernel and user space pow.
> - hardware
> [A] it has no buffering or really minimal (few frames to keep the bus fed)
> [B] have fixed size FIFO with fixed sized bursting equal to the FIFO
> size - relatively large FIFO
> [C] have fixed size FIFO with different initial and runtime bursts
> [D] have configurable FIFO and the the relation between initial and
> runtime burst can fall into [B] or [C] depending on FIFO size, the FIFO
> is scaling with the period size (to some limit in most cases)
> 
> - applications
> [1] uses ALSA period as processing unit
> [2] uses NO_PERIOD_WAKEUP and ignores period size as processing unit.

I'm also trying to resolve the deep buffer issue. And we should not use 
NO_PERIOD_WAKEUP as hint that applications don't do better transfer 
granularity IMHO. The interface should be universal. This flag is just to not 
inform user space that the period was processed. Dot.

> I think the two new parameter for init_chunk and step_chunk within
> hw_params returned from driver to user space  should cover this well.

My idea was just to reuse the current mechanism to settle transfer chunks, so 
applications can give advice, how the PCM buffer will be used (based on the 
period size and the granularity flag). See my proposal. How you would like to 
handle this in an universal way?

				Jaroslav

-- 
Jaroslav Kysela <perex@perex.cz>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-04-07 13:50 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 13:34 (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA' Péter Ujfalusi
2026-03-23 14:54 ` Jaroslav Kysela
2026-03-23 16:16   ` Péter Ujfalusi
2026-03-24  8:58     ` Jaroslav Kysela
2026-03-24 10:51       ` Péter Ujfalusi
2026-03-24 13:25         ` Péter Ujfalusi
2026-03-24 15:48         ` Jaroslav Kysela
2026-03-25 13:28           ` Péter Ujfalusi
2026-03-25 14:08             ` Jaroslav Kysela
2026-03-26 12:04               ` Péter Ujfalusi
2026-03-24  7:12 ` Péter Ujfalusi
2026-03-30 14:27 ` Takashi Iwai
2026-03-30 15:15   ` Péter Ujfalusi
2026-03-30 16:39     ` Takashi Iwai
2026-03-31  6:00       ` Péter Ujfalusi
2026-03-31  6:36         ` Takashi Iwai
2026-03-31  9:29           ` Jaroslav Kysela
2026-03-31 10:42             ` Kai Vehmanen
2026-03-31 10:56             ` Péter Ujfalusi
2026-03-31 12:00               ` Jaroslav Kysela
2026-03-31 14:09                 ` Péter Ujfalusi
2026-04-02 12:01                   ` Jaroslav Kysela
2026-04-07 11:59                     ` Péter Ujfalusi
2026-04-07 13:50                       ` Jaroslav Kysela
2026-03-31 11:19           ` Péter Ujfalusi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox