Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'

public inbox for linux-sound@vger.kernel.org
 help / color / mirror / Atom feed

From: "Péter Ujfalusi" <peter.ujfalusi@linux.intel.com>
To: Jaroslav Kysela <perex@perex.cz>, Takashi Iwai <tiwai@suse.de>
Cc: Takashi Iwai <tiwai@suse.com>, Mark Brown <broonie@kernel.org>,
	Liam Girdwood <liam.r.girdwood@linux.intel.com>,
	Linux-ALSA <alsa-devel@alsa-project.org>,
	"linux-sound@vger.kernel.org" <linux-sound@vger.kernel.org>,
	Kai Vehmanen <kai.vehmanen@linux.intel.com>,
	arun@asymptotic.io, wim.taymans@gmail.com
Subject: Re: (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA'
Date: Tue, 7 Apr 2026 14:59:17 +0300	[thread overview]
Message-ID: <b97ae6b9-ef72-4dbf-bf29-9092ec3919d8@linux.intel.com> (raw)
In-Reply-To: <978bdc67-b91e-437a-bb8a-b609a4fef6d1@perex.cz>

On 02/04/2026 15:01, Jaroslav Kysela wrote:
>> But with a static-DB and locking t he period size to the DB size would
>> prevent bigger period size use and even with bigger period size, if
>> the DB size is 'small' then the power efficiency is out of the window.
>> But, if we increase the static-DB size then we limit the minimum
>> period size to scale down.
> 
> It's safe to allow bigger periods from the driver which are multiple of
> the DB size.

But that is the thing, this is not always true for every configuration.
4ms DB with 1ms runtime steps for example.

> But apps should be aware about the information which area (e.g. period)
> is current so they should avoid to work in this area, if system task
> scheduling is probably going to interrupt this realtime data feed.
> 
> Basically, originally we took the periods as the base transfer block in
> API design. The period size and the filled count of periods (playback)
> gave the latency (of course without additional FIFO in hw path - just
> for the CPU <-> HW path).

But then came the NO_PERIOD_WAKEUP (if I recall Android and Pipewire
were the ones to push for this and use it).
This concept kind of superseded the period size as scheduling unit
paradigm, but still it was expecting the DMA to move in continuous and
small steps - preferably in max frame sizes.

>>> We may have those configurations (* = future when driver tells the
>>> initial count of queued periods):
>>>
>>
>>>   period | period count | init_periods  | minimal data latency |
>>> --------+--------------+---------------+----------------------+---------
>>>   4ms    | 4            | unsettled (1) | 4ms                  |
>>> realtime
>>> *4ms    | 200          | 25            | 100ms                | semi-
>>> realtime
>>>   1000ms | 2            | 1             | 1000ms               | pwr
>>> efficient
>>
>> SOF specific, but we limit the period size to be min 8ms to unbrake
>> pipewire to avoid xzrun on start and xrun handling and to hint
>> application what is safe.
> 
> And it's the whole problem. You are trying to solve a problem caused
> with the situation that applications do not know about those constraints.

What we do is not unique among the systems that do have jumpy-DMA, they
all constrain the min period size.
The drivers forbid user space to use something which will fail on start.

The difference here is that we want to have a solution which could cover
all devices and systems.
Afaik all 'solutions' atm are product specific or the device is special
cased. We could do this with PW for SOF and call it a day ;)
That does not  scale...

The Nokia n9 had a PCM device with a codec which could suck up 128ms of
audio and play it from it's own memory (tlv320dac33), it has kcontrol
for user space to set the FIFO thresholds and modes and a modified
pulseaudio to understand this.

> But current PW behaviour is based on "assumption" that the data are
> processed in small chunks from the PCM buffer. And this assumption is
> not true for SOF while it worked perfectly for legacy HDA and all simple
> PCI (even ISA :-)) sound hardware. Also the initial transfer at the
> stream start is different. And I think that USB / FireWire serial buses
> are in similar situation. I saw workaround (special settings) in PW for
> those cases, too.

Yes, USB/FW audio sets the SNDRV_PCM_INFO_BLOCK_TRANSFER and PW looks at
the devices and if it see that BLOCK_TRANSFER is set and  the device is
USB/FW then it sort of ignores it, uses custom headroom (I think doubles
the period size for it) and ignores the interrupts.

We could stamp SOF as BLOCK_TRANSFER device (which it is not) and extend
the special casing from USB/FW to include SOF, could work. But then you
will add a new QC device or AMD device or.

>>> The timer based queuing should just help to feed data more frequently,
>>> but it does not mean that applications should not set the period size
>>> based on the requirements. Ideally, everything should be coordinated -
>>> period sizes with proper sample feed timing from the application side.
>>
>> Right, but, but, in normal PCM of SOF, the ShallowBuffer is 4ms, we
>> cannot set constraint for the period size to be min and max 4ms, that
>> would xrun right away.
> 
> The xrun happens just because application do not push data to the next
> period in time, right?

This is what we have seen, yes, PW provides minimal - even less than
period size - amount of data and then things fail.
Interestingly PA on the same hardware appears to work fine. I guess it
is not that aggressive?

>> And this would just bring in the issue of the static DB that we have,
>> if it is big, then applications would break if they want smaller periods.
> 
> We need definitely a handshake (app<->kernel) for this, but the question
> is, if just going back to honor the period sizes properly from the user
> space applications is a right way to go or not. IMHO it may be just a
> clarification for the current mechanism.
> 
> In any case we need those extensions:
> 
> Kernel -> user space:
> 
> - give the initial transfer chunk size to user space
> - give the next (step) transfer chunk size to user space
> 
> The minimal requirement for the playback data at start would be:
> 
>     'init_chunk + (2 * step_chunk)'
> 
> Note that init_chunk may be zero (legacy PCI HW).

and step_chunk as well.

> In my proposal, step_chunk == period_size and init_chunk will be
> provided using additional value in hw_params (as count of initial periods).

I don't think this would work without breaking user space. Locking the
period size to be the size of a step_chunk?
In SOF the default PCM device have 1ms step_chunk and it can have
maximum of 256 periods (HDA BDLE constraint).

> User space -> kernel:
> 
> - notification that the code honors period sizes + initial periods and
> uses period size to ask kernel for requested latency (keeping the
> current behavior for older binaries; and to allow drivers to do optimal
> setup)
> - request for low hw_ptr granularity (to use e.g. deep buffers) -
> basically okay, we don't care, the "BATCH" transfer mode with whole
> periods is okay
> 
> Example (with the "honor period size" handshake activated):
> 
> 20ms latency goal, 4ms hw transfers, 12ms initial hw buffer
> 
> -> set period size to 10ms or less
>      # 2 periods must be filled to avoid an initial xrun
> -> set buffer size to 30ms or more
> <- get 8 periods x 4ms
> <- 3 initial periods (3*4ms = 12ms)
> 
> suggested initial fill = 12ms + 2 * 4ms ; 20ms total
> 
> The driver (constraints setup) should take account the initial periods
> in this case.

You cannot set a constraint after the hw_params has been set, that is
reverse of how things work, no?

Also the original issue that initiated the thread was that we had a
fixed 96ms hw transfer with 100ms FIFO coming as a fixed preset.
This caught application wanting to have 10/20/30/40/50ms of period size
on guard...

The issue is different from hardware/kernel and user space pow.
- hardware
[A] it has no buffering or really minimal (few frames to keep the bus fed)
[B] have fixed size FIFO with fixed sized bursting equal to the FIFO
size - relatively large FIFO
[C] have fixed size FIFO with different initial and runtime bursts
[D] have configurable FIFO and the the relation between initial and
runtime burst can fall into [B] or [C] depending on FIFO size, the FIFO
is scaling with the period size (to some limit in most cases)

- applications
[1] uses ALSA period as processing unit
[2] uses NO_PERIOD_WAKEUP and ignores period size as processing unit.

I have access to setups which falls into [C] currently, these are SOF
based systems and would like to support [D].

[B] and [C] must place constraint on minimum period size (and they do)
to forbid smaller period sizes then the FIFO to avoid xrun on start.
Both [1] and [2] type of application works fine: [1] always proved at
least 2x period data (which is bigger than the min_size), [2] is tricky,
but they also need to do the same (PW now does).
Applications are free to request as big period as they want, the min
size (which is a constraint based on the fixed FIFO size) will guide them.

Switching the driver to [D] is no issue for applications of type [1],
they provide 2x periods and they don't really if the DMA will jump.
The problem is with [2] type, they only know the minimum period size
which has nothing to how the hw_ptr will behave with bigger period sizes.
At the moment we are not exposing the PCM device which would do this, we
need something which scales for as much devices and configs as possible.

I think the two new parameter for init_chunk and step_chunk within
hw_params returned from driver to user space  should cover this well.

- driver sets min_period_size/time constraint as the lowest FIFO size
(already done)
- user space configures whatever period/buffer size it wants.
- driver sets the init/step chunks to the configuration that it ended up
with the setup, or continue to not set it.
- user space checks if the chunk config is 0 -> use period size min as a
guidance, if they are not 0, use the information to set up the safety
headroom.

>> BTW, should we still keep the dynamic DeepBuffer as a way from the
>> kernel to do things automatically? It will allow unaware ALSA
>> applications to gain this for free, but it needs information to be
>> given to clever ones (like PW) on how to deal with the jumps.
> 
> It's difficult to suggest something for this problem when the specific
> application does not give any hint to the kernel space about future API
> calls and expected use (if they will do rewind for example in future).
> We need properly document everything related to the transfers and let
> applications to choose between "expect data change soon" and "buffering"
> behaviour IMHO.
> 
> And definitely, we should not do optimizations related to single app -
> choose specific period size just because pipewire does not work
> correctly (speaking about the clarified API) - in the drivers.

Certainly!
We could have gone to the “We’ll handle it in a clever way.” path for
SOF, but that would be counter productive for everyone, including us ;)

-- 
Péter

next prev parent reply	other threads:[~2026-04-07 11:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 13:34 (re)use and (re)definition of snd_pcm_hw_params->fifo_size for 'jumpy DMA' Péter Ujfalusi
2026-03-23 14:54 ` Jaroslav Kysela
2026-03-23 16:16   ` Péter Ujfalusi
2026-03-24  8:58     ` Jaroslav Kysela
2026-03-24 10:51       ` Péter Ujfalusi
2026-03-24 13:25         ` Péter Ujfalusi
2026-03-24 15:48         ` Jaroslav Kysela
2026-03-25 13:28           ` Péter Ujfalusi
2026-03-25 14:08             ` Jaroslav Kysela
2026-03-26 12:04               ` Péter Ujfalusi
2026-03-24  7:12 ` Péter Ujfalusi
2026-03-30 14:27 ` Takashi Iwai
2026-03-30 15:15   ` Péter Ujfalusi
2026-03-30 16:39     ` Takashi Iwai
2026-03-31  6:00       ` Péter Ujfalusi
2026-03-31  6:36         ` Takashi Iwai
2026-03-31  9:29           ` Jaroslav Kysela
2026-03-31 10:42             ` Kai Vehmanen
2026-03-31 10:56             ` Péter Ujfalusi
2026-03-31 12:00               ` Jaroslav Kysela
2026-03-31 14:09                 ` Péter Ujfalusi
2026-04-02 12:01                   ` Jaroslav Kysela
2026-04-07 11:59                     ` Péter Ujfalusi [this message]
2026-04-07 13:50                       ` Jaroslav Kysela
2026-03-31 11:19           ` Péter Ujfalusi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b97ae6b9-ef72-4dbf-bf29-9092ec3919d8@linux.intel.com \
    --to=peter.ujfalusi@linux.intel.com \
    --cc=alsa-devel@alsa-project.org \
    --cc=arun@asymptotic.io \
    --cc=broonie@kernel.org \
    --cc=kai.vehmanen@linux.intel.com \
    --cc=liam.r.girdwood@linux.intel.com \
    --cc=linux-sound@vger.kernel.org \
    --cc=perex@perex.cz \
    --cc=tiwai@suse.com \
    --cc=tiwai@suse.de \
    --cc=wim.taymans@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox