Suspend/resume Issue on pcm_dmix.c in alsa-lib

Alsa-Devel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Suspend/resume Issue on pcm_dmix.c in alsa-lib
@ 2024-08-27  7:06 Chancel Liu
  2024-08-27  9:54 ` Takashi Iwai
  2024-08-27 10:49 ` Takashi Iwai
  0 siblings, 2 replies; 13+ messages in thread
From: Chancel Liu @ 2024-08-27  7:06 UTC (permalink / raw)
  To: alsa-devel@alsa-project.org, Takashi Iwai, Jaroslav Kysela; +Cc: S.J. Wang

Hi Takashi Iwai, Jaroslav Kysela

We found an issue on dmix in alsa-lib when do suspend and resume. It can be easily reproduced by following steps:

1. Run two dmix clients in parallel. (Only one client doesn't has such issue)
~# aplay xxx1.wav &
~# aplay xxx2.wav &
Here I attach the asound.conf we're using.
~# cat /etc/asound.conf
defaults.pcm.rate_converter "linear"

pcm.dmix_44100{
    type dmix
    ipc_key 5678293
    ipc_key_add_uid yes
    slave{
        pcm "hw:0,0"
        period_time 40000
        format S16_LE
        rate 44100
        }
}

pcm.asymed{
    type asym
    playback.pcm "dmix_44100"
    capture.pcm "dsnoop_44100"
}

pcm.!default{
    type plug
    route_policy "average"
    slave.pcm "asymed"
}

2. Let linux enter into suspend and then resume(Repeat this step if not reproduced)
3. After resume, aplay will get stuck in snd_pcm_wait(). The GDB shows:
(gdb) bt
#0  0x0000fffff7da9264 in __GI___poll (fds=fds@entry=0xfffffffff480, nfds=nfds@entry=1, timeout=timeout@entry=240)
    at /usr/src/debug/glibc/2.39+git/sysdeps/unix/sysv/linux/poll.c:41
#1  0x0000fffff7edf468 in poll (__timeout=240, __nfds=1, __fds=0xfffffffff480)
#2  snd1_pcm_wait_nocheck (pcm=pcm@entry=0xaaaaaaad2cb0, timeout=240, timeout@entry=-10001) at pcm.c:2993
#3  0x0000fffff7ee54a0 in snd1_pcm_write_areas (pcm=pcm@entry=0xaaaaaaad2cb0, areas=areas@entry=0xfffffffff560, offset=<optimized out>, offset@entry=0, size=<optimized out>,
    size@entry=1768, func=func@entry=0xfffff7ef5190 <snd_pcm_plugin_write_areas>) at pcm.c:7699
#4  0x0000fffff7ef5020 in snd_pcm_plugin_writei (pcm=0xaaaaaaad2cb0, buffer=<optimized out>, size=1768) at pcm_plugin.c:354

It seems that sometimes after suspend and resume there's no available space for data written into buffer. Then aplay keeps stuck in snd_pcm_wait(). I checked the hw_ptr of dmix and found that hw_ptr is always 0 after resume.
I don't have a solution now so I turn to you for help. The version of alsa-lib is v1.2.11. Could you please help check it?

Regards,
Chancel Liu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-08-27  7:06 Suspend/resume Issue on pcm_dmix.c in alsa-lib Chancel Liu
@ 2024-08-27  9:54 ` Takashi Iwai
  2024-08-27 10:49 ` Takashi Iwai
  1 sibling, 0 replies; 13+ messages in thread
From: Takashi Iwai @ 2024-08-27  9:54 UTC (permalink / raw)
  To: Chancel Liu; +Cc: alsa-devel@alsa-project.org, Jaroslav Kysela, S.J. Wang

On Tue, 27 Aug 2024 09:06:39 +0200,
Chancel Liu wrote:
> 
> 
> Hi Takashi Iwai, Jaroslav Kysela
> 
> We found an issue on dmix in alsa-lib when do suspend and resume. It can be
> easily reproduced by following steps:
> 
> 1. Run two dmix clients in parallel. (Only one client doesn’t has such issue)
> 
> ~# aplay xxx1.wav &
> 
> ~# aplay xxx2.wav &
> 
> Here I attach the asound.conf we're using.
> 
> ~# cat /etc/asound.conf
> 
> defaults.pcm.rate_converter "linear"
> 
> pcm.dmix_44100{
> 
>     type dmix
> 
>     ipc_key 5678293
> 
>     ipc_key_add_uid yes
> 
>     slave{
> 
>         pcm "hw:0,0"
> 
>         period_time 40000
> 
>         format S16_LE
> 
>         rate 44100
> 
>         }
> 
> }
> 
> pcm.asymed{
> 
>     type asym
> 
>     playback.pcm "dmix_44100"
> 
>     capture.pcm "dsnoop_44100"
> 
> }
> 
> pcm.!default{
> 
>     type plug
> 
>     route_policy "average"
> 
>     slave.pcm "asymed"
> 
> }
> 
> 2. Let linux enter into suspend and then resume(Repeat this step if not
> reproduced)
> 
> 3. After resume, aplay will get stuck in snd_pcm_wait(). The GDB shows:
> 
> (gdb) bt
> 
> #0  0x0000fffff7da9264 in __GI___poll (fds=fds@entry=0xfffffffff480, nfds=
> nfds@entry=1, timeout=timeout@entry=240)
> 
>     at /usr/src/debug/glibc/2.39+git/sysdeps/unix/sysv/linux/poll.c:41
> 
> #1  0x0000fffff7edf468 in poll (__timeout=240, __nfds=1, __fds=0xfffffffff480)
> 
> #2  snd1_pcm_wait_nocheck (pcm=pcm@entry=0xaaaaaaad2cb0, timeout=240,
> timeout@entry=-10001) at pcm.c:2993
> 
> #3  0x0000fffff7ee54a0 in snd1_pcm_write_areas (pcm=pcm@entry=0xaaaaaaad2cb0,
> areas=areas@entry=0xfffffffff560, offset=<optimized out>, offset@entry=0, size
> =<optimized out>,
> 
>     size@entry=1768, func=func@entry=0xfffff7ef5190
> <snd_pcm_plugin_write_areas>) at pcm.c:7699
> 
> #4  0x0000fffff7ef5020 in snd_pcm_plugin_writei (pcm=0xaaaaaaad2cb0, buffer=
> <optimized out>, size=1768) at pcm_plugin.c:354
> 
> It seems that sometimes after suspend and resume there's no available space
> for data written into buffer. Then aplay keeps stuck in snd_pcm_wait(). I
> checked the hw_ptr of dmix and found that hw_ptr is always 0 after resume.
> 
> I don't have a solution now so I turn to you for help. The version of alsa-lib
> is v1.2.11. Could you please help check it?

I tried your setup but I couldn't reproduce the issue locally with my
laptop and HD-audio device.  Possibly depending on the kernel driver?

In the case of dmix, it's a poll() against the PCM slave timer.  So
it doesn't take care of suspend/resume state unlike the real PCM
device.  OTOH, the timer device should send notification events at
suspend/resume, and it should trigger the poll wakeup, too.

Does poll() return after the suspend/resume once but falls into a loop
due to revents being unset?  Or it's stuck and never returns at
suspend/resume?


thanks,

Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-08-27  7:06 Suspend/resume Issue on pcm_dmix.c in alsa-lib Chancel Liu
  2024-08-27  9:54 ` Takashi Iwai
@ 2024-08-27 10:49 ` Takashi Iwai
  2024-09-04  9:07   ` Chancel Liu
  1 sibling, 1 reply; 13+ messages in thread
From: Takashi Iwai @ 2024-08-27 10:49 UTC (permalink / raw)
  To: Chancel Liu; +Cc: alsa-devel@alsa-project.org, Jaroslav Kysela, S.J. Wang

[ it seems that my previous post didn't go out properly, so resent;
  if you've seen already the same, please disregard ]

On Tue, 27 Aug 2024 09:06:39 +0200,
Chancel Liu wrote:
> 
> 
> Hi Takashi Iwai, Jaroslav Kysela
> 
> We found an issue on dmix in alsa-lib when do suspend and resume. It can be
> easily reproduced by following steps:
> 
> 1. Run two dmix clients in parallel. (Only one client doesn’t has such issue)
> 
> ~# aplay xxx1.wav &
> 
> ~# aplay xxx2.wav &
> 
> Here I attach the asound.conf we're using.
> 
> ~# cat /etc/asound.conf
> 
> defaults.pcm.rate_converter "linear"
> 
> pcm.dmix_44100{
> 
>     type dmix
> 
>     ipc_key 5678293
> 
>     ipc_key_add_uid yes
> 
>     slave{
> 
>         pcm "hw:0,0"
> 
>         period_time 40000
> 
>         format S16_LE
> 
>         rate 44100
> 
>         }
> 
> }
> 
> pcm.asymed{
> 
>     type asym
> 
>     playback.pcm "dmix_44100"
> 
>     capture.pcm "dsnoop_44100"
> 
> }
> 
> pcm.!default{
> 
>     type plug
> 
>     route_policy "average"
> 
>     slave.pcm "asymed"
> 
> }
> 
> 2. Let linux enter into suspend and then resume(Repeat this step if not
> reproduced)
> 
> 3. After resume, aplay will get stuck in snd_pcm_wait(). The GDB shows:
> 
> (gdb) bt
> 
> #0  0x0000fffff7da9264 in __GI___poll (fds=fds@entry=0xfffffffff480, nfds=
> nfds@entry=1, timeout=timeout@entry=240)
> 
>     at /usr/src/debug/glibc/2.39+git/sysdeps/unix/sysv/linux/poll.c:41
> 
> #1  0x0000fffff7edf468 in poll (__timeout=240, __nfds=1, __fds=0xfffffffff480)
> 
> #2  snd1_pcm_wait_nocheck (pcm=pcm@entry=0xaaaaaaad2cb0, timeout=240,
> timeout@entry=-10001) at pcm.c:2993
> 
> #3  0x0000fffff7ee54a0 in snd1_pcm_write_areas (pcm=pcm@entry=0xaaaaaaad2cb0,
> areas=areas@entry=0xfffffffff560, offset=<optimized out>, offset@entry=0, size
> =<optimized out>,
> 
>     size@entry=1768, func=func@entry=0xfffff7ef5190
> <snd_pcm_plugin_write_areas>) at pcm.c:7699
> 
> #4  0x0000fffff7ef5020 in snd_pcm_plugin_writei (pcm=0xaaaaaaad2cb0, buffer=
> <optimized out>, size=1768) at pcm_plugin.c:354
> 
> It seems that sometimes after suspend and resume there's no available space
> for data written into buffer. Then aplay keeps stuck in snd_pcm_wait(). I
> checked the hw_ptr of dmix and found that hw_ptr is always 0 after resume.
> 
> I don't have a solution now so I turn to you for help. The version of alsa-lib
> is v1.2.11. Could you please help check it?

I tried your setup but I couldn't reproduce the issue locally with my
laptop and HD-audio device.  Possibly depending on the kernel driver?

In the case of dmix, it's a poll() against the PCM slave timer.  So
it doesn't take care of suspend/resume state unlike the real PCM
device.  OTOH, the timer device should send notification events at
suspend/resume, and it should trigger the poll wakeup, too.

Does poll() return after the suspend/resume once but falls into a loop
due to revents being unset?  Or it's stuck and never returns at
suspend/resume?


thanks,

Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-08-27 10:49 ` Takashi Iwai
@ 2024-09-04  9:07   ` Chancel Liu
  2024-09-04  9:29     ` Jaroslav Kysela
  2024-09-04  9:57     ` Takashi Iwai
  0 siblings, 2 replies; 13+ messages in thread
From: Chancel Liu @ 2024-09-04  9:07 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel@alsa-project.org, Jaroslav Kysela, S.J. Wang

Hi Takashi,

Thanks for your reply and suggestions. Finally we have found the root cause. 
Seems it's related to both drivers and alsa-lib.

When two dmix clients run in parallel we get two direct dmix instances.
1st dmix instance:
snd_pcm_dmix_open()
	snd_pcm_direct_initialize_slave()
		save_slave_setting()
Since the driver we are using has SND_PCM_INFO_RESUME flag, dmix->spcm->info
has this flag. Then this flag is cleared in dmix->shmptr->s.info.
		
2nd dmix instance:
snd_pcm_dmix_open()
	snd_pcm_direct_open_secondary_client()
		copy_slave_setting()
2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn' has this
flag.

If 1st dmix instance resumes firstly it should implement recovery of slave pcm
in snd_pcm_direct_slave_recover(). Because 1st dmix->spcm->info has
SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be called correctly to
resume slave pcm.

However if 2nd dmix instance resumes firstly, snd_pcm_resume(direct->spcm) will
not be called because it's spcm->info doesn't has SND_PCM_INFO_RESUME flag. The
1st dmix instance assumes someone else already did recovery so
snd_pcm_resume(direct->spcm) won't be called neither. In result the slave pcm
fails to resume.

SND_PCM_INFO_RESUME flag has impact on the flow of dmix resume. In my opinion
the first resumed dmix instance should make sure slave pcm can be recovered
properly no matter it's the first opened instance or secondary opened instance.
Do you know why the secondary opened instance clear the SND_PCM_INFO_RESUME
flag? Can we do the following modification?

diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c
@@ -1183,8 +1226,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix, snd_pcm_t *spcm)
        COPY_SLAVE(buffer_time);
        COPY_SLAVE(sample_bits);
        COPY_SLAVE(frame_bits);
-
-       dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;

Regards, 
Chancel Liu

> [ it seems that my previous post didn't go out properly, so resent;
>   if you've seen already the same, please disregard ]
> 
> On Tue, 27 Aug 2024 09:06:39 +0200,
> Chancel Liu wrote:
> >
> >
> > Hi Takashi Iwai, Jaroslav Kysela
> >
> > We found an issue on dmix in alsa-lib when do suspend and resume. It
> > can be easily reproduced by following steps:
> >
> > 1. Run two dmix clients in parallel. (Only one client doesnʼt has such
> > issue)
> >
> > ~# aplay xxx1.wav &
> >
> > ~# aplay xxx2.wav &
> >
> > Here I attach the asound.conf we're using.
> >
> > ~# cat /etc/asound.conf
> >
> > defaults.pcm.rate_converter "linear"
> >
> > pcm.dmix_44100{
> >
> >     type dmix
> >
> >     ipc_key 5678293
> >
> >     ipc_key_add_uid yes
> >
> >     slave{
> >
> >         pcm "hw:0,0"
> >
> >         period_time 40000
> >
> >         format S16_LE
> >
> >         rate 44100
> >
> >         }
> >
> > }
> >
> > pcm.asymed{
> >
> >     type asym
> >
> >     playback.pcm "dmix_44100"
> >
> >     capture.pcm "dsnoop_44100"
> >
> > }
> >
> > pcm.!default{
> >
> >     type plug
> >
> >     route_policy "average"
> >
> >     slave.pcm "asymed"
> >
> > }
> >
> > 2. Let linux enter into suspend and then resume(Repeat this step if
> > not
> > reproduced)
> >
> > 3. After resume, aplay will get stuck in snd_pcm_wait(). The GDB shows:
> >
> > (gdb) bt
> >
> > #0  0x0000fffff7da9264 in __GI___poll (fds=fds@entry=0xfffffffff480,
> > nfds= nfds@entry=1, timeout=timeout@entry=240)
> >
> >     at /usr/src/debug/glibc/2.39+git/sysdeps/unix/sysv/linux/poll.c:41
> >
> > #1  0x0000fffff7edf468 in poll (__timeout=240, __nfds=1,
> > __fds=0xfffffffff480)
> >
> > #2  snd1_pcm_wait_nocheck (pcm=pcm@entry=0xaaaaaaad2cb0,
> timeout=240,
> > timeout@entry=-10001) at pcm.c:2993
> >
> > #3  0x0000fffff7ee54a0 in snd1_pcm_write_areas
> > (pcm=pcm@entry=0xaaaaaaad2cb0, areas=areas@entry=0xfffffffff560,
> > offset=<optimized out>, offset@entry=0, size =<optimized out>,
> >
> >     size@entry=1768, func=func@entry=0xfffff7ef5190
> > <snd_pcm_plugin_write_areas>) at pcm.c:7699
> >
> > #4  0x0000fffff7ef5020 in snd_pcm_plugin_writei (pcm=0xaaaaaaad2cb0,
> > buffer= <optimized out>, size=1768) at pcm_plugin.c:354
> >
> > It seems that sometimes after suspend and resume there's no available
> > space for data written into buffer. Then aplay keeps stuck in
> > snd_pcm_wait(). I checked the hw_ptr of dmix and found that hw_ptr is
> always 0 after resume.
> >
> > I don't have a solution now so I turn to you for help. The version of
> > alsa-lib is v1.2.11. Could you please help check it?
> 
> I tried your setup but I couldn't reproduce the issue locally with my laptop and
> HD-audio device.  Possibly depending on the kernel driver?
> 
> In the case of dmix, it's a poll() against the PCM slave timer.  So it doesn't take
> care of suspend/resume state unlike the real PCM device.  OTOH, the timer
> device should send notification events at suspend/resume, and it should trigger
> the poll wakeup, too.
> 
> Does poll() return after the suspend/resume once but falls into a loop due to
> revents being unset?  Or it's stuck and never returns at suspend/resume?
> 
> 
> thanks,
> 
> Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-09-04  9:07   ` Chancel Liu
@ 2024-09-04  9:29     ` Jaroslav Kysela
  2024-09-04 10:04       ` Takashi Iwai
  2024-09-04  9:57     ` Takashi Iwai
  1 sibling, 1 reply; 13+ messages in thread
From: Jaroslav Kysela @ 2024-09-04  9:29 UTC (permalink / raw)
  To: Chancel Liu, Takashi Iwai; +Cc: alsa-devel@alsa-project.org, S.J. Wang

On 04. 09. 24 11:07, Chancel Liu wrote:
> Hi Takashi,
> 
> Thanks for your reply and suggestions. Finally we have found the root cause.
> Seems it's related to both drivers and alsa-lib.
> 
> When two dmix clients run in parallel we get two direct dmix instances.
> 1st dmix instance:
> snd_pcm_dmix_open()
> 	snd_pcm_direct_initialize_slave()
> 		save_slave_setting()
> Since the driver we are using has SND_PCM_INFO_RESUME flag, dmix->spcm->info
> has this flag. Then this flag is cleared in dmix->shmptr->s.info.
> 		
> 2nd dmix instance:
> snd_pcm_dmix_open()
> 	snd_pcm_direct_open_secondary_client()
> 		copy_slave_setting()
> 2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn' has this
> flag.
> 
> If 1st dmix instance resumes firstly it should implement recovery of slave pcm
> in snd_pcm_direct_slave_recover(). Because 1st dmix->spcm->info has
> SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be called correctly to
> resume slave pcm.
> 
> However if 2nd dmix instance resumes firstly, snd_pcm_resume(direct->spcm) will
> not be called because it's spcm->info doesn't has SND_PCM_INFO_RESUME flag. The
> 1st dmix instance assumes someone else already did recovery so
> snd_pcm_resume(direct->spcm) won't be called neither. In result the slave pcm
> fails to resume.

The snd_pcm_direct_slave_recover() function should be called for both dmix 
instances. It calls snd_pcm_prepare() for the "driver" PCM, so the driver 
should recover from suspend in this case, too.

See the "some buggy drivers" comment in snd_pcm_direct_slave_recover(). It 
looks like a driver issue, the "resume" flag mangling is just a workaround.

> SND_PCM_INFO_RESUME flag has impact on the flow of dmix resume. In my opinion
> the first resumed dmix instance should make sure slave pcm can be recovered
> properly no matter it's the first opened instance or secondary opened instance.
> Do you know why the secondary opened instance clear the SND_PCM_INFO_RESUME
> flag? Can we do the following modification?
> 
> diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c
> @@ -1183,8 +1226,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix, snd_pcm_t *spcm)
>          COPY_SLAVE(buffer_time);
>          COPY_SLAVE(sample_bits);
>          COPY_SLAVE(frame_bits);
> -
> -       dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;

Another option is to fix the buggy drivers and remove the workround (or make 
it configurable) from alsa-lib (revert commit 
6d1d620eadf32c6d963468ce56ff52cc3a2f32e2).

						Jaroslav

-- 
Jaroslav Kysela <perex@perex.cz>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-09-04  9:07   ` Chancel Liu
  2024-09-04  9:29     ` Jaroslav Kysela
@ 2024-09-04  9:57     ` Takashi Iwai
  2024-09-05  7:44       ` Chancel Liu
  1 sibling, 1 reply; 13+ messages in thread
From: Takashi Iwai @ 2024-09-04  9:57 UTC (permalink / raw)
  To: Chancel Liu; +Cc: alsa-devel@alsa-project.org, Jaroslav Kysela, S.J. Wang

On Wed, 04 Sep 2024 11:07:30 +0200,
Chancel Liu wrote:
> 
> Hi Takashi,
> 
> Thanks for your reply and suggestions. Finally we have found the root cause. 
> Seems it's related to both drivers and alsa-lib.
> 
> When two dmix clients run in parallel we get two direct dmix instances.
> 1st dmix instance:
> snd_pcm_dmix_open()
> 	snd_pcm_direct_initialize_slave()
> 		save_slave_setting()
> Since the driver we are using has SND_PCM_INFO_RESUME flag, dmix->spcm->info
> has this flag. Then this flag is cleared in dmix->shmptr->s.info.
> 		
> 2nd dmix instance:
> snd_pcm_dmix_open()
> 	snd_pcm_direct_open_secondary_client()
> 		copy_slave_setting()
> 2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn' has this
> flag.
> 
> If 1st dmix instance resumes firstly it should implement recovery of slave pcm
> in snd_pcm_direct_slave_recover(). Because 1st dmix->spcm->info has
> SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be called correctly to
> resume slave pcm.

... and immediately stop the stream, then prepare and restart as a
usual restart.

> However if 2nd dmix instance resumes firstly, snd_pcm_resume(direct->spcm) will
> not be called because it's spcm->info doesn't has SND_PCM_INFO_RESUME flag. The
> 1st dmix instance assumes someone else already did recovery so
> snd_pcm_resume(direct->spcm) won't be called neither. In result the slave pcm
> fails to resume.

Something wrong happening here, then.

In dmix, there is no hardware resume at all, but it's always a restart
of the stream.  The call of snd_pcm_resume() is only temporarily for
inconsistencies that can be a problem on some drivers (IIRC dmaengine
stuff).  That said, dmix does a kind of fake resume, stops and
restarts the stream cleanly on the first instance.  On the second
instance, it's already recovered, hence it bails out.

If poll() hangs on the second instance, there can be some other
problem.  Maybe the resume -> stop -> restart sequence doesn't work
with your driver well?

> SND_PCM_INFO_RESUME flag has impact on the flow of dmix resume. In my opinion
> the first resumed dmix instance should make sure slave pcm can be recovered
> properly no matter it's the first opened instance or secondary opened instance
.

The snd_pcm_resume() gets called no matter which instance, just the
first one who tries to recover the suspended state.  (And it's called
internally at updating the various state, not necessarily an explicit
recovery call.)

> Do you know why the secondary opened instance clear the SND_PCM_INFO_RESUME
> flag? Can we do the following modification?
> 
> diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c
> @@ -1183,8 +1226,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix, snd_pcm_t *spcm)
>         COPY_SLAVE(buffer_time);
>         COPY_SLAVE(sample_bits);
>         COPY_SLAVE(frame_bits);
> -
> -       dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;

I don't think so.  The clearance of the RESUME flag here is correct.
dmix doesn't support the hardware resume feature.  It does its own.
(And this flag is merely a info for apps, which isn't really evaluated
except for the code in dmix workaround there.)


Takashi

> 
> Regards, 
> Chancel Liu
> 
> > [ it seems that my previous post didn't go out properly, so resent;
> >   if you've seen already the same, please disregard ]
> > 
> > On Tue, 27 Aug 2024 09:06:39 +0200,
> > Chancel Liu wrote:
> > >
> > >
> > > Hi Takashi Iwai, Jaroslav Kysela
> > >
> > > We found an issue on dmix in alsa-lib when do suspend and resume. It
> > > can be easily reproduced by following steps:
> > >
> > > 1. Run two dmix clients in parallel. (Only one client doesnʼt has such
> > > issue)
> > >
> > > ~# aplay xxx1.wav &
> > >
> > > ~# aplay xxx2.wav &
> > >
> > > Here I attach the asound.conf we're using.
> > >
> > > ~# cat /etc/asound.conf
> > >
> > > defaults.pcm.rate_converter "linear"
> > >
> > > pcm.dmix_44100{
> > >
> > >     type dmix
> > >
> > >     ipc_key 5678293
> > >
> > >     ipc_key_add_uid yes
> > >
> > >     slave{
> > >
> > >         pcm "hw:0,0"
> > >
> > >         period_time 40000
> > >
> > >         format S16_LE
> > >
> > >         rate 44100
> > >
> > >         }
> > >
> > > }
> > >
> > > pcm.asymed{
> > >
> > >     type asym
> > >
> > >     playback.pcm "dmix_44100"
> > >
> > >     capture.pcm "dsnoop_44100"
> > >
> > > }
> > >
> > > pcm.!default{
> > >
> > >     type plug
> > >
> > >     route_policy "average"
> > >
> > >     slave.pcm "asymed"
> > >
> > > }
> > >
> > > 2. Let linux enter into suspend and then resume(Repeat this step if
> > > not
> > > reproduced)
> > >
> > > 3. After resume, aplay will get stuck in snd_pcm_wait(). The GDB shows:
> > >
> > > (gdb) bt
> > >
> > > #0  0x0000fffff7da9264 in __GI___poll (fds=fds@entry=0xfffffffff480,
> > > nfds= nfds@entry=1, timeout=timeout@entry=240)
> > >
> > >     at /usr/src/debug/glibc/2.39+git/sysdeps/unix/sysv/linux/poll.c:41
> > >
> > > #1  0x0000fffff7edf468 in poll (__timeout=240, __nfds=1,
> > > __fds=0xfffffffff480)
> > >
> > > #2  snd1_pcm_wait_nocheck (pcm=pcm@entry=0xaaaaaaad2cb0,
> > timeout=240,
> > > timeout@entry=-10001) at pcm.c:2993
> > >
> > > #3  0x0000fffff7ee54a0 in snd1_pcm_write_areas
> > > (pcm=pcm@entry=0xaaaaaaad2cb0, areas=areas@entry=0xfffffffff560,
> > > offset=<optimized out>, offset@entry=0, size =<optimized out>,
> > >
> > >     size@entry=1768, func=func@entry=0xfffff7ef5190
> > > <snd_pcm_plugin_write_areas>) at pcm.c:7699
> > >
> > > #4  0x0000fffff7ef5020 in snd_pcm_plugin_writei (pcm=0xaaaaaaad2cb0,
> > > buffer= <optimized out>, size=1768) at pcm_plugin.c:354
> > >
> > > It seems that sometimes after suspend and resume there's no available
> > > space for data written into buffer. Then aplay keeps stuck in
> > > snd_pcm_wait(). I checked the hw_ptr of dmix and found that hw_ptr is
> > always 0 after resume.
> > >
> > > I don't have a solution now so I turn to you for help. The version of
> > > alsa-lib is v1.2.11. Could you please help check it?
> > 
> > I tried your setup but I couldn't reproduce the issue locally with my laptop and
> > HD-audio device.  Possibly depending on the kernel driver?
> > 
> > In the case of dmix, it's a poll() against the PCM slave timer.  So it doesn't take
> > care of suspend/resume state unlike the real PCM device.  OTOH, the timer
> > device should send notification events at suspend/resume, and it should trigger
> > the poll wakeup, too.
> > 
> > Does poll() return after the suspend/resume once but falls into a loop due to
> > revents being unset?  Or it's stuck and never returns at suspend/resume?
> > 
> > 
> > thanks,
> > 
> > Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-09-04  9:29     ` Jaroslav Kysela
@ 2024-09-04 10:04       ` Takashi Iwai
  0 siblings, 0 replies; 13+ messages in thread
From: Takashi Iwai @ 2024-09-04 10:04 UTC (permalink / raw)
  To: Jaroslav Kysela; +Cc: Chancel Liu, alsa-devel@alsa-project.org, S.J. Wang

On Wed, 04 Sep 2024 11:29:05 +0200,
Jaroslav Kysela wrote:
> 
> On 04. 09. 24 11:07, Chancel Liu wrote:
> > Hi Takashi,
> > 
> > Thanks for your reply and suggestions. Finally we have found the root cause.
> > Seems it's related to both drivers and alsa-lib.
> > 
> > When two dmix clients run in parallel we get two direct dmix instances.
> > 1st dmix instance:
> > snd_pcm_dmix_open()
> > 	snd_pcm_direct_initialize_slave()
> > 		save_slave_setting()
> > Since the driver we are using has SND_PCM_INFO_RESUME flag, dmix->spcm->info
> > has this flag. Then this flag is cleared in dmix->shmptr->s.info.
> > 		
> > 2nd dmix instance:
> > snd_pcm_dmix_open()
> > 	snd_pcm_direct_open_secondary_client()
> > 		copy_slave_setting()
> > 2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn' has this
> > flag.
> > 
> > If 1st dmix instance resumes firstly it should implement recovery of slave pcm
> > in snd_pcm_direct_slave_recover(). Because 1st dmix->spcm->info has
> > SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be called correctly to
> > resume slave pcm.
> > 
> > However if 2nd dmix instance resumes firstly, snd_pcm_resume(direct->spcm) will
> > not be called because it's spcm->info doesn't has SND_PCM_INFO_RESUME flag. The
> > 1st dmix instance assumes someone else already did recovery so
> > snd_pcm_resume(direct->spcm) won't be called neither. In result the slave pcm
> > fails to resume.
> 
> The snd_pcm_direct_slave_recover() function should be called for both
> dmix instances. It calls snd_pcm_prepare() for the "driver" PCM, so
> the driver should recover from suspend in this case, too.

IIUC, it's called.  snd_pcm_direct_check_xrun() is called in many
places to get the state synced, and this sets the PCM state to
SND_PCM_STATE_SUSPENDED when shmptr->s.recoveries changes.
Then the application calls snd_pcm_resume(), and it calls
snd_pcm_direct_slave_recovery().

> See the "some buggy drivers" comment in
> snd_pcm_direct_slave_recover(). It looks like a driver issue, the
> "resume" flag mangling is just a workaround.

A sort of, yes.  The clearance of INFO_RESUME flag should have been
done no matter whether we do this workaround or not, though.

> > SND_PCM_INFO_RESUME flag has impact on the flow of dmix resume. In my opinion
> > the first resumed dmix instance should make sure slave pcm can be recovered
> > properly no matter it's the first opened instance or secondary opened instance.
> > Do you know why the secondary opened instance clear the SND_PCM_INFO_RESUME
> > flag? Can we do the following modification?
> > 
> > diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c
> > @@ -1183,8 +1226,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix, snd_pcm_t *spcm)
> >          COPY_SLAVE(buffer_time);
> >          COPY_SLAVE(sample_bits);
> >          COPY_SLAVE(frame_bits);
> > -
> > -       dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;
> 
> Another option is to fix the buggy drivers and remove the workround
> (or make it configurable) from alsa-lib (revert commit
> 6d1d620eadf32c6d963468ce56ff52cc3a2f32e2).

I'm afraid that this problem is irrelevant with it.

Although I wrote it as "buggy", it might be better phrased as
"fragile".  We haven't defined strictly how the state should be
changed when SUSPENDED or PAUSED to PREPARE.  Ideally, we could just
jump to PREPARE without clearing the state, but some devices seem
assuming the clearance of those state at first.


Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-09-04  9:57     ` Takashi Iwai
@ 2024-09-05  7:44       ` Chancel Liu
  2024-09-05  8:10         ` Takashi Iwai
  0 siblings, 1 reply; 13+ messages in thread
From: Chancel Liu @ 2024-09-05  7:44 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel@alsa-project.org, Jaroslav Kysela, S.J. Wang

> >
> > Hi Takashi,
> >
> > Thanks for your reply and suggestions. Finally we have found the root cause.
> > Seems it's related to both drivers and alsa-lib.
> >
> > When two dmix clients run in parallel we get two direct dmix instances.
> > 1st dmix instance:
> > snd_pcm_dmix_open()
> >       snd_pcm_direct_initialize_slave()
> >               save_slave_setting()
> > Since the driver we are using has SND_PCM_INFO_RESUME flag,
> > dmix->spcm->info has this flag. Then this flag is cleared in
> dmix->shmptr->s.info.
> >
> > 2nd dmix instance:
> > snd_pcm_dmix_open()
> >       snd_pcm_direct_open_secondary_client()
> >               copy_slave_setting()
> > 2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn'
> > has this flag.
> >
> > If 1st dmix instance resumes firstly it should implement recovery of
> > slave pcm in snd_pcm_direct_slave_recover(). Because 1st
> > dmix->spcm->info has
> > SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be called
> > correctly to resume slave pcm.
> 
> ... and immediately stop the stream, then prepare and restart as a usual
> restart.
> 
> > However if 2nd dmix instance resumes firstly,
> > snd_pcm_resume(direct->spcm) will not be called because it's
> > spcm->info doesn't has SND_PCM_INFO_RESUME flag. The 1st dmix instance
> > assumes someone else already did recovery so
> > snd_pcm_resume(direct->spcm) won't be called neither. In result the
> > slave pcm fails to resume.
> 
> Something wrong happening here, then.
> 
> In dmix, there is no hardware resume at all, but it's always a restart of the
> stream.  The call of snd_pcm_resume() is only temporarily for inconsistencies
> that can be a problem on some drivers (IIRC dmaengine stuff).  That said,
> dmix does a kind of fake resume, stops and restarts the stream cleanly on the
> first instance.  On the second instance, it's already recovered, hence it bails
> out.
> 
> If poll() hangs on the second instance, there can be some other problem.
> Maybe the resume -> stop -> restart sequence doesn't work with your driver
> well?
> 

Our dma driver will do PAUSE in system suspend and requires doing RESUME in
system resume. Current problem is that snd_pcm_resume() is not called by both
1st instance and 2nd instance.

> > SND_PCM_INFO_RESUME flag has impact on the flow of dmix resume. In my
> > opinion the first resumed dmix instance should make sure slave pcm can
> > be recovered properly no matter it's the first opened instance or
> > secondary opened instance
> .
> 
> The snd_pcm_resume() gets called no matter which instance, just the first one
> who tries to recover the suspended state.  (And it's called internally at
> updating the various state, not necessarily an explicit recovery call.)
> 

Unfortunately if secondary opened instance resumes first it doesn't has
SND_PCM_INFO_RESUME which causes snd_pcm_resume() never be called.

> > Do you know why the secondary opened instance clear the
> > SND_PCM_INFO_RESUME flag? Can we do the following modification?
> >
> > diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c @@ -1183,8
> > +1226,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix,
> snd_pcm_t *spcm)
> >         COPY_SLAVE(buffer_time);
> >         COPY_SLAVE(sample_bits);
> >         COPY_SLAVE(frame_bits);
> > -
> > -       dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;
> 
> I don't think so.  The clearance of the RESUME flag here is correct.
> dmix doesn't support the hardware resume feature.  It does its own.
> (And this flag is merely a info for apps, which isn't really evaluated except for
> the code in dmix workaround there.)
> 
> 
> Takashi
> 

I think dmix should know what state the real driver is. If driver requires that
app should do snd_pcm_resume() how can dmix get this information?

Many thanks for answering these questions.

Regards, 
Chancel Liu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-09-05  7:44       ` Chancel Liu
@ 2024-09-05  8:10         ` Takashi Iwai
  2024-09-05 11:01           ` [EXT] " Chancel Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Takashi Iwai @ 2024-09-05  8:10 UTC (permalink / raw)
  To: Chancel Liu; +Cc: alsa-devel@alsa-project.org, Jaroslav Kysela, S.J. Wang

On Thu, 05 Sep 2024 09:44:10 +0200,
Chancel Liu wrote:
> 
> > >
> > > Hi Takashi,
> > >
> > > Thanks for your reply and suggestions. Finally we have found the root cause.
> > > Seems it's related to both drivers and alsa-lib.
> > >
> > > When two dmix clients run in parallel we get two direct dmix instances.
> > > 1st dmix instance:
> > > snd_pcm_dmix_open()
> > >       snd_pcm_direct_initialize_slave()
> > >               save_slave_setting()
> > > Since the driver we are using has SND_PCM_INFO_RESUME flag,
> > > dmix->spcm->info has this flag. Then this flag is cleared in
> > dmix->shmptr->s.info.
> > >
> > > 2nd dmix instance:
> > > snd_pcm_dmix_open()
> > >       snd_pcm_direct_open_secondary_client()
> > >               copy_slave_setting()
> > > 2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn'
> > > has this flag.
> > >
> > > If 1st dmix instance resumes firstly it should implement recovery of
> > > slave pcm in snd_pcm_direct_slave_recover(). Because 1st
> > > dmix->spcm->info has
> > > SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be called
> > > correctly to resume slave pcm.
> > 
> > ... and immediately stop the stream, then prepare and restart as a usual
> > restart.
> > 
> > > However if 2nd dmix instance resumes firstly,
> > > snd_pcm_resume(direct->spcm) will not be called because it's
> > > spcm->info doesn't has SND_PCM_INFO_RESUME flag. The 1st dmix instance
> > > assumes someone else already did recovery so
> > > snd_pcm_resume(direct->spcm) won't be called neither. In result the
> > > slave pcm fails to resume.
> > 
> > Something wrong happening here, then.
> > 
> > In dmix, there is no hardware resume at all, but it's always a restart of the
> > stream.  The call of snd_pcm_resume() is only temporarily for inconsistencies
> > that can be a problem on some drivers (IIRC dmaengine stuff).  That said,
> > dmix does a kind of fake resume, stops and restarts the stream cleanly on the
> > first instance.  On the second instance, it's already recovered, hence it bails
> > out.
> > 
> > If poll() hangs on the second instance, there can be some other problem.
> > Maybe the resume -> stop -> restart sequence doesn't work with your driver
> > well?
> > 
> 
> Our dma driver will do PAUSE in system suspend and requires doing RESUME in
> system resume. Current problem is that snd_pcm_resume() is not called by both
> 1st instance and 2nd instance.

That's weird.  Are you really testing with the latest alsa-lib code?

If application doesn't call snd_pcm_resume(), it means that the PCM
state isn't set to SUSPENDED, so it pretends as if still running.

Or if you mean that snd_pcm_resume() to the slave PCM isn't called
(even though snd_pcm_resume() is called for the dmix PCM), check
whether snd_pcm_direct_slave_recover() gets called, especially at the
point:

	/* some buggy drivers require the device resumed before prepared;
	 * when a device has RESUME flag and is in SUSPENDED state, resume
	 * here but immediately drop to bring it to a sane active state.
	 */
	if (state == SND_PCM_STATE_SUSPENDED &&
	    (direct->spcm->info & SND_PCM_INFO_RESUME)) {
		snd_pcm_resume(direct->spcm);
		snd_pcm_drop(direct->spcm);
		snd_pcm_direct_timer_stop(direct);
		snd_pcm_direct_clear_timer_queue(direct);
	}

Try to put debug prints or catch via breakpoint whether this code path
is executed.

Also, does the issue happen with the latest 6.11-rc kernel, too?
If yes, what if you drop SNDRV_PCM_INFO_RESUME bit flag in the driver
side?  Does the problem persist, or it works?

> > > SND_PCM_INFO_RESUME flag has impact on the flow of dmix resume. In my
> > > opinion the first resumed dmix instance should make sure slave pcm can
> > > be recovered properly no matter it's the first opened instance or
> > > secondary opened instance
> > .
> > 
> > The snd_pcm_resume() gets called no matter which instance, just the first one
> > who tries to recover the suspended state.  (And it's called internally at
> > updating the various state, not necessarily an explicit recovery call.)
> > 
> 
> Unfortunately if secondary opened instance resumes first it doesn't has
> SND_PCM_INFO_RESUME which causes snd_pcm_resume() never be called.

No, it's misunderstanding.  SND_PCM_INFO_RESUME isn't exposed to the
application in the case of dmix at all; i.e. dmix doesn't support the
full resume, per se. That's the design.  So it doesn't matter which
instance gets resumed at first.

> > > Do you know why the secondary opened instance clear the
> > > SND_PCM_INFO_RESUME flag? Can we do the following modification?
> > >
> > > diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c @@ -1183,8
> > > +1226,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix,
> > snd_pcm_t *spcm)
> > >         COPY_SLAVE(buffer_time);
> > >         COPY_SLAVE(sample_bits);
> > >         COPY_SLAVE(frame_bits);
> > > -
> > > -       dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;
> > 
> > I don't think so.  The clearance of the RESUME flag here is correct.
> > dmix doesn't support the hardware resume feature.  It does its own.
> > (And this flag is merely a info for apps, which isn't really evaluated except for
> > the code in dmix workaround there.)
> > 
> > 
> > Takashi
> > 
> 
> I think dmix should know what state the real driver is. If driver requires that
> app should do snd_pcm_resume() how can dmix get this information?

The dmix already knows.  But the PCM state exposed to applications
isn't always tied as 1:1.


Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [EXT] Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-09-05  8:10         ` Takashi Iwai
@ 2024-09-05 11:01           ` Chancel Liu
  2024-09-05 13:36             ` Takashi Iwai
  0 siblings, 1 reply; 13+ messages in thread
From: Chancel Liu @ 2024-09-05 11:01 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel@alsa-project.org, Jaroslav Kysela, S.J. Wang

> > > > Hi Takashi,
> > > >
> > > > Thanks for your reply and suggestions. Finally we have found the root
> cause.
> > > > Seems it's related to both drivers and alsa-lib.
> > > >
> > > > When two dmix clients run in parallel we get two direct dmix instances.
> > > > 1st dmix instance:
> > > > snd_pcm_dmix_open()
> > > >       snd_pcm_direct_initialize_slave()
> > > >               save_slave_setting()
> > > > Since the driver we are using has SND_PCM_INFO_RESUME flag,
> > > > dmix->spcm->info has this flag. Then this flag is cleared in
> > > dmix->shmptr->s.info.
> > > >
> > > > 2nd dmix instance:
> > > > snd_pcm_dmix_open()
> > > >       snd_pcm_direct_open_secondary_client()
> > > >               copy_slave_setting()
> > > > 2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn'
> > > > has this flag.
> > > >
> > > > If 1st dmix instance resumes firstly it should implement recovery of
> > > > slave pcm in snd_pcm_direct_slave_recover(). Because 1st
> > > > dmix->spcm->info has
> > > > SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be called
> > > > correctly to resume slave pcm.
> > >
> > > ... and immediately stop the stream, then prepare and restart as a usual
> > > restart.
> > >
> > > > However if 2nd dmix instance resumes firstly,
> > > > snd_pcm_resume(direct->spcm) will not be called because it's
> > > > spcm->info doesn't has SND_PCM_INFO_RESUME flag. The 1st dmix
> instance
> > > > assumes someone else already did recovery so
> > > > snd_pcm_resume(direct->spcm) won't be called neither. In result the
> > > > slave pcm fails to resume.
> > >
> > > Something wrong happening here, then.
> > >
> > > In dmix, there is no hardware resume at all, but it's always a restart of the
> > > stream.  The call of snd_pcm_resume() is only temporarily for
> inconsistencies
> > > that can be a problem on some drivers (IIRC dmaengine stuff).  That said,
> > > dmix does a kind of fake resume, stops and restarts the stream cleanly on
> the
> > > first instance.  On the second instance, it's already recovered, hence it
> bails
> > > out.
> > >
> > > If poll() hangs on the second instance, there can be some other problem.
> > > Maybe the resume -> stop -> restart sequence doesn't work with your
> driver
> > > well?
> > >
> >
> > Our dma driver will do PAUSE in system suspend and requires doing RESUME
> in
> > system resume. Current problem is that snd_pcm_resume() is not called by
> both
> > 1st instance and 2nd instance.
> 
> That's weird.  Are you really testing with the latest alsa-lib code?
> 
> If application doesn't call snd_pcm_resume(), it means that the PCM
> state isn't set to SUSPENDED, so it pretends as if still running.
> 
> Or if you mean that snd_pcm_resume() to the slave PCM isn't called
> (even though snd_pcm_resume() is called for the dmix PCM), check
> whether snd_pcm_direct_slave_recover() gets called, especially at the
> point:
> 
>         /* some buggy drivers require the device resumed before prepared;
>          * when a device has RESUME flag and is in SUSPENDED state,
> resume
>          * here but immediately drop to bring it to a sane active state.
>          */
>         if (state == SND_PCM_STATE_SUSPENDED &&
>             (direct->spcm->info & SND_PCM_INFO_RESUME)) {
>                 snd_pcm_resume(direct->spcm);
>                 snd_pcm_drop(direct->spcm);
>                 snd_pcm_direct_timer_stop(direct);
>                 snd_pcm_direct_clear_timer_queue(direct);
>         }
> 
> Try to put debug prints or catch via breakpoint whether this code path
> is executed.
> 
> Also, does the issue happen with the latest 6.11-rc kernel, too?
> If yes, what if you drop SNDRV_PCM_INFO_RESUME bit flag in the driver
> side?  Does the problem persist, or it works?
> 

I'm working on kernel 6.6 and alsa-lib v1.2.11. It's not so outdated I think and
then I will try to switch on the latest version.

Indeed I did some debug on this part. Please see my comments inline.

int snd_pcm_direct_slave_recover(snd_pcm_direct_t *direct)
{
	...
	
	/* [Chancel]
	 * When two dmix clients run in parallel we get two direct dmix instances.
	 * 1st dmix->spcm->info has SND_PCM_INFO_RESUME flag but 2nd dmix doesn't.
	 * Let's name 1st opened dmix "dmix1" and 2nd dmix "dmix2".
	 * After resume, both dmix1 and dmix2 enter into snd_pcm_direct_slave_recover().
	 * Here we assume dmix2 is the earlier instance which execute here.
	 * dmix2 successfully get semaphore lock and dmix1 is waiting for this lock.
	 */
	 
	semerr = snd_pcm_direct_semaphore_down(direct,
					   DIRECT_IPC_SEM_CLIENT);
	...
	state = snd_pcm_state(direct->spcm);
	if (state != SND_PCM_STATE_XRUN && state != SND_PCM_STATE_SUSPENDED) {
	
	/* [Chancel]
	 * dmix2 finds spcm state is SUSPENDED so it will not enter here.
	 * However later when dmix1 get lock and enter here, spcm state has been changed to RUNNING by dmix2.
	 * In result dmix1 assumes some other instance has done so dmix2 directly return.
	 * snd_pcm_resume() is not called by dmix1.
	 */
	
		/* ignore... someone else already did recovery */
		semerr = snd_pcm_direct_semaphore_up(direct,
						     DIRECT_IPC_SEM_CLIENT);
		if (semerr < 0) {
			SNDERR("SEMUP FAILED with err %d", semerr);
			return semerr;
		}

		return 0;
	}
	...

	if (state == SND_PCM_STATE_SUSPENDED &&
	    (direct->spcm->info & SND_PCM_INFO_RESUME)) {
	
	/* [Chancel]
	 * dmix2->spcm->info doesn't have SND_PCM_INFO_RESUME flag. So this condition is not met.
	 * snd_pcm_resume() is not called by dmix2.
	 */

		snd_pcm_resume(direct->spcm);
		snd_pcm_drop(direct->spcm);
		snd_pcm_direct_timer_stop(direct);
		snd_pcm_direct_clear_timer_queue(direct);
	}
	...
	ret = snd_pcm_prepare(direct->spcm);
	...
	
	/* [Chancel]
	 * dmix2 calls snd_pcm_start to set spcm state to RUNNING.
	 */
	
	ret = snd_pcm_start(direct->spcm);
	...
}

The dma driver I'm using supports pause/resume function. I don't think dropping SNDRV_PCM_INFO_RESUME 
is a good fix on this issue. Besides this driver, I also validate on another driver whose dma doesn't
has such flag. This issue has gone and both 2 instances work well with suspend/resume.

Regards, 
Chancel Liu

> > > > SND_PCM_INFO_RESUME flag has impact on the flow of dmix resume. In
> my
> > > > opinion the first resumed dmix instance should make sure slave pcm can
> > > > be recovered properly no matter it's the first opened instance or
> > > > secondary opened instance
> > > .
> > >
> > > The snd_pcm_resume() gets called no matter which instance, just the first
> one
> > > who tries to recover the suspended state.  (And it's called internally at
> > > updating the various state, not necessarily an explicit recovery call.)
> > >
> >
> > Unfortunately if secondary opened instance resumes first it doesn't has
> > SND_PCM_INFO_RESUME which causes snd_pcm_resume() never be called.
> 
> No, it's misunderstanding.  SND_PCM_INFO_RESUME isn't exposed to the
> application in the case of dmix at all; i.e. dmix doesn't support the
> full resume, per se. That's the design.  So it doesn't matter which
> instance gets resumed at first.
> 
> > > > Do you know why the secondary opened instance clear the
> > > > SND_PCM_INFO_RESUME flag? Can we do the following modification?
> > > >
> > > > diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c @@ -1183,8
> > > > +1226,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix,
> > > snd_pcm_t *spcm)
> > > >         COPY_SLAVE(buffer_time);
> > > >         COPY_SLAVE(sample_bits);
> > > >         COPY_SLAVE(frame_bits);
> > > > -
> > > > -       dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;
> > >
> > > I don't think so.  The clearance of the RESUME flag here is correct.
> > > dmix doesn't support the hardware resume feature.  It does its own.
> > > (And this flag is merely a info for apps, which isn't really evaluated except
> for
> > > the code in dmix workaround there.)
> > >
> > >
> > > Takashi
> > >
> >
> > I think dmix should know what state the real driver is. If driver requires that
> > app should do snd_pcm_resume() how can dmix get this information?
> 
> The dmix already knows.  But the PCM state exposed to applications
> isn't always tied as 1:1.
> 
> 
> Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [EXT] Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-09-05 11:01           ` [EXT] " Chancel Liu
@ 2024-09-05 13:36             ` Takashi Iwai
  2024-09-06  6:22               ` Chancel Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Takashi Iwai @ 2024-09-05 13:36 UTC (permalink / raw)
  To: Chancel Liu; +Cc: alsa-devel@alsa-project.org, Jaroslav Kysela, S.J. Wang

On Thu, 05 Sep 2024 13:01:11 +0200,
Chancel Liu wrote:
> 
> > > > > Hi Takashi,
> > > > >
> > > > > Thanks for your reply and suggestions. Finally we have found the root
> > cause.
> > > > > Seems it's related to both drivers and alsa-lib.
> > > > >
> > > > > When two dmix clients run in parallel we get two direct dmix instances.
> > > > > 1st dmix instance:
> > > > > snd_pcm_dmix_open()
> > > > >       snd_pcm_direct_initialize_slave()
> > > > >               save_slave_setting()
> > > > > Since the driver we are using has SND_PCM_INFO_RESUME flag,
> > > > > dmix->spcm->info has this flag. Then this flag is cleared in
> > > > dmix->shmptr->s.info.
> > > > >
> > > > > 2nd dmix instance:
> > > > > snd_pcm_dmix_open()
> > > > >       snd_pcm_direct_open_secondary_client()
> > > > >               copy_slave_setting()
> > > > > 2nd dmix->spcm->info is copied from dmix->shmptr->s.info so it doesn'
> > > > > has this flag.
> > > > >
> > > > > If 1st dmix instance resumes firstly it should implement recovery of
> > > > > slave pcm in snd_pcm_direct_slave_recover(). Because 1st
> > > > > dmix->spcm->info has
> > > > > SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be called
> > > > > correctly to resume slave pcm.
> > > >
> > > > ... and immediately stop the stream, then prepare and restart as a usual
> > > > restart.
> > > >
> > > > > However if 2nd dmix instance resumes firstly,
> > > > > snd_pcm_resume(direct->spcm) will not be called because it's
> > > > > spcm->info doesn't has SND_PCM_INFO_RESUME flag. The 1st dmix
> > instance
> > > > > assumes someone else already did recovery so
> > > > > snd_pcm_resume(direct->spcm) won't be called neither. In result the
> > > > > slave pcm fails to resume.
> > > >
> > > > Something wrong happening here, then.
> > > >
> > > > In dmix, there is no hardware resume at all, but it's always a restart of the
> > > > stream.  The call of snd_pcm_resume() is only temporarily for
> > inconsistencies
> > > > that can be a problem on some drivers (IIRC dmaengine stuff).  That said,
> > > > dmix does a kind of fake resume, stops and restarts the stream cleanly on
> > the
> > > > first instance.  On the second instance, it's already recovered, hence it
> > bails
> > > > out.
> > > >
> > > > If poll() hangs on the second instance, there can be some other problem.
> > > > Maybe the resume -> stop -> restart sequence doesn't work with your
> > driver
> > > > well?
> > > >
> > >
> > > Our dma driver will do PAUSE in system suspend and requires doing RESUME
> > in
> > > system resume. Current problem is that snd_pcm_resume() is not called by
> > both
> > > 1st instance and 2nd instance.
> > 
> > That's weird.  Are you really testing with the latest alsa-lib code?
> > 
> > If application doesn't call snd_pcm_resume(), it means that the PCM
> > state isn't set to SUSPENDED, so it pretends as if still running.
> > 
> > Or if you mean that snd_pcm_resume() to the slave PCM isn't called
> > (even though snd_pcm_resume() is called for the dmix PCM), check
> > whether snd_pcm_direct_slave_recover() gets called, especially at the
> > point:
> > 
> >         /* some buggy drivers require the device resumed before prepared;
> >          * when a device has RESUME flag and is in SUSPENDED state,
> > resume
> >          * here but immediately drop to bring it to a sane active state.
> >          */
> >         if (state == SND_PCM_STATE_SUSPENDED &&
> >             (direct->spcm->info & SND_PCM_INFO_RESUME)) {
> >                 snd_pcm_resume(direct->spcm);
> >                 snd_pcm_drop(direct->spcm);
> >                 snd_pcm_direct_timer_stop(direct);
> >                 snd_pcm_direct_clear_timer_queue(direct);
> >         }
> > 
> > Try to put debug prints or catch via breakpoint whether this code path
> > is executed.
> > 
> > Also, does the issue happen with the latest 6.11-rc kernel, too?
> > If yes, what if you drop SNDRV_PCM_INFO_RESUME bit flag in the driver
> > side?  Does the problem persist, or it works?
> > 
> 
> I'm working on kernel 6.6 and alsa-lib v1.2.11. It's not so outdated I think and
> then I will try to switch on the latest version.
> 
> Indeed I did some debug on this part. Please see my comments inline.
> 
> int snd_pcm_direct_slave_recover(snd_pcm_direct_t *direct)
> {
> 	...
> 	
> 	/* [Chancel]
> 	 * When two dmix clients run in parallel we get two direct dmix instances.
> 	 * 1st dmix->spcm->info has SND_PCM_INFO_RESUME flag but 2nd dmix doesn't.

OK, that must be the cause.  It's because the second open copies the
saved shmem->s.info into spcm->info at its open time while we already
dropped the INFO_RESUME bit.  All the rest behavior are side effect of
this inconsistency.

I guess dropping the INFO_RESUME bit at hw_params and hw_refine should
work instead.  A totally untested fix is below.

(And I believe the drop of INFO_PAUSE should be handled similarly,
 too, instead of dropping spcm->info bit there.)


Takashi

--- a/src/pcm/pcm_direct.c
+++ b/src/pcm/pcm_direct.c
@@ -1018,6 +1018,7 @@ int snd_pcm_direct_hw_refine(snd_pcm_t *pcm, snd_pcm_hw_params_t *params)
 	}
 	dshare->timer_ticks = hw_param_interval(params, SND_PCM_HW_PARAM_PERIOD_SIZE)->max / dshare->slave_period_size;
 	params->info = dshare->shmptr->s.info;
+	params->info &= ~SND_PCM_INFO_RESUME;
 #ifdef REFINE_DEBUG
 	snd_output_puts(log, "DMIX REFINE (end):\n");
 	snd_pcm_hw_params_dump(params, log);
@@ -1031,6 +1032,7 @@ int snd_pcm_direct_hw_params(snd_pcm_t *pcm, snd_pcm_hw_params_t * params)
 	snd_pcm_direct_t *dmix = pcm->private_data;
 
 	params->info = dmix->shmptr->s.info;
+	params->info &= ~SND_PCM_INFO_RESUME;
 	params->rate_num = dmix->shmptr->s.rate;
 	params->rate_den = 1;
 	params->fifo_size = 0;
@@ -1183,8 +1185,6 @@ static void save_slave_setting(snd_pcm_direct_t *dmix, snd_pcm_t *spcm)
 	COPY_SLAVE(buffer_time);
 	COPY_SLAVE(sample_bits);
 	COPY_SLAVE(frame_bits);
-
-	dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;
 }
 
 #undef COPY_SLAVE

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [EXT] Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-09-05 13:36             ` Takashi Iwai
@ 2024-09-06  6:22               ` Chancel Liu
  2024-09-06  6:31                 ` Takashi Iwai
  0 siblings, 1 reply; 13+ messages in thread
From: Chancel Liu @ 2024-09-06  6:22 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel@alsa-project.org, Jaroslav Kysela, S.J. Wang

> > > > > > Hi Takashi,
> > > > > >
> > > > > > Thanks for your reply and suggestions. Finally we have found
> > > > > > the root
> > > cause.
> > > > > > Seems it's related to both drivers and alsa-lib.
> > > > > >
> > > > > > When two dmix clients run in parallel we get two direct dmix
> instances.
> > > > > > 1st dmix instance:
> > > > > > snd_pcm_dmix_open()
> > > > > >       snd_pcm_direct_initialize_slave()
> > > > > >               save_slave_setting() Since the driver we are
> > > > > > using has SND_PCM_INFO_RESUME flag,
> > > > > > dmix->spcm->info has this flag. Then this flag is cleared in
> > > > > dmix->shmptr->s.info.
> > > > > >
> > > > > > 2nd dmix instance:
> > > > > > snd_pcm_dmix_open()
> > > > > >       snd_pcm_direct_open_secondary_client()
> > > > > >               copy_slave_setting() 2nd dmix->spcm->info is
> > > > > > copied from dmix->shmptr->s.info so it doesn'
> > > > > > has this flag.
> > > > > >
> > > > > > If 1st dmix instance resumes firstly it should implement
> > > > > > recovery of slave pcm in snd_pcm_direct_slave_recover().
> > > > > > Because 1st
> > > > > > dmix->spcm->info has
> > > > > > SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be
> called
> > > > > > correctly to resume slave pcm.
> > > > >
> > > > > ... and immediately stop the stream, then prepare and restart as
> > > > > a usual restart.
> > > > >
> > > > > > However if 2nd dmix instance resumes firstly,
> > > > > > snd_pcm_resume(direct->spcm) will not be called because it's
> > > > > > spcm->info doesn't has SND_PCM_INFO_RESUME flag. The 1st dmix
> > > instance
> > > > > > assumes someone else already did recovery so
> > > > > > snd_pcm_resume(direct->spcm) won't be called neither. In
> > > > > > result the slave pcm fails to resume.
> > > > >
> > > > > Something wrong happening here, then.
> > > > >
> > > > > In dmix, there is no hardware resume at all, but it's always a
> > > > > restart of the stream.  The call of snd_pcm_resume() is only
> > > > > temporarily for
> > > inconsistencies
> > > > > that can be a problem on some drivers (IIRC dmaengine stuff).
> > > > > That said, dmix does a kind of fake resume, stops and restarts
> > > > > the stream cleanly on
> > > the
> > > > > first instance.  On the second instance, it's already recovered,
> > > > > hence it
> > > bails
> > > > > out.
> > > > >
> > > > > If poll() hangs on the second instance, there can be some other problem.
> > > > > Maybe the resume -> stop -> restart sequence doesn't work with
> > > > > your
> > > driver
> > > > > well?
> > > > >
> > > >
> > > > Our dma driver will do PAUSE in system suspend and requires doing
> > > > RESUME
> > > in
> > > > system resume. Current problem is that snd_pcm_resume() is not
> > > > called by
> > > both
> > > > 1st instance and 2nd instance.
> > >
> > > That's weird.  Are you really testing with the latest alsa-lib code?
> > >
> > > If application doesn't call snd_pcm_resume(), it means that the PCM
> > > state isn't set to SUSPENDED, so it pretends as if still running.
> > >
> > > Or if you mean that snd_pcm_resume() to the slave PCM isn't called
> > > (even though snd_pcm_resume() is called for the dmix PCM), check
> > > whether snd_pcm_direct_slave_recover() gets called, especially at
> > > the
> > > point:
> > >
> > >         /* some buggy drivers require the device resumed before
> prepared;
> > >          * when a device has RESUME flag and is in SUSPENDED state,
> > > resume
> > >          * here but immediately drop to bring it to a sane active state.
> > >          */
> > >         if (state == SND_PCM_STATE_SUSPENDED &&
> > >             (direct->spcm->info & SND_PCM_INFO_RESUME)) {
> > >                 snd_pcm_resume(direct->spcm);
> > >                 snd_pcm_drop(direct->spcm);
> > >                 snd_pcm_direct_timer_stop(direct);
> > >                 snd_pcm_direct_clear_timer_queue(direct);
> > >         }
> > >
> > > Try to put debug prints or catch via breakpoint whether this code
> > > path is executed.
> > >
> > > Also, does the issue happen with the latest 6.11-rc kernel, too?
> > > If yes, what if you drop SNDRV_PCM_INFO_RESUME bit flag in the
> > > driver side?  Does the problem persist, or it works?
> > >
> >
> > I'm working on kernel 6.6 and alsa-lib v1.2.11. It's not so outdated I
> > think and then I will try to switch on the latest version.
> >
> > Indeed I did some debug on this part. Please see my comments inline.
> >
> > int snd_pcm_direct_slave_recover(snd_pcm_direct_t *direct) {
> >       ...
> >
> >       /* [Chancel]
> >        * When two dmix clients run in parallel we get two direct dmix
> instances.
> >        * 1st dmix->spcm->info has SND_PCM_INFO_RESUME flag but 2nd
> dmix doesn't.
> 
> OK, that must be the cause.  It's because the second open copies the saved
> shmem->s.info into spcm->info at its open time while we already dropped the
> INFO_RESUME bit.  All the rest behavior are side effect of this inconsistency.
> 
> I guess dropping the INFO_RESUME bit at hw_params and hw_refine should
> work instead.  A totally untested fix is below.
> 
> (And I believe the drop of INFO_PAUSE should be handled similarly,  too,
> instead of dropping spcm->info bit there.)
> 
> 
> Takashi
> 
> --- a/src/pcm/pcm_direct.c
> +++ b/src/pcm/pcm_direct.c
> @@ -1018,6 +1018,7 @@ int snd_pcm_direct_hw_refine(snd_pcm_t *pcm,
> snd_pcm_hw_params_t *params)
>         }
>         dshare->timer_ticks = hw_param_interval(params,
> SND_PCM_HW_PARAM_PERIOD_SIZE)->max / dshare->slave_period_size;
>         params->info = dshare->shmptr->s.info;
> +       params->info &= ~SND_PCM_INFO_RESUME;
>  #ifdef REFINE_DEBUG
>         snd_output_puts(log, "DMIX REFINE (end):\n");
>         snd_pcm_hw_params_dump(params, log); @@ -1031,6 +1032,7
> @@ int snd_pcm_direct_hw_params(snd_pcm_t *pcm,
> snd_pcm_hw_params_t * params)
>         snd_pcm_direct_t *dmix = pcm->private_data;
> 
>         params->info = dmix->shmptr->s.info;
> +       params->info &= ~SND_PCM_INFO_RESUME;
>         params->rate_num = dmix->shmptr->s.rate;
>         params->rate_den = 1;
>         params->fifo_size = 0;
> @@ -1183,8 +1185,6 @@ static void save_slave_setting(snd_pcm_direct_t
> *dmix, snd_pcm_t *spcm)
>         COPY_SLAVE(buffer_time);
>         COPY_SLAVE(sample_bits);
>         COPY_SLAVE(frame_bits);
> -
> -       dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;
>  }
> 
>  #undef COPY_SLAVE

Thanks Takashi,

This patch can fix this issue on my side. From my test both dmix1->spcm->info and
dmix2->spcm->info has SND_PCM_INFO_RESUME flag and snd_pcm_resume() can be
successfully called by first resumed instance. I don't understand this patch well. Are
you meant to drop SND_PCM_INFO_RESUME from dmix and keep it in slave pcm?

BTW, when will this patch merged to mainline?

Regards, 
Chancel Liu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [EXT] Re: Suspend/resume Issue on pcm_dmix.c in alsa-lib
  2024-09-06  6:22               ` Chancel Liu
@ 2024-09-06  6:31                 ` Takashi Iwai
  0 siblings, 0 replies; 13+ messages in thread
From: Takashi Iwai @ 2024-09-06  6:31 UTC (permalink / raw)
  To: Chancel Liu; +Cc: alsa-devel@alsa-project.org, Jaroslav Kysela, S.J. Wang

On Fri, 06 Sep 2024 08:22:23 +0200,
Chancel Liu wrote:
> 
> > > > > > > Hi Takashi,
> > > > > > >
> > > > > > > Thanks for your reply and suggestions. Finally we have found
> > > > > > > the root
> > > > cause.
> > > > > > > Seems it's related to both drivers and alsa-lib.
> > > > > > >
> > > > > > > When two dmix clients run in parallel we get two direct dmix
> > instances.
> > > > > > > 1st dmix instance:
> > > > > > > snd_pcm_dmix_open()
> > > > > > >       snd_pcm_direct_initialize_slave()
> > > > > > >               save_slave_setting() Since the driver we are
> > > > > > > using has SND_PCM_INFO_RESUME flag,
> > > > > > > dmix->spcm->info has this flag. Then this flag is cleared in
> > > > > > dmix->shmptr->s.info.
> > > > > > >
> > > > > > > 2nd dmix instance:
> > > > > > > snd_pcm_dmix_open()
> > > > > > >       snd_pcm_direct_open_secondary_client()
> > > > > > >               copy_slave_setting() 2nd dmix->spcm->info is
> > > > > > > copied from dmix->shmptr->s.info so it doesn'
> > > > > > > has this flag.
> > > > > > >
> > > > > > > If 1st dmix instance resumes firstly it should implement
> > > > > > > recovery of slave pcm in snd_pcm_direct_slave_recover().
> > > > > > > Because 1st
> > > > > > > dmix->spcm->info has
> > > > > > > SND_PCM_INFO_RESUME，snd_pcm_resume(direct->spcm) can be
> > called
> > > > > > > correctly to resume slave pcm.
> > > > > >
> > > > > > ... and immediately stop the stream, then prepare and restart as
> > > > > > a usual restart.
> > > > > >
> > > > > > > However if 2nd dmix instance resumes firstly,
> > > > > > > snd_pcm_resume(direct->spcm) will not be called because it's
> > > > > > > spcm->info doesn't has SND_PCM_INFO_RESUME flag. The 1st dmix
> > > > instance
> > > > > > > assumes someone else already did recovery so
> > > > > > > snd_pcm_resume(direct->spcm) won't be called neither. In
> > > > > > > result the slave pcm fails to resume.
> > > > > >
> > > > > > Something wrong happening here, then.
> > > > > >
> > > > > > In dmix, there is no hardware resume at all, but it's always a
> > > > > > restart of the stream.  The call of snd_pcm_resume() is only
> > > > > > temporarily for
> > > > inconsistencies
> > > > > > that can be a problem on some drivers (IIRC dmaengine stuff).
> > > > > > That said, dmix does a kind of fake resume, stops and restarts
> > > > > > the stream cleanly on
> > > > the
> > > > > > first instance.  On the second instance, it's already recovered,
> > > > > > hence it
> > > > bails
> > > > > > out.
> > > > > >
> > > > > > If poll() hangs on the second instance, there can be some other problem.
> > > > > > Maybe the resume -> stop -> restart sequence doesn't work with
> > > > > > your
> > > > driver
> > > > > > well?
> > > > > >
> > > > >
> > > > > Our dma driver will do PAUSE in system suspend and requires doing
> > > > > RESUME
> > > > in
> > > > > system resume. Current problem is that snd_pcm_resume() is not
> > > > > called by
> > > > both
> > > > > 1st instance and 2nd instance.
> > > >
> > > > That's weird.  Are you really testing with the latest alsa-lib code?
> > > >
> > > > If application doesn't call snd_pcm_resume(), it means that the PCM
> > > > state isn't set to SUSPENDED, so it pretends as if still running.
> > > >
> > > > Or if you mean that snd_pcm_resume() to the slave PCM isn't called
> > > > (even though snd_pcm_resume() is called for the dmix PCM), check
> > > > whether snd_pcm_direct_slave_recover() gets called, especially at
> > > > the
> > > > point:
> > > >
> > > >         /* some buggy drivers require the device resumed before
> > prepared;
> > > >          * when a device has RESUME flag and is in SUSPENDED state,
> > > > resume
> > > >          * here but immediately drop to bring it to a sane active state.
> > > >          */
> > > >         if (state == SND_PCM_STATE_SUSPENDED &&
> > > >             (direct->spcm->info & SND_PCM_INFO_RESUME)) {
> > > >                 snd_pcm_resume(direct->spcm);
> > > >                 snd_pcm_drop(direct->spcm);
> > > >                 snd_pcm_direct_timer_stop(direct);
> > > >                 snd_pcm_direct_clear_timer_queue(direct);
> > > >         }
> > > >
> > > > Try to put debug prints or catch via breakpoint whether this code
> > > > path is executed.
> > > >
> > > > Also, does the issue happen with the latest 6.11-rc kernel, too?
> > > > If yes, what if you drop SNDRV_PCM_INFO_RESUME bit flag in the
> > > > driver side?  Does the problem persist, or it works?
> > > >
> > >
> > > I'm working on kernel 6.6 and alsa-lib v1.2.11. It's not so outdated I
> > > think and then I will try to switch on the latest version.
> > >
> > > Indeed I did some debug on this part. Please see my comments inline.
> > >
> > > int snd_pcm_direct_slave_recover(snd_pcm_direct_t *direct) {
> > >       ...
> > >
> > >       /* [Chancel]
> > >        * When two dmix clients run in parallel we get two direct dmix
> > instances.
> > >        * 1st dmix->spcm->info has SND_PCM_INFO_RESUME flag but 2nd
> > dmix doesn't.
> > 
> > OK, that must be the cause.  It's because the second open copies the saved
> > shmem->s.info into spcm->info at its open time while we already dropped the
> > INFO_RESUME bit.  All the rest behavior are side effect of this inconsistency.
> > 
> > I guess dropping the INFO_RESUME bit at hw_params and hw_refine should
> > work instead.  A totally untested fix is below.
> > 
> > (And I believe the drop of INFO_PAUSE should be handled similarly,  too,
> > instead of dropping spcm->info bit there.)
> > 
> > 
> > Takashi
> > 
> > --- a/src/pcm/pcm_direct.c
> > +++ b/src/pcm/pcm_direct.c
> > @@ -1018,6 +1018,7 @@ int snd_pcm_direct_hw_refine(snd_pcm_t *pcm,
> > snd_pcm_hw_params_t *params)
> >         }
> >         dshare->timer_ticks = hw_param_interval(params,
> > SND_PCM_HW_PARAM_PERIOD_SIZE)->max / dshare->slave_period_size;
> >         params->info = dshare->shmptr->s.info;
> > +       params->info &= ~SND_PCM_INFO_RESUME;
> >  #ifdef REFINE_DEBUG
> >         snd_output_puts(log, "DMIX REFINE (end):\n");
> >         snd_pcm_hw_params_dump(params, log); @@ -1031,6 +1032,7
> > @@ int snd_pcm_direct_hw_params(snd_pcm_t *pcm,
> > snd_pcm_hw_params_t * params)
> >         snd_pcm_direct_t *dmix = pcm->private_data;
> > 
> >         params->info = dmix->shmptr->s.info;
> > +       params->info &= ~SND_PCM_INFO_RESUME;
> >         params->rate_num = dmix->shmptr->s.rate;
> >         params->rate_den = 1;
> >         params->fifo_size = 0;
> > @@ -1183,8 +1185,6 @@ static void save_slave_setting(snd_pcm_direct_t
> > *dmix, snd_pcm_t *spcm)
> >         COPY_SLAVE(buffer_time);
> >         COPY_SLAVE(sample_bits);
> >         COPY_SLAVE(frame_bits);
> > -
> > -       dmix->shmptr->s.info &= ~SND_PCM_INFO_RESUME;
> >  }
> > 
> >  #undef COPY_SLAVE
> 
> Thanks Takashi,
> 
> This patch can fix this issue on my side. From my test both dmix1->spcm->info and
> dmix2->spcm->info has SND_PCM_INFO_RESUME flag and snd_pcm_resume() can be
> successfully called by first resumed instance. I don't understand this patch well. Are
> you meant to drop SND_PCM_INFO_RESUME from dmix and keep it in slave pcm?

Yes.  The intention of dropping INFO_RESUME is because dmix can't do
the full resume due to its implementation nature.  It needs a prepare
/ restart like many other drivers.  So we have to drop the info bit
exposed to the outside for apps, while keeping the slave PCM info
internally intact.

> BTW, when will this patch merged to mainline?

Now the test result is positive, I'm going to submit & merge later.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-09-06  6:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-27  7:06 Suspend/resume Issue on pcm_dmix.c in alsa-lib Chancel Liu
2024-08-27  9:54 ` Takashi Iwai
2024-08-27 10:49 ` Takashi Iwai
2024-09-04  9:07   ` Chancel Liu
2024-09-04  9:29     ` Jaroslav Kysela
2024-09-04 10:04       ` Takashi Iwai
2024-09-04  9:57     ` Takashi Iwai
2024-09-05  7:44       ` Chancel Liu
2024-09-05  8:10         ` Takashi Iwai
2024-09-05 11:01           ` [EXT] " Chancel Liu
2024-09-05 13:36             ` Takashi Iwai
2024-09-06  6:22               ` Chancel Liu
2024-09-06  6:31                 ` Takashi Iwai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox