Alsa-Devel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Issue in pcm_dsnoop.c in alsa-lib
@ 2022-03-01  4:16 Shengjiu Wang
  2022-03-03 15:57 ` Takashi Iwai
  0 siblings, 1 reply; 8+ messages in thread
From: Shengjiu Wang @ 2022-03-01  4:16 UTC (permalink / raw)
  To: alsa-devel, Takashi Iwai, Jaroslav Kysela; +Cc: Shengjiu Wang, chancel.liu

Hi Takashi Iwai, Jaroslav Kysela

    We encountered an issue in the pcm_dsnoop use case, could you please
help to have a look?

    *Issue description:*
    With two instances for dsnoop type device running in parallel, after
suspend/resume,  one of the instances will be hung in memcpy because the
very large copy size is obtained.

#3 0x0000ffffa78d5098 in snd_pcm_dsnoop_sync_ptr (pcm=0xaaab06563da0)
at pcm_dsnoop.c:158
dsnoop = 0xaaab06563c20
slave_hw_ptr = 64
old_slave_hw_ptr = 533120
avail = *187651522444320*

  * Reason analysis: *
   The root cause that I analysis is that after suspend/resume,  one
instance will get the SND_PCM_STATE_SUSPENDED state from slave pcm device,
  then it will do snd_pcm_prepare() and snd_pcm_start(),  which will reset
the dsnoop->slave_hw_ptr and the hw_ptr of slave pcm device,  then the
state of this instance is correct.  But another instance may not get
the SND_PCM_STATE_SUSPENDED state from slave pcm device because slave
device may have been recovered by first instance,  so
the dsnoop->slave_hw_ptr is not reset.  but because hw_ptr of slave pcm
device has been reset,  so there will be a very large "avail" size.

   *Solution:*
   I didn't come up with a fix for this issue,  seems there is no easy way
to let another instance know this case and reset the
dsnoop->slave_hw_ptr,  could you please help?

Best regards
Wang shengjiu

^ permalink raw reply	[flat|nested] 8+ messages in thread
[parent not found: <1646108881728133917-webhooks-bot@alsa-project.org>]
[parent not found: <1646246543868125916-webhooks-bot@alsa-project.org>]
* Re: Issue in pcm_dsnoop.c in alsa-lib
@ 2022-03-04  8:35 S.J. Wang
  2022-03-04  8:44 ` Takashi Iwai
  0 siblings, 1 reply; 8+ messages in thread
From: S.J. Wang @ 2022-03-04  8:35 UTC (permalink / raw)
  To: Takashi Iwai, Shengjiu Wang; +Cc: alsa-devel@alsa-project.org, Chancel Liu



> >
> > Hi Takashi Iwai, Jaroslav Kysela
> >
> >     We encountered an issue in the pcm_dsnoop use case, could you
> > please help to have a look?
> >
> >     *Issue description:*
> >     With two instances for dsnoop type device running in parallel,
> > after suspend/resume,  one of the instances will be hung in memcpy
> > because the very large copy size is obtained.
> >
> > #3 0x0000ffffa78d5098 in snd_pcm_dsnoop_sync_ptr
> (pcm=0xaaab06563da0)
> > at pcm_dsnoop.c:158 dsnoop = 0xaaab06563c20 slave_hw_ptr = 64
> > old_slave_hw_ptr = 533120 avail = *187651522444320*
> >
> >   * Reason analysis: *
> >    The root cause that I analysis is that after suspend/resume,  one
> > instance will get the SND_PCM_STATE_SUSPENDED state from slave pcm
> device,
> >   then it will do snd_pcm_prepare() and snd_pcm_start(),  which will
> > reset the dsnoop->slave_hw_ptr and the hw_ptr of slave pcm device,
> > then the state of this instance is correct.  But another instance may
> > not get the SND_PCM_STATE_SUSPENDED state from slave pcm device
> > because slave device may have been recovered by first instance,  so
> > the dsnoop->slave_hw_ptr is not reset.  but because hw_ptr of slave
> > pcm device has been reset,  so there will be a very large "avail" size.
> >
> >    *Solution:*
> >    I didn't come up with a fix for this issue,  seems there is no easy
> > way to let another instance know this case and reset the
> > dsnoop->slave_hw_ptr,  could you please help?
> 
> Could you try topic/pcm-direct-resume branch on
> 
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub
> .com%2Ftiwai%2Falsa-
> lib&amp;data=04%7C01%7Cshengjiu.wang%40nxp.com%7C95f97de3f2c840d
> 9853508d9fd2e79ea%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C
> 637819198319430045%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdat
> a=WWX1ZlcQhJF3pHJdHPIH%2B0xG9o%2FjQnHG5fHDbKXwQwE%3D&amp;r
> eserved=0
> 

Thanks,  I push my test result in https://github.com/alsa-project/alsa-lib/issues/213
Could you please review?

Best regards
Wang shengjiu


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: Issue in pcm_dsnoop.c in alsa-lib
@ 2022-03-10  2:25 S.J. Wang
  2022-03-10  8:27 ` Takashi Iwai
  0 siblings, 1 reply; 8+ messages in thread
From: S.J. Wang @ 2022-03-10  2:25 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Chancel Liu, alsa-devel@alsa-project.org, Shengjiu Wang

Hi

> >
> > > >
> > > > Hi Takashi Iwai, Jaroslav Kysela
> > > >
> > > >     We encountered an issue in the pcm_dsnoop use case, could you
> > > > please help to have a look?
> > > >
> > > >     *Issue description:*
> > > >     With two instances for dsnoop type device running in parallel,
> > > > after suspend/resume,  one of the instances will be hung in memcpy
> > > > because the very large copy size is obtained.
> > > >
> > > > #3 0x0000ffffa78d5098 in snd_pcm_dsnoop_sync_ptr
> > > (pcm=0xaaab06563da0)
> > > > at pcm_dsnoop.c:158 dsnoop = 0xaaab06563c20 slave_hw_ptr = 64
> > > > old_slave_hw_ptr = 533120 avail = *187651522444320*
> > > >
> > > >   * Reason analysis: *
> > > >    The root cause that I analysis is that after suspend/resume,
> > > > one instance will get the SND_PCM_STATE_SUSPENDED state from slave
> > > > pcm
> > > device,
> > > >   then it will do snd_pcm_prepare() and snd_pcm_start(),  which
> > > > will reset the dsnoop->slave_hw_ptr and the hw_ptr of slave pcm
> > > > device, then the state of this instance is correct.  But another
> > > > instance may not get the SND_PCM_STATE_SUSPENDED state from
> slave
> > > > pcm device because slave device may have been recovered by first
> > > > instance,  so the dsnoop->slave_hw_ptr is not reset.  but because
> > > > hw_ptr of slave pcm device has been reset,  so there will be a very large
> "avail" size.
> > > >
> > > >    *Solution:*
> > > >    I didn't come up with a fix for this issue,  seems there is no
> > > > easy way to let another instance know this case and reset the
> > > > dsnoop->slave_hw_ptr,  could you please help?
> > >
> > > Could you try topic/pcm-direct-resume branch on
> > >
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thub
> > > .com%2Ftiwai%2Falsa-
> > >
> lib&amp;data=04%7C01%7Cshengjiu.wang%40nxp.com%7C95f97de3f2c840d
> > >
> 9853508d9fd2e79ea%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C
> > >
> 637819198319430045%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> > >
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdat
> > >
> a=WWX1ZlcQhJF3pHJdHPIH%2B0xG9o%2FjQnHG5fHDbKXwQwE%3D&amp;r
> > > eserved=0
> > >
> >
> > Thanks,  I push my test result in
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2Falsa-project%2Falsa-
> lib%2Fissues%2F213&amp;data=04%7C01%7Cshe
> >
> ngjiu.wang%40nxp.com%7Cf71e70640d1b40b66be508d9fdbb2ac2%7C686ea
> 1d3bc2b
> >
> 4c6fa92cd99c5c301635%7C0%7C0%7C637819802581943763%7CUnknown%7
> CTWFpbGZs
> >
> b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> 0%3D
> > %7C3000&amp;sdata=fZ2ogNj2RDTv4DV8vgB71M2m0XtU8UhMiXEV1%2Bl
> wUrQ%3D&amp
> > ;reserved=0
> > Could you please review?
> 
> Please keep the discussion on ML.
> 

I saw you have update the origin/topic/pcm-direct-resume branch, I test your 
latest change, it is more stable than before, but still meet once of the issue after
overnight test, it it very very low possibility.

So I suggest if we need to do below change, shall we?

diff --git a/src/pcm/pcm_dsnoop.c b/src/pcm/pcm_dsnoop.c
index 729ff447b41f..cc333b3f4384 100644
--- a/src/pcm/pcm_dsnoop.c
+++ b/src/pcm/pcm_dsnoop.c
@@ -134,14 +134,21 @@ static int snd_pcm_dsnoop_sync_ptr(snd_pcm_t *pcm)
        snd_pcm_sframes_t diff;
        int err;

-       err = snd_pcm_direct_check_xrun(dsnoop, pcm);
-       if (err < 0)
-               return err;
        if (dsnoop->slowptr)
                snd_pcm_hwsync(dsnoop->spcm);
        old_slave_hw_ptr = dsnoop->slave_hw_ptr;
        snoop_timestamp(pcm);
        slave_hw_ptr = dsnoop->slave_hw_ptr;
+       /*
+        * FIXME: Move snd_pcm_direct_client_chk_xrun after getting the
+        * dsnoop->spcm->hw.ptr. If the snd_pcm_direct_slave_recover()
+        * of another instance happening before dsnoop->spcm->hw.ptr
+        * is got, then a wrong spcm->hw.ptr is got which cause a wrong
+        * 'diff' data later.
+        */
+       err = snd_pcm_direct_check_xrun(dsnoop, pcm);
+       if (err < 0)
+               return err;
        diff = pcm_frame_diff(slave_hw_ptr, old_slave_hw_ptr, dsnoop->slave_boundary);

best regards
wang shengjiu


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-03-10  8:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-01  4:16 Issue in pcm_dsnoop.c in alsa-lib Shengjiu Wang
2022-03-03 15:57 ` Takashi Iwai
     [not found] <1646108881728133917-webhooks-bot@alsa-project.org>
2022-03-01  4:28 ` GitHub issues - opened
     [not found] <1646246543868125916-webhooks-bot@alsa-project.org>
2022-03-02 18:42 ` GitHub issues - edited
  -- strict thread matches above, loose matches on Subject: below --
2022-03-04  8:35 S.J. Wang
2022-03-04  8:44 ` Takashi Iwai
2022-03-10  2:25 S.J. Wang
2022-03-10  8:27 ` Takashi Iwai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox