alsa-devel.alsa-project.org archive mirror
 help / color / mirror / Atom feed
* async between dmaengine_pcm_dma_complete and snd_pcm_release
@ 2013-10-09  7:29 Qiao Zhou
  2013-10-09  8:19 ` Lars-Peter Clausen
  0 siblings, 1 reply; 19+ messages in thread
From: Qiao Zhou @ 2013-10-09  7:29 UTC (permalink / raw)
  To: Mark Brown, lgirdwood, perex@perex.cz, tiwai@suse.de,
	alsa-devel@alsa-project.org, trinity.qiao.zhou,
	zhangfei.gao@gmail.com, Chao Xie

Hi Mark, Liam, Jaroslav, Takashi

I met an issue in which kernel panic appears in 
dmaengine_pcm_dma_complete function on a quad-core system. The 
dmaengine_pcm_dma_complete is running core0, while snd_pcm_release has 
already been executed on core1, due to in low memory stress oom killer 
kills the audio thread to release some memory.

snd_pcm_release frees the runtime parameters, and runtime is used in 
dmaengine_pcm_dma_complete, which is a callback from tasklet in 
dmaengine. In current audio driver, we can't promise that 
dmaengine_pcm_dma_complete is not executed after snd_pcm_release on 
multi cores. Maybe we should add some protection. Do you have any 
suggestion?

I have tried to apply below workaround, which can fix the panic, but I'm 
not confident it's proper. Need your comment and better suggestion.


 From d568a88e8f66ee21d44324bdfb48d2a3106cf0d1 Mon Sep 17 00:00:00 2001
From: Qiao Zhou <zhouqiao@marvell.com>
Date: Wed, 9 Oct 2013 15:24:29 +0800
Subject: [PATCH] ASoC: soc-dmaengine: add mutex to protect runtime param

under SMP arch, the dmaengine_pcm_dma_complete callback has the
chance to run on one cpu while at the same time, the substream is
released on another cpu. thus it may access param which is already
freed. we need to add mutes to protect such access, and check PCM
availability before using it.

Signed-off-by: Qiao Zhou <zhouqiao@marvell.com>
---
  sound/soc/soc-dmaengine-pcm.c |   11 ++++++++++-
  1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/sound/soc/soc-dmaengine-pcm.c b/sound/soc/soc-dmaengine-pcm.c
index 111b7d9..5917029 100644
--- a/sound/soc/soc-dmaengine-pcm.c
+++ b/sound/soc/soc-dmaengine-pcm.c
@@ -125,13 +125,22 @@ EXPORT_SYMBOL_GPL(snd_hwparams_to_dma_slave_config);
  static void dmaengine_pcm_dma_complete(void *arg)
  {
  	struct snd_pcm_substream *substream = arg;
-	struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream);
+	struct dmaengine_pcm_runtime_data *prtd;
+
+	mutex_lock(&substream->pcm->open_mutex);
+	if (!substream || !substream->runtime) {
+		mutex_unlock(&substream->pcm->open_mutex);
+		return;
+	}
+
+	prtd = substream_to_prtd(substream);

  	prtd->pos += snd_pcm_lib_period_bytes(substream);
  	if (prtd->pos >= snd_pcm_lib_buffer_bytes(substream))
  		prtd->pos = 0;

  	snd_pcm_period_elapsed(substream);
+	mutex_unlock(&substream->pcm->open_mutex);
  }

  static int dmaengine_pcm_prepare_and_submit(struct snd_pcm_substream 
*substream)
-- 
1.7.0.4


-- 

Best Regards
Qiao

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-09  7:29 async between dmaengine_pcm_dma_complete and snd_pcm_release Qiao Zhou
@ 2013-10-09  8:19 ` Lars-Peter Clausen
  2013-10-09  8:30   ` Lars-Peter Clausen
  0 siblings, 1 reply; 19+ messages in thread
From: Lars-Peter Clausen @ 2013-10-09  8:19 UTC (permalink / raw)
  To: Qiao Zhou
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, Mark Brown, lgirdwood,
	zhangfei.gao@gmail.com, trinity.qiao.zhou, Chao Xie

On 10/09/2013 09:29 AM, Qiao Zhou wrote:
> Hi Mark, Liam, Jaroslav, Takashi
> 
> I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
> function on a quad-core system. The dmaengine_pcm_dma_complete is running
> core0, while snd_pcm_release has already been executed on core1, due to in
> low memory stress oom killer kills the audio thread to release some memory.
> 
> snd_pcm_release frees the runtime parameters, and runtime is used in
> dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
> In current audio driver, we can't promise that dmaengine_pcm_dma_complete is
> not executed after snd_pcm_release on multi cores. Maybe we should add some
> protection. Do you have any suggestion?
> 
> I have tried to apply below workaround, which can fix the panic, but I'm not
> confident it's proper. Need your comment and better suggestion.

I think this is a general problem with your dmaengine driver, nothing audio
specific. If the callback is able to run after dmaengine_terminate_all() has
returned successfully there is a bug in the dmaengine driver. You need to
make sure that none of the callbacks is called after terminate_all() has
finished and you probably also have to make sure that the tasklet has
completed, if it is running at the same time as the call to
dmaengine_terminate_all().

- Lars


> 
> 
> From d568a88e8f66ee21d44324bdfb48d2a3106cf0d1 Mon Sep 17 00:00:00 2001
> From: Qiao Zhou <zhouqiao@marvell.com>
> Date: Wed, 9 Oct 2013 15:24:29 +0800
> Subject: [PATCH] ASoC: soc-dmaengine: add mutex to protect runtime param
> 
> under SMP arch, the dmaengine_pcm_dma_complete callback has the
> chance to run on one cpu while at the same time, the substream is
> released on another cpu. thus it may access param which is already
> freed. we need to add mutes to protect such access, and check PCM
> availability before using it.
> 
> Signed-off-by: Qiao Zhou <zhouqiao@marvell.com>
> ---
>  sound/soc/soc-dmaengine-pcm.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/sound/soc/soc-dmaengine-pcm.c b/sound/soc/soc-dmaengine-pcm.c
> index 111b7d9..5917029 100644
> --- a/sound/soc/soc-dmaengine-pcm.c
> +++ b/sound/soc/soc-dmaengine-pcm.c
> @@ -125,13 +125,22 @@ EXPORT_SYMBOL_GPL(snd_hwparams_to_dma_slave_config);
>  static void dmaengine_pcm_dma_complete(void *arg)
>  {
>      struct snd_pcm_substream *substream = arg;
> -    struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream);
> +    struct dmaengine_pcm_runtime_data *prtd;
> +
> +    mutex_lock(&substream->pcm->open_mutex);
> +    if (!substream || !substream->runtime) {
> +        mutex_unlock(&substream->pcm->open_mutex);
> +        return;
> +    }
> +
> +    prtd = substream_to_prtd(substream);
> 
>      prtd->pos += snd_pcm_lib_period_bytes(substream);
>      if (prtd->pos >= snd_pcm_lib_buffer_bytes(substream))
>          prtd->pos = 0;
> 
>      snd_pcm_period_elapsed(substream);
> +    mutex_unlock(&substream->pcm->open_mutex);
>  }
> 
>  static int dmaengine_pcm_prepare_and_submit(struct snd_pcm_substream
> *substream)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-09  8:19 ` Lars-Peter Clausen
@ 2013-10-09  8:30   ` Lars-Peter Clausen
  2013-10-09 10:23     ` Qiao Zhou
  2013-10-09 10:53     ` Takashi Iwai
  0 siblings, 2 replies; 19+ messages in thread
From: Lars-Peter Clausen @ 2013-10-09  8:30 UTC (permalink / raw)
  To: Qiao Zhou
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, lgirdwood, Mark Brown,
	zhangfei.gao@gmail.com, trinity.qiao.zhou, Chao Xie

On 10/09/2013 10:19 AM, Lars-Peter Clausen wrote:
> On 10/09/2013 09:29 AM, Qiao Zhou wrote:
>> Hi Mark, Liam, Jaroslav, Takashi
>>
>> I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
>> function on a quad-core system. The dmaengine_pcm_dma_complete is running
>> core0, while snd_pcm_release has already been executed on core1, due to in
>> low memory stress oom killer kills the audio thread to release some memory.
>>
>> snd_pcm_release frees the runtime parameters, and runtime is used in
>> dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
>> In current audio driver, we can't promise that dmaengine_pcm_dma_complete is
>> not executed after snd_pcm_release on multi cores. Maybe we should add some
>> protection. Do you have any suggestion?
>>
>> I have tried to apply below workaround, which can fix the panic, but I'm not
>> confident it's proper. Need your comment and better suggestion.
> 
> I think this is a general problem with your dmaengine driver, nothing audio
> specific. If the callback is able to run after dmaengine_terminate_all() has
> returned successfully there is a bug in the dmaengine driver. You need to
> make sure that none of the callbacks is called after terminate_all() has
> finished and you probably also have to make sure that the tasklet has
> completed, if it is running at the same time as the call to
> dmaengine_terminate_all().

On the other hand that last part could get tricky as the
dmaengine_terminate_all() might be call from within the callback.

> 
> - Lars
> 
> 
>>
>>
>> From d568a88e8f66ee21d44324bdfb48d2a3106cf0d1 Mon Sep 17 00:00:00 2001
>> From: Qiao Zhou <zhouqiao@marvell.com>
>> Date: Wed, 9 Oct 2013 15:24:29 +0800
>> Subject: [PATCH] ASoC: soc-dmaengine: add mutex to protect runtime param
>>
>> under SMP arch, the dmaengine_pcm_dma_complete callback has the
>> chance to run on one cpu while at the same time, the substream is
>> released on another cpu. thus it may access param which is already
>> freed. we need to add mutes to protect such access, and check PCM
>> availability before using it.
>>
>> Signed-off-by: Qiao Zhou <zhouqiao@marvell.com>
>> ---
>>  sound/soc/soc-dmaengine-pcm.c |   11 ++++++++++-
>>  1 files changed, 10 insertions(+), 1 deletions(-)
>>
>> diff --git a/sound/soc/soc-dmaengine-pcm.c b/sound/soc/soc-dmaengine-pcm.c
>> index 111b7d9..5917029 100644
>> --- a/sound/soc/soc-dmaengine-pcm.c
>> +++ b/sound/soc/soc-dmaengine-pcm.c
>> @@ -125,13 +125,22 @@ EXPORT_SYMBOL_GPL(snd_hwparams_to_dma_slave_config);
>>  static void dmaengine_pcm_dma_complete(void *arg)
>>  {
>>      struct snd_pcm_substream *substream = arg;
>> -    struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream);
>> +    struct dmaengine_pcm_runtime_data *prtd;
>> +
>> +    mutex_lock(&substream->pcm->open_mutex);
>> +    if (!substream || !substream->runtime) {
>> +        mutex_unlock(&substream->pcm->open_mutex);
>> +        return;
>> +    }
>> +
>> +    prtd = substream_to_prtd(substream);
>>
>>      prtd->pos += snd_pcm_lib_period_bytes(substream);
>>      if (prtd->pos >= snd_pcm_lib_buffer_bytes(substream))
>>          prtd->pos = 0;
>>
>>      snd_pcm_period_elapsed(substream);
>> +    mutex_unlock(&substream->pcm->open_mutex);
>>  }
>>
>>  static int dmaengine_pcm_prepare_and_submit(struct snd_pcm_substream
>> *substream)
> 
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-09  8:30   ` Lars-Peter Clausen
@ 2013-10-09 10:23     ` Qiao Zhou
  2013-10-09 11:00       ` Lars-Peter Clausen
  2013-10-09 10:53     ` Takashi Iwai
  1 sibling, 1 reply; 19+ messages in thread
From: Qiao Zhou @ 2013-10-09 10:23 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, lgirdwood@gmail.com,
	Mark Brown, zhangfei.gao@gmail.com, trinity.qiao.zhou@gmail.com,
	Chao Xie

On 10/09/2013 04:30 PM, Lars-Peter Clausen wrote:
> On 10/09/2013 10:19 AM, Lars-Peter Clausen wrote:
>> On 10/09/2013 09:29 AM, Qiao Zhou wrote:
>>> Hi Mark, Liam, Jaroslav, Takashi
>>>
>>> I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
>>> function on a quad-core system. The dmaengine_pcm_dma_complete is running
>>> core0, while snd_pcm_release has already been executed on core1, due to in
>>> low memory stress oom killer kills the audio thread to release some memory.
>>>
>>> snd_pcm_release frees the runtime parameters, and runtime is used in
>>> dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
>>> In current audio driver, we can't promise that dmaengine_pcm_dma_complete is
>>> not executed after snd_pcm_release on multi cores. Maybe we should add some
>>> protection. Do you have any suggestion?
>>>
>>> I have tried to apply below workaround, which can fix the panic, but I'm not
>>> confident it's proper. Need your comment and better suggestion.
>>
>> I think this is a general problem with your dmaengine driver, nothing audio
>> specific. If the callback is able to run after dmaengine_terminate_all() has
>> returned successfully there is a bug in the dmaengine driver. You need to
The terminate_all runs after callback, and they run just very close on 
different cores. should soc-dmaengine add such protection anyway?
>> make sure that none of the callbacks is called after terminate_all() has
>> finished and you probably also have to make sure that the tasklet has
>> completed, if it is running at the same time as the call to
>> dmaengine_terminate_all().
In case the callback is executed no later than terminate_all on 
different cores, then we have to wait until the callback finishes. 
right? anything better method?
>
> On the other hand that last part could get tricky as the
> dmaengine_terminate_all() might be call from within the callback.
It's tricky indeed in case xrun happens. we should avoid possible deadlock.
>
>>
>> - Lars
>>
>>
>>>
>>>
>>>  From d568a88e8f66ee21d44324bdfb48d2a3106cf0d1 Mon Sep 17 00:00:00 2001
>>> From: Qiao Zhou <zhouqiao@marvell.com>
>>> Date: Wed, 9 Oct 2013 15:24:29 +0800
>>> Subject: [PATCH] ASoC: soc-dmaengine: add mutex to protect runtime param
>>>
>>> under SMP arch, the dmaengine_pcm_dma_complete callback has the
>>> chance to run on one cpu while at the same time, the substream is
>>> released on another cpu. thus it may access param which is already
>>> freed. we need to add mutes to protect such access, and check PCM
>>> availability before using it.
>>>
>>> Signed-off-by: Qiao Zhou <zhouqiao@marvell.com>
>>> ---
>>>   sound/soc/soc-dmaengine-pcm.c |   11 ++++++++++-
>>>   1 files changed, 10 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/sound/soc/soc-dmaengine-pcm.c b/sound/soc/soc-dmaengine-pcm.c
>>> index 111b7d9..5917029 100644
>>> --- a/sound/soc/soc-dmaengine-pcm.c
>>> +++ b/sound/soc/soc-dmaengine-pcm.c
>>> @@ -125,13 +125,22 @@ EXPORT_SYMBOL_GPL(snd_hwparams_to_dma_slave_config);
>>>   static void dmaengine_pcm_dma_complete(void *arg)
>>>   {
>>>       struct snd_pcm_substream *substream = arg;
>>> -    struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream);
>>> +    struct dmaengine_pcm_runtime_data *prtd;
>>> +
>>> +    mutex_lock(&substream->pcm->open_mutex);
>>> +    if (!substream || !substream->runtime) {
>>> +        mutex_unlock(&substream->pcm->open_mutex);
>>> +        return;
>>> +    }
>>> +
>>> +    prtd = substream_to_prtd(substream);
>>>
>>>       prtd->pos += snd_pcm_lib_period_bytes(substream);
>>>       if (prtd->pos >= snd_pcm_lib_buffer_bytes(substream))
>>>           prtd->pos = 0;
>>>
>>>       snd_pcm_period_elapsed(substream);
>>> +    mutex_unlock(&substream->pcm->open_mutex);
>>>   }
>>>
>>>   static int dmaengine_pcm_prepare_and_submit(struct snd_pcm_substream
>>> *substream)
>>
>> _______________________________________________
>> Alsa-devel mailing list
>> Alsa-devel@alsa-project.org
>> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
>>
>


-- 

Best Regards
Qiao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-09  8:30   ` Lars-Peter Clausen
  2013-10-09 10:23     ` Qiao Zhou
@ 2013-10-09 10:53     ` Takashi Iwai
  2013-10-10  1:08       ` Qiao Zhou
  1 sibling, 1 reply; 19+ messages in thread
From: Takashi Iwai @ 2013-10-09 10:53 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: alsa-devel@alsa-project.org, lgirdwood, Qiao Zhou, Mark Brown,
	zhangfei.gao@gmail.com, trinity.qiao.zhou, Chao Xie

At Wed, 09 Oct 2013 10:30:14 +0200,
Lars-Peter Clausen wrote:
> 
> On 10/09/2013 10:19 AM, Lars-Peter Clausen wrote:
> > On 10/09/2013 09:29 AM, Qiao Zhou wrote:
> >> Hi Mark, Liam, Jaroslav, Takashi
> >>
> >> I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
> >> function on a quad-core system. The dmaengine_pcm_dma_complete is running
> >> core0, while snd_pcm_release has already been executed on core1, due to in
> >> low memory stress oom killer kills the audio thread to release some memory.
> >>
> >> snd_pcm_release frees the runtime parameters, and runtime is used in
> >> dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
> >> In current audio driver, we can't promise that dmaengine_pcm_dma_complete is
> >> not executed after snd_pcm_release on multi cores. Maybe we should add some
> >> protection. Do you have any suggestion?
> >>
> >> I have tried to apply below workaround, which can fix the panic, but I'm not
> >> confident it's proper. Need your comment and better suggestion.
> > 
> > I think this is a general problem with your dmaengine driver, nothing audio
> > specific. If the callback is able to run after dmaengine_terminate_all() has
> > returned successfully there is a bug in the dmaengine driver. You need to
> > make sure that none of the callbacks is called after terminate_all() has
> > finished and you probably also have to make sure that the tasklet has
> > completed, if it is running at the same time as the call to
> > dmaengine_terminate_all().
> 
> On the other hand that last part could get tricky as the
> dmaengine_terminate_all() might be call from within the callback.

Basically you'd need a sync for the termination (in this case
something like tasklet_kill()) to be called at hw_free and prepare
where you can do schedule.  dmaengine_terminate_all() can't sync for
termination by itself, as it's called in the trigger PCM callback
inside a spinlock.


Takashi

> 
> > 
> > - Lars
> > 
> > 
> >>
> >>
> >> From d568a88e8f66ee21d44324bdfb48d2a3106cf0d1 Mon Sep 17 00:00:00 2001
> >> From: Qiao Zhou <zhouqiao@marvell.com>
> >> Date: Wed, 9 Oct 2013 15:24:29 +0800
> >> Subject: [PATCH] ASoC: soc-dmaengine: add mutex to protect runtime param
> >>
> >> under SMP arch, the dmaengine_pcm_dma_complete callback has the
> >> chance to run on one cpu while at the same time, the substream is
> >> released on another cpu. thus it may access param which is already
> >> freed. we need to add mutes to protect such access, and check PCM
> >> availability before using it.
> >>
> >> Signed-off-by: Qiao Zhou <zhouqiao@marvell.com>
> >> ---
> >>  sound/soc/soc-dmaengine-pcm.c |   11 ++++++++++-
> >>  1 files changed, 10 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/sound/soc/soc-dmaengine-pcm.c b/sound/soc/soc-dmaengine-pcm.c
> >> index 111b7d9..5917029 100644
> >> --- a/sound/soc/soc-dmaengine-pcm.c
> >> +++ b/sound/soc/soc-dmaengine-pcm.c
> >> @@ -125,13 +125,22 @@ EXPORT_SYMBOL_GPL(snd_hwparams_to_dma_slave_config);
> >>  static void dmaengine_pcm_dma_complete(void *arg)
> >>  {
> >>      struct snd_pcm_substream *substream = arg;
> >> -    struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream);
> >> +    struct dmaengine_pcm_runtime_data *prtd;
> >> +
> >> +    mutex_lock(&substream->pcm->open_mutex);
> >> +    if (!substream || !substream->runtime) {
> >> +        mutex_unlock(&substream->pcm->open_mutex);
> >> +        return;
> >> +    }
> >> +
> >> +    prtd = substream_to_prtd(substream);
> >>
> >>      prtd->pos += snd_pcm_lib_period_bytes(substream);
> >>      if (prtd->pos >= snd_pcm_lib_buffer_bytes(substream))
> >>          prtd->pos = 0;
> >>
> >>      snd_pcm_period_elapsed(substream);
> >> +    mutex_unlock(&substream->pcm->open_mutex);
> >>  }
> >>
> >>  static int dmaengine_pcm_prepare_and_submit(struct snd_pcm_substream
> >> *substream)
> > 
> > _______________________________________________
> > Alsa-devel mailing list
> > Alsa-devel@alsa-project.org
> > http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
> > 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-09 10:23     ` Qiao Zhou
@ 2013-10-09 11:00       ` Lars-Peter Clausen
  2013-10-10  1:05         ` Qiao Zhou
  2013-10-10  2:54         ` Vinod Koul
  0 siblings, 2 replies; 19+ messages in thread
From: Lars-Peter Clausen @ 2013-10-09 11:00 UTC (permalink / raw)
  To: Qiao Zhou
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, lgirdwood@gmail.com,
	Vinod Koul, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

Added Vinod to Cc.

On 10/09/2013 12:23 PM, Qiao Zhou wrote:
> On 10/09/2013 04:30 PM, Lars-Peter Clausen wrote:
>> On 10/09/2013 10:19 AM, Lars-Peter Clausen wrote:
>>> On 10/09/2013 09:29 AM, Qiao Zhou wrote:
>>>> Hi Mark, Liam, Jaroslav, Takashi
>>>>
>>>> I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
>>>> function on a quad-core system. The dmaengine_pcm_dma_complete is running
>>>> core0, while snd_pcm_release has already been executed on core1, due to in
>>>> low memory stress oom killer kills the audio thread to release some memory.
>>>>
>>>> snd_pcm_release frees the runtime parameters, and runtime is used in
>>>> dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
>>>> In current audio driver, we can't promise that
>>>> dmaengine_pcm_dma_complete is
>>>> not executed after snd_pcm_release on multi cores. Maybe we should add some
>>>> protection. Do you have any suggestion?
>>>>
>>>> I have tried to apply below workaround, which can fix the panic, but I'm
>>>> not
>>>> confident it's proper. Need your comment and better suggestion.
>>>
>>> I think this is a general problem with your dmaengine driver, nothing audio
>>> specific. If the callback is able to run after dmaengine_terminate_all() has
>>> returned successfully there is a bug in the dmaengine driver. You need to
> The terminate_all runs after callback, and they run just very close on
> different cores. should soc-dmaengine add such protection anyway?

The problem is that if there is a race, that the callback races against the
freeing of the prtd, then there is also the chance that the callback races
against the freeing of the substream. So in that case, e.g. with your patch,
you'd try to lock a mutex for which the memory already has been freed. So we
need a way to synchronize against the callbacks, i.e. makes sure that non of
the callbacks are running anymore at a given point. And only after that
point we are allowed to free the memory that is referenced in the callback.

>>> make sure that none of the callbacks is called after terminate_all() has
>>> finished and you probably also have to make sure that the tasklet has
>>> completed, if it is running at the same time as the call to
>>> dmaengine_terminate_all().
> In case the callback is executed no later than terminate_all on different
> cores, then we have to wait until the callback finishes. right? anything
> better method?
>>
>> On the other hand that last part could get tricky as the
>> dmaengine_terminate_all() might be call from within the callback.
> It's tricky indeed in case xrun happens. we should avoid possible deadlock.

I think we'll eventually need to versions of dmaengine_terminate_all(). A
sync version which makes sure that the tasklet has finished and a non-sync
version that only makes sure that no new callbacks are started. I think the
sync version should be the default with an optional async version which must
be used, if it can run from within the callback. So we'd call the async
version in the pcm_trigger callback and the sync version in the pcm_close
callback.

- Lars

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-09 11:00       ` Lars-Peter Clausen
@ 2013-10-10  1:05         ` Qiao Zhou
  2013-10-10  2:56           ` Vinod Koul
  2013-10-10  2:54         ` Vinod Koul
  1 sibling, 1 reply; 19+ messages in thread
From: Qiao Zhou @ 2013-10-10  1:05 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, lgirdwood@gmail.com,
	Vinod Koul, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

On 10/09/2013 07:00 PM, Lars-Peter Clausen wrote:
> Added Vinod to Cc.
>
> On 10/09/2013 12:23 PM, Qiao Zhou wrote:
>> On 10/09/2013 04:30 PM, Lars-Peter Clausen wrote:
>>> On 10/09/2013 10:19 AM, Lars-Peter Clausen wrote:
>>>> On 10/09/2013 09:29 AM, Qiao Zhou wrote:
>>>>> Hi Mark, Liam, Jaroslav, Takashi
>>>>>
>>>>> I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
>>>>> function on a quad-core system. The dmaengine_pcm_dma_complete is running
>>>>> core0, while snd_pcm_release has already been executed on core1, due to in
>>>>> low memory stress oom killer kills the audio thread to release some memory.
>>>>>
>>>>> snd_pcm_release frees the runtime parameters, and runtime is used in
>>>>> dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
>>>>> In current audio driver, we can't promise that
>>>>> dmaengine_pcm_dma_complete is
>>>>> not executed after snd_pcm_release on multi cores. Maybe we should add some
>>>>> protection. Do you have any suggestion?
>>>>>
>>>>> I have tried to apply below workaround, which can fix the panic, but I'm
>>>>> not
>>>>> confident it's proper. Need your comment and better suggestion.
>>>>
>>>> I think this is a general problem with your dmaengine driver, nothing audio
>>>> specific. If the callback is able to run after dmaengine_terminate_all() has
>>>> returned successfully there is a bug in the dmaengine driver. You need to
>> The terminate_all runs after callback, and they run just very close on
>> different cores. should soc-dmaengine add such protection anyway?
>
> The problem is that if there is a race, that the callback races against the
> freeing of the prtd, then there is also the chance that the callback races
> against the freeing of the substream. So in that case, e.g. with your patch,
> you'd try to lock a mutex for which the memory already has been freed. So we
> need a way to synchronize against the callbacks, i.e. makes sure that non of
> the callbacks are running anymore at a given point. And only after that
> point we are allowed to free the memory that is referenced in the callback.
Indeed there is change that the callback races against the freeing of 
the substream, in case the driver load/unload dynamically.
>
>>>> make sure that none of the callbacks is called after terminate_all() has
>>>> finished and you probably also have to make sure that the tasklet has
>>>> completed, if it is running at the same time as the call to
>>>> dmaengine_terminate_all().
>> In case the callback is executed no later than terminate_all on different
>> cores, then we have to wait until the callback finishes. right? anything
>> better method?
>>>
>>> On the other hand that last part could get tricky as the
>>> dmaengine_terminate_all() might be call from within the callback.
>> It's tricky indeed in case xrun happens. we should avoid possible deadlock.
>
> I think we'll eventually need to versions of dmaengine_terminate_all(). A
> sync version which makes sure that the tasklet has finished and a non-sync
> version that only makes sure that no new callbacks are started. I think the
> sync version should be the default with an optional async version which must
> be used, if it can run from within the callback. So we'd call the async
> version in the pcm_trigger callback and the sync version in the pcm_close
> callback.
In our current dmaengine driver, the dma interrupt is disabled in 
terminate_all, so there is no new callback after it. This is the async 
version. Takashi also mentions the requirement for such sync version. 
I'll investigate the sync version more. thanks a lot.
>
> - Lars
>


-- 

Best Regards
Qiao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-09 10:53     ` Takashi Iwai
@ 2013-10-10  1:08       ` Qiao Zhou
  0 siblings, 0 replies; 19+ messages in thread
From: Qiao Zhou @ 2013-10-10  1:08 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel@alsa-project.org, Lars-Peter Clausen,
	lgirdwood@gmail.com, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

On 10/09/2013 06:53 PM, Takashi Iwai wrote:
> At Wed, 09 Oct 2013 10:30:14 +0200,
> Lars-Peter Clausen wrote:
>>
>> On 10/09/2013 10:19 AM, Lars-Peter Clausen wrote:
>>> On 10/09/2013 09:29 AM, Qiao Zhou wrote:
>>>> Hi Mark, Liam, Jaroslav, Takashi
>>>>
>>>> I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
>>>> function on a quad-core system. The dmaengine_pcm_dma_complete is running
>>>> core0, while snd_pcm_release has already been executed on core1, due to in
>>>> low memory stress oom killer kills the audio thread to release some memory.
>>>>
>>>> snd_pcm_release frees the runtime parameters, and runtime is used in
>>>> dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
>>>> In current audio driver, we can't promise that dmaengine_pcm_dma_complete is
>>>> not executed after snd_pcm_release on multi cores. Maybe we should add some
>>>> protection. Do you have any suggestion?
>>>>
>>>> I have tried to apply below workaround, which can fix the panic, but I'm not
>>>> confident it's proper. Need your comment and better suggestion.
>>>
>>> I think this is a general problem with your dmaengine driver, nothing audio
>>> specific. If the callback is able to run after dmaengine_terminate_all() has
>>> returned successfully there is a bug in the dmaengine driver. You need to
>>> make sure that none of the callbacks is called after terminate_all() has
>>> finished and you probably also have to make sure that the tasklet has
>>> completed, if it is running at the same time as the call to
>>> dmaengine_terminate_all().
>>
>> On the other hand that last part could get tricky as the
>> dmaengine_terminate_all() might be call from within the callback.
>
> Basically you'd need a sync for the termination (in this case
> something like tasklet_kill()) to be called at hw_free and prepare
> where you can do schedule.  dmaengine_terminate_all() can't sync for
> termination by itself, as it's called in the trigger PCM callback
> inside a spinlock.
Yes, I tried tasklet_kill() in terminate_all() but it didn't help due to 
inside a spinlock. I'll try to implement in a better place as you 
suggest. thanks.
>
>
> Takashi
>
>>
>>>
>>> - Lars
>>>
>>>
>>>>
>>>>
>>>>  From d568a88e8f66ee21d44324bdfb48d2a3106cf0d1 Mon Sep 17 00:00:00 2001
>>>> From: Qiao Zhou <zhouqiao@marvell.com>
>>>> Date: Wed, 9 Oct 2013 15:24:29 +0800
>>>> Subject: [PATCH] ASoC: soc-dmaengine: add mutex to protect runtime param
>>>>
>>>> under SMP arch, the dmaengine_pcm_dma_complete callback has the
>>>> chance to run on one cpu while at the same time, the substream is
>>>> released on another cpu. thus it may access param which is already
>>>> freed. we need to add mutes to protect such access, and check PCM
>>>> availability before using it.
>>>>
>>>> Signed-off-by: Qiao Zhou <zhouqiao@marvell.com>
>>>> ---
>>>>   sound/soc/soc-dmaengine-pcm.c |   11 ++++++++++-
>>>>   1 files changed, 10 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/sound/soc/soc-dmaengine-pcm.c b/sound/soc/soc-dmaengine-pcm.c
>>>> index 111b7d9..5917029 100644
>>>> --- a/sound/soc/soc-dmaengine-pcm.c
>>>> +++ b/sound/soc/soc-dmaengine-pcm.c
>>>> @@ -125,13 +125,22 @@ EXPORT_SYMBOL_GPL(snd_hwparams_to_dma_slave_config);
>>>>   static void dmaengine_pcm_dma_complete(void *arg)
>>>>   {
>>>>       struct snd_pcm_substream *substream = arg;
>>>> -    struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream);
>>>> +    struct dmaengine_pcm_runtime_data *prtd;
>>>> +
>>>> +    mutex_lock(&substream->pcm->open_mutex);
>>>> +    if (!substream || !substream->runtime) {
>>>> +        mutex_unlock(&substream->pcm->open_mutex);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    prtd = substream_to_prtd(substream);
>>>>
>>>>       prtd->pos += snd_pcm_lib_period_bytes(substream);
>>>>       if (prtd->pos >= snd_pcm_lib_buffer_bytes(substream))
>>>>           prtd->pos = 0;
>>>>
>>>>       snd_pcm_period_elapsed(substream);
>>>> +    mutex_unlock(&substream->pcm->open_mutex);
>>>>   }
>>>>
>>>>   static int dmaengine_pcm_prepare_and_submit(struct snd_pcm_substream
>>>> *substream)
>>>
>>> _______________________________________________
>>> Alsa-devel mailing list
>>> Alsa-devel@alsa-project.org
>>> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
>>>
>>


-- 

Best Regards
Qiao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-09 11:00       ` Lars-Peter Clausen
  2013-10-10  1:05         ` Qiao Zhou
@ 2013-10-10  2:54         ` Vinod Koul
  2013-10-10  5:50           ` Qiao Zhou
  2013-10-10  7:46           ` Lars-Peter Clausen
  1 sibling, 2 replies; 19+ messages in thread
From: Vinod Koul @ 2013-10-10  2:54 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, lgirdwood@gmail.com,
	Qiao Zhou, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

On Wed, Oct 09, 2013 at 01:00:08PM +0200, Lars-Peter Clausen wrote:
> Added Vinod to Cc.
> 
> On 10/09/2013 12:23 PM, Qiao Zhou wrote:
> > On 10/09/2013 04:30 PM, Lars-Peter Clausen wrote:
> >> On 10/09/2013 10:19 AM, Lars-Peter Clausen wrote:
> >>> On 10/09/2013 09:29 AM, Qiao Zhou wrote:
> >>>> Hi Mark, Liam, Jaroslav, Takashi
> >>>>
> >>>> I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
> >>>> function on a quad-core system. The dmaengine_pcm_dma_complete is running
> >>>> core0, while snd_pcm_release has already been executed on core1, due to in
> >>>> low memory stress oom killer kills the audio thread to release some memory.
> >>>>
> >>>> snd_pcm_release frees the runtime parameters, and runtime is used in
> >>>> dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
> >>>> In current audio driver, we can't promise that
> >>>> dmaengine_pcm_dma_complete is
> >>>> not executed after snd_pcm_release on multi cores. Maybe we should add some
> >>>> protection. Do you have any suggestion?
> >>>>
> >>>> I have tried to apply below workaround, which can fix the panic, but I'm
> >>>> not
> >>>> confident it's proper. Need your comment and better suggestion.
> >>>
> >>> I think this is a general problem with your dmaengine driver, nothing audio
> >>> specific. If the callback is able to run after dmaengine_terminate_all() has
> >>> returned successfully there is a bug in the dmaengine driver. You need to
> > The terminate_all runs after callback, and they run just very close on
> > different cores. should soc-dmaengine add such protection anyway?
> 
> The problem is that if there is a race, that the callback races against the
> freeing of the prtd, then there is also the chance that the callback races
> against the freeing of the substream. So in that case, e.g. with your patch,
> you'd try to lock a mutex for which the memory already has been freed. So we
> need a way to synchronize against the callbacks, i.e. makes sure that non of
> the callbacks are running anymore at a given point. And only after that
> point we are allowed to free the memory that is referenced in the callback.
Okay reading thru the mail series and code:

Since we are using cyclic dma here, we will get callback based on periods. So
it is a very common case that you terminate and the callback is invoked.

Now callback can be invoked by
1) the thread terminating audio, in TRIGGER_STOP
2) in the callback context, you invoked callback which would then go and call
the period_elapsed ultimately leading to TRIGGER_STOP (xrun)

We need to take care of these conditions:

1. In dma driver, once terminate_all in invoked, grab the lock, disable the
tasklet, pause/stop the dmaengine remove all the descriptors from the lists.
This ensures that dmaengine doesnt trigger anything new. And if it does we dont
call into client

2. If we get an interrupt or tasklet invoked after this, then it is the
resposiblity of dma driver to clear interrupt and return

3. While you have invoked the terminate_all you might get a callback, in that
case the substream is still valid (you are still in TRIGGER_STOP). There should
be no harm in calling period_elapsed, but it would be good if we detect that and
return from here.

4. My only worry is that during callback we drop the locks held, so callback can
be running on different CPU while you process the terminate all. This is very
racy and possibly the issue being seen in this thread. This gets complicated by
that fact that xrun would invoke the stop thus terminate_all.

> >> On the other hand that last part could get tricky as the
> >> dmaengine_terminate_all() might be call from within the callback.
> > It's tricky indeed in case xrun happens. we should avoid possible deadlock.
> 
> I think we'll eventually need to versions of dmaengine_terminate_all(). A
> sync version which makes sure that the tasklet has finished and a non-sync
> version that only makes sure that no new callbacks are started. I think the
> sync version should be the default with an optional async version which must
> be used, if it can run from within the callback. So we'd call the async
> version in the pcm_trigger callback and the sync version in the pcm_close
> callback.
Yes this can be done. We can name this disable_callback cmd. The cmd will tell
dma driver to disable all callback on the channel. This can be invoked from the
TRIGEGR_STOP and then terminate_all in the free

Which dma driver are you guys using in this? I will send a patch for the core
and pcm layer. Someone need to test on actual hardware with driver fix :)

-- 
~Vinod

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-10  1:05         ` Qiao Zhou
@ 2013-10-10  2:56           ` Vinod Koul
  2013-10-10  5:54             ` Qiao Zhou
  0 siblings, 1 reply; 19+ messages in thread
From: Vinod Koul @ 2013-10-10  2:56 UTC (permalink / raw)
  To: Qiao Zhou
  Cc: alsa-devel@alsa-project.org, Lars-Peter Clausen, tiwai@suse.de,
	lgirdwood@gmail.com, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

On Thu, Oct 10, 2013 at 09:05:02AM +0800, Qiao Zhou wrote:
> On 10/09/2013 07:00 PM, Lars-Peter Clausen wrote:
> >I think we'll eventually need to versions of dmaengine_terminate_all(). A
> >sync version which makes sure that the tasklet has finished and a non-sync
> >version that only makes sure that no new callbacks are started. I think the
> >sync version should be the default with an optional async version which must
> >be used, if it can run from within the callback. So we'd call the async
> >version in the pcm_trigger callback and the sync version in the pcm_close
> >callback.
> In our current dmaengine driver, the dma interrupt is disabled in
> terminate_all, so there is no new callback after it. This is the
> async version. Takashi also mentions the requirement for such sync
> version. I'll investigate the sync version more. thanks a lot.
Your issue seems to be more on the case callback has been onvoked while the
terminate_all in processing and after that case when sound core freed the
pointers. If you get a new callback after the terminate_all then taht would only
be a driver bug!

--
~Vinod

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-10  2:54         ` Vinod Koul
@ 2013-10-10  5:50           ` Qiao Zhou
  2013-10-10 15:47             ` Vinod Koul
  2013-10-10  7:46           ` Lars-Peter Clausen
  1 sibling, 1 reply; 19+ messages in thread
From: Qiao Zhou @ 2013-10-10  5:50 UTC (permalink / raw)
  To: Vinod Koul
  Cc: alsa-devel@alsa-project.org, Lars-Peter Clausen, tiwai@suse.de,
	lgirdwood@gmail.com, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

On 10/10/2013 10:54 AM, Vinod Koul wrote:
> On Wed, Oct 09, 2013 at 01:00:08PM +0200, Lars-Peter Clausen wrote:
>> Added Vinod to Cc.
>>
>> On 10/09/2013 12:23 PM, Qiao Zhou wrote:
>>> On 10/09/2013 04:30 PM, Lars-Peter Clausen wrote:
>>>> On 10/09/2013 10:19 AM, Lars-Peter Clausen wrote:
>>>>> On 10/09/2013 09:29 AM, Qiao Zhou wrote:
>>>>>> Hi Mark, Liam, Jaroslav, Takashi
>>>>>>
>>>>>> I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
>>>>>> function on a quad-core system. The dmaengine_pcm_dma_complete is running
>>>>>> core0, while snd_pcm_release has already been executed on core1, due to in
>>>>>> low memory stress oom killer kills the audio thread to release some memory.
>>>>>>
>>>>>> snd_pcm_release frees the runtime parameters, and runtime is used in
>>>>>> dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
>>>>>> In current audio driver, we can't promise that
>>>>>> dmaengine_pcm_dma_complete is
>>>>>> not executed after snd_pcm_release on multi cores. Maybe we should add some
>>>>>> protection. Do you have any suggestion?
>>>>>>
>>>>>> I have tried to apply below workaround, which can fix the panic, but I'm
>>>>>> not
>>>>>> confident it's proper. Need your comment and better suggestion.
>>>>>
>>>>> I think this is a general problem with your dmaengine driver, nothing audio
>>>>> specific. If the callback is able to run after dmaengine_terminate_all() has
>>>>> returned successfully there is a bug in the dmaengine driver. You need to
>>> The terminate_all runs after callback, and they run just very close on
>>> different cores. should soc-dmaengine add such protection anyway?
>>
>> The problem is that if there is a race, that the callback races against the
>> freeing of the prtd, then there is also the chance that the callback races
>> against the freeing of the substream. So in that case, e.g. with your patch,
>> you'd try to lock a mutex for which the memory already has been freed. So we
>> need a way to synchronize against the callbacks, i.e. makes sure that non of
>> the callbacks are running anymore at a given point. And only after that
>> point we are allowed to free the memory that is referenced in the callback.
> Okay reading thru the mail series and code:
>
> Since we are using cyclic dma here, we will get callback based on periods. So
> it is a very common case that you terminate and the callback is invoked.
>
> Now callback can be invoked by
> 1) the thread terminating audio, in TRIGGER_STOP
> 2) in the callback context, you invoked callback which would then go and call
> the period_elapsed ultimately leading to TRIGGER_STOP (xrun)
>
> We need to take care of these conditions:
>
> 1. In dma driver, once terminate_all in invoked, grab the lock, disable the
> tasklet, pause/stop the dmaengine remove all the descriptors from the lists.
> This ensures that dmaengine doesnt trigger anything new. And if it does we dont
> call into client
what lock do you refer to? is it "snd_pcm_stream_lock" or a new one in 
dma driver?
>
> 2. If we get an interrupt or tasklet invoked after this, then it is the
> resposiblity of dma driver to clear interrupt and return
>
> 3. While you have invoked the terminate_all you might get a callback, in that
> case the substream is still valid (you are still in TRIGGER_STOP). There should
> be no harm in calling period_elapsed, but it would be good if we detect that and
> return from here.
>
> 4. My only worry is that during callback we drop the locks held, so callback can
> be running on different CPU while you process the terminate all. This is very
> racy and possibly the issue being seen in this thread. This gets complicated by
> that fact that xrun would invoke the stop thus terminate_all.
The timing is very racy. we have two platforms, of which the only 
difference is that one is 2 * a9 cpu, and the other is 4 * a7 cpu. all 
other components and peripherals are the same. The result is we can't 
reproduce the panic issue after more than 4 days stress test on 2-cpu 
platform, but can reproduce the issue in ~10 hours level on the 4-cpu 
platform.
>
>>>> On the other hand that last part could get tricky as the
>>>> dmaengine_terminate_all() might be call from within the callback.
>>> It's tricky indeed in case xrun happens. we should avoid possible deadlock.
>>
>> I think we'll eventually need to versions of dmaengine_terminate_all(). A
>> sync version which makes sure that the tasklet has finished and a non-sync
>> version that only makes sure that no new callbacks are started. I think the
>> sync version should be the default with an optional async version which must
>> be used, if it can run from within the callback. So we'd call the async
>> version in the pcm_trigger callback and the sync version in the pcm_close
>> callback.
> Yes this can be done. We can name this disable_callback cmd. The cmd will tell
> dma driver to disable all callback on the channel. This can be invoked from the
> TRIGEGR_STOP and then terminate_all in the free
>
> Which dma driver are you guys using in this? I will send a patch for the core
> and pcm layer. Someone need to test on actual hardware with driver fix :)
>
I'm using the mmp_tdma driver under /drivers/dma/, and I can test the 
patch on our 4-cpu platform. thanks.

-- 

Best Regards
Qiao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-10  2:56           ` Vinod Koul
@ 2013-10-10  5:54             ` Qiao Zhou
  0 siblings, 0 replies; 19+ messages in thread
From: Qiao Zhou @ 2013-10-10  5:54 UTC (permalink / raw)
  To: Vinod Koul
  Cc: alsa-devel@alsa-project.org, Lars-Peter Clausen, tiwai@suse.de,
	lgirdwood@gmail.com, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

On 10/10/2013 10:56 AM, Vinod Koul wrote:
> On Thu, Oct 10, 2013 at 09:05:02AM +0800, Qiao Zhou wrote:
>> On 10/09/2013 07:00 PM, Lars-Peter Clausen wrote:
>>> I think we'll eventually need to versions of dmaengine_terminate_all(). A
>>> sync version which makes sure that the tasklet has finished and a non-sync
>>> version that only makes sure that no new callbacks are started. I think the
>>> sync version should be the default with an optional async version which must
>>> be used, if it can run from within the callback. So we'd call the async
>>> version in the pcm_trigger callback and the sync version in the pcm_close
>>> callback.
>> In our current dmaengine driver, the dma interrupt is disabled in
>> terminate_all, so there is no new callback after it. This is the
>> async version. Takashi also mentions the requirement for such sync
>> version. I'll investigate the sync version more. thanks a lot.
> Your issue seems to be more on the case callback has been onvoked while the
> terminate_all in processing and after that case when sound core freed the
> pointers. If you get a new callback after the terminate_all then taht would only
> be a driver bug!
Agree. No new callback should be invoked after terminate_all.
>
> --
> ~Vinod
>


-- 

Best Regards
Qiao

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-10  2:54         ` Vinod Koul
  2013-10-10  5:50           ` Qiao Zhou
@ 2013-10-10  7:46           ` Lars-Peter Clausen
  2013-10-10 16:10             ` Vinod Koul
  1 sibling, 1 reply; 19+ messages in thread
From: Lars-Peter Clausen @ 2013-10-10  7:46 UTC (permalink / raw)
  To: Vinod Koul
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, lgirdwood@gmail.com,
	Qiao Zhou, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

[...]
> 
>>>> On the other hand that last part could get tricky as the
>>>> dmaengine_terminate_all() might be call from within the callback.
>>> It's tricky indeed in case xrun happens. we should avoid possible deadlock.
>>
>> I think we'll eventually need to versions of dmaengine_terminate_all(). A
>> sync version which makes sure that the tasklet has finished and a non-sync
>> version that only makes sure that no new callbacks are started. I think the
>> sync version should be the default with an optional async version which must
>> be used, if it can run from within the callback. So we'd call the async
>> version in the pcm_trigger callback and the sync version in the pcm_close
>> callback.
> Yes this can be done. We can name this disable_callback cmd. The cmd will tell
> dma driver to disable all callback on the channel. This can be invoked from the
> TRIGEGR_STOP and then terminate_all in the free
> 

I think we should make it the default behavior of dmaengine_terminate_all()
to wait for the tasklet to finish. Since this is what almost always want,
except in this case where you might end up calling dmaeinge_terminate_all()
from within the callback. Internally this can be implemented as two separate
commands. So leave the DMA_TERMINATE_ALL as it is and add a new
DMA_SYNC_CALLBACKS (or whatever it will be named) command. This command will
internally call tasklet_kill(). Then we have two new functions
dmaengine_terminate_all_async() which will just issue the DMA_TERMINATE_ALL
command, the other function is dmaengine_sync_callbacks() which will issue
the DMA_SYNC_CALLBACKS command. dmaengine_terminate_all() will then first
call dmaengine_terminate_all_async() and then dmaengine_sync_callbacks().
The ALSA code would have to be updated first to call
dmaengine_terminate_all_async() for TRIGGER_STOP and
dmaengine_sync_callbacks() on the pcm_close path.

- Lars

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-10  5:50           ` Qiao Zhou
@ 2013-10-10 15:47             ` Vinod Koul
  2013-11-05  8:55               ` Qiao Zhou
  0 siblings, 1 reply; 19+ messages in thread
From: Vinod Koul @ 2013-10-10 15:47 UTC (permalink / raw)
  To: Qiao Zhou
  Cc: alsa-devel@alsa-project.org, Lars-Peter Clausen, tiwai@suse.de,
	lgirdwood@gmail.com, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

On Thu, Oct 10, 2013 at 01:50:54PM +0800, Qiao Zhou wrote:
> >1. In dma driver, once terminate_all in invoked, grab the lock, disable the
> >tasklet, pause/stop the dmaengine remove all the descriptors from the lists.
> >This ensures that dmaengine doesnt trigger anything new. And if it does we dont
> >call into client
> what lock do you refer to? is it "snd_pcm_stream_lock" or a new one
> in dma driver?
We are ref dma driver so it would be a dma driver lock.

> >2. If we get an interrupt or tasklet invoked after this, then it is the
> >resposiblity of dma driver to clear interrupt and return
> >
> >3. While you have invoked the terminate_all you might get a callback, in that
> >case the substream is still valid (you are still in TRIGGER_STOP). There should
> >be no harm in calling period_elapsed, but it would be good if we detect that and
> >return from here.
> >
> >4. My only worry is that during callback we drop the locks held, so callback can
> >be running on different CPU while you process the terminate all. This is very
> >racy and possibly the issue being seen in this thread. This gets complicated by
> >that fact that xrun would invoke the stop thus terminate_all.
> The timing is very racy. we have two platforms, of which the only
> difference is that one is 2 * a9 cpu, and the other is 4 * a7 cpu.
> all other components and peripherals are the same. The result is we
> can't reproduce the panic issue after more than 4 days stress test
> on 2-cpu platform, but can reproduce the issue in ~10 hours level on
> the 4-cpu platform.
The only reason I see if dma driver is NOT buggy is that you dma driver already
invoked callback and on different CPU you decided to terminate the audio and
call terminate_all

> >>>>On the other hand that last part could get tricky as the
> >>>>dmaengine_terminate_all() might be call from within the callback.
> >>>It's tricky indeed in case xrun happens. we should avoid possible deadlock.
> >>
> >>I think we'll eventually need to versions of dmaengine_terminate_all(). A
> >>sync version which makes sure that the tasklet has finished and a non-sync
> >>version that only makes sure that no new callbacks are started. I think the
> >>sync version should be the default with an optional async version which must
> >>be used, if it can run from within the callback. So we'd call the async
> >>version in the pcm_trigger callback and the sync version in the pcm_close
> >>callback.
> >Yes this can be done. We can name this disable_callback cmd. The cmd will tell
> >dma driver to disable all callback on the channel. This can be invoked from the
> >TRIGEGR_STOP and then terminate_all in the free
> >
> >Which dma driver are you guys using in this? I will send a patch for the core
> >and pcm layer. Someone need to test on actual hardware with driver fix :)
> >
> I'm using the mmp_tdma driver under /drivers/dma/, and I can test
> the patch on our 4-cpu platform. thanks.
Ok, let me check the driver and also see what needs to be done for this

--
~Vinod

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-10  7:46           ` Lars-Peter Clausen
@ 2013-10-10 16:10             ` Vinod Koul
  2013-10-10 17:53               ` Lars-Peter Clausen
  0 siblings, 1 reply; 19+ messages in thread
From: Vinod Koul @ 2013-10-10 16:10 UTC (permalink / raw)
  To: Lars-Peter Clausen, Dan J Williams
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, lgirdwood@gmail.com,
	Qiao Zhou, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

On Thu, Oct 10, 2013 at 09:46:34AM +0200, Lars-Peter Clausen wrote:

+ Dan

> [...]
> > 
> >>>> On the other hand that last part could get tricky as the
> >>>> dmaengine_terminate_all() might be call from within the callback.
> >>> It's tricky indeed in case xrun happens. we should avoid possible deadlock.
> >>
> >> I think we'll eventually need to versions of dmaengine_terminate_all(). A
> >> sync version which makes sure that the tasklet has finished and a non-sync
> >> version that only makes sure that no new callbacks are started. I think the
> >> sync version should be the default with an optional async version which must
> >> be used, if it can run from within the callback. So we'd call the async
> >> version in the pcm_trigger callback and the sync version in the pcm_close
> >> callback.
> > Yes this can be done. We can name this disable_callback cmd. The cmd will tell
> > dma driver to disable all callback on the channel. This can be invoked from the
> > TRIGEGR_STOP and then terminate_all in the free
> > 
> 
> I think we should make it the default behavior of dmaengine_terminate_all()
> to wait for the tasklet to finish.
No we cant since terminate can get invoked from callback itself!

> Since this is what almost always want,
> except in this case where you might end up calling dmaeinge_terminate_all()
> from within the callback. Internally this can be implemented as two separate
> commands. So leave the DMA_TERMINATE_ALL as it is and add a new
> DMA_SYNC_CALLBACKS (or whatever it will be named) command. This command will
> internally call tasklet_kill().
Implementations will have tasklet for dmaengine and not per channel. So
tasklet_kill is not option!

>Then we have two new functions
> dmaengine_terminate_all_async() which will just issue the DMA_TERMINATE_ALL
> command, the other function is dmaengine_sync_callbacks() which will issue
> the DMA_SYNC_CALLBACKS command. dmaengine_terminate_all() will then first
> call dmaengine_terminate_all_async() and then dmaengine_sync_callbacks().
> The ALSA code would have to be updated first to call
> dmaengine_terminate_all_async() for TRIGGER_STOP and
> dmaengine_sync_callbacks() on the pcm_close path.
In order to avoid the complication for getting called from the callback itself,
I was thinking that we tell dma driver explictly that from now on dont invoke
the callbacks (we might be in callback context), then we return. DMA engine will
not issue any new callbacks. And also give chance to existing callback to
return.
Then the client can safely call terminate_all() to abort the channel from non
callback context!

The only issue here is that we get called from callback context as well as
thread of client and all this cna happen while callback is executing.

This make API similistic IMO.

--
~Vinod

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-10 16:10             ` Vinod Koul
@ 2013-10-10 17:53               ` Lars-Peter Clausen
  2013-10-13 15:24                 ` Vinod Koul
  0 siblings, 1 reply; 19+ messages in thread
From: Lars-Peter Clausen @ 2013-10-10 17:53 UTC (permalink / raw)
  To: Vinod Koul
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, lgirdwood@gmail.com,
	Qiao Zhou, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Dan J Williams, Chao Xie

On 10/10/2013 06:10 PM, Vinod Koul wrote:
> On Thu, Oct 10, 2013 at 09:46:34AM +0200, Lars-Peter Clausen wrote:
>
> + Dan
>
>> [...]
>>>
>>>>>> On the other hand that last part could get tricky as the
>>>>>> dmaengine_terminate_all() might be call from within the callback.
>>>>> It's tricky indeed in case xrun happens. we should avoid possible deadlock.
>>>>
>>>> I think we'll eventually need to versions of dmaengine_terminate_all(). A
>>>> sync version which makes sure that the tasklet has finished and a non-sync
>>>> version that only makes sure that no new callbacks are started. I think the
>>>> sync version should be the default with an optional async version which must
>>>> be used, if it can run from within the callback. So we'd call the async
>>>> version in the pcm_trigger callback and the sync version in the pcm_close
>>>> callback.
>>> Yes this can be done. We can name this disable_callback cmd. The cmd will tell
>>> dma driver to disable all callback on the channel. This can be invoked from the
>>> TRIGEGR_STOP and then terminate_all in the free
>>>
>>
>> I think we should make it the default behavior of dmaengine_terminate_all()
>> to wait for the tasklet to finish.
> No we cant since terminate can get invoked from callback itself!

That's why I'm suggesting to introduce a dmaenigne_terminate_all_async() which 
needs to be called in case

>
>> Since this is what almost always want,
>> except in this case where you might end up calling dmaeinge_terminate_all()
>> from within the callback. Internally this can be implemented as two separate
>> commands. So leave the DMA_TERMINATE_ALL as it is and add a new
>> DMA_SYNC_CALLBACKS (or whatever it will be named) command. This command will
>> internally call tasklet_kill().
> Implementations will have tasklet for dmaengine and not per channel. So
> tasklet_kill is not option!

Most drivers actually have the tasklet per channel.

>
>> Then we have two new functions
>> dmaengine_terminate_all_async() which will just issue the DMA_TERMINATE_ALL
>> command, the other function is dmaengine_sync_callbacks() which will issue
>> the DMA_SYNC_CALLBACKS command. dmaengine_terminate_all() will then first
>> call dmaengine_terminate_all_async() and then dmaengine_sync_callbacks().
>> The ALSA code would have to be updated first to call
>> dmaengine_terminate_all_async() for TRIGGER_STOP and
>> dmaengine_sync_callbacks() on the pcm_close path.
> In order to avoid the complication for getting called from the callback itself,
> I was thinking that we tell dma driver explictly that from now on dont invoke
> the callbacks (we might be in callback context), then we return. DMA engine will
> not issue any new callbacks. And also give chance to existing callback to
> return.
> Then the client can safely call terminate_all() to abort the channel from non
> callback context!

But you'll gain nothing by that. The race condition still exists. You need a 
mechanism to to synchronize the execution of the callback against 
dmaengine_terminate_all(). One simple way for drivers with a per channel 
tasklet is to use tasklet_kill() for this. But other possibilities also exists 
as well.

- Lars

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-10 17:53               ` Lars-Peter Clausen
@ 2013-10-13 15:24                 ` Vinod Koul
  2013-10-13 16:57                   ` Lars-Peter Clausen
  0 siblings, 1 reply; 19+ messages in thread
From: Vinod Koul @ 2013-10-13 15:24 UTC (permalink / raw)
  To: Lars-Peter Clausen
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, lgirdwood@gmail.com,
	Qiao Zhou, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Dan J Williams, Chao Xie

On Thu, Oct 10, 2013 at 07:53:33PM +0200, Lars-Peter Clausen wrote:
> On 10/10/2013 06:10 PM, Vinod Koul wrote:
> >On Thu, Oct 10, 2013 at 09:46:34AM +0200, Lars-Peter Clausen wrote:
> >
> >+ Dan
> >
> >>[...]
> >>>
> >>>>>>On the other hand that last part could get tricky as the
> >>>>>>dmaengine_terminate_all() might be call from within the callback.
> >>>>>It's tricky indeed in case xrun happens. we should avoid possible deadlock.
> >>>>
> >>>>I think we'll eventually need to versions of dmaengine_terminate_all(). A
> >>>>sync version which makes sure that the tasklet has finished and a non-sync
> >>>>version that only makes sure that no new callbacks are started. I think the
> >>>>sync version should be the default with an optional async version which must
> >>>>be used, if it can run from within the callback. So we'd call the async
> >>>>version in the pcm_trigger callback and the sync version in the pcm_close
> >>>>callback.
> >>>Yes this can be done. We can name this disable_callback cmd. The cmd will tell
> >>>dma driver to disable all callback on the channel. This can be invoked from the
> >>>TRIGEGR_STOP and then terminate_all in the free
> >>>
> >>
> >>I think we should make it the default behavior of dmaengine_terminate_all()
> >>to wait for the tasklet to finish.
> >No we cant since terminate can get invoked from callback itself!
> 
> That's why I'm suggesting to introduce a
> dmaenigne_terminate_all_async() which needs to be called in case
> >>Since this is what almost always want,
> >>except in this case where you might end up calling dmaeinge_terminate_all()
> >>from within the callback. Internally this can be implemented as two separate
> >>commands. So leave the DMA_TERMINATE_ALL as it is and add a new
> >>DMA_SYNC_CALLBACKS (or whatever it will be named) command. This command will
> >>internally call tasklet_kill().
> >Implementations will have tasklet for dmaengine and not per channel. So
> >tasklet_kill is not option!
> 
> Most drivers actually have the tasklet per channel.
But NOT all, so thats why we cant make this as requirement. I saw quite a few
driver had common taklet. So this cant be ignored!

> >
> >>Then we have two new functions
> >>dmaengine_terminate_all_async() which will just issue the DMA_TERMINATE_ALL
> >>command, the other function is dmaengine_sync_callbacks() which will issue
> >>the DMA_SYNC_CALLBACKS command. dmaengine_terminate_all() will then first
> >>call dmaengine_terminate_all_async() and then dmaengine_sync_callbacks().
> >>The ALSA code would have to be updated first to call
> >>dmaengine_terminate_all_async() for TRIGGER_STOP and
> >>dmaengine_sync_callbacks() on the pcm_close path.
> >In order to avoid the complication for getting called from the callback itself,
> >I was thinking that we tell dma driver explictly that from now on dont invoke
> >the callbacks (we might be in callback context), then we return. DMA engine will
> >not issue any new callbacks. And also give chance to existing callback to
> >return.
> >Then the client can safely call terminate_all() to abort the channel from non
> >callback context!
> 
> But you'll gain nothing by that. The race condition still exists.
> You need a mechanism to to synchronize the execution of the callback
> against dmaengine_terminate_all(). One simple way for drivers with a
> per channel tasklet is to use tasklet_kill() for this. But other
> possibilities also exists as well.
No we are elimnating it. When you tell driver that you dont want it to invoke
the callback, you give current one which maybe from callback context a chance to
complete. So from that point no invoking callback on that channel. Disabling the channel
interrupts will help as well.

The terminate_all is gauranteed to be invoked from a non callback context. So
driver cna handle it easily

~Vinod

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-13 15:24                 ` Vinod Koul
@ 2013-10-13 16:57                   ` Lars-Peter Clausen
  0 siblings, 0 replies; 19+ messages in thread
From: Lars-Peter Clausen @ 2013-10-13 16:57 UTC (permalink / raw)
  To: Vinod Koul
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de, lgirdwood@gmail.com,
	Qiao Zhou, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Dan J Williams, Chao Xie

On 10/13/2013 05:24 PM, Vinod Koul wrote:
> On Thu, Oct 10, 2013 at 07:53:33PM +0200, Lars-Peter Clausen wrote:
>> On 10/10/2013 06:10 PM, Vinod Koul wrote:
>>> On Thu, Oct 10, 2013 at 09:46:34AM +0200, Lars-Peter Clausen wrote:
>>>
>>> + Dan
>>>
>>>> [...]
>>>>>
>>>>>>>> On the other hand that last part could get tricky as the
>>>>>>>> dmaengine_terminate_all() might be call from within the callback.
>>>>>>> It's tricky indeed in case xrun happens. we should avoid possible deadlock.
>>>>>>
>>>>>> I think we'll eventually need to versions of dmaengine_terminate_all(). A
>>>>>> sync version which makes sure that the tasklet has finished and a non-sync
>>>>>> version that only makes sure that no new callbacks are started. I think the
>>>>>> sync version should be the default with an optional async version which must
>>>>>> be used, if it can run from within the callback. So we'd call the async
>>>>>> version in the pcm_trigger callback and the sync version in the pcm_close
>>>>>> callback.
>>>>> Yes this can be done. We can name this disable_callback cmd. The cmd will tell
>>>>> dma driver to disable all callback on the channel. This can be invoked from the
>>>>> TRIGEGR_STOP and then terminate_all in the free
>>>>>
>>>>
>>>> I think we should make it the default behavior of dmaengine_terminate_all()
>>>> to wait for the tasklet to finish.
>>> No we cant since terminate can get invoked from callback itself!
>>
>> That's why I'm suggesting to introduce a
>> dmaenigne_terminate_all_async() which needs to be called in case
>>>> Since this is what almost always want,
>>>> except in this case where you might end up calling dmaeinge_terminate_all()
>>> >from within the callback. Internally this can be implemented as two separate
>>>> commands. So leave the DMA_TERMINATE_ALL as it is and add a new
>>>> DMA_SYNC_CALLBACKS (or whatever it will be named) command. This command will
>>>> internally call tasklet_kill().
>>> Implementations will have tasklet for dmaengine and not per channel. So
>>> tasklet_kill is not option!
>>
>> Most drivers actually have the tasklet per channel.
> But NOT all, so thats why we cant make this as requirement. I saw quite a few
> driver had common taklet. So this cant be ignored!
>
>>>
>>>> Then we have two new functions
>>>> dmaengine_terminate_all_async() which will just issue the DMA_TERMINATE_ALL
>>>> command, the other function is dmaengine_sync_callbacks() which will issue
>>>> the DMA_SYNC_CALLBACKS command. dmaengine_terminate_all() will then first
>>>> call dmaengine_terminate_all_async() and then dmaengine_sync_callbacks().
>>>> The ALSA code would have to be updated first to call
>>>> dmaengine_terminate_all_async() for TRIGGER_STOP and
>>>> dmaengine_sync_callbacks() on the pcm_close path.
>>> In order to avoid the complication for getting called from the callback itself,
>>> I was thinking that we tell dma driver explictly that from now on dont invoke
>>> the callbacks (we might be in callback context), then we return. DMA engine will
>>> not issue any new callbacks. And also give chance to existing callback to
>>> return.
>>> Then the client can safely call terminate_all() to abort the channel from non
>>> callback context!
>>
>> But you'll gain nothing by that. The race condition still exists.
>> You need a mechanism to to synchronize the execution of the callback
>> against dmaengine_terminate_all(). One simple way for drivers with a
>> per channel tasklet is to use tasklet_kill() for this. But other
>> possibilities also exists as well.
> No we are elimnating it. When you tell driver that you dont want it to invoke
> the callback, you give current one which maybe from callback context a chance to
> complete. So from that point no invoking callback on that channel. Disabling the channel
> interrupts will help as well.
>
> The terminate_all is gauranteed to be invoked from a non callback context. So
> driver cna handle it easily

The race we are seeing here is unrelated to terminate_all() being called from 
the callback context. The race we see here is snd_pcm_release_substream() 
racing against the callback.

The code in snd_pcm_release_substream() looks like this:

     ...
     snd_pcm_drop(substream);
     if (substream->hw_opened) {
         if (substream->ops->hw_free != NULL)
             substream->ops->hw_free(substream);
         substream->ops->close(substream);
         substream->hw_opened = 0;
     }
     ...

snd_pcm_drop() will cause the substream to stop if it is currently active. So 
this is where we call terminate_all(). In the close callback, which is called a 
few lines down, we free the data which is accessed in the callback. The race we 
see is this data being accessed in the callback after it has been freed. So the 
callback is running concurrently against this sequence in 
snd_pcm_release_substream(). Moving the position where we call terminate_all() 
in the sequence will not change anything. The callback will still be running 
concurrently against this sequence and the memory will still be used after it 
has been freed. There needs to be explicit synchronization between the callback 
and terminate_all() to solve this.

- Lars

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: async between dmaengine_pcm_dma_complete and snd_pcm_release
  2013-10-10 15:47             ` Vinod Koul
@ 2013-11-05  8:55               ` Qiao Zhou
  0 siblings, 0 replies; 19+ messages in thread
From: Qiao Zhou @ 2013-11-05  8:55 UTC (permalink / raw)
  To: Vinod Koul
  Cc: alsa-devel@alsa-project.org, Lars-Peter Clausen, tiwai@suse.de,
	lgirdwood@gmail.com, Mark Brown, zhangfei.gao@gmail.com,
	trinity.qiao.zhou@gmail.com, Chao Xie

On 10/10/2013 11:47 PM, Vinod Koul wrote:
> On Thu, Oct 10, 2013 at 01:50:54PM +0800, Qiao Zhou wrote:
>>> 1. In dma driver, once terminate_all in invoked, grab the lock, disable the
>>> tasklet, pause/stop the dmaengine remove all the descriptors from the lists.
>>> This ensures that dmaengine doesnt trigger anything new. And if it does we dont
>>> call into client
>> what lock do you refer to? is it "snd_pcm_stream_lock" or a new one
>> in dma driver?
> We are ref dma driver so it would be a dma driver lock.
>
>>> 2. If we get an interrupt or tasklet invoked after this, then it is the
>>> resposiblity of dma driver to clear interrupt and return
>>>
>>> 3. While you have invoked the terminate_all you might get a callback, in that
>>> case the substream is still valid (you are still in TRIGGER_STOP). There should
>>> be no harm in calling period_elapsed, but it would be good if we detect that and
>>> return from here.
>>>
>>> 4. My only worry is that during callback we drop the locks held, so callback can
>>> be running on different CPU while you process the terminate all. This is very
>>> racy and possibly the issue being seen in this thread. This gets complicated by
>>> that fact that xrun would invoke the stop thus terminate_all.
>> The timing is very racy. we have two platforms, of which the only
>> difference is that one is 2 * a9 cpu, and the other is 4 * a7 cpu.
>> all other components and peripherals are the same. The result is we
>> can't reproduce the panic issue after more than 4 days stress test
>> on 2-cpu platform, but can reproduce the issue in ~10 hours level on
>> the 4-cpu platform.
> The only reason I see if dma driver is NOT buggy is that you dma driver already
> invoked callback and on different CPU you decided to terminate the audio and
> call terminate_all
>
>>>>>> On the other hand that last part could get tricky as the
>>>>>> dmaengine_terminate_all() might be call from within the callback.
>>>>> It's tricky indeed in case xrun happens. we should avoid possible deadlock.
>>>>
>>>> I think we'll eventually need to versions of dmaengine_terminate_all(). A
>>>> sync version which makes sure that the tasklet has finished and a non-sync
>>>> version that only makes sure that no new callbacks are started. I think the
>>>> sync version should be the default with an optional async version which must
>>>> be used, if it can run from within the callback. So we'd call the async
>>>> version in the pcm_trigger callback and the sync version in the pcm_close
>>>> callback.
>>> Yes this can be done. We can name this disable_callback cmd. The cmd will tell
>>> dma driver to disable all callback on the channel. This can be invoked from the
>>> TRIGEGR_STOP and then terminate_all in the free
>>>
>>> Which dma driver are you guys using in this? I will send a patch for the core
>>> and pcm layer. Someone need to test on actual hardware with driver fix :)
>>>
>> I'm using the mmp_tdma driver under /drivers/dma/, and I can test
>> the patch on our 4-cpu platform. thanks.
> Ok, let me check the driver and also see what needs to be done for this
>
> --
> ~Vinod
>
Hi Vinod,

Do you have any finding?

-- 

Best Regards
Qiao

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-11-05  8:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-09  7:29 async between dmaengine_pcm_dma_complete and snd_pcm_release Qiao Zhou
2013-10-09  8:19 ` Lars-Peter Clausen
2013-10-09  8:30   ` Lars-Peter Clausen
2013-10-09 10:23     ` Qiao Zhou
2013-10-09 11:00       ` Lars-Peter Clausen
2013-10-10  1:05         ` Qiao Zhou
2013-10-10  2:56           ` Vinod Koul
2013-10-10  5:54             ` Qiao Zhou
2013-10-10  2:54         ` Vinod Koul
2013-10-10  5:50           ` Qiao Zhou
2013-10-10 15:47             ` Vinod Koul
2013-11-05  8:55               ` Qiao Zhou
2013-10-10  7:46           ` Lars-Peter Clausen
2013-10-10 16:10             ` Vinod Koul
2013-10-10 17:53               ` Lars-Peter Clausen
2013-10-13 15:24                 ` Vinod Koul
2013-10-13 16:57                   ` Lars-Peter Clausen
2013-10-09 10:53     ` Takashi Iwai
2013-10-10  1:08       ` Qiao Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).