All of lore.kernel.org
 help / color / mirror / Atom feed
* hda codec unbind refcount hang
@ 2022-09-09 15:45 Ville Syrjälä
  2022-09-09 15:59 ` Takashi Iwai
  0 siblings, 1 reply; 5+ messages in thread
From: Ville Syrjälä @ 2022-09-09 15:45 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel

Hi Takashi,

commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec
unbinding") introduced a problem on at least one of my older machines.

The problem happens when hda_codec_driver_remove() encounters a
codec without any pcms (and thus the refcount is 1) and tries to
call refcount_dec(). Turns out refcount_dec() doesn't like to be
used for dropping the refcount to 0, and instead if spews a warning
and does its saturate thing. The subsequent wait_event() is then
permanently stuck waiting on the saturated refcount.

I've definitely seen the same kind of pattern used elsewhere
in the kernel as well, so the fact that refcount_t can't be used
to implement it is a bit of surprise to me. I guess most other
places still use atomic_t instead.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: hda codec unbind refcount hang
  2022-09-09 15:45 hda codec unbind refcount hang Ville Syrjälä
@ 2022-09-09 15:59 ` Takashi Iwai
  2022-09-09 19:39   ` Ville Syrjälä
  0 siblings, 1 reply; 5+ messages in thread
From: Takashi Iwai @ 2022-09-09 15:59 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: alsa-devel

On Fri, 09 Sep 2022 17:45:25 +0200,
Ville Syrjälä wrote:
> 
> Hi Takashi,
> 
> commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec
> unbinding") introduced a problem on at least one of my older machines.
> 
> The problem happens when hda_codec_driver_remove() encounters a
> codec without any pcms (and thus the refcount is 1) and tries to
> call refcount_dec(). Turns out refcount_dec() doesn't like to be
> used for dropping the refcount to 0, and instead if spews a warning
> and does its saturate thing. The subsequent wait_event() is then
> permanently stuck waiting on the saturated refcount.
> 
> I've definitely seen the same kind of pattern used elsewhere
> in the kernel as well, so the fact that refcount_t can't be used
> to implement it is a bit of surprise to me. I guess most other
> places still use atomic_t instead.

Does the patch below work around it?  It seem to be a subtle
difference between refcount_dec() and refcount_dec_and_test().


thanks,

Takashi

-- 8< --
--- a/sound/pci/hda/hda_bind.c
+++ b/sound/pci/hda/hda_bind.c
@@ -157,10 +157,11 @@ static int hda_codec_driver_remove(struct device *dev)
 		return codec->bus->core.ext_ops->hdev_detach(&codec->core);
 	}
 
-	refcount_dec(&codec->pcm_ref);
-	snd_hda_codec_disconnect_pcms(codec);
-	snd_hda_jack_tbl_disconnect(codec);
-	wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
+	if (!refcount_dec_and_test(&codec->pcm_ref)) {
+		snd_hda_codec_disconnect_pcms(codec);
+		snd_hda_jack_tbl_disconnect(codec);
+		wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
+	}
 	snd_power_sync_ref(codec->bus->card);
 
 	if (codec->patch_ops.free)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: hda codec unbind refcount hang
  2022-09-09 15:59 ` Takashi Iwai
@ 2022-09-09 19:39   ` Ville Syrjälä
  2022-09-10 10:22     ` Takashi Iwai
  0 siblings, 1 reply; 5+ messages in thread
From: Ville Syrjälä @ 2022-09-09 19:39 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel

On Fri, Sep 09, 2022 at 05:59:47PM +0200, Takashi Iwai wrote:
> On Fri, 09 Sep 2022 17:45:25 +0200,
> Ville Syrjälä wrote:
> > 
> > Hi Takashi,
> > 
> > commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec
> > unbinding") introduced a problem on at least one of my older machines.
> > 
> > The problem happens when hda_codec_driver_remove() encounters a
> > codec without any pcms (and thus the refcount is 1) and tries to
> > call refcount_dec(). Turns out refcount_dec() doesn't like to be
> > used for dropping the refcount to 0, and instead if spews a warning
> > and does its saturate thing. The subsequent wait_event() is then
> > permanently stuck waiting on the saturated refcount.
> > 
> > I've definitely seen the same kind of pattern used elsewhere
> > in the kernel as well, so the fact that refcount_t can't be used
> > to implement it is a bit of surprise to me. I guess most other
> > places still use atomic_t instead.
> 
> Does the patch below work around it?  It seem to be a subtle
> difference between refcount_dec() and refcount_dec_and_test().

Aye, this works.

Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> 
> 
> thanks,
> 
> Takashi
> 
> -- 8< --
> --- a/sound/pci/hda/hda_bind.c
> +++ b/sound/pci/hda/hda_bind.c
> @@ -157,10 +157,11 @@ static int hda_codec_driver_remove(struct device *dev)
>  		return codec->bus->core.ext_ops->hdev_detach(&codec->core);
>  	}
>  
> -	refcount_dec(&codec->pcm_ref);
> -	snd_hda_codec_disconnect_pcms(codec);
> -	snd_hda_jack_tbl_disconnect(codec);
> -	wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
> +	if (!refcount_dec_and_test(&codec->pcm_ref)) {
> +		snd_hda_codec_disconnect_pcms(codec);
> +		snd_hda_jack_tbl_disconnect(codec);
> +		wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
> +	}
>  	snd_power_sync_ref(codec->bus->card);
>  
>  	if (codec->patch_ops.free)

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: hda codec unbind refcount hang
  2022-09-09 19:39   ` Ville Syrjälä
@ 2022-09-10 10:22     ` Takashi Iwai
  2022-09-10 11:54       ` Ville Syrjälä
  0 siblings, 1 reply; 5+ messages in thread
From: Takashi Iwai @ 2022-09-10 10:22 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: alsa-devel

On Fri, 09 Sep 2022 21:39:19 +0200,
Ville Syrjälä wrote:
> 
> On Fri, Sep 09, 2022 at 05:59:47PM +0200, Takashi Iwai wrote:
> > On Fri, 09 Sep 2022 17:45:25 +0200,
> > Ville Syrjälä wrote:
> > > 
> > > Hi Takashi,
> > > 
> > > commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec
> > > unbinding") introduced a problem on at least one of my older machines.
> > > 
> > > The problem happens when hda_codec_driver_remove() encounters a
> > > codec without any pcms (and thus the refcount is 1) and tries to
> > > call refcount_dec(). Turns out refcount_dec() doesn't like to be
> > > used for dropping the refcount to 0, and instead if spews a warning
> > > and does its saturate thing. The subsequent wait_event() is then
> > > permanently stuck waiting on the saturated refcount.
> > > 
> > > I've definitely seen the same kind of pattern used elsewhere
> > > in the kernel as well, so the fact that refcount_t can't be used
> > > to implement it is a bit of surprise to me. I guess most other
> > > places still use atomic_t instead.
> > 
> > Does the patch below work around it?  It seem to be a subtle
> > difference between refcount_dec() and refcount_dec_and_test().
> 
> Aye, this works.
> 
> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Good to hear.

I think the below is slightly safer, assuring the other *_disconnect()
calls.

Could you give it a try again?  Once after confirming it works, I'll
re-submit and merge to my tree.


thanks,

Takashi

-- 8< --
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] ALSA: hda: Fix hang at HD-audio codec unbinding due to refcount saturation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We fixed the potential deadlock at dynamic unbinding the HD-audio
codec at the commit 7206998f578d ("ALSA: hda: Fix potential deadlock
at codec unbinding"), but ironically, this caused another potential
deadlock.  The current code uses refcount_dec() and waits for the
pending task with wait_event for dropping the refcount to 0.  This
works fine when PCMs are assigned and actually waiting for the
refcount drop.

Meanwhile, when there was no PCM assigned, the refcount_dec() call
itself was supposed to drop to zero -- alas, it doesn't in reality;
refcount_dec() complains, spews kernel warning and it saturates
instead of dropping to 0, due to the nature of refcount_dec()
implementation.  This eventually blocks the wait_event() wakeup and
the code get stuck there.

For avoiding the problem, we call refcount_dec_and_test() and skips
the sync-wait if it already reaches to zero.

The patch does a slight code reshuffling to make sure to invoke other
disconnect calls before the sync-wait, too.

Fixes: 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YxtflWQnslMHVlU7@intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/pci/hda/hda_bind.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/pci/hda/hda_bind.c b/sound/pci/hda/hda_bind.c
index cae9a975cbcc..1a868dd9dc4b 100644
--- a/sound/pci/hda/hda_bind.c
+++ b/sound/pci/hda/hda_bind.c
@@ -157,10 +157,10 @@ static int hda_codec_driver_remove(struct device *dev)
 		return codec->bus->core.ext_ops->hdev_detach(&codec->core);
 	}
 
-	refcount_dec(&codec->pcm_ref);
 	snd_hda_codec_disconnect_pcms(codec);
 	snd_hda_jack_tbl_disconnect(codec);
-	wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
+	if (!refcount_dec_and_test(&codec->pcm_ref))
+		wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
 	snd_power_sync_ref(codec->bus->card);
 
 	if (codec->patch_ops.free)
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: hda codec unbind refcount hang
  2022-09-10 10:22     ` Takashi Iwai
@ 2022-09-10 11:54       ` Ville Syrjälä
  0 siblings, 0 replies; 5+ messages in thread
From: Ville Syrjälä @ 2022-09-10 11:54 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel

On Sat, Sep 10, 2022 at 12:22:04PM +0200, Takashi Iwai wrote:
> On Fri, 09 Sep 2022 21:39:19 +0200,
> Ville Syrjälä wrote:
> > 
> > On Fri, Sep 09, 2022 at 05:59:47PM +0200, Takashi Iwai wrote:
> > > On Fri, 09 Sep 2022 17:45:25 +0200,
> > > Ville Syrjälä wrote:
> > > > 
> > > > Hi Takashi,
> > > > 
> > > > commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec
> > > > unbinding") introduced a problem on at least one of my older machines.
> > > > 
> > > > The problem happens when hda_codec_driver_remove() encounters a
> > > > codec without any pcms (and thus the refcount is 1) and tries to
> > > > call refcount_dec(). Turns out refcount_dec() doesn't like to be
> > > > used for dropping the refcount to 0, and instead if spews a warning
> > > > and does its saturate thing. The subsequent wait_event() is then
> > > > permanently stuck waiting on the saturated refcount.
> > > > 
> > > > I've definitely seen the same kind of pattern used elsewhere
> > > > in the kernel as well, so the fact that refcount_t can't be used
> > > > to implement it is a bit of surprise to me. I guess most other
> > > > places still use atomic_t instead.
> > > 
> > > Does the patch below work around it?  It seem to be a subtle
> > > difference between refcount_dec() and refcount_dec_and_test().
> > 
> > Aye, this works.
> > 
> > Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Good to hear.
> 
> I think the below is slightly safer, assuring the other *_disconnect()
> calls.
> 
> Could you give it a try again?  Once after confirming it works, I'll
> re-submit and merge to my tree.

This works too. Thanks

Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> 
> 
> thanks,
> 
> Takashi
> 
> -- 8< --
> From: Takashi Iwai <tiwai@suse.de>
> Subject: [PATCH] ALSA: hda: Fix hang at HD-audio codec unbinding due to refcount saturation
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> We fixed the potential deadlock at dynamic unbinding the HD-audio
> codec at the commit 7206998f578d ("ALSA: hda: Fix potential deadlock
> at codec unbinding"), but ironically, this caused another potential
> deadlock.  The current code uses refcount_dec() and waits for the
> pending task with wait_event for dropping the refcount to 0.  This
> works fine when PCMs are assigned and actually waiting for the
> refcount drop.
> 
> Meanwhile, when there was no PCM assigned, the refcount_dec() call
> itself was supposed to drop to zero -- alas, it doesn't in reality;
> refcount_dec() complains, spews kernel warning and it saturates
> instead of dropping to 0, due to the nature of refcount_dec()
> implementation.  This eventually blocks the wait_event() wakeup and
> the code get stuck there.
> 
> For avoiding the problem, we call refcount_dec_and_test() and skips
> the sync-wait if it already reaches to zero.
> 
> The patch does a slight code reshuffling to make sure to invoke other
> disconnect calls before the sync-wait, too.
> 
> Fixes: 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding")
> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: <stable@vger.kernel.org>
> Link: https://lore.kernel.org/r/YxtflWQnslMHVlU7@intel.com
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  sound/pci/hda/hda_bind.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/sound/pci/hda/hda_bind.c b/sound/pci/hda/hda_bind.c
> index cae9a975cbcc..1a868dd9dc4b 100644
> --- a/sound/pci/hda/hda_bind.c
> +++ b/sound/pci/hda/hda_bind.c
> @@ -157,10 +157,10 @@ static int hda_codec_driver_remove(struct device *dev)
>  		return codec->bus->core.ext_ops->hdev_detach(&codec->core);
>  	}
>  
> -	refcount_dec(&codec->pcm_ref);
>  	snd_hda_codec_disconnect_pcms(codec);
>  	snd_hda_jack_tbl_disconnect(codec);
> -	wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
> +	if (!refcount_dec_and_test(&codec->pcm_ref))
> +		wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
>  	snd_power_sync_ref(codec->bus->card);
>  
>  	if (codec->patch_ops.free)
> -- 
> 2.35.3

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-09-10 11:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-09 15:45 hda codec unbind refcount hang Ville Syrjälä
2022-09-09 15:59 ` Takashi Iwai
2022-09-09 19:39   ` Ville Syrjälä
2022-09-10 10:22     ` Takashi Iwai
2022-09-10 11:54       ` Ville Syrjälä

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.