All of lore.kernel.org
 help / color / mirror / Atom feed
From: Takashi Iwai <tiwai@suse.de>
To: imre.deak@intel.com
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	alsa-devel <alsa-devel@alsa-project.org>
Subject: Re: snd_hda_intel/snd_hda_codec_hdmi module load/unload race
Date: Thu, 15 Dec 2016 12:32:08 +0100	[thread overview]
Message-ID: <s5hr359bf7r.wl-tiwai@suse.de> (raw)
In-Reply-To: <1481749250.8172.5.camel@intel.com>

On Wed, 14 Dec 2016 22:00:50 +0100,
Imre Deak wrote:
> 
> Hi,
> 
> I got the trace below while trying to unload (unbind) snd_hda_intel, while
> its still loading the HDMI codec driver. IIUC what happens is:
> 
> Task1                                    Task2                                    Task3
> modprobe snd_hda_intel
>   schedule(azx_probe_work)
> unbind snd_hda_intel via sysfs
>   device_release_driver()
>     device_lock(snd_hda_intel)
>     azx_remove()
>       cancel_work_sync(azx_probe_work)
>                                          azx_probe_work()
>                                            request_module(snd-hda-codec-hdmi)
>                                                                                   hdmi_driver_init()
>                                                                                     __driver_attach()
>                                                                                       device_lock(snd_hda_intel)
> 
> Deadlock, since azx_probe_work() will never finish and the snd_hda_intel device
> lock will never get released.

This is indeed nasty.  The deadlock happens when the driver core takes
the parent's device lock.

static int __driver_attach(struct device *dev, void *data)
{
....
	if (dev->parent)	/* Needed for USB */
		device_lock(dev->parent);
	device_lock(dev);
	if (!dev->driver)
		driver_probe_device(drv, dev);

I vaguely remember of some other issue due to the device_lock of the
parent device.  And, I guess a similar deadlock may happen not only
with HD-audio driver but also in general with every driver using async
probe.

Greg, any good way to avoid such a deadlock?  Can we make the parent
device lock conditional somehow?

(BTW, an easy fix particularly for HD-audio is something like:

--- a/sound/hda/hdac_device.c
+++ b/sound/hda/hdac_device.c
@@ -44,7 +44,7 @@ int snd_hdac_device_init(struct hdac_device *codec, struct hdac_bus *bus,
 
 	dev = &codec->dev;
 	device_initialize(dev);
-	dev->parent = bus->dev;
+	// dev->parent = bus->dev;
 	dev->bus = &snd_hda_bus_type;
 	dev->release = default_release;
 	dev->groups = hdac_dev_attr_groups;

... but it's obviously ugly :)


thanks,

Takashi


> 
> --Imre
> 
> [  246.805268] INFO: task drv_module_relo:1684 blocked for more than 120 seconds.
> [  246.805834]       Tainted: G        W       4.9.0-CI-CI_DRM_1946+ #158
> [  246.806284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  246.806779] drv_module_relo D    0  1684   1597 0x00000000
> [  246.807171]  ffff8802703cbac0 ffff8802643926c0 000000000000479d ffff8802713b0040
> [  246.807706]  ffff88027fc98118 ffffc90003cffa58 ffffffff8181d429 0000000000000046
> [  246.808217]  ffffffff827d2470 0000000000000001 ffff88027fc98118 ffff88027fc98100
> [  246.808739] Call Trace:
> [  246.808945]  [<ffffffff8181d429>] ? __schedule+0x2c9/0xa30
> [  246.809281]  [<ffffffff8181dbcb>] schedule+0x3b/0x90
> [  246.809618]  [<ffffffff8182227b>] schedule_timeout+0x25b/0x4c0
> [  246.810021]  [<ffffffff810d6c9f>] ? mark_held_locks+0x6f/0xa0
> [  246.810370]  [<ffffffff81823987>] ? _raw_spin_unlock_irq+0x27/0x50
> [  246.810759]  [<ffffffff810d6df2>] ? trace_hardirqs_on_caller+0x122/0x1b0
> [  246.811963]  [<ffffffff8181e730>] wait_for_common+0xf0/0x160
> [  246.812463]  [<ffffffff810af5d0>] ? wake_up_q+0x70/0x70
> [  246.812795]  [<ffffffff8181e7b8>] wait_for_completion+0x18/0x20
> [  246.813389]  [<ffffffff8109b9e3>] flush_work+0x203/0x2e0
> [  246.813714]  [<ffffffff8109ba86>] ? flush_work+0x2a6/0x2e0
> [  246.814088]  [<ffffffff81099410>] ? flush_workqueue_prep_pwqs+0x220/0x220
> [  246.814790]  [<ffffffff8109d290>] __cancel_work_timer+0x110/0x1d0
> [  246.815346]  [<ffffffff810d6df2>] ? trace_hardirqs_on_caller+0x122/0x1b0
> [  246.815742]  [<ffffffff810d6e8d>] ? trace_hardirqs_on+0xd/0x10
> [  246.816096]  [<ffffffff8109d35b>] cancel_work_sync+0xb/0x10
> [  246.816566]  [<ffffffffa00d4fa4>] azx_remove+0x24/0x30 [snd_hda_intel]
> [  246.816956]  [<ffffffff81485864>] pci_device_remove+0x34/0xb0
> [  246.817373]  [<ffffffff81584cfc>] __device_release_driver+0x9c/0x150
> [  246.817717]  [<ffffffff81584dd0>] device_release_driver+0x20/0x30
> [  246.818077]  [<ffffffff815836aa>] unbind_store+0x10a/0x150
> [  246.818549]  [<ffffffff81582d70>] drv_attr_store+0x20/0x30
> [  246.818885]  [<ffffffff812775a0>] sysfs_kf_write+0x40/0x50
> [  246.819290]  [<ffffffff812768e0>] kernfs_fop_write+0x130/0x1b0
> [  246.819639]  [<ffffffff811f6383>] __vfs_write+0x23/0x120
> [  246.819963]  [<ffffffff810f48c0>] ? rcu_read_lock_sched_held+0x70/0x80
> [  246.820476]  [<ffffffff810f4caa>] ? rcu_sync_lockdep_assert+0x2a/0x50
> [  246.820859]  [<ffffffff811f9a68>] ? __sb_start_write+0xf8/0x200
> [  246.821292]  [<ffffffff811f7129>] ? vfs_write+0x169/0x1b0
> [  246.821597]  [<ffffffff811f7073>] vfs_write+0xb3/0x1b0
> [  246.821907]  [<ffffffff810d6df2>] ? trace_hardirqs_on_caller+0x122/0x1b0
> [  246.822405]  [<ffffffff811f83b4>] SyS_write+0x44/0xb0
> [  246.822684]  [<ffffffff8182412e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
> [  246.823064] 
>                Showing all locks held in the system:
> [  246.823568] 2 locks held by kworker/u8:0/6:
> [  246.823817]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffffff8109c1f6>] process_one_work+0x166/0x6b0
> [  246.824400]  #1:  ((&sub_info->work)){+.+.+.}, at: [<ffffffff8109c1f6>] process_one_work+0x166/0x6b0
> [  246.824940] 2 locks held by khungtaskd/34:
> [  246.825210]  #0:  (rcu_read_lock){......}, at: [<ffffffff811329cc>] watchdog+0x9c/0x600
> [  246.825726]  #1:  (tasklist_lock){.+.+..}, at: [<ffffffff810d536d>] debug_show_all_locks+0x3d/0x1a0
> [  246.826303] 2 locks held by kworker/3:1/50:
> [  246.826535]  #0:  ("events"){++++.+}, at: [<ffffffff8109c1f6>] process_one_work+0x166/0x6b0
> [  246.827104]  #1:  ((&hda->probe_work)){+.+.+.}, at: [<ffffffff8109c1f6>] process_one_work+0x166/0x6b0
> [  246.827706] 5 locks held by drv_module_relo/1684:
> [  246.827963]  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff811f7129>] vfs_write+0x169/0x1b0
> [  246.828507]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff812768ac>] kernfs_fop_write+0xfc/0x1b0
> [  246.829083]  #2:  (s_active#186){++++.+}, at: [<ffffffff812768b4>] kernfs_fop_write+0x104/0x1b0
> [  246.829663]  #3:  (&dev->mutex){......}, at: [<ffffffff8158369a>] unbind_store+0xfa/0x150
> [  246.830204]  #4:  (&dev->mutex){......}, at: [<ffffffff81584dc8>] device_release_driver+0x18/0x30
> [  246.830772] 1 lock held by modprobe/1685:
> [  246.830992]  #0:  (&dev->mutex){......}, at: [<ffffffff8158561a>] __driver_attach+0x5a/0xe0
> [  246.831575] 2 locks held by bash/1727:
> [  246.831782]  #0:  (&tty->ldisc_sem){++++.+}, at: [<ffffffff81822d3d>] ldsem_down_read+0x2d/0x40
> [  246.832344]  #1:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff81506529>] n_tty_read+0xa9/0x880
> 
> [  246.833050] =============================================
> 
> [  246.833436] INFO: task modprobe:1685 blocked for more than 120 seconds.
> [  246.833829]       Tainted: G        W       4.9.0-CI-CI_DRM_1946+ #158
> [  246.834221] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  246.834680] modprobe        D    0  1685      6 0x00000000
> [  246.835029]  ffff8802703c9a40 ffff8802760e4d40 00000000000091ba ffff8802559e26c0
> [  246.835522]  ffff88027fd18118 ffffc900032bbb70 ffffffff8181d429 0000000000000000
> [  246.835967]  ffffc900032bbb98 0000000000000292 ffff88027fd18118 ffff88027fd18100
> [  246.836462] Call Trace:
> [  246.836628]  [<ffffffff8181d429>] ? __schedule+0x2c9/0xa30
> [  246.836924]  [<ffffffff8181dbcb>] schedule+0x3b/0x90
> [  246.837243]  [<ffffffff8181e043>] schedule_preempt_disabled+0x13/0x20
> [  246.837629]  [<ffffffff8181ecbc>] mutex_lock_nested+0x16c/0x400
> [  246.837946]  [<ffffffff8158561a>] ? __driver_attach+0x5a/0xe0
> [  246.838298]  [<ffffffff815855c0>] ? driver_probe_device+0x430/0x430
> [  246.838676]  [<ffffffff8158561a>] __driver_attach+0x5a/0xe0
> [  246.839008]  [<ffffffff815855c0>] ? driver_probe_device+0x430/0x430
> [  246.839354]  [<ffffffff81582fc1>] bus_for_each_dev+0x61/0xa0
> [  246.839704]  [<ffffffff81584b89>] driver_attach+0x19/0x20
> [  246.840029]  [<ffffffff815846cb>] bus_add_driver+0x1fb/0x270
> [  246.840346]  [<ffffffffa014a000>] ? 0xffffffffa014a000
> [  246.840645]  [<ffffffff8158629b>] driver_register+0x5b/0xd0
> [  246.840977]  [<ffffffffa048f0d5>] __hda_codec_driver_register+0x55/0x60 [snd_hda_codec]
> [  246.841417]  [<ffffffffa014a01e>] hdmi_driver_init+0x1e/0x20 [snd_hda_codec_hdmi]
> [  246.841861]  [<ffffffff81000428>] do_one_initcall+0x38/0x150
> [  246.842177]  [<ffffffff810f48c0>] ? rcu_read_lock_sched_held+0x70/0x80
> [  246.842566]  [<ffffffff811d7804>] ? kmem_cache_alloc_trace+0x274/0x2e0
> [  246.842951]  [<ffffffff81176161>] ? do_init_module+0x22/0x1d1
> [  246.843307]  [<ffffffff81176199>] do_init_module+0x5a/0x1d1
> [  246.843617]  [<ffffffff8111cb15>] load_module+0x2275/0x27d0
> [  246.843951]  [<ffffffff81119e60>] ? symbol_put_addr+0x60/0x60
> [  246.844306]  [<ffffffff811fd7d5>] ? kernel_read_file+0x105/0x190
> [  246.844637]  [<ffffffff8111d29c>] SyS_finit_module+0xbc/0xf0
> [  246.844979]  [<ffffffff8182412e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
> [  246.845368] 
>                Showing all locks held in the system:
> [  246.845745] 2 locks held by kworker/u8:0/6:
> [  246.845974]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffffff8109c1f6>] process_one_work+0x166/0x6b0
> [  246.846553]  #1:  ((&sub_info->work)){+.+.+.}, at: [<ffffffff8109c1f6>] process_one_work+0x166/0x6b0
> [  246.847133] 2 locks held by khungtaskd/34:
> [  246.847399]  #0:  (rcu_read_lock){......}, at: [<ffffffff811329cc>] watchdog+0x9c/0x600
> [  246.847913]  #1:  (tasklist_lock){.+.+..}, at: [<ffffffff810d536d>] debug_show_all_locks+0x3d/0x1a0
> [  246.848488] 2 locks held by kworker/3:1/50:
> [  246.848747]  #0:  ("events"){++++.+}, at: [<ffffffff8109c1f6>] process_one_work+0x166/0x6b0
> [  246.849286]  #1:  ((&hda->probe_work)){+.+.+.}, at: [<ffffffff8109c1f6>] process_one_work+0x166/0x6b0
> [  246.849875] 1 lock held by in:imklog/491:
> [  246.850094]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff8121854e>] __fdget_pos+0x3e/0x50
> [  246.850657] 5 locks held by drv_module_relo/1684:
> [  246.850931]  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffff811f7129>] vfs_write+0x169/0x1b0
> [  246.851474]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff812768ac>] kernfs_fop_write+0xfc/0x1b0
> [  246.852013]  #2:  (s_active#186){++++.+}, at: [<ffffffff812768b4>] kernfs_fop_write+0x104/0x1b0
> [  246.852577]  #3:  (&dev->mutex){......}, at: [<ffffffff8158369a>] unbind_store+0xfa/0x150
> [  246.853103]  #4:  (&dev->mutex){......}, at: [<ffffffff81584dc8>] device_release_driver+0x18/0x30
> [  246.853714] 1 lock held by modprobe/1685:
> [  246.853934]  #0:  (&dev->mutex){......}, at: [<ffffffff8158561a>] __driver_attach+0x5a/0xe0
> [  246.854473] 2 locks held by bash/1727:
> [  246.854713]  #0:  (&tty->ldisc_sem){++++.+}, at: [<ffffffff81822d3d>] ldsem_down_read+0x2d/0x40
> [  246.855272]  #1:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff81506529>] n_tty_read+0xa9/0x880
> 
> Stack trace of kworker/50:
> [<ffffffff810979e3>] call_usermodehelper_exec+0x133/0x1d0
> [<ffffffff81097c0f>] __request_module+0x18f/0x360
> [<ffffffffa048f182>] request_codec_module+0x52/0x60 [snd_hda_codec]
> [<ffffffffa048f21d>] snd_hda_codec_configure+0x5d/0x170 [snd_hda_codec]
> [<ffffffffa0499693>] azx_codec_configure+0x23/0x40 [snd_hda_codec]
> [<ffffffffa00d6700>] azx_probe_work+0x5d0/0x8d0 [snd_hda_intel]
> [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0
> [<ffffffff8109c789>] worker_thread+0x49/0x490
> [<ffffffff810a2aab>] kthread+0xeb/0x110
> [<ffffffff818243a7>] ret_from_fork+0x27/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
> 

  reply	other threads:[~2016-12-15 11:32 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-14 21:00 snd_hda_intel/snd_hda_codec_hdmi module load/unload race Imre Deak
2016-12-15 11:32 ` Takashi Iwai [this message]
2016-12-16 16:40   ` Greg Kroah-Hartman
2017-01-02 11:00     ` Takashi Iwai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=s5hr359bf7r.wl-tiwai@suse.de \
    --to=tiwai@suse.de \
    --cc=alsa-devel@alsa-project.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=imre.deak@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.