Possible race at user mode helper in request_firmware

All of lore.kernel.org
 help / color / mirror / Atom feed

From: clingutla@codeaurora.org
To: ming.lei@canonical.com, linux-kernel@vger.kernel.org
Cc: gregkh@linuxfoundation.org, linux-arm-msm@vger.kernel.org
Subject: Possible race at user mode helper in request_firmware
Date: Mon, 14 Mar 2016 17:31:08 +0530	[thread overview]
Message-ID: <69c9d897d4c1703fb5efe6d2d9dd9f61@codeaurora.org> (raw)

Hello,

I see possible race in _request_firmware_load function, on which I 
wanted to take your opinion.

When system is going to low power mode, device_cache_fw_images() is 
called from pm notifier which schedules async workers to cache devices 
firmware.
If more than 2 async requests falls under user helper mode, then there 
is a use after free of "firmware" directory kobject.
The race will hit, when one asynchronous worker calls device_del() while 
other asynchronous worker calls get_device_parent() in
_request_firmware_load().

After loading firmware, first worker calls device_del(), and "firmware" 
directory ref count is 1, so cleanup_glue_dir() calls kobject_put() with 
gdp_mutex lock held.
Meanwhile, second async worker calls device_add(), and 
get_device_parent() is still able to find the firmware directory kobject 
in ksets after getting  gdp_mutex.
This kobject gets added as parent to the second device kobject.
By the time of accessing the parent kobject, its ref count was 0 and 
prints below warning stack followed by crash.

[21147.455038] ------------[ cut here ]------------
[21147.458910] WARNING: at <...>/kernel/include/linux/kref.h:47 
kobject_get+0x50/0x68()
[21147.481199] Modules linked in: core_ctl(PO) qdrbg_module(O) 
qcrypto_module(O)
[21147.481256] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O 
3.10.84-g0957845-00427-g56a05c2 #1
[21147.481284] Workqueue: events_unbound async_run_entry_fn
[21147.481296] Call trace:
[21147.481315] [<ffffffc000206b60>] dump_backtrace+0x0/0x270
[21147.481331] [<ffffffc000206de0>] show_stack+0x10/0x1c
[21147.481353] [<ffffffc000d569ec>] dump_stack+0x1c/0x28
[21147.481370] [<ffffffc00021cc18>] warn_slowpath_common+0x74/0x9c
[21147.481383] [<ffffffc00021ce4c>] warn_slowpath_null+0x14/0x20
[21147.481397] [<ffffffc000456b10>] kobject_get+0x4c/0x68
[21147.481416] [<ffffffc000609f64>] get_device_parent+0xa0/0x194
[21147.481430] [<ffffffc00060a3f4>] device_add+0x100/0x600
[21147.481446] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
[21147.481459] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
[21147.481473] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
[21147.481490] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
[21147.481505] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
[21147.481521] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
[21147.481535] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
[21147.481553] [<ffffffc00023e500>] kthread+0xac/0xb8
[21147.481563] ---[ end trace dabc98ea48b8ba59 ]---

[21147.486436] Unable to handle kernel paging request at virtual address 
0x6b6b6b6b6b6b6bcb
[21147.493816] pgd = ffffffc0a982d000
[21147.498337] [6b6b6b6b6b6b6bcb] *pgd=0000000000000000
[21147.502287] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[21147.507826] Modules linked in: core_ctl(PO) qdrbg_module(O) 
qcrypto_module(O)
[21147.514951] CPU: 2 PID: 23935 Comm: kworker/u16:8 Tainted: P W O 
3.10.84-g0957845-00427-g56a05c2 #1
[21147.524686] Workqueue: events_unbound async_run_entry_fn
[21147.529965] task: ffffffc008138000 ti: ffffffc007880000 task.ti: 
ffffffc007880000
[21147.537438] PC is at sysfs_create_dir+0x34/0xd4
[21147.541949] LR is at kobject_add_internal+0x120/0x24c
[21147.546979] pc : [<ffffffc000361fa8>] lr : [<ffffffc000456f5c>] 
pstate: 80000145
[21147.996295] Process kworker/u16:8 (pid: 23935, stack limit = 
0xffffffc007880058)
[21148.003673] Call trace:
[21148.006117] [<ffffffc000361fa8>] sysfs_create_dir+0x34/0xd4
[21148.011663] [<ffffffc000456f58>] kobject_add_internal+0x11c/0x24c
[21148.017737] [<ffffffc0004573ec>] kobject_add+0xc8/0xe4
[21148.022863] [<ffffffc00060a410>] device_add+0x11c/0x600
[21148.028069] [<ffffffc00061c390>] _request_firmware+0x8b4/0xa80
[21148.033886] [<ffffffc00061c58c>] request_firmware+0x30/0x3c
[21148.039439] [<ffffffc00061c5ec>] cache_firmware+0x54/0xb0
[21148.044822] [<ffffffc00061c65c>] __async_dev_cache_fw_image+0x14/0x54
[21148.051249] [<ffffffc000243794>] async_run_entry_fn+0x6c/0x12c
[21148.057064] [<ffffffc000237e88>] process_one_work+0x264/0x3dc
[21148.062792] [<ffffffc0002392c0>] worker_thread+0x1f0/0x340
[21148.068263] [<ffffffc00023e500>] kthread+0xac/0xb8
[21148.073039] Code: b5000094 14000026 d000ac54 9125a294 (7940c281)
[21148.080312] ---[ end trace dabc98ea48b8ba5a ]---

Below is the callstack where the parent is freed from the slub_debug 
info.
0xFFFFFFC000608D74  \\vmlinux\base/core\class_dir_release+0x0C
0xFFFFFFC0002FE6BC \\vmlinux\slub\free_debug_processing\fail+0x114
0xFFFFFFC0002FE7A0 \\vmlinux\slub\__slab_free+0x44
0xFFFFFFC0002FEE60 \\vmlinux\slub\kfree+0x1F8
0xFFFFFFC000608D70 \\vmlinux\base/core\class_dir_release+0x8
0xFFFFFFC000456D70 \\vmlinux\kobject\kobject_release+0x134
0xFFFFFFC000456E10 \\vmlinux\kobject\kobject_put+0x58
0xFFFFFFC000456C28 \\vmlinux\kobject\kobject_del+0x64
0xFFFFFFC000609D78 \\vmlinux\base/core\device_del+0x150
0xFFFFFFC00061C524 \\vmlinux\firmware_class\_request_firmware\out+0x71C
0xFFFFFFC00061C58C \\vmlinux\firmware_class\request_firmware+0x30
0xFFFFFFC00061C5EC \\vmlinux\firmware_class\cache_firmware+0x54
0xFFFFFFC00061C65C 
\\vmlinux\firmware_class\__async_dev_cache_fw_image+0x14
0xFFFFFFC000243794 \\vmlinux\async\async_run_entry_fn+0x6C
0xFFFFFFC000237E88 \\vmlinux\workqueue\process_one_work+0x264
0xFFFFFFC0002392C0 \\vmlinux\workqueue\worker_thread\recheck+0x1A0
0xFFFFFFC00023E500 \\vmlinux\kthread\kthread+0xAC

For similar type of issue, I see there is an existing fix: "sysfs: 
driver core: Fix glue dir race condition by gdp_mutex".
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/base/core.c?id=e4a60d139060975eb956717e4f63ae348d4d8cc5


Still I am able to reproduce the issue. I have verified this race on 
kernels: 3.10, 3.18 and 4.4.

I followed below procedure to reproduce the issue:
      1. Enable "CONFIG_DEBUG_KOBJECT_RELEASE"
      2. Use test_firmware, modified the test for async calls
      3. Replace WARN_ON with BUG_ON in kref_get().
I ran below script from shell:
     count=0
     while [ 1 ]
     do
             echo 3 > /sys/devices/virtual/misc/test_firmware/trigger_ufw
             if [ $? -ne 0 ]; then
                     print "Exiting.. "
                     exit 1
             fi
             count=$(($count +1))
             echo "count $count"
      done

With below patch, I could reproduce the issue in the 2nd iteration.

-------------------------8<----------------------------------------------

To reproduce the race condition, use firmware test and
CONFIG_DEBUG_KOBJECT_RELEASE=y, which delays kobject
releases. So that we can catch the bug easily.

Enabled below flags in kernel config file:

+CONFIG_DEBUG_KOBJECT_RELEASE=y
+CONFIG_TEST_FIRMWARE=y


diff --git a/include/linux/kref.h b/include/linux/kref.h
index 484604d..727fb24 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -44,7 +44,7 @@ static inline void kref_get(struct kref *kref)
       * condition when this kref is freeing by some other thread right
now.
       * In this case one should use kref_get_unless_zero()
       */
-    WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
+    BUG_ON(atomic_inc_return(&kref->refcount) < 2);
  }

  /**
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
old mode 100644
new mode 100755
index 86374c1..14c9598
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -18,6 +18,7 @@
  #include <linux/miscdevice.h>
  #include <linux/slab.h>
  #include <linux/uaccess.h>
+#include <linux/async.h>

+
+static ASYNC_DOMAIN_EXCLUSIVE(ufw_domain);
+
+static void __async_ufw_work(void *name,
+                                       async_cookie_t cookie)
+{
+    const struct firmware *fw;
+    const char *fw_name = name;
+    int ret;
+
+    pr_err("requesting %s\n", fw_name);
+
+    ret = request_firmware(&fw, fw_name, NULL);
+    pr_err("loaded: %zu\n", fw ? fw->size : 0);
+        if (!ret)
+                kfree(fw);
+
+}
+
+static char *name[] = { "fw1.bin", "fw2.bin"};
+
+static ssize_t trigger_ufw_store(struct device *dev,
+        struct device_attribute *attr,
+        const char *buf, size_t count)
+{
+    int rc, i, c=0;
+    u32 start;
+    char *fw_name;
+
+    rc = kstrtou32(buf, 0, &start);
+    if (rc){
+        pr_err("Invalid option\n");
+        return rc;
+    }
+    pr_err(" no of iterations %d\n", start);
+
+    for (i=0; i< start; i++)
+    {
+        if (c >= 2)
+            c = 0;
+        fw_name = name[c++];
+        async_schedule_domain(__async_ufw_work, (void *)fw_name, 
&ufw_domain);
+    }
+    async_synchronize_full_domain(&ufw_domain);
+    return count;
+}

  static DEVICE_ATTR_WO(trigger_request);
+static DEVICE_ATTR_WO(trigger_ufw);

  static int __init test_firmware_init(void)
  {
@@ -92,6 +140,12 @@ static int __init test_firmware_init(void)
          goto dereg;
      }

+    rc = device_create_file(test_fw_misc_device.this_device,
+                             &dev_attr_trigger_ufw);
+    if (rc) {
+        pr_err("could not create sysfs interface: %d\n", rc);
+        goto dereg;
+    }
      pr_warn("interface ready\n");

      return 0;
@@ -106,7 +160,9 @@ static void __exit test_firmware_exit(void)
  {
      release_firmware(test_firmware);
      device_remove_file(test_fw_misc_device.this_device,
-               &dev_attr_trigger_request);
+            &dev_attr_trigger_request);
+    device_remove_file(test_fw_misc_device.this_device,
+            &dev_attr_trigger_ufw);
      misc_deregister(&test_fw_misc_device);
      pr_warn("removed interface\n");
  }

----------------------------8<--------------------------------------------

Thanks and regards,
Chandrasekhar L.
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of Code Aurora Forum,
  a Linux Foundation Collaborative Project.

next             reply	other threads:[~2016-03-14 12:01 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-14 12:01 clingutla [this message]
     [not found] <56E2768C.1050000@codeaurora.com>
2016-03-15 15:09 ` Possible race at user mode helper in request_firmware Ming Lei
2016-03-16  2:06   ` Ming Lei
2016-03-16  7:02     ` Chandra Sekhar Lingutla
2016-03-16  8:13       ` Ming Lei
2016-03-16  8:42       ` Ming Lei
2016-03-16  8:48         ` Chandra Sekhar Lingutla
2016-03-16  8:58           ` Ming Lei
2016-03-16  9:04             ` Chandra Sekhar Lingutla
2016-03-16  9:51               ` Chandra Sekhar Lingutla
2016-03-16 10:25                 ` Ming Lei
2016-03-17  5:50                   ` Chandra Sekhar Lingutla

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:484604d dfblob:727fb24 dfblob:86374c1 dfblob:14c9598 )
 OR (
bs:"Possible race at user mode helper in request_firmware" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69c9d897d4c1703fb5efe6d2d9dd9f61@codeaurora.org \
    --to=clingutla@codeaurora.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.