public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* rmmod thermal, unable to handle kernel NULL pointer dereference
@ 2014-05-21  5:57 Kui Zhang
  2014-05-21  8:22 ` Aaron Lu
  0 siblings, 1 reply; 5+ messages in thread
From: Kui Zhang @ 2014-05-21  5:57 UTC (permalink / raw)
  To: aaron.lu, linux-kernel@vger.kernel.org

Hello,

I get following error when rmmod thermal.

rmmod  thermal
Killed

[ 1207.313060] BUG: unable to handle kernel NULL pointer dereference
at           (null)
[ 1207.313460] IP: [<ffffffff816bb906>] _raw_spin_lock_irq+0x6/0x30
[ 1207.313858] PGD 0
[ 1207.314256] Oops: 0002 [#1] SMP
[ 1207.314658] Modules linked in: thermal(-) ntfs vfat msdos fat cpuid
ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT xt_CHECKSUM
iptable_mangle xt_tcpudp bridge stp target_core_mod configfs
ebtable_nat ebtables drbd lru_cache scsi_transport_iscsi xt_conntrack
ip6table_filter ip6_tables iptable_filter ip_tables nf_conntrack_ipv4
nf_defrag_ipv4 xt_state x_tables nf_conntrack dm_multipath dm_mod arc4
tifm_7xx1 coretemp tifm_core hwmon iwl3945 iwlegacy mac80211 cfg80211
tpm_infineon tpm fuse snd_hda_intel snd_hda_controller
snd_hda_codec_idt snd_hda_codec_generic snd_hda_codec snd_hwdep
snd_pcm snd_timer snd soundcore usb_storage raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor xor async_tx
raid6_pq raid1 raid0 multipath linear md_mod sky2 firewire_ohci sr_mod
cdrom firewire_core crc_itu_t uhci_hcd [last unloaded: thermal]
[ 1207.316991] CPU: 1 PID: 22792 Comm: rmmod Tainted: G        W
3.15.0-rc5+ #2
[ 1207.316991] task: ffff88006e712100 ti: ffff880075c28000 task.ti:
ffff880075c28000
[ 1207.316991] RIP: 0010:[<ffffffff816bb906>]  [<ffffffff816bb906>]
_raw_spin_lock_irq+0x6/0x30
[ 1207.316991] RSP: 0018:ffff880075c29da8  EFLAGS: 00010006
[ 1207.316991] RAX: 0000000000000100 RBX: ffff88006e414990 RCX: 0000000000000002
[ 1207.316991] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
[ 1207.316991] RBP: 0000000000000001 R08: 0000000000016280 R09: ffff8800c7b16280
[ 1207.316991] R10: ffffffff8135a910 R11: ffffea0003062240 R12: 0000000000000002
[ 1207.316991] R13: 0000000000000001 R14: ffff88006e415e00 R15: 0000000000000000
[ 1207.316991] FS:  00007f265711f740(0000) GS:ffff8800c7b00000(0000)
knlGS:0000000000000000
[ 1207.316991] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1207.316991] CR2: 0000000000000000 CR3: 000000008aecd000 CR4: 00000000000007e0
[ 1207.316991] Stack:
[ 1207.316991]  ffffffff8109152b 00ff8800c2e2d670 ffff88006e414a00
ffff88006e415e00
[ 1207.316991]  ffff880075c29e90 ffff88006e415e70 ffff88006e415e20
0000000000000002
[ 1207.316991]  ffff880075c29e20 ffffffff810917b7 ffff880075c29e28
0000000000000000
[ 1207.316991] Call Trace:
[ 1207.316991]  [<ffffffff8109152b>] ? flush_workqueue_prep_pwqs+0x5b/0x180
[ 1207.316991]  [<ffffffff810917b7>] ? flush_workqueue+0x167/0x5a0
[ 1207.316991]  [<ffffffff81350000>] ? fbcon_switch+0x570/0x580
[ 1207.316991]  [<ffffffff81350000>] ? fbcon_switch+0x570/0x580
[ 1207.316991]  [<ffffffffa0506a38>] ? acpi_thermal_remove+0x23/0x94 [thermal]
[ 1207.316991]  [<ffffffff8135ccfd>] ? acpi_device_remove+0x74/0x91
[ 1207.316991]  [<ffffffff8150df65>] ? __device_release_driver+0x75/0xf0
[ 1207.316991]  [<ffffffff8150e690>] ? driver_detach+0xa0/0xb0
[ 1207.316991]  [<ffffffff8150dc13>] ? bus_remove_driver+0x43/0xa0
[ 1207.316991]  [<ffffffff810da6db>] ? SyS_delete_module+0x11b/0x1a0
[ 1207.316991]  [<ffffffff816bc600>] ? tracesys+0x7e/0xe6
[ 1207.316991]  [<ffffffff816bc663>] ? tracesys+0xe1/0xe6
[ 1207.316991] Code: 0f 1f 44 00 00 8d 8a 00 01 00 00 89 d0 f0 66 0f
b1 0f 66 39 d0 75 e5 b8 01 00 00 00 c3 0f 1f 84 00 00 00 00 00 fa b8
00 01 00 00 <f0> 66 0f c1 07 0f b6 d4 38 c2 75 03 c3 f3 90 0f b6 07 38
d0 75
[ 1207.316991] RIP  [<ffffffff816bb906>] _raw_spin_lock_irq+0x6/0x30
[ 1207.316991]  RSP <ffff880075c29da8>
[ 1207.316991] CR2: 0000000000000000
[ 1207.316991] ---[ end trace 45b9cde975015a9d ]---


when try again:

rmmod thermal
rmmod: ERROR: ../libkmod/libkmod-module.c:764
kmod_module_remove_module() could not remove 'thermal': Device or
resource busy
rmmod: ERROR: could not remove module thermal: Device or resource busy


The problem seem to have started after: commit
a59ffb2062df3a5c346dbed931fa1e587fd0f0f3


Thanks
Kui.Z

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: rmmod thermal, unable to handle kernel NULL pointer dereference
  2014-05-21  5:57 rmmod thermal, unable to handle kernel NULL pointer dereference Kui Zhang
@ 2014-05-21  8:22 ` Aaron Lu
  2014-05-21  8:33   ` [PATCH] thermal: hwmon: Make the check for critical temp valid consistent Aaron Lu
                     ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Aaron Lu @ 2014-05-21  8:22 UTC (permalink / raw)
  To: Kui Zhang
  Cc: linux-kernel@vger.kernel.org, ACPI Devel Mailing List,
	Rafael J. Wysocki, Zhang Rui

On 05/21/2014 01:57 PM, Kui Zhang wrote:
> Hello,
> 
> I get following error when rmmod thermal.
> 
> rmmod  thermal
> Killed

Thanks for the report. Here is a fix patch that should solve this
problem.

From: Aaron Lu <aaron.lu@intel.com>
Date: Wed, 21 May 2014 16:00:42 +0800
Subject: [PATCH] ACPI / thermal: fix workqueue destroy order

When the thermal module is to be removed, we should destroy the wq
acpi_thermal_pm_queue after the ACPI driver's remove callback is
executed as we will need to flush the workqueue there, or a NULL pointer
access will be hit.

Reported-by: Kui Zhang <kuizhang@gmail.com>
Reference: http://www.spinics.net/lists/kernel/msg1747251.html
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/acpi/thermal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index c1e31a41f949..25bbc55dca89 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -1278,8 +1278,8 @@ static int __init acpi_thermal_init(void)
 
 static void __exit acpi_thermal_exit(void)
 {
-	destroy_workqueue(acpi_thermal_pm_queue);
 	acpi_bus_unregister_driver(&acpi_thermal_driver);
+	destroy_workqueue(acpi_thermal_pm_queue);
 
 	return;
 }
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] thermal: hwmon: Make the check for critical temp valid consistent
  2014-05-21  8:22 ` Aaron Lu
@ 2014-05-21  8:33   ` Aaron Lu
  2014-05-21 15:38   ` rmmod thermal, unable to handle kernel NULL pointer dereference Kui Zhang
  2014-05-26 12:53   ` Rafael J. Wysocki
  2 siblings, 0 replies; 5+ messages in thread
From: Aaron Lu @ 2014-05-21  8:33 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Kui Zhang, linux-kernel@vger.kernel.org, ACPI Devel Mailing List,
	Rafael J. Wysocki, Linux-pm mailing list

On 05/21/2014 04:22 PM, Aaron Lu wrote:
> On 05/21/2014 01:57 PM, Kui Zhang wrote:
>> Hello,
>>
>> I get following error when rmmod thermal.
>>
>> rmmod  thermal
>> Killed

While dealing with this problem, I found another problem that also
results in a kernel crash on thermal module removal:


From: Aaron Lu <aaron.lu@intel.com>
Date: Wed, 21 May 2014 16:05:38 +0800
Subject: [PATCH] thermal: hwmon: Make the check for critical temp valid consistent

We used the tz->ops->get_crit_temp && !tz->ops->get_crit_temp(tz, temp)
to decide if we need to create the temp_crit attribute file but we just
check if tz->ops->get_crit_temp exists to decide if we need to remove
that attribute file. Some ACPI thermal zone doesn't have a valid critical
trip point and that would result in removing a non-existent device file
on thermal module unload.

Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/thermal/thermal_hwmon.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
index fdb07199d9c2..1967bee4f076 100644
--- a/drivers/thermal/thermal_hwmon.c
+++ b/drivers/thermal/thermal_hwmon.c
@@ -140,6 +140,12 @@ thermal_hwmon_lookup_temp(const struct thermal_hwmon_device *hwmon,
 	return NULL;
 }
 
+static bool thermal_zone_crit_temp_valid(struct thermal_zone_device *tz)
+{
+	unsigned long temp;
+	return tz->ops->get_crit_temp && !tz->ops->get_crit_temp(tz, &temp);
+}
+
 int thermal_add_hwmon_sysfs(struct thermal_zone_device *tz)
 {
 	struct thermal_hwmon_device *hwmon;
@@ -189,21 +195,18 @@ int thermal_add_hwmon_sysfs(struct thermal_zone_device *tz)
 	if (result)
 		goto free_temp_mem;
 
-	if (tz->ops->get_crit_temp) {
-		unsigned long temperature;
-		if (!tz->ops->get_crit_temp(tz, &temperature)) {
-			snprintf(temp->temp_crit.name,
-				 sizeof(temp->temp_crit.name),
+	if (thermal_zone_crit_temp_valid(tz)) {
+		snprintf(temp->temp_crit.name,
+				sizeof(temp->temp_crit.name),
 				"temp%d_crit", hwmon->count);
-			temp->temp_crit.attr.attr.name = temp->temp_crit.name;
-			temp->temp_crit.attr.attr.mode = 0444;
-			temp->temp_crit.attr.show = temp_crit_show;
-			sysfs_attr_init(&temp->temp_crit.attr.attr);
-			result = device_create_file(hwmon->device,
-						    &temp->temp_crit.attr);
-			if (result)
-				goto unregister_input;
-		}
+		temp->temp_crit.attr.attr.name = temp->temp_crit.name;
+		temp->temp_crit.attr.attr.mode = 0444;
+		temp->temp_crit.attr.show = temp_crit_show;
+		sysfs_attr_init(&temp->temp_crit.attr.attr);
+		result = device_create_file(hwmon->device,
+					    &temp->temp_crit.attr);
+		if (result)
+			goto unregister_input;
 	}
 
 	mutex_lock(&thermal_hwmon_list_lock);
@@ -250,7 +253,7 @@ void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
 	}
 
 	device_remove_file(hwmon->device, &temp->temp_input.attr);
-	if (tz->ops->get_crit_temp)
+	if (thermal_zone_crit_temp_valid(tz))
 		device_remove_file(hwmon->device, &temp->temp_crit.attr);
 
 	mutex_lock(&thermal_hwmon_list_lock);
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: rmmod thermal, unable to handle kernel NULL pointer dereference
  2014-05-21  8:22 ` Aaron Lu
  2014-05-21  8:33   ` [PATCH] thermal: hwmon: Make the check for critical temp valid consistent Aaron Lu
@ 2014-05-21 15:38   ` Kui Zhang
  2014-05-26 12:53   ` Rafael J. Wysocki
  2 siblings, 0 replies; 5+ messages in thread
From: Kui Zhang @ 2014-05-21 15:38 UTC (permalink / raw)
  To: Aaron Lu
  Cc: linux-kernel@vger.kernel.org, ACPI Devel Mailing List,
	Rafael J. Wysocki, Zhang Rui

Thanks, that fixed it.

On Wed, May 21, 2014 at 1:22 AM, Aaron Lu <aaron.lu@intel.com> wrote:
> On 05/21/2014 01:57 PM, Kui Zhang wrote:
>> Hello,
>>
>> I get following error when rmmod thermal.
>>
>> rmmod  thermal
>> Killed
>
> Thanks for the report. Here is a fix patch that should solve this
> problem.
>
> From: Aaron Lu <aaron.lu@intel.com>
> Date: Wed, 21 May 2014 16:00:42 +0800
> Subject: [PATCH] ACPI / thermal: fix workqueue destroy order
>
> When the thermal module is to be removed, we should destroy the wq
> acpi_thermal_pm_queue after the ACPI driver's remove callback is
> executed as we will need to flush the workqueue there, or a NULL pointer
> access will be hit.
>
> Reported-by: Kui Zhang <kuizhang@gmail.com>
> Reference: http://www.spinics.net/lists/kernel/msg1747251.html
> Cc: All applicable <stable@vger.kernel.org>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> ---
>  drivers/acpi/thermal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
> index c1e31a41f949..25bbc55dca89 100644
> --- a/drivers/acpi/thermal.c
> +++ b/drivers/acpi/thermal.c
> @@ -1278,8 +1278,8 @@ static int __init acpi_thermal_init(void)
>
>  static void __exit acpi_thermal_exit(void)
>  {
> -       destroy_workqueue(acpi_thermal_pm_queue);
>         acpi_bus_unregister_driver(&acpi_thermal_driver);
> +       destroy_workqueue(acpi_thermal_pm_queue);
>
>         return;
>  }
> --
> 1.9.0
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: rmmod thermal, unable to handle kernel NULL pointer dereference
  2014-05-21  8:22 ` Aaron Lu
  2014-05-21  8:33   ` [PATCH] thermal: hwmon: Make the check for critical temp valid consistent Aaron Lu
  2014-05-21 15:38   ` rmmod thermal, unable to handle kernel NULL pointer dereference Kui Zhang
@ 2014-05-26 12:53   ` Rafael J. Wysocki
  2 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2014-05-26 12:53 UTC (permalink / raw)
  To: Aaron Lu
  Cc: Kui Zhang, linux-kernel@vger.kernel.org, ACPI Devel Mailing List,
	Zhang Rui

On Wednesday, May 21, 2014 04:22:58 PM Aaron Lu wrote:
> On 05/21/2014 01:57 PM, Kui Zhang wrote:
> > Hello,
> > 
> > I get following error when rmmod thermal.
> > 
> > rmmod  thermal
> > Killed
> 
> Thanks for the report. Here is a fix patch that should solve this
> problem.
> 
> From: Aaron Lu <aaron.lu@intel.com>
> Date: Wed, 21 May 2014 16:00:42 +0800
> Subject: [PATCH] ACPI / thermal: fix workqueue destroy order
> 
> When the thermal module is to be removed, we should destroy the wq
> acpi_thermal_pm_queue after the ACPI driver's remove callback is
> executed as we will need to flush the workqueue there, or a NULL pointer
> access will be hit.
> 
> Reported-by: Kui Zhang <kuizhang@gmail.com>
> Reference: http://www.spinics.net/lists/kernel/msg1747251.html
> Cc: All applicable <stable@vger.kernel.org>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>

I'm going to push this as a fix for 3.15, thanks!

> ---
>  drivers/acpi/thermal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
> index c1e31a41f949..25bbc55dca89 100644
> --- a/drivers/acpi/thermal.c
> +++ b/drivers/acpi/thermal.c
> @@ -1278,8 +1278,8 @@ static int __init acpi_thermal_init(void)
>  
>  static void __exit acpi_thermal_exit(void)
>  {
> -	destroy_workqueue(acpi_thermal_pm_queue);
>  	acpi_bus_unregister_driver(&acpi_thermal_driver);
> +	destroy_workqueue(acpi_thermal_pm_queue);
>  
>  	return;
>  }
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-26 12:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-21  5:57 rmmod thermal, unable to handle kernel NULL pointer dereference Kui Zhang
2014-05-21  8:22 ` Aaron Lu
2014-05-21  8:33   ` [PATCH] thermal: hwmon: Make the check for critical temp valid consistent Aaron Lu
2014-05-21 15:38   ` rmmod thermal, unable to handle kernel NULL pointer dereference Kui Zhang
2014-05-26 12:53   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox