* rmmod thermal, unable to handle kernel NULL pointer dereference
@ 2014-05-21 5:57 Kui Zhang
2014-05-21 8:22 ` Aaron Lu
0 siblings, 1 reply; 5+ messages in thread
From: Kui Zhang @ 2014-05-21 5:57 UTC (permalink / raw)
To: aaron.lu, linux-kernel@vger.kernel.org
Hello,
I get following error when rmmod thermal.
rmmod thermal
Killed
[ 1207.313060] BUG: unable to handle kernel NULL pointer dereference
at (null)
[ 1207.313460] IP: [<ffffffff816bb906>] _raw_spin_lock_irq+0x6/0x30
[ 1207.313858] PGD 0
[ 1207.314256] Oops: 0002 [#1] SMP
[ 1207.314658] Modules linked in: thermal(-) ntfs vfat msdos fat cpuid
ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT xt_CHECKSUM
iptable_mangle xt_tcpudp bridge stp target_core_mod configfs
ebtable_nat ebtables drbd lru_cache scsi_transport_iscsi xt_conntrack
ip6table_filter ip6_tables iptable_filter ip_tables nf_conntrack_ipv4
nf_defrag_ipv4 xt_state x_tables nf_conntrack dm_multipath dm_mod arc4
tifm_7xx1 coretemp tifm_core hwmon iwl3945 iwlegacy mac80211 cfg80211
tpm_infineon tpm fuse snd_hda_intel snd_hda_controller
snd_hda_codec_idt snd_hda_codec_generic snd_hda_codec snd_hwdep
snd_pcm snd_timer snd soundcore usb_storage raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor xor async_tx
raid6_pq raid1 raid0 multipath linear md_mod sky2 firewire_ohci sr_mod
cdrom firewire_core crc_itu_t uhci_hcd [last unloaded: thermal]
[ 1207.316991] CPU: 1 PID: 22792 Comm: rmmod Tainted: G W
3.15.0-rc5+ #2
[ 1207.316991] task: ffff88006e712100 ti: ffff880075c28000 task.ti:
ffff880075c28000
[ 1207.316991] RIP: 0010:[<ffffffff816bb906>] [<ffffffff816bb906>]
_raw_spin_lock_irq+0x6/0x30
[ 1207.316991] RSP: 0018:ffff880075c29da8 EFLAGS: 00010006
[ 1207.316991] RAX: 0000000000000100 RBX: ffff88006e414990 RCX: 0000000000000002
[ 1207.316991] RDX: 0000000000000002 RSI: 0000000000000001 RDI: 0000000000000000
[ 1207.316991] RBP: 0000000000000001 R08: 0000000000016280 R09: ffff8800c7b16280
[ 1207.316991] R10: ffffffff8135a910 R11: ffffea0003062240 R12: 0000000000000002
[ 1207.316991] R13: 0000000000000001 R14: ffff88006e415e00 R15: 0000000000000000
[ 1207.316991] FS: 00007f265711f740(0000) GS:ffff8800c7b00000(0000)
knlGS:0000000000000000
[ 1207.316991] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1207.316991] CR2: 0000000000000000 CR3: 000000008aecd000 CR4: 00000000000007e0
[ 1207.316991] Stack:
[ 1207.316991] ffffffff8109152b 00ff8800c2e2d670 ffff88006e414a00
ffff88006e415e00
[ 1207.316991] ffff880075c29e90 ffff88006e415e70 ffff88006e415e20
0000000000000002
[ 1207.316991] ffff880075c29e20 ffffffff810917b7 ffff880075c29e28
0000000000000000
[ 1207.316991] Call Trace:
[ 1207.316991] [<ffffffff8109152b>] ? flush_workqueue_prep_pwqs+0x5b/0x180
[ 1207.316991] [<ffffffff810917b7>] ? flush_workqueue+0x167/0x5a0
[ 1207.316991] [<ffffffff81350000>] ? fbcon_switch+0x570/0x580
[ 1207.316991] [<ffffffff81350000>] ? fbcon_switch+0x570/0x580
[ 1207.316991] [<ffffffffa0506a38>] ? acpi_thermal_remove+0x23/0x94 [thermal]
[ 1207.316991] [<ffffffff8135ccfd>] ? acpi_device_remove+0x74/0x91
[ 1207.316991] [<ffffffff8150df65>] ? __device_release_driver+0x75/0xf0
[ 1207.316991] [<ffffffff8150e690>] ? driver_detach+0xa0/0xb0
[ 1207.316991] [<ffffffff8150dc13>] ? bus_remove_driver+0x43/0xa0
[ 1207.316991] [<ffffffff810da6db>] ? SyS_delete_module+0x11b/0x1a0
[ 1207.316991] [<ffffffff816bc600>] ? tracesys+0x7e/0xe6
[ 1207.316991] [<ffffffff816bc663>] ? tracesys+0xe1/0xe6
[ 1207.316991] Code: 0f 1f 44 00 00 8d 8a 00 01 00 00 89 d0 f0 66 0f
b1 0f 66 39 d0 75 e5 b8 01 00 00 00 c3 0f 1f 84 00 00 00 00 00 fa b8
00 01 00 00 <f0> 66 0f c1 07 0f b6 d4 38 c2 75 03 c3 f3 90 0f b6 07 38
d0 75
[ 1207.316991] RIP [<ffffffff816bb906>] _raw_spin_lock_irq+0x6/0x30
[ 1207.316991] RSP <ffff880075c29da8>
[ 1207.316991] CR2: 0000000000000000
[ 1207.316991] ---[ end trace 45b9cde975015a9d ]---
when try again:
rmmod thermal
rmmod: ERROR: ../libkmod/libkmod-module.c:764
kmod_module_remove_module() could not remove 'thermal': Device or
resource busy
rmmod: ERROR: could not remove module thermal: Device or resource busy
The problem seem to have started after: commit
a59ffb2062df3a5c346dbed931fa1e587fd0f0f3
Thanks
Kui.Z
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rmmod thermal, unable to handle kernel NULL pointer dereference
2014-05-21 5:57 rmmod thermal, unable to handle kernel NULL pointer dereference Kui Zhang
@ 2014-05-21 8:22 ` Aaron Lu
2014-05-21 8:33 ` [PATCH] thermal: hwmon: Make the check for critical temp valid consistent Aaron Lu
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Aaron Lu @ 2014-05-21 8:22 UTC (permalink / raw)
To: Kui Zhang
Cc: linux-kernel@vger.kernel.org, ACPI Devel Mailing List,
Rafael J. Wysocki, Zhang Rui
On 05/21/2014 01:57 PM, Kui Zhang wrote:
> Hello,
>
> I get following error when rmmod thermal.
>
> rmmod thermal
> Killed
Thanks for the report. Here is a fix patch that should solve this
problem.
From: Aaron Lu <aaron.lu@intel.com>
Date: Wed, 21 May 2014 16:00:42 +0800
Subject: [PATCH] ACPI / thermal: fix workqueue destroy order
When the thermal module is to be removed, we should destroy the wq
acpi_thermal_pm_queue after the ACPI driver's remove callback is
executed as we will need to flush the workqueue there, or a NULL pointer
access will be hit.
Reported-by: Kui Zhang <kuizhang@gmail.com>
Reference: http://www.spinics.net/lists/kernel/msg1747251.html
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
drivers/acpi/thermal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index c1e31a41f949..25bbc55dca89 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -1278,8 +1278,8 @@ static int __init acpi_thermal_init(void)
static void __exit acpi_thermal_exit(void)
{
- destroy_workqueue(acpi_thermal_pm_queue);
acpi_bus_unregister_driver(&acpi_thermal_driver);
+ destroy_workqueue(acpi_thermal_pm_queue);
return;
}
--
1.9.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH] thermal: hwmon: Make the check for critical temp valid consistent
2014-05-21 8:22 ` Aaron Lu
@ 2014-05-21 8:33 ` Aaron Lu
2014-05-21 15:38 ` rmmod thermal, unable to handle kernel NULL pointer dereference Kui Zhang
2014-05-26 12:53 ` Rafael J. Wysocki
2 siblings, 0 replies; 5+ messages in thread
From: Aaron Lu @ 2014-05-21 8:33 UTC (permalink / raw)
To: Zhang Rui
Cc: Kui Zhang, linux-kernel@vger.kernel.org, ACPI Devel Mailing List,
Rafael J. Wysocki, Linux-pm mailing list
On 05/21/2014 04:22 PM, Aaron Lu wrote:
> On 05/21/2014 01:57 PM, Kui Zhang wrote:
>> Hello,
>>
>> I get following error when rmmod thermal.
>>
>> rmmod thermal
>> Killed
While dealing with this problem, I found another problem that also
results in a kernel crash on thermal module removal:
From: Aaron Lu <aaron.lu@intel.com>
Date: Wed, 21 May 2014 16:05:38 +0800
Subject: [PATCH] thermal: hwmon: Make the check for critical temp valid consistent
We used the tz->ops->get_crit_temp && !tz->ops->get_crit_temp(tz, temp)
to decide if we need to create the temp_crit attribute file but we just
check if tz->ops->get_crit_temp exists to decide if we need to remove
that attribute file. Some ACPI thermal zone doesn't have a valid critical
trip point and that would result in removing a non-existent device file
on thermal module unload.
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
drivers/thermal/thermal_hwmon.c | 33 ++++++++++++++++++---------------
1 file changed, 18 insertions(+), 15 deletions(-)
diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
index fdb07199d9c2..1967bee4f076 100644
--- a/drivers/thermal/thermal_hwmon.c
+++ b/drivers/thermal/thermal_hwmon.c
@@ -140,6 +140,12 @@ thermal_hwmon_lookup_temp(const struct thermal_hwmon_device *hwmon,
return NULL;
}
+static bool thermal_zone_crit_temp_valid(struct thermal_zone_device *tz)
+{
+ unsigned long temp;
+ return tz->ops->get_crit_temp && !tz->ops->get_crit_temp(tz, &temp);
+}
+
int thermal_add_hwmon_sysfs(struct thermal_zone_device *tz)
{
struct thermal_hwmon_device *hwmon;
@@ -189,21 +195,18 @@ int thermal_add_hwmon_sysfs(struct thermal_zone_device *tz)
if (result)
goto free_temp_mem;
- if (tz->ops->get_crit_temp) {
- unsigned long temperature;
- if (!tz->ops->get_crit_temp(tz, &temperature)) {
- snprintf(temp->temp_crit.name,
- sizeof(temp->temp_crit.name),
+ if (thermal_zone_crit_temp_valid(tz)) {
+ snprintf(temp->temp_crit.name,
+ sizeof(temp->temp_crit.name),
"temp%d_crit", hwmon->count);
- temp->temp_crit.attr.attr.name = temp->temp_crit.name;
- temp->temp_crit.attr.attr.mode = 0444;
- temp->temp_crit.attr.show = temp_crit_show;
- sysfs_attr_init(&temp->temp_crit.attr.attr);
- result = device_create_file(hwmon->device,
- &temp->temp_crit.attr);
- if (result)
- goto unregister_input;
- }
+ temp->temp_crit.attr.attr.name = temp->temp_crit.name;
+ temp->temp_crit.attr.attr.mode = 0444;
+ temp->temp_crit.attr.show = temp_crit_show;
+ sysfs_attr_init(&temp->temp_crit.attr.attr);
+ result = device_create_file(hwmon->device,
+ &temp->temp_crit.attr);
+ if (result)
+ goto unregister_input;
}
mutex_lock(&thermal_hwmon_list_lock);
@@ -250,7 +253,7 @@ void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
}
device_remove_file(hwmon->device, &temp->temp_input.attr);
- if (tz->ops->get_crit_temp)
+ if (thermal_zone_crit_temp_valid(tz))
device_remove_file(hwmon->device, &temp->temp_crit.attr);
mutex_lock(&thermal_hwmon_list_lock);
--
1.9.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: rmmod thermal, unable to handle kernel NULL pointer dereference
2014-05-21 8:22 ` Aaron Lu
2014-05-21 8:33 ` [PATCH] thermal: hwmon: Make the check for critical temp valid consistent Aaron Lu
@ 2014-05-21 15:38 ` Kui Zhang
2014-05-26 12:53 ` Rafael J. Wysocki
2 siblings, 0 replies; 5+ messages in thread
From: Kui Zhang @ 2014-05-21 15:38 UTC (permalink / raw)
To: Aaron Lu
Cc: linux-kernel@vger.kernel.org, ACPI Devel Mailing List,
Rafael J. Wysocki, Zhang Rui
Thanks, that fixed it.
On Wed, May 21, 2014 at 1:22 AM, Aaron Lu <aaron.lu@intel.com> wrote:
> On 05/21/2014 01:57 PM, Kui Zhang wrote:
>> Hello,
>>
>> I get following error when rmmod thermal.
>>
>> rmmod thermal
>> Killed
>
> Thanks for the report. Here is a fix patch that should solve this
> problem.
>
> From: Aaron Lu <aaron.lu@intel.com>
> Date: Wed, 21 May 2014 16:00:42 +0800
> Subject: [PATCH] ACPI / thermal: fix workqueue destroy order
>
> When the thermal module is to be removed, we should destroy the wq
> acpi_thermal_pm_queue after the ACPI driver's remove callback is
> executed as we will need to flush the workqueue there, or a NULL pointer
> access will be hit.
>
> Reported-by: Kui Zhang <kuizhang@gmail.com>
> Reference: http://www.spinics.net/lists/kernel/msg1747251.html
> Cc: All applicable <stable@vger.kernel.org>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> ---
> drivers/acpi/thermal.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
> index c1e31a41f949..25bbc55dca89 100644
> --- a/drivers/acpi/thermal.c
> +++ b/drivers/acpi/thermal.c
> @@ -1278,8 +1278,8 @@ static int __init acpi_thermal_init(void)
>
> static void __exit acpi_thermal_exit(void)
> {
> - destroy_workqueue(acpi_thermal_pm_queue);
> acpi_bus_unregister_driver(&acpi_thermal_driver);
> + destroy_workqueue(acpi_thermal_pm_queue);
>
> return;
> }
> --
> 1.9.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rmmod thermal, unable to handle kernel NULL pointer dereference
2014-05-21 8:22 ` Aaron Lu
2014-05-21 8:33 ` [PATCH] thermal: hwmon: Make the check for critical temp valid consistent Aaron Lu
2014-05-21 15:38 ` rmmod thermal, unable to handle kernel NULL pointer dereference Kui Zhang
@ 2014-05-26 12:53 ` Rafael J. Wysocki
2 siblings, 0 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2014-05-26 12:53 UTC (permalink / raw)
To: Aaron Lu
Cc: Kui Zhang, linux-kernel@vger.kernel.org, ACPI Devel Mailing List,
Zhang Rui
On Wednesday, May 21, 2014 04:22:58 PM Aaron Lu wrote:
> On 05/21/2014 01:57 PM, Kui Zhang wrote:
> > Hello,
> >
> > I get following error when rmmod thermal.
> >
> > rmmod thermal
> > Killed
>
> Thanks for the report. Here is a fix patch that should solve this
> problem.
>
> From: Aaron Lu <aaron.lu@intel.com>
> Date: Wed, 21 May 2014 16:00:42 +0800
> Subject: [PATCH] ACPI / thermal: fix workqueue destroy order
>
> When the thermal module is to be removed, we should destroy the wq
> acpi_thermal_pm_queue after the ACPI driver's remove callback is
> executed as we will need to flush the workqueue there, or a NULL pointer
> access will be hit.
>
> Reported-by: Kui Zhang <kuizhang@gmail.com>
> Reference: http://www.spinics.net/lists/kernel/msg1747251.html
> Cc: All applicable <stable@vger.kernel.org>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
I'm going to push this as a fix for 3.15, thanks!
> ---
> drivers/acpi/thermal.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
> index c1e31a41f949..25bbc55dca89 100644
> --- a/drivers/acpi/thermal.c
> +++ b/drivers/acpi/thermal.c
> @@ -1278,8 +1278,8 @@ static int __init acpi_thermal_init(void)
>
> static void __exit acpi_thermal_exit(void)
> {
> - destroy_workqueue(acpi_thermal_pm_queue);
> acpi_bus_unregister_driver(&acpi_thermal_driver);
> + destroy_workqueue(acpi_thermal_pm_queue);
>
> return;
> }
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-05-26 12:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-21 5:57 rmmod thermal, unable to handle kernel NULL pointer dereference Kui Zhang
2014-05-21 8:22 ` Aaron Lu
2014-05-21 8:33 ` [PATCH] thermal: hwmon: Make the check for critical temp valid consistent Aaron Lu
2014-05-21 15:38 ` rmmod thermal, unable to handle kernel NULL pointer dereference Kui Zhang
2014-05-26 12:53 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox