linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NULL pointer dereference in swsusp_free with 3.17-rc5
@ 2014-09-23 14:50 Bjørn Mork
  2014-09-23 15:24 ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Bjørn Mork @ 2014-09-23 14:50 UTC (permalink / raw)
  To: linux-pm

Hello,

I just tested Debians experimental v3.17-rc5 build and got this after
resuming from hibernation:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff810a8cc1>] swsusp_free+0x21/0x190
PGD b39c2067 PUD b39c1067 PMD 0 
Oops: 0000 [#1] SMP 
Modules linked in: ctr ccm xt_multiport iptable_filter ip_tables bridge aufs(C) bnep xt_hl nf_log_ipv6 nf_log_common xt_LOG binfmt_misc ip6table_filter ip6_tables x_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc 8021q garp stp mrp llc tun loop fuse joydev arc4 iwlmvm mac80211 snd_hda_codec_conexant iwlwifi snd_hda_codec_generic iTCO_wdt iTCO_vendor_support cfg80211 coretemp kvm_intel kvm psmouse evdev serio_raw cdc_acm cdc_mbim cdc_wdm uvcvideo cdc_ncm videobuf2_vmalloc videobuf2_memops usbnet mii videobuf2_core v4l2_common videodev media i2c_i801 lpc_ich mfd_core snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm shpchp snd_timer mei_me mei wmi thinkpad_acpi nvram i915 snd soundcore ac battery ecb drm_kms_helper drm tpm_tis acpi_cpufreq tpm i2c_algo_bit i2ccore video btusb bluetooth processor rfkill button ext4 crc16 mbcache jbd2 nbd sd_mod crc_t10dif crct10dif_generic sg sr_mod cdrom crct10dif_common ahci libahci libata scsi_mod thermal thermal_sys uhci_hcd ehci_pci ehci_hcd usbcore usb_common e1000e ptp pps_core
CPU: 1 PID: 4898 Comm: s2disk Tainted: G         C     3.17-rc5-amd64 #1 Debian 3.17~rc5-1~exp1
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
task: ffff88023155ea40 ti: ffff8800b3b14000 task.ti: ffff8800b3b14000
RIP: 0010:[<ffffffff810a8cc1>]  [<ffffffff810a8cc1>] swsusp_free+0x21/0x190
RSP: 0018:ffff8800b3b17ea8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800b39bab00 RCX: 0000000000000001
RDX: ffff8800b39bab10 RSI: ffff8800b39bab00 RDI: 0000000000000000
RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8800b39bab10 R11: 0000000000000246 R12: ffffea0000000000
R13: ffff880232f485a0 R14: ffff88023ac27cd8 R15: ffff880232927590
FS:  00007f406d83b700(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000b3a62000 CR4: 00000000000007e0
Stack:
 ffff8800b39bab00 0000000000000010 ffff880232927590 ffffffff810acb4a
 ffff8800b39bab00 ffffffff811a955a ffff8800b39bab10 0000000000000000
 ffff88023155f098 ffffffff81a6b8c0 ffff88023155ea40 0000000000000007
Call Trace:
 [<ffffffff810acb4a>] ? snapshot_release+0x2a/0xb0
 [<ffffffff811a955a>] ? __fput+0xca/0x1d0
 [<ffffffff81080627>] ? task_work_run+0x97/0xd0
 [<ffffffff81012d89>] ? do_notify_resume+0x69/0xa0
 [<ffffffff8151452a>] ? int_signal+0x12/0x17
Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 54 48 8b 05 ba 62 9c 00 49 bc 00 00 00 00 00 ea ff ff 48 8b 3d a1 62 9c 00 55 53 <48> 8b 10 48 89 50 18 48 8b 52 20 48 c7 40 28 00 00 00 00 c7 40 
RIP  [<ffffffff810a8cc1>] swsusp_free+0x21/0x190
 RSP <ffff8800b3b17ea8>
CR2: 0000000000000000
---[ end trace f02be86a1ec0cccb ]---


This is the first 3.17-rc I try, so it might be an issue from 3.17-rc1.
v3.16 is fine.

This problem seems to be reliably repeatable so I can try to narrow it
down a bit.


Bjørn (not subscribed, so please CC me)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-23 14:50 NULL pointer dereference in swsusp_free with 3.17-rc5 Bjørn Mork
@ 2014-09-23 15:24 ` Rafael J. Wysocki
  2014-09-23 17:27   ` Bjørn Mork
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2014-09-23 15:24 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: linux-pm

On Tuesday, September 23, 2014 04:50:25 PM Bjørn Mork wrote:
> Hello,
> 
> I just tested Debians experimental v3.17-rc5 build and got this after
> resuming from hibernation:
> 
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<ffffffff810a8cc1>] swsusp_free+0x21/0x190
> PGD b39c2067 PUD b39c1067 PMD 0 
> Oops: 0000 [#1] SMP 
> Modules linked in: ctr ccm xt_multiport iptable_filter ip_tables bridge aufs(C) bnep xt_hl nf_log_ipv6 nf_log_common xt_LOG binfmt_misc ip6table_filter ip6_tables x_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc 8021q garp stp mrp llc tun loop fuse joydev arc4 iwlmvm mac80211 snd_hda_codec_conexant iwlwifi snd_hda_codec_generic iTCO_wdt iTCO_vendor_support cfg80211 coretemp kvm_intel kvm psmouse evdev serio_raw cdc_acm cdc_mbim cdc_wdm uvcvideo cdc_ncm videobuf2_vmalloc videobuf2_memops usbnet mii videobuf2_core v4l2_common videodev media i2c_i801 lpc_ich mfd_core snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm shpchp snd_timer mei_me mei wmi thinkpad_acpi nvram i915 snd soundcore ac battery ecb drm_kms_helper drm tpm_tis acpi_cpufreq tpm i2c_algo_bit i2ccore video btusb bluetooth processor rfkill button ext4 crc16 mbcache jbd2 nbd sd_mod crc_t10dif crct10dif_generic sg sr_mod cdrom crct10dif_common ahci libahci libata scsi_mod thermal thermal_sys uhci_hcd ehci_pci ehci_hcd usbcore usb_common e1000e ptp pps_core
> CPU: 1 PID: 4898 Comm: s2disk Tainted: G         C     3.17-rc5-amd64 #1 Debian 3.17~rc5-1~exp1
> Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
> task: ffff88023155ea40 ti: ffff8800b3b14000 task.ti: ffff8800b3b14000
> RIP: 0010:[<ffffffff810a8cc1>]  [<ffffffff810a8cc1>] swsusp_free+0x21/0x190
> RSP: 0018:ffff8800b3b17ea8  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff8800b39bab00 RCX: 0000000000000001
> RDX: ffff8800b39bab10 RSI: ffff8800b39bab00 RDI: 0000000000000000
> RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000000
> R10: ffff8800b39bab10 R11: 0000000000000246 R12: ffffea0000000000
> R13: ffff880232f485a0 R14: ffff88023ac27cd8 R15: ffff880232927590
> FS:  00007f406d83b700(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 00000000b3a62000 CR4: 00000000000007e0
> Stack:
>  ffff8800b39bab00 0000000000000010 ffff880232927590 ffffffff810acb4a
>  ffff8800b39bab00 ffffffff811a955a ffff8800b39bab10 0000000000000000
>  ffff88023155f098 ffffffff81a6b8c0 ffff88023155ea40 0000000000000007
> Call Trace:
>  [<ffffffff810acb4a>] ? snapshot_release+0x2a/0xb0
>  [<ffffffff811a955a>] ? __fput+0xca/0x1d0
>  [<ffffffff81080627>] ? task_work_run+0x97/0xd0
>  [<ffffffff81012d89>] ? do_notify_resume+0x69/0xa0
>  [<ffffffff8151452a>] ? int_signal+0x12/0x17
> Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 54 48 8b 05 ba 62 9c 00 49 bc 00 00 00 00 00 ea ff ff 48 8b 3d a1 62 9c 00 55 53 <48> 8b 10 48 89 50 18 48 8b 52 20 48 c7 40 28 00 00 00 00 c7 40 
> RIP  [<ffffffff810a8cc1>] swsusp_free+0x21/0x190
>  RSP <ffff8800b3b17ea8>
> CR2: 0000000000000000
> ---[ end trace f02be86a1ec0cccb ]---
> 
> 
> This is the first 3.17-rc I try, so it might be an issue from 3.17-rc1.
> v3.16 is fine.
> 
> This problem seems to be reliably repeatable so I can try to narrow it
> down a bit.

Well, please do.

I would suspect one of these commits:

84c91b7ae07c PM / hibernate: avoid unsafe pages in e820 reserved regions
0f7d83e85dbd PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node
9047eb629e5c PM / Hibernate: Remove the old memory-bitmap implementation
6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()
3a20cb177961 PM / Hibernate: Implement position keeping in radix tree
07a338236fdc PM / Hibernate: Add memory_rtree_find_bit function
f469f02dc6fa PM / Hibernate: Create a Radix-Tree to store memory bitmap

so I guess you can start from checking them (the topmpost one is the latest).

Rafael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-23 15:24 ` Rafael J. Wysocki
@ 2014-09-23 17:27   ` Bjørn Mork
  2014-09-23 20:28     ` Rafael J. Wysocki
  2014-09-23 21:20     ` Rafael J. Wysocki
  0 siblings, 2 replies; 14+ messages in thread
From: Bjørn Mork @ 2014-09-23 17:27 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]

"Rafael J. Wysocki" <rjw@rjwysocki.net> writes:


> I would suspect one of these commits:
>
> 84c91b7ae07c PM / hibernate: avoid unsafe pages in e820 reserved regions
> 0f7d83e85dbd PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node
> 9047eb629e5c PM / Hibernate: Remove the old memory-bitmap implementation
> 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()
> 3a20cb177961 PM / Hibernate: Implement position keeping in radix tree
> 07a338236fdc PM / Hibernate: Add memory_rtree_find_bit function
> f469f02dc6fa PM / Hibernate: Create a Radix-Tree to store memory bitmap
>
> so I guess you can start from checking them (the topmpost one is the latest).

Thanks.  Yes, you were correct. The bad commit is

 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()

I have confirmed that reverting only this commit on top of a clean
v3.17-rc6 fixes the problem.  I am attaching the context-modified revert
patch I used.


Bjørn


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Revert-PM-Hibernate-Iterate-over-set-bits-instead-of.patch --]
[-- Type: text/x-diff, Size: 2462 bytes --]

>From 92950fd86c2f74ae17840bfc15651b6ae77e43df Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@mork.no>
Date: Tue, 23 Sep 2014 19:18:43 +0200
Subject: [PATCH] Revert "PM / Hibernate: Iterate over set bits instead of
 PFNs in swsusp_free()"

This reverts commit 6efde38f07690652bf0d93f5e4f1a5f496574806.

Conflicts:
	kernel/power/snapshot.c
---
 kernel/power/snapshot.c |   50 ++++++++++++++---------------------------------
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index c4b8093c80b3..f1604d8cf489 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -725,14 +725,6 @@ static void memory_bm_clear_bit(struct memory_bitmap *bm, unsigned long pfn)
 	clear_bit(bit, addr);
 }
 
-static void memory_bm_clear_current(struct memory_bitmap *bm)
-{
-	int bit;
-
-	bit = max(bm->cur.node_bit - 1, 0);
-	clear_bit(bit, bm->cur.node->data);
-}
-
 static int memory_bm_test_bit(struct memory_bitmap *bm, unsigned long pfn)
 {
 	void *addr;
@@ -1341,35 +1333,23 @@ static struct memory_bitmap copy_bm;
 
 void swsusp_free(void)
 {
-	unsigned long fb_pfn, fr_pfn;
-
-	memory_bm_position_reset(forbidden_pages_map);
-	memory_bm_position_reset(free_pages_map);
-
-loop:
-	fr_pfn = memory_bm_next_pfn(free_pages_map);
-	fb_pfn = memory_bm_next_pfn(forbidden_pages_map);
-
-	/*
-	 * Find the next bit set in both bitmaps. This is guaranteed to
-	 * terminate when fb_pfn == fr_pfn == BM_END_OF_MAP.
-	 */
-	do {
-		if (fb_pfn < fr_pfn)
-			fb_pfn = memory_bm_next_pfn(forbidden_pages_map);
-		if (fr_pfn < fb_pfn)
-			fr_pfn = memory_bm_next_pfn(free_pages_map);
-	} while (fb_pfn != fr_pfn);
-
-	if (fr_pfn != BM_END_OF_MAP && pfn_valid(fr_pfn)) {
-		struct page *page = pfn_to_page(fr_pfn);
+	struct zone *zone;
+	unsigned long pfn, max_zone_pfn;
 
-		memory_bm_clear_current(forbidden_pages_map);
-		memory_bm_clear_current(free_pages_map);
-		__free_page(page);
-		goto loop;
+	for_each_populated_zone(zone) {
+		max_zone_pfn = zone_end_pfn(zone);
+		for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
+			if (pfn_valid(pfn)) {
+				struct page *page = pfn_to_page(pfn);
+
+				if (swsusp_page_is_forbidden(page) &&
+				    swsusp_page_is_free(page)) {
+					swsusp_unset_page_forbidden(page);
+					swsusp_unset_page_free(page);
+					__free_page(page);
+				}
+			}
 	}
-
 	nr_copy_pages = 0;
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-23 17:27   ` Bjørn Mork
@ 2014-09-23 20:28     ` Rafael J. Wysocki
  2014-09-24  9:46       ` Joerg Roedel
  2014-09-23 21:20     ` Rafael J. Wysocki
  1 sibling, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2014-09-23 20:28 UTC (permalink / raw)
  To: Bjørn Mork, Joerg Roedel; +Cc: linux-pm

On Tuesday, September 23, 2014 07:27:06 PM Bjørn Mork wrote:
> 
> --=-=-=
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: quoted-printable
> 
> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
> 
> 
> > I would suspect one of these commits:
> >
> > 84c91b7ae07c PM / hibernate: avoid unsafe pages in e820 reserved regions
> > 0f7d83e85dbd PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node
> > 9047eb629e5c PM / Hibernate: Remove the old memory-bitmap implementation
> > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in sws=
> usp_free()
> > 3a20cb177961 PM / Hibernate: Implement position keeping in radix tree
> > 07a338236fdc PM / Hibernate: Add memory_rtree_find_bit function
> > f469f02dc6fa PM / Hibernate: Create a Radix-Tree to store memory bitmap
> >
> > so I guess you can start from checking them (the topmpost one is the late=
> st).
> 
> Thanks.  Yes, you were correct. The bad commit is
> 
>  6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsu=
> sp_free()
> 
> I have confirmed that reverting only this commit on top of a clean
> v3.17-rc6 fixes the problem.  I am attaching the context-modified revert
> patch I used.

OK, thanks!

Joerg, can you please have a look at this?


> --=-=-=
> Content-Type: text/x-diff
> Content-Disposition: inline;
>  filename=0001-Revert-PM-Hibernate-Iterate-over-set-bits-instead-of.patch
> 
> From 92950fd86c2f74ae17840bfc15651b6ae77e43df Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@mork.no>
> Date: Tue, 23 Sep 2014 19:18:43 +0200
> Subject: [PATCH] Revert "PM / Hibernate: Iterate over set bits instead of
>  PFNs in swsusp_free()"
> 
> This reverts commit 6efde38f07690652bf0d93f5e4f1a5f496574806.
> 
> Conflicts:
> 	kernel/power/snapshot.c
> ---
>  kernel/power/snapshot.c |   50 ++++++++++++++---------------------------------
>  1 file changed, 15 insertions(+), 35 deletions(-)
> 
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index c4b8093c80b3..f1604d8cf489 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -725,14 +725,6 @@ static void memory_bm_clear_bit(struct memory_bitmap *bm, unsigned long pfn)
>  	clear_bit(bit, addr);
>  }
>  
> -static void memory_bm_clear_current(struct memory_bitmap *bm)
> -{
> -	int bit;
> -
> -	bit = max(bm->cur.node_bit - 1, 0);
> -	clear_bit(bit, bm->cur.node->data);
> -}
> -
>  static int memory_bm_test_bit(struct memory_bitmap *bm, unsigned long pfn)
>  {
>  	void *addr;
> @@ -1341,35 +1333,23 @@ static struct memory_bitmap copy_bm;
>  
>  void swsusp_free(void)
>  {
> -	unsigned long fb_pfn, fr_pfn;
> -
> -	memory_bm_position_reset(forbidden_pages_map);
> -	memory_bm_position_reset(free_pages_map);
> -
> -loop:
> -	fr_pfn = memory_bm_next_pfn(free_pages_map);
> -	fb_pfn = memory_bm_next_pfn(forbidden_pages_map);
> -
> -	/*
> -	 * Find the next bit set in both bitmaps. This is guaranteed to
> -	 * terminate when fb_pfn == fr_pfn == BM_END_OF_MAP.
> -	 */
> -	do {
> -		if (fb_pfn < fr_pfn)
> -			fb_pfn = memory_bm_next_pfn(forbidden_pages_map);
> -		if (fr_pfn < fb_pfn)
> -			fr_pfn = memory_bm_next_pfn(free_pages_map);
> -	} while (fb_pfn != fr_pfn);
> -
> -	if (fr_pfn != BM_END_OF_MAP && pfn_valid(fr_pfn)) {
> -		struct page *page = pfn_to_page(fr_pfn);
> +	struct zone *zone;
> +	unsigned long pfn, max_zone_pfn;
>  
> -		memory_bm_clear_current(forbidden_pages_map);
> -		memory_bm_clear_current(free_pages_map);
> -		__free_page(page);
> -		goto loop;
> +	for_each_populated_zone(zone) {
> +		max_zone_pfn = zone_end_pfn(zone);
> +		for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++)
> +			if (pfn_valid(pfn)) {
> +				struct page *page = pfn_to_page(pfn);
> +
> +				if (swsusp_page_is_forbidden(page) &&
> +				    swsusp_page_is_free(page)) {
> +					swsusp_unset_page_forbidden(page);
> +					swsusp_unset_page_free(page);
> +					__free_page(page);
> +				}
> +			}
>  	}
> -
>  	nr_copy_pages = 0;
>  	nr_meta_pages = 0;
>  	restore_pblist = NULL;
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-23 17:27   ` Bjørn Mork
  2014-09-23 20:28     ` Rafael J. Wysocki
@ 2014-09-23 21:20     ` Rafael J. Wysocki
  2014-09-24  7:45       ` Bjørn Mork
  1 sibling, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2014-09-23 21:20 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: linux-pm, Joerg Roedel

On Tuesday, September 23, 2014 07:27:06 PM Bjørn Mork wrote:
> 
> --=-=-=
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: quoted-printable
> 
> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
> 
> 
> > I would suspect one of these commits:
> >
> > 84c91b7ae07c PM / hibernate: avoid unsafe pages in e820 reserved regions
> > 0f7d83e85dbd PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node
> > 9047eb629e5c PM / Hibernate: Remove the old memory-bitmap implementation
> > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in sws=
> usp_free()
> > 3a20cb177961 PM / Hibernate: Implement position keeping in radix tree
> > 07a338236fdc PM / Hibernate: Add memory_rtree_find_bit function
> > f469f02dc6fa PM / Hibernate: Create a Radix-Tree to store memory bitmap
> >
> > so I guess you can start from checking them (the topmpost one is the late=
> st).
> 
> Thanks.  Yes, you were correct. The bad commit is
> 
>  6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsu=
> sp_free()
> 
> I have confirmed that reverting only this commit on top of a clean
> v3.17-rc6 fixes the problem.  I am attaching the context-modified revert
> patch I used.

Instead of reverting the commit, can you please check if adding an "if (page)"
check around 

		memory_bm_clear_current(forbidden_pages_map);
		memory_bm_clear_current(free_pages_map);
		__free_page(page);

in swsusp_free() makes the NULL pointer deref go away?

Rafael


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-23 21:20     ` Rafael J. Wysocki
@ 2014-09-24  7:45       ` Bjørn Mork
  2014-09-24  9:51         ` Joerg Roedel
  0 siblings, 1 reply; 14+ messages in thread
From: Bjørn Mork @ 2014-09-24  7:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm, Joerg Roedel

"Rafael J. Wysocki" <rjw@rjwysocki.net> writes:

> On Tuesday, September 23, 2014 07:27:06 PM Bjørn Mork wrote:
>> 
>> --=-=-=
>> Content-Type: text/plain; charset=utf-8
>> Content-Transfer-Encoding: quoted-printable
>> 
>> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
>> 
>> 
>> > I would suspect one of these commits:
>> >
>> > 84c91b7ae07c PM / hibernate: avoid unsafe pages in e820 reserved regions
>> > 0f7d83e85dbd PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node
>> > 9047eb629e5c PM / Hibernate: Remove the old memory-bitmap implementation
>> > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in sws=
>> usp_free()
>> > 3a20cb177961 PM / Hibernate: Implement position keeping in radix tree
>> > 07a338236fdc PM / Hibernate: Add memory_rtree_find_bit function
>> > f469f02dc6fa PM / Hibernate: Create a Radix-Tree to store memory bitmap
>> >
>> > so I guess you can start from checking them (the topmpost one is the late=
>> st).
>> 
>> Thanks.  Yes, you were correct. The bad commit is
>> 
>>  6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsu=
>> sp_free()
>> 
>> I have confirmed that reverting only this commit on top of a clean
>> v3.17-rc6 fixes the problem.  I am attaching the context-modified revert
>> patch I used.
>
> Instead of reverting the commit, can you please check if adding an "if (page)"
> check around 
>
> 		memory_bm_clear_current(forbidden_pages_map);
> 		memory_bm_clear_current(free_pages_map);
> 		__free_page(page);
>
> in swsusp_free() makes the NULL pointer deref go away?

Tested.  But on a hunch, I also added this just to be sure:


@@ -1343,7 +1343,15 @@ void swsusp_free(void)
 {
        unsigned long fb_pfn, fr_pfn;
 
+WARN_ON(!forbidden_pages_map);
+if (!forbidden_pages_map)
+       return;
+
        memory_bm_position_reset(forbidden_pages_map);
+WARN_ON(!free_pages_map);
+if (!free_pages_map)
+       return;
+
        memory_bm_position_reset(free_pages_map);
 



And got:


[   34.500851] WARNING: CPU: 1 PID: 5052 at kernel/power/snapshot.c:1346 swsusp_free+0x25/0x160()
[   34.500854] Modules linked in: xt_multiport iptable_filter ip_tables dm_mod bnep xt_hl nf_log_ipv6 nf_log_common xt_LOG binfmt_misc ip6table_filter ip6_tables x_tables nfsd nfs_acl nfs lockd fscache sunrpc 8021q garp stp mrp llc tun loop fuse snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support arc4 coretemp kvm_intel cdc_mbim kvm cdc_wdm uvcvideo cdc_ncm videobuf2_vmalloc evdev videobuf2_memops videobuf2_core psmouse cdc_acm v4l2_common usbnet mii videodev serio_raw i2c_i801 iwlmvm snd_hda_intel snd_hda_controller mac80211 snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss iwlwifi i915 lpc_ich i2c_algo_bit mfd_core drm_kms_helper snd_pcm cfg80211 snd_timer drm wmi thinkpad_acpi nvram snd soundcore battery ecb ac btusb bluetooth video rfkill button acpi_cpufreq processor ext4 crc16 jbd2 mbcache nbd sg sd_mod crc_t10dif sr_mod crct10dif_generic cdrom crct10dif_common microcode ahci libahci libata scsi_mod uhci_hcd ehci_pci ehci_hcd e1000e usbcore ptp usb_common pps_core thermal thermal_sys
[   34.501050] CPU: 1 PID: 5052 Comm: s2disk Not tainted 3.17.0-rc6+ #260
[   34.501054] Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
[   34.501057]  0000000000000009 ffff88021e9cfde8 ffffffff813bea17 0000000000000000
[   34.501066]  0000000000000000 ffff88021e9cfe28 ffffffff8103eb60 ffffffff8107f8be
[   34.501075]  ffffffff8107c902 ffff8800ae809a00 0000000000000010 ffff880232149690
[   34.501083] Call Trace:
[   34.501092]  [<ffffffff813bea17>] dump_stack+0x4e/0x68
[   34.501100]  [<ffffffff8103eb60>] warn_slowpath_common+0x7c/0x96
[   34.501107]  [<ffffffff8107f8be>] ? lock_system_sleep+0x22/0x24
[   34.501113]  [<ffffffff8107c902>] ? swsusp_free+0x25/0x160
[   34.501119]  [<ffffffff8103eb8f>] warn_slowpath_null+0x15/0x17
[   34.501125]  [<ffffffff8107c902>] swsusp_free+0x25/0x160
[   34.501131]  [<ffffffff8107fb82>] snapshot_release+0x13/0x68
[   34.501138]  [<ffffffff81136c78>] __fput+0x10d/0x1e5
[   34.501144]  [<ffffffff81136d80>] ____fput+0x9/0xb
[   34.501151]  [<ffffffff81056a51>] task_work_run+0x81/0xb0
[   34.501159]  [<ffffffff810026d8>] do_notify_resume+0x55/0x66
[   34.501166]  [<ffffffff811ee7ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   34.501172]  [<ffffffff813c4a98>] int_signal+0x12/0x17
[   34.501177] ---[ end trace fd197b669f9d81a3 ]---

where 1346 is

bjorn@canardo:/usr/local/src/build-tmp/linux$ sed -ne 1346p kernel/power/snapshot.c
WARN_ON(!forbidden_pages_map);


So it looks like that map for some reason is uninitialized when
swsusp_free() is called after a hibernate->resume cycle.  Is this only
hitting me?  If so, then I *really* wonder what I'm doing out of the
ordinary...



Bjørn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-23 20:28     ` Rafael J. Wysocki
@ 2014-09-24  9:46       ` Joerg Roedel
  0 siblings, 0 replies; 14+ messages in thread
From: Joerg Roedel @ 2014-09-24  9:46 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Bjørn Mork, linux-pm

On Tue, Sep 23, 2014 at 10:28:00PM +0200, Rafael J. Wysocki wrote:
> On Tuesday, September 23, 2014 07:27:06 PM Bjørn Mork wrote:
> > 
> > --=-=-=
> > Content-Type: text/plain; charset=utf-8
> > Content-Transfer-Encoding: quoted-printable
> > 
> > "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
> > 
> > 
> > > I would suspect one of these commits:
> > >
> > > 84c91b7ae07c PM / hibernate: avoid unsafe pages in e820 reserved regions
> > > 0f7d83e85dbd PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node
> > > 9047eb629e5c PM / Hibernate: Remove the old memory-bitmap implementation
> > > 6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in sws=
> > usp_free()
> > > 3a20cb177961 PM / Hibernate: Implement position keeping in radix tree
> > > 07a338236fdc PM / Hibernate: Add memory_rtree_find_bit function
> > > f469f02dc6fa PM / Hibernate: Create a Radix-Tree to store memory bitmap
> > >
> > > so I guess you can start from checking them (the topmpost one is the late=
> > st).
> > 
> > Thanks.  Yes, you were correct. The bad commit is
> > 
> >  6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsu=
> > sp_free()
> > 
> > I have confirmed that reverting only this commit on top of a clean
> > v3.17-rc6 fixes the problem.  I am attaching the context-modified revert
> > patch I used.
> 
> OK, thanks!
> 
> Joerg, can you please have a look at this?

Yes, I have a look.


	Joerg



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-24  7:45       ` Bjørn Mork
@ 2014-09-24  9:51         ` Joerg Roedel
  2014-09-24 10:17           ` Bjørn Mork
  0 siblings, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2014-09-24  9:51 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: Rafael J. Wysocki, linux-pm

On Wed, Sep 24, 2014 at 09:45:26AM +0200, Bjørn Mork wrote:
> >> Thanks.  Yes, you were correct. The bad commit is
> >> 
> >>  6efde38f0769 PM / Hibernate: Iterate over set bits instead of PFNs in swsu=
> >> sp_free()
> >> 
> >> I have confirmed that reverting only this commit on top of a clean
> >> v3.17-rc6 fixes the problem.  I am attaching the context-modified revert
> >> patch I used.

This is weird, because ...

> >
> > Instead of reverting the commit, can you please check if adding an "if (page)"
> > check around 
> >
> > 		memory_bm_clear_current(forbidden_pages_map);
> > 		memory_bm_clear_current(free_pages_map);
> > 		__free_page(page);
> >
> > in swsusp_free() makes the NULL pointer deref go away?
> 
> Tested.  But on a hunch, I also added this just to be sure:
> 
> 
> @@ -1343,7 +1343,15 @@ void swsusp_free(void)
>  {
>         unsigned long fb_pfn, fr_pfn;
>  
> +WARN_ON(!forbidden_pages_map);
> +if (!forbidden_pages_map)
> +       return;
> +
>         memory_bm_position_reset(forbidden_pages_map);
> +WARN_ON(!free_pages_map);
> +if (!free_pages_map)
> +       return;
> +
>         memory_bm_position_reset(free_pages_map);

... the old code did not check for a valid forbidden_pages_map and
free_pages_map either, so it should crash there too.


	Joerg


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-24  9:51         ` Joerg Roedel
@ 2014-09-24 10:17           ` Bjørn Mork
  2014-09-24 23:44             ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Bjørn Mork @ 2014-09-24 10:17 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Rafael J. Wysocki, linux-pm

Joerg Roedel <jroedel@suse.de> writes:

>> @@ -1343,7 +1343,15 @@ void swsusp_free(void)
>>  {
>>         unsigned long fb_pfn, fr_pfn;
>>  
>> +WARN_ON(!forbidden_pages_map);
>> +if (!forbidden_pages_map)
>> +       return;
>> +
>>         memory_bm_position_reset(forbidden_pages_map);
>> +WARN_ON(!free_pages_map);
>> +if (!free_pages_map)
>> +       return;
>> +
>>         memory_bm_position_reset(free_pages_map);
>
> ... the old code did not check for a valid forbidden_pages_map and
> free_pages_map either, so it should crash there too.

Well, it did test AFAICS...  The 3.16 code is

int swsusp_page_is_forbidden(struct page *page)
{
        return forbidden_pages_map ?
                memory_bm_test_bit(forbidden_pages_map, page_to_pfn(page)) : 0;
}

..

void swsusp_free(void)
{
..
                                if (swsusp_page_is_forbidden(page) &&
                                    swsusp_page_is_free(page)) {
                                        swsusp_unset_page_forbidden(page);
                                        swsusp_unset_page_free(page);
                                        __free_page(page);
                                }






Bjørn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-24 10:17           ` Bjørn Mork
@ 2014-09-24 23:44             ` Rafael J. Wysocki
  2014-09-25  7:20               ` Bjørn Mork
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2014-09-24 23:44 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: Joerg Roedel, linux-pm

On Wednesday, September 24, 2014 12:17:18 PM Bjørn Mork wrote:
> Joerg Roedel <jroedel@suse.de> writes:
> 
> >> @@ -1343,7 +1343,15 @@ void swsusp_free(void)
> >>  {
> >>         unsigned long fb_pfn, fr_pfn;
> >>  
> >> +WARN_ON(!forbidden_pages_map);
> >> +if (!forbidden_pages_map)
> >> +       return;
> >> +
> >>         memory_bm_position_reset(forbidden_pages_map);
> >> +WARN_ON(!free_pages_map);
> >> +if (!free_pages_map)
> >> +       return;
> >> +
> >>         memory_bm_position_reset(free_pages_map);
> >
> > ... the old code did not check for a valid forbidden_pages_map and
> > free_pages_map either, so it should crash there too.
> 
> Well, it did test AFAICS...  The 3.16 code is
> 
> int swsusp_page_is_forbidden(struct page *page)
> {
>         return forbidden_pages_map ?
>                 memory_bm_test_bit(forbidden_pages_map, page_to_pfn(page)) : 0;
> }
> 
> ..
> 
> void swsusp_free(void)
> {
> ..
>                                 if (swsusp_page_is_forbidden(page) &&
>                                     swsusp_page_is_free(page)) {
>                                         swsusp_unset_page_forbidden(page);
>                                         swsusp_unset_page_free(page);
>                                         __free_page(page);
>                                 }
> 
> 
> 
> 
> 

OK

I've decided to go with a revert for 3.17, as we don't seem to have an immediate
fix and the final 3.17 may be as close as this Sunday.  So I'm going to send my
final pull request for 3.17 to Linus tomorrow or early on Friday.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-24 23:44             ` Rafael J. Wysocki
@ 2014-09-25  7:20               ` Bjørn Mork
  2014-09-25  9:13                 ` Joerg Roedel
  0 siblings, 1 reply; 14+ messages in thread
From: Bjørn Mork @ 2014-09-25  7:20 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Joerg Roedel, linux-pm

"Rafael J. Wysocki" <rjw@rjwysocki.net> writes:

> I've decided to go with a revert for 3.17, as we don't seem to have an immediate
> fix and the final 3.17 may be as close as this Sunday.  So I'm going to send my
> final pull request for 3.17 to Linus tomorrow or early on Friday.

Sounds safest to me, FWIW.

For the next round of this, I think the only missing part was some test
like

        if (!forbidden_pages_map || !free_pages_map)
           goto return_without_freeing_anything;

at the beginning of swsusp_free().  I think we can agree that it isn't
necessary to repeat that test for every page like the old code did :-)

But I I believe it would be useful to analyze exactly why this is
necessary, and possibly add a comment with that explanation.  I don't
have any other arguments for the test than the old code and the oops.
If this is how it is designed then that would be nice to spell out.  If
not, then maybe there is a flaw somewhere else that should be fixed
instead.

And BTW, I believe it would be useful if at least one more person in the
world tested hibernation between each release ;-)


Bjørn


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-25  7:20               ` Bjørn Mork
@ 2014-09-25  9:13                 ` Joerg Roedel
  2014-09-25 10:54                   ` Bjørn Mork
  0 siblings, 1 reply; 14+ messages in thread
From: Joerg Roedel @ 2014-09-25  9:13 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: Rafael J. Wysocki, linux-pm

On Thu, Sep 25, 2014 at 09:20:58AM +0200, Bjørn Mork wrote:
> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
> 
> > I've decided to go with a revert for 3.17, as we don't seem to have an immediate
> > fix and the final 3.17 may be as close as this Sunday.  So I'm going to send my
> > final pull request for 3.17 to Linus tomorrow or early on Friday.
> 
> Sounds safest to me, FWIW.

Yes, sorry for the delay, I am still fighting with my cold and couldn't
get around to send a fix sooner :/

> For the next round of this, I think the only missing part was some test
> like
> 
>         if (!forbidden_pages_map || !free_pages_map)
>            goto return_without_freeing_anything;

Right, this is pretty much the fix. Can you please test the attached
patch?

> And BTW, I believe it would be useful if at least one more person in the
> world tested hibernation between each release ;-)

Well, I tested these patches on at least 4 or 5 different hardware
configurations. I also know of other people testing hibernation with -rc
kernels, but this is the first report of this issue I have seen. I
wonder what it different in your setup so that you trigger this bug.

Anyway, it would be great if you could test the patch below :)

Thanks,

	Joerg

From fe599eff60cfbfbb1f894dc476ee28f38aef954b Mon Sep 17 00:00:00 2001
From: Joerg Roedel <jroedel@suse.de>
Date: Thu, 25 Sep 2014 11:04:40 +0200
Subject: [PATCH] PM: Hibernate: Fix NULL pointer access in swsusp_free

The optimized version of swsusp_free does not check the
bitmap pointers anymore, which may cause a NULL pointer
dereference and a kernel crash. Fix it by adding the checks
and bail out if one of them is NULL.

Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 kernel/power/snapshot.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index c4b8093..791a618 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1343,6 +1343,9 @@ void swsusp_free(void)
 {
 	unsigned long fb_pfn, fr_pfn;
 
+	if (!forbidden_pages_map || !free_pages_map)
+		goto out;
+
 	memory_bm_position_reset(forbidden_pages_map);
 	memory_bm_position_reset(free_pages_map);
 
@@ -1370,6 +1373,7 @@ loop:
 		goto loop;
 	}
 
+out:
 	nr_copy_pages = 0;
 	nr_meta_pages = 0;
 	restore_pblist = NULL;
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-25  9:13                 ` Joerg Roedel
@ 2014-09-25 10:54                   ` Bjørn Mork
  2014-09-25 20:26                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Bjørn Mork @ 2014-09-25 10:54 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Rafael J. Wysocki, linux-pm

Joerg Roedel <jroedel@suse.de> writes:
> On Thu, Sep 25, 2014 at 09:20:58AM +0200, Bjørn Mork wrote:
>> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
>> 
>> > I've decided to go with a revert for 3.17, as we don't seem to have an immediate
>> > fix and the final 3.17 may be as close as this Sunday.  So I'm going to send my
>> > final pull request for 3.17 to Linus tomorrow or early on Friday.
>> 
>> Sounds safest to me, FWIW.
>
> Yes, sorry for the delay, I am still fighting with my cold and couldn't
> get around to send a fix sooner :/

If it's any comfort, a cold was the reason I found this :-)

I didn't plan on testing 3.17 at all. Instead I wanted to beat on
Debian's 3.16 packages for a while to make sure it is in perfect
condition for the jessie release.  But then I catched the cold and felt
like just relaxing with a movie.  And instantly hit the i915 ring
initialization bug. Which has a workaround in the queue for 3.16, so
there is nothing to worry about:
https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/queue-3.16/drm-i915-read-head-register-back-in-init_ring_common-to-enforce-ordering.patch

Except that this bug hit me there and then and I wasn't exactly in the
mood for building a new kernel.  So I just installed Debian's
experimental 3.17-rc5 build as a quickfix instead and watched the movie.

Then, much later, I noticed the oops.  Which I wouldn't have done
without the cold.

>> For the next round of this, I think the only missing part was some test
>> like
>> 
>>         if (!forbidden_pages_map || !free_pages_map)
>>            goto return_without_freeing_anything;
>
> Right, this is pretty much the fix. Can you please test the attached
> patch?
>
>> And BTW, I believe it would be useful if at least one more person in the
>> world tested hibernation between each release ;-)
>
> Well, I tested these patches on at least 4 or 5 different hardware
> configurations. I also know of other people testing hibernation with -rc
> kernels, but this is the first report of this issue I have seen. I
> wonder what it different in your setup so that you trigger this bug.

I do wonder about that as well. AFAIK I don't do anything extraordinary.
My hardware is somewhat dated but should still be pretty common.

I use "s2disk" from the Debian wheezy "uswsusp" package (version
1.0+20110509-3) to hibernate the system.  Don't think I do anything
fancy.  I just run s2disk and that's that.

The fact that the test was there before indicates that the NULL maps are
to be expected, but maybe that is really just painting over some
hardware related problem? My laptop is a Lenovo Thinkpad X301 from
~2008, having much of the hardware in common with most laptops from that
time:

bjorn@nemi:~$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub [8086:2a40] (rev 07)
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07)
00:02.1 Display controller [0380]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a43] (rev 07)
00:03.0 Communication controller [0780]: Intel Corporation Mobile 4 Series Chipset MEI Controller [8086:2a44] (rev 07)
00:19.0 Ethernet controller [0200]: Intel Corporation 82567LM Gigabit Network Connection [8086:10f5] (rev 03)
00:1a.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 [8086:2937] (rev 03)
00:1a.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 [8086:2938] (rev 03)
00:1a.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 [8086:2939] (rev 03)
00:1a.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 [8086:293c] (rev 03)
00:1b.0 Audio device [0403]: Intel Corporation 82801I (ICH9 Family) HD Audio Controller [8086:293e] (rev 03)
00:1c.0 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 [8086:2940] (rev 03)
00:1c.1 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 [8086:2942] (rev 03)
00:1d.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 [8086:2934] (rev 03)
00:1d.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 [8086:2935] (rev 03)
00:1d.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 [8086:2936] (rev 03)
00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 [8086:293a] (rev 03)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev 93)
00:1f.0 ISA bridge [0601]: Intel Corporation ICH9M-E LPC Interface Controller [8086:2917] (rev 03)
00:1f.2 SATA controller [0106]: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] [8086:2929] (rev 03)
00:1f.3 SMBus [0c05]: Intel Corporation 82801I (ICH9 Family) SMBus Controller [8086:2930] (rev 03)
03:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev 63)

bjorn@nemi:~$ lsusb 
Bus 005 Device 003: ID 1199:a001 Sierra Wireless, Inc. 
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 008 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 007 Device 002: ID 8087:07dc Intel Corp. 
Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 003: ID 17ef:4807 Lenovo UVC Camera
Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 002: ID 0a5c:2145 Broadcom Corp. BCM2045B (BDC-2.1) [Bluetooth Controller]
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub


The only odd parts are the newer 7260 wifi card (with the 8087:07dc
additional bluetooth device), and the even newer EM7345 LTE modem
(1199:a001).  There is of course a slight chance that one of these
devices cause something odd to happen during hibernate/resume.

The newer bluetooth device does trigger a firmware related warning
during resume, which I haven't bothered investigating yet.  I guess it
could theoretically be related to other resume issues:

Sep 25 12:19:52 nemi kernel: [   51.128904] ------------[ cut here ]------------
Sep 25 12:19:52 nemi kernel: [   51.128915] WARNING: CPU: 0 PID: 5119 at drivers/base/firmware_class.c:1124 _request_firmware+0x277/0x68d()
Sep 25 12:19:52 nemi kernel: [   51.128920] Modules linked in: xt_multiport iptable_filter ip_tables bnep dm_mod xt_hl nf_log_ipv6 nf_log_common xt_LOG binfmt_misc ip6table_filter ip6_tables x_tables nfsd nfs_acl nfs lockd fscache sunrpc 8021q garp stp mrp llc tun loop fuse iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant snd_hda_codec_generic cdc_mbim cdc_wdm cdc_ncm usbnet coretemp kvm_intel mii arc4 cdc_acm kvm evdev psmouse serio_raw uvcvideo ecb videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common btusb videodev bluetooth snd_hda_intel iwlmvm snd_hda_controller mac80211 snd_hda_codec snd_hwdep snd_pcm_oss lpc_ich mfd_core i2c_i801 snd_mixer_oss iwlwifi snd_pcm cfg80211 snd_timer wmi thinkpad_acpi nvram snd soundcore rfkill ac battery i915 i2c_algo_bit drm_kms_helper video drm acpi_cpufreq button processor ext4 crc16 jbd2 mbcache nbd sg sd_mod crc_t10dif crct10dif_generic sr_mod crct10dif_common cdrom microcode ahci libahci libata scsi_mod ehci_pci uhci_hcd ehci_hcd usbcore usb_common e1000e ptp pps_core thermal thermal_sys
Sep 25 12:19:52 nemi kernel: [   51.129255] CPU: 0 PID: 5119 Comm: kworker/u9:0 Tainted: G        W      3.17.0-rc6+ #261
Sep 25 12:19:52 nemi kernel: [   51.129261] Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
Sep 25 12:19:52 nemi kernel: [   51.129305] Workqueue: hci0 hci_power_on [bluetooth]
Sep 25 12:19:52 nemi kernel: [   51.129312]  0000000000000009 ffff88023184fae8 ffffffff813be9c7 0000000000003dea
Sep 25 12:19:52 nemi kernel: [   51.129326]  0000000000000000 ffff88023184fb28 ffffffff8103eb60 ffff88023184faf8
Sep 25 12:19:52 nemi kernel: [   51.129340]  ffffffff812a61a5 00000000fffffff5 ffff880223e6b180 ffff8800b17dd4c0
Sep 25 12:19:52 nemi kernel: [   51.129355] Call Trace:
Sep 25 12:19:52 nemi kernel: [   51.129367]  [<ffffffff813be9c7>] dump_stack+0x4e/0x68
Sep 25 12:19:52 nemi kernel: [   51.129377]  [<ffffffff8103eb60>] warn_slowpath_common+0x7c/0x96
Sep 25 12:19:52 nemi kernel: [   51.129389]  [<ffffffff812a61a5>] ? _request_firmware+0x277/0x68d
Sep 25 12:19:52 nemi kernel: [   51.129400]  [<ffffffff8103eb8f>] warn_slowpath_null+0x15/0x17
Sep 25 12:19:52 nemi kernel: [   51.129411]  [<ffffffff812a61a5>] _request_firmware+0x277/0x68d
Sep 25 12:19:52 nemi kernel: [   51.129422]  [<ffffffff812a667b>] request_firmware+0x30/0x44
Sep 25 12:19:52 nemi kernel: [   51.129449]  [<ffffffffa058bcd4>] btusb_setup_intel+0x297/0x675 [btusb]
Sep 25 12:19:52 nemi kernel: [   51.129513]  [<ffffffffa0071366>] ? usb_autopm_put_interface+0x35/0x39 [usbcore]
Sep 25 12:19:52 nemi kernel: [   51.129584]  [<ffffffffa05ddccf>] hci_dev_do_open+0x13b/0x986 [bluetooth]
Sep 25 12:19:52 nemi kernel: [   51.129596]  [<ffffffff810533c2>] ? process_one_work+0x136/0x3bf
Sep 25 12:19:52 nemi kernel: [   51.129653]  [<ffffffffa05de561>] hci_power_on+0x47/0x16a [bluetooth]
Sep 25 12:19:52 nemi kernel: [   51.129664]  [<ffffffff81053487>] process_one_work+0x1fb/0x3bf
Sep 25 12:19:52 nemi kernel: [   51.129675]  [<ffffffff810533c2>] ? process_one_work+0x136/0x3bf
Sep 25 12:19:52 nemi kernel: [   51.129710]  [<ffffffff81053841>] worker_thread+0x1cc/0x2a3
Sep 25 12:19:52 nemi kernel: [   51.129721]  [<ffffffff81053675>] ? process_scheduled_works+0x2a/0x2a
Sep 25 12:19:52 nemi kernel: [   51.129731]  [<ffffffff81057dc0>] kthread+0xb5/0xbd
Sep 25 12:19:52 nemi kernel: [   51.129743]  [<ffffffff810759e3>] ? trace_hardirqs_on+0xd/0xf
Sep 25 12:19:52 nemi kernel: [   51.129754]  [<ffffffff81057d0b>] ? __kthread_parkme+0x5c/0x5c
Sep 25 12:19:52 nemi kernel: [   51.129765]  [<ffffffff813c472c>] ret_from_fork+0x7c/0xb0
Sep 25 12:19:52 nemi kernel: [   51.129775]  [<ffffffff81057d0b>] ? __kthread_parkme+0x5c/0x5c
Sep 25 12:19:52 nemi kernel: [   51.129782] ---[ end trace 059b5e27114304b7 ]---
Sep 25 12:19:52 nemi kernel: [   51.129791] bluetooth hci0: firmware: intel/ibt-hw-37.7.bseq will not be loaded
Sep 25 12:19:52 nemi kernel: [   51.129802] Bluetooth: hci0 failed to open default Intel fw file: intel/ibt-hw-37.7.bseq


And I have always(?) had the strange NMI warning in resume logs: 

 Sep 25 12:19:55 nemi kernel: [   54.623454] Uhhuh. NMI received for unknown reason 31 on CPU 0.

but Google has taught me to simply ignore that.  I see it as just an
example of why it's pointless to log firmware errors without a clear
indication what the user is supposed to do about it.  I might be wrong
:-)

That's all I got.  I was hoping someone could explain under what
circumstances those maps will be NULL.

> Anyway, it would be great if you could test the patch below :)

The patch works fine for me.  Thanks.  I look forward to the next
revision of this set.  It looks like a really nice cleanup. And I am
sorry that I didn't catch the bug earlier in the 3.17 cycle.


Bjørn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: NULL pointer dereference in swsusp_free with 3.17-rc5
  2014-09-25 10:54                   ` Bjørn Mork
@ 2014-09-25 20:26                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2014-09-25 20:26 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: Joerg Roedel, linux-pm

On Thursday, September 25, 2014 12:54:56 PM Bjørn Mork wrote:
> Joerg Roedel <jroedel@suse.de> writes:
> > On Thu, Sep 25, 2014 at 09:20:58AM +0200, Bjørn Mork wrote:
> >> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:

[cut]

> 
> > Anyway, it would be great if you could test the patch below :)
> 
> The patch works fine for me.  Thanks.  I look forward to the next
> revision of this set.  It looks like a really nice cleanup. And I am
> sorry that I didn't catch the bug earlier in the 3.17 cycle.

I'm only going to revert the commit in question as the rest should just
work without this one, only less efficiently.

So I'll be waiting for a resend of it with the patch that you've just tested
folded in after 3.18-rc is out.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-09-25 20:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-23 14:50 NULL pointer dereference in swsusp_free with 3.17-rc5 Bjørn Mork
2014-09-23 15:24 ` Rafael J. Wysocki
2014-09-23 17:27   ` Bjørn Mork
2014-09-23 20:28     ` Rafael J. Wysocki
2014-09-24  9:46       ` Joerg Roedel
2014-09-23 21:20     ` Rafael J. Wysocki
2014-09-24  7:45       ` Bjørn Mork
2014-09-24  9:51         ` Joerg Roedel
2014-09-24 10:17           ` Bjørn Mork
2014-09-24 23:44             ` Rafael J. Wysocki
2014-09-25  7:20               ` Bjørn Mork
2014-09-25  9:13                 ` Joerg Roedel
2014-09-25 10:54                   ` Bjørn Mork
2014-09-25 20:26                     ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).