linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Bjørn Mork" <bjorn@mork.no>
To: Joerg Roedel <jroedel@suse.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>, linux-pm@vger.kernel.org
Subject: Re: NULL pointer dereference in swsusp_free with 3.17-rc5
Date: Thu, 25 Sep 2014 12:54:56 +0200	[thread overview]
Message-ID: <87sijg6kcf.fsf@nemi.mork.no> (raw)
In-Reply-To: <20140925091318.GA4269@suse.de> (Joerg Roedel's message of "Thu, 25 Sep 2014 11:13:18 +0200")

Joerg Roedel <jroedel@suse.de> writes:
> On Thu, Sep 25, 2014 at 09:20:58AM +0200, Bjørn Mork wrote:
>> "Rafael J. Wysocki" <rjw@rjwysocki.net> writes:
>> 
>> > I've decided to go with a revert for 3.17, as we don't seem to have an immediate
>> > fix and the final 3.17 may be as close as this Sunday.  So I'm going to send my
>> > final pull request for 3.17 to Linus tomorrow or early on Friday.
>> 
>> Sounds safest to me, FWIW.
>
> Yes, sorry for the delay, I am still fighting with my cold and couldn't
> get around to send a fix sooner :/

If it's any comfort, a cold was the reason I found this :-)

I didn't plan on testing 3.17 at all. Instead I wanted to beat on
Debian's 3.16 packages for a while to make sure it is in perfect
condition for the jessie release.  But then I catched the cold and felt
like just relaxing with a movie.  And instantly hit the i915 ring
initialization bug. Which has a workaround in the queue for 3.16, so
there is nothing to worry about:
https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/queue-3.16/drm-i915-read-head-register-back-in-init_ring_common-to-enforce-ordering.patch

Except that this bug hit me there and then and I wasn't exactly in the
mood for building a new kernel.  So I just installed Debian's
experimental 3.17-rc5 build as a quickfix instead and watched the movie.

Then, much later, I noticed the oops.  Which I wouldn't have done
without the cold.

>> For the next round of this, I think the only missing part was some test
>> like
>> 
>>         if (!forbidden_pages_map || !free_pages_map)
>>            goto return_without_freeing_anything;
>
> Right, this is pretty much the fix. Can you please test the attached
> patch?
>
>> And BTW, I believe it would be useful if at least one more person in the
>> world tested hibernation between each release ;-)
>
> Well, I tested these patches on at least 4 or 5 different hardware
> configurations. I also know of other people testing hibernation with -rc
> kernels, but this is the first report of this issue I have seen. I
> wonder what it different in your setup so that you trigger this bug.

I do wonder about that as well. AFAIK I don't do anything extraordinary.
My hardware is somewhat dated but should still be pretty common.

I use "s2disk" from the Debian wheezy "uswsusp" package (version
1.0+20110509-3) to hibernate the system.  Don't think I do anything
fancy.  I just run s2disk and that's that.

The fact that the test was there before indicates that the NULL maps are
to be expected, but maybe that is really just painting over some
hardware related problem? My laptop is a Lenovo Thinkpad X301 from
~2008, having much of the hardware in common with most laptops from that
time:

bjorn@nemi:~$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub [8086:2a40] (rev 07)
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07)
00:02.1 Display controller [0380]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a43] (rev 07)
00:03.0 Communication controller [0780]: Intel Corporation Mobile 4 Series Chipset MEI Controller [8086:2a44] (rev 07)
00:19.0 Ethernet controller [0200]: Intel Corporation 82567LM Gigabit Network Connection [8086:10f5] (rev 03)
00:1a.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 [8086:2937] (rev 03)
00:1a.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 [8086:2938] (rev 03)
00:1a.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 [8086:2939] (rev 03)
00:1a.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 [8086:293c] (rev 03)
00:1b.0 Audio device [0403]: Intel Corporation 82801I (ICH9 Family) HD Audio Controller [8086:293e] (rev 03)
00:1c.0 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 [8086:2940] (rev 03)
00:1c.1 PCI bridge [0604]: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 [8086:2942] (rev 03)
00:1d.0 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 [8086:2934] (rev 03)
00:1d.1 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 [8086:2935] (rev 03)
00:1d.2 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 [8086:2936] (rev 03)
00:1d.7 USB controller [0c03]: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 [8086:293a] (rev 03)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev 93)
00:1f.0 ISA bridge [0601]: Intel Corporation ICH9M-E LPC Interface Controller [8086:2917] (rev 03)
00:1f.2 SATA controller [0106]: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] [8086:2929] (rev 03)
00:1f.3 SMBus [0c05]: Intel Corporation 82801I (ICH9 Family) SMBus Controller [8086:2930] (rev 03)
03:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev 63)

bjorn@nemi:~$ lsusb 
Bus 005 Device 003: ID 1199:a001 Sierra Wireless, Inc. 
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 008 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 007 Device 002: ID 8087:07dc Intel Corp. 
Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 003: ID 17ef:4807 Lenovo UVC Camera
Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 002: ID 0a5c:2145 Broadcom Corp. BCM2045B (BDC-2.1) [Bluetooth Controller]
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub


The only odd parts are the newer 7260 wifi card (with the 8087:07dc
additional bluetooth device), and the even newer EM7345 LTE modem
(1199:a001).  There is of course a slight chance that one of these
devices cause something odd to happen during hibernate/resume.

The newer bluetooth device does trigger a firmware related warning
during resume, which I haven't bothered investigating yet.  I guess it
could theoretically be related to other resume issues:

Sep 25 12:19:52 nemi kernel: [   51.128904] ------------[ cut here ]------------
Sep 25 12:19:52 nemi kernel: [   51.128915] WARNING: CPU: 0 PID: 5119 at drivers/base/firmware_class.c:1124 _request_firmware+0x277/0x68d()
Sep 25 12:19:52 nemi kernel: [   51.128920] Modules linked in: xt_multiport iptable_filter ip_tables bnep dm_mod xt_hl nf_log_ipv6 nf_log_common xt_LOG binfmt_misc ip6table_filter ip6_tables x_tables nfsd nfs_acl nfs lockd fscache sunrpc 8021q garp stp mrp llc tun loop fuse iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant snd_hda_codec_generic cdc_mbim cdc_wdm cdc_ncm usbnet coretemp kvm_intel mii arc4 cdc_acm kvm evdev psmouse serio_raw uvcvideo ecb videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common btusb videodev bluetooth snd_hda_intel iwlmvm snd_hda_controller mac80211 snd_hda_codec snd_hwdep snd_pcm_oss lpc_ich mfd_core i2c_i801 snd_mixer_oss iwlwifi snd_pcm cfg80211 snd_timer wmi thinkpad_acpi nvram snd soundcore rfkill ac battery i915 i2c_algo_bit drm_kms_helper video drm acpi_cpufreq button processor ext4 crc16 jbd2 mbcache nbd sg sd_mod crc_t10dif crct10dif_generic sr_mod crct10dif_common cdrom microcode ahci libahci libata scsi_mod ehci_pci uhci_hcd ehci_hcd usbcore usb_common e1000e ptp pps_core thermal thermal_sys
Sep 25 12:19:52 nemi kernel: [   51.129255] CPU: 0 PID: 5119 Comm: kworker/u9:0 Tainted: G        W      3.17.0-rc6+ #261
Sep 25 12:19:52 nemi kernel: [   51.129261] Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
Sep 25 12:19:52 nemi kernel: [   51.129305] Workqueue: hci0 hci_power_on [bluetooth]
Sep 25 12:19:52 nemi kernel: [   51.129312]  0000000000000009 ffff88023184fae8 ffffffff813be9c7 0000000000003dea
Sep 25 12:19:52 nemi kernel: [   51.129326]  0000000000000000 ffff88023184fb28 ffffffff8103eb60 ffff88023184faf8
Sep 25 12:19:52 nemi kernel: [   51.129340]  ffffffff812a61a5 00000000fffffff5 ffff880223e6b180 ffff8800b17dd4c0
Sep 25 12:19:52 nemi kernel: [   51.129355] Call Trace:
Sep 25 12:19:52 nemi kernel: [   51.129367]  [<ffffffff813be9c7>] dump_stack+0x4e/0x68
Sep 25 12:19:52 nemi kernel: [   51.129377]  [<ffffffff8103eb60>] warn_slowpath_common+0x7c/0x96
Sep 25 12:19:52 nemi kernel: [   51.129389]  [<ffffffff812a61a5>] ? _request_firmware+0x277/0x68d
Sep 25 12:19:52 nemi kernel: [   51.129400]  [<ffffffff8103eb8f>] warn_slowpath_null+0x15/0x17
Sep 25 12:19:52 nemi kernel: [   51.129411]  [<ffffffff812a61a5>] _request_firmware+0x277/0x68d
Sep 25 12:19:52 nemi kernel: [   51.129422]  [<ffffffff812a667b>] request_firmware+0x30/0x44
Sep 25 12:19:52 nemi kernel: [   51.129449]  [<ffffffffa058bcd4>] btusb_setup_intel+0x297/0x675 [btusb]
Sep 25 12:19:52 nemi kernel: [   51.129513]  [<ffffffffa0071366>] ? usb_autopm_put_interface+0x35/0x39 [usbcore]
Sep 25 12:19:52 nemi kernel: [   51.129584]  [<ffffffffa05ddccf>] hci_dev_do_open+0x13b/0x986 [bluetooth]
Sep 25 12:19:52 nemi kernel: [   51.129596]  [<ffffffff810533c2>] ? process_one_work+0x136/0x3bf
Sep 25 12:19:52 nemi kernel: [   51.129653]  [<ffffffffa05de561>] hci_power_on+0x47/0x16a [bluetooth]
Sep 25 12:19:52 nemi kernel: [   51.129664]  [<ffffffff81053487>] process_one_work+0x1fb/0x3bf
Sep 25 12:19:52 nemi kernel: [   51.129675]  [<ffffffff810533c2>] ? process_one_work+0x136/0x3bf
Sep 25 12:19:52 nemi kernel: [   51.129710]  [<ffffffff81053841>] worker_thread+0x1cc/0x2a3
Sep 25 12:19:52 nemi kernel: [   51.129721]  [<ffffffff81053675>] ? process_scheduled_works+0x2a/0x2a
Sep 25 12:19:52 nemi kernel: [   51.129731]  [<ffffffff81057dc0>] kthread+0xb5/0xbd
Sep 25 12:19:52 nemi kernel: [   51.129743]  [<ffffffff810759e3>] ? trace_hardirqs_on+0xd/0xf
Sep 25 12:19:52 nemi kernel: [   51.129754]  [<ffffffff81057d0b>] ? __kthread_parkme+0x5c/0x5c
Sep 25 12:19:52 nemi kernel: [   51.129765]  [<ffffffff813c472c>] ret_from_fork+0x7c/0xb0
Sep 25 12:19:52 nemi kernel: [   51.129775]  [<ffffffff81057d0b>] ? __kthread_parkme+0x5c/0x5c
Sep 25 12:19:52 nemi kernel: [   51.129782] ---[ end trace 059b5e27114304b7 ]---
Sep 25 12:19:52 nemi kernel: [   51.129791] bluetooth hci0: firmware: intel/ibt-hw-37.7.bseq will not be loaded
Sep 25 12:19:52 nemi kernel: [   51.129802] Bluetooth: hci0 failed to open default Intel fw file: intel/ibt-hw-37.7.bseq


And I have always(?) had the strange NMI warning in resume logs: 

 Sep 25 12:19:55 nemi kernel: [   54.623454] Uhhuh. NMI received for unknown reason 31 on CPU 0.

but Google has taught me to simply ignore that.  I see it as just an
example of why it's pointless to log firmware errors without a clear
indication what the user is supposed to do about it.  I might be wrong
:-)

That's all I got.  I was hoping someone could explain under what
circumstances those maps will be NULL.

> Anyway, it would be great if you could test the patch below :)

The patch works fine for me.  Thanks.  I look forward to the next
revision of this set.  It looks like a really nice cleanup. And I am
sorry that I didn't catch the bug earlier in the 3.17 cycle.


Bjørn

  reply	other threads:[~2014-09-25 10:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-23 14:50 NULL pointer dereference in swsusp_free with 3.17-rc5 Bjørn Mork
2014-09-23 15:24 ` Rafael J. Wysocki
2014-09-23 17:27   ` Bjørn Mork
2014-09-23 20:28     ` Rafael J. Wysocki
2014-09-24  9:46       ` Joerg Roedel
2014-09-23 21:20     ` Rafael J. Wysocki
2014-09-24  7:45       ` Bjørn Mork
2014-09-24  9:51         ` Joerg Roedel
2014-09-24 10:17           ` Bjørn Mork
2014-09-24 23:44             ` Rafael J. Wysocki
2014-09-25  7:20               ` Bjørn Mork
2014-09-25  9:13                 ` Joerg Roedel
2014-09-25 10:54                   ` Bjørn Mork [this message]
2014-09-25 20:26                     ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sijg6kcf.fsf@nemi.mork.no \
    --to=bjorn@mork.no \
    --cc=jroedel@suse.de \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).