* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] [not found] <CAGKbab941Nh7Sy1NTZC6ySxG_P5g7HpjATQ_GSCvDY8y=qgmHA@mail.gmail.com> @ 2016-01-19 16:38 ` Alan Stern 2016-01-19 16:52 ` Paul Menzel 0 siblings, 1 reply; 22+ messages in thread From: Alan Stern @ 2016-01-19 16:38 UTC (permalink / raw) To: Erich Schubert Cc: Paul Menzel, Ben Hutchings, SCSI development list, 801925, Alexandre Rossi On Tue, 19 Jan 2016, Erich Schubert wrote: > Hi, > Attached are photos of the Kernel null pointer BUG that I'm observing. > > These shots are with 4.4.0-rc8. > As you can see, I have a similar trace to Paul, but the error occurs > one stack frame earlier? Yours is only slighly similar to Paul's. He got an error in sr_runtime_suspend, but your error occurs in sd_resume -- a completely different function in a different source file. > Maybe Alex issue is the same bug, but triggered slightly different or > just the kernel compiled differently. > __rpm_callback, scsi_autopm_put_device, __pm_runtime_resume, sd_probe > is present in all of these traces. > > Sorry, I do not have a lot of time right now to help testing or debugging. I can't tell what's going wrong without some real debugging. This means somebody has to build and test a patched kernel. There are no problems on my computer, so it will have to be one or more of you guys. Alan Stern ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-01-19 16:38 ` NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] Alan Stern @ 2016-01-19 16:52 ` Paul Menzel 2016-01-19 21:08 ` Alan Stern 0 siblings, 1 reply; 22+ messages in thread From: Paul Menzel @ 2016-01-19 16:52 UTC (permalink / raw) To: Alan Stern Cc: Erich Schubert, Ben Hutchings, SCSI development list, 801925, Alexandre Rossi [-- Attachment #1: Type: text/plain, Size: 1827 bytes --] Dear Alan, dear Erich, Am Dienstag, den 19.01.2016, 11:38 -0500 schrieb Alan Stern: > On Tue, 19 Jan 2016, Erich Schubert wrote: > > Attached are photos of the Kernel null pointer BUG that I'm observing. > > > > These shots are with 4.4.0-rc8. > > As you can see, I have a similar trace to Paul, but the error occurs > > one stack frame earlier? > > Yours is only slighly similar to Paul's. He got an error in > sr_runtime_suspend, but your error occurs in sd_resume -- a completely > different function in a different source file. if I remember correctly, it happened it different places for me too. In the backlog you should see, that Ben gave me a patch to try and then it wasn’t triggered as it failed somewhere else. > > Maybe Alex issue is the same bug, but triggered slightly different or > > just the kernel compiled differently. > > __rpm_callback, scsi_autopm_put_device, __pm_runtime_resume, sd_probe > > is present in all of these traces. > > > > Sorry, I do not have a lot of time right now to help testing or debugging. > > I can't tell what's going wrong without some real debugging. This > means somebody has to build and test a patched kernel. There are no > problems on my computer, so it will have to be one or more of you > guys. Could you please attach the debugging patch. Hopefully Alexandre, Erich, or I will have some spare time to build an image from it. Alan, thank you a lot for being so responsive and helpful! Thanks, Paul -- GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-01-19 16:52 ` Paul Menzel @ 2016-01-19 21:08 ` Alan Stern 2016-01-19 23:20 ` Paul Menzel 2016-01-20 22:07 ` Alexandre Rossi 0 siblings, 2 replies; 22+ messages in thread From: Alan Stern @ 2016-01-19 21:08 UTC (permalink / raw) To: Paul Menzel Cc: Erich Schubert, Ben Hutchings, SCSI development list, 801925, Alexandre Rossi [-- Attachment #1: Type: TEXT/PLAIN, Size: 567 bytes --] On Tue, 19 Jan 2016, Paul Menzel wrote: > Could you please attach the debugging patch. Hopefully Alexandre, Erich, > or I will have some spare time to build an image from it. Actually, this patch is an attempt at a fix. After looking more carefully at your log pictures, I realized what the problem must be. It's too bad nobody was able to capture a log where the error occurred in sr_runtime_suspend, though -- all the logs in the bug report show sd_runtime_resume. > Alan, thank you a lot for being so responsive and helpful! You're welcome. Alan Stern [-- Attachment #2: Type: TEXT/PLAIN, Size: 1577 bytes --] drivers/scsi/sd.c | 7 +++++-- drivers/scsi/sr.c | 4 ++++ 2 files changed, 9 insertions(+), 2 deletions(-) Index: usb-4.4/drivers/scsi/sd.c =================================================================== --- usb-4.4.orig/drivers/scsi/sd.c +++ usb-4.4/drivers/scsi/sd.c @@ -3275,8 +3275,8 @@ static int sd_suspend_common(struct devi struct scsi_disk *sdkp = dev_get_drvdata(dev); int ret = 0; - if (!sdkp) - return 0; /* this can happen */ + if (!sdkp) /* E.g.: runtime suspend following sd_remove() */ + return 0; if (sdkp->WCE && sdkp->media_present) { sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n"); @@ -3315,6 +3315,9 @@ static int sd_resume(struct device *dev) { struct scsi_disk *sdkp = dev_get_drvdata(dev); + if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */ + return 0; + if (!sdkp->device->manage_start_stop) return 0; Index: usb-4.4/drivers/scsi/sr.c =================================================================== --- usb-4.4.orig/drivers/scsi/sr.c +++ usb-4.4/drivers/scsi/sr.c @@ -144,6 +144,9 @@ static int sr_runtime_suspend(struct dev { struct scsi_cd *cd = dev_get_drvdata(dev); + if (!cd) /* E.g.: runtime suspend following sr_remove() */ + return 0; + if (cd->media_present) return -EBUSY; else @@ -985,6 +988,7 @@ static int sr_remove(struct device *dev) scsi_autopm_get_device(cd->device); del_gendisk(cd->disk); + dev_set_drvdata(dev, NULL); mutex_lock(&sr_ref_mutex); kref_put(&cd->kref, sr_kref_release); ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-01-19 21:08 ` Alan Stern @ 2016-01-19 23:20 ` Paul Menzel 2016-01-20 22:07 ` Alexandre Rossi 1 sibling, 0 replies; 22+ messages in thread From: Paul Menzel @ 2016-01-19 23:20 UTC (permalink / raw) To: Alan Stern Cc: Erich Schubert, Ben Hutchings, SCSI development list, 801925, Alexandre Rossi [-- Attachment #1: Type: text/plain, Size: 1008 bytes --] Dear Alan, Am Dienstag, den 19.01.2016, 16:08 -0500 schrieb Alan Stern: > On Tue, 19 Jan 2016, Paul Menzel wrote: > > > Could you please attach the debugging patch. Hopefully Alexandre, Erich, > > or I will have some spare time to build an image from it. > > Actually, this patch is an attempt at a fix. After looking more > carefully at your log pictures, I realized what the problem must be. that indeed fixed it for me. I applied your patch on linux-image-4.4.0-rc8-686 [1] and was able to get to the LUKS passphrase dialog. Awesome! Thank you very, very much! […] Thanks, Paul [1] https://packages.debian.org/experimental/linux-image-4.4.0-rc8-686 -- GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-01-19 21:08 ` Alan Stern 2016-01-19 23:20 ` Paul Menzel @ 2016-01-20 22:07 ` Alexandre Rossi 2016-02-09 16:47 ` Paul Menzel 1 sibling, 1 reply; 22+ messages in thread From: Alexandre Rossi @ 2016-01-20 22:07 UTC (permalink / raw) To: Alan Stern Cc: Paul Menzel, Erich Schubert, Ben Hutchings, SCSI development list, 801925 Hi, >> Could you please attach the debugging patch. Hopefully Alexandre, Erich, >> or I will have some spare time to build an image from it. > > Actually, this patch is an attempt at a fix. After looking more > carefully at your log pictures, I realized what the problem must be. > > It's too bad nobody was able to capture a log where the error > occurred in sr_runtime_suspend, though -- all the logs in the bug > report show sd_runtime_resume. I just tested the patch applied on top of 4.3.3 (4.3.3-6 in Debian). It still crashes at boot, but the stacktrace is different : it happens in blk_post_runtime_resume . Maybe I'm bit by a different bug or maybe the I need to try with 4.4. I'll post the captured log when I have access to a wired network. I'd be happy to provide the logs of a debugging patch. Alex ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-01-20 22:07 ` Alexandre Rossi @ 2016-02-09 16:47 ` Paul Menzel 2016-02-09 19:56 ` Alexandre Rossi 0 siblings, 1 reply; 22+ messages in thread From: Paul Menzel @ 2016-02-09 16:47 UTC (permalink / raw) To: Alexandre Rossi Cc: Alan Stern, Erich Schubert, Ben Hutchings, SCSI development list, 801925 [-- Attachment #1: Type: text/plain, Size: 1707 bytes --] Dear Debian and Linux folks, Am Mittwoch, den 20.01.2016, 23:07 +0100 schrieb Alexandre Rossi: > >> Could you please attach the debugging patch. Hopefully Alexandre, Erich, > >> or I will have some spare time to build an image from it. > > > > Actually, this patch is an attempt at a fix. After looking more > > carefully at your log pictures, I realized what the problem must be. > > > > It's too bad nobody was able to capture a log where the error > > occurred in sr_runtime_suspend, though -- all the logs in the bug > > report show sd_runtime_resume. > > I just tested the patch applied on top of 4.3.3 (4.3.3-6 in Debian). > > It still crashes at boot, but the stacktrace is different : it happens > in blk_post_runtime_resume . Maybe I'm bit by a different bug or maybe > the I need to try with 4.4. > > I'll post the captured log when I have access to a wired network. I'd > be happy to provide the logs of a debugging patch. I tried Linux 4.3.5-1 [1], which entered Debian Sid/unstable yesterday, and I get the same null pointer dereference as Alexandre. As this is Linux 4.3 and not 4.4, I guess this is a different problem though. Alexandre, where you able to capture the stack trace? I’d submit a new bug report with this. Thanks, Paul [1] http://metadata.ftp-master.debian.org/changelogs/main/l/linux/linux_4.3.5-1_changelog -- GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-02-09 16:47 ` Paul Menzel @ 2016-02-09 19:56 ` Alexandre Rossi 2016-02-09 20:51 ` Ben Hutchings 2016-02-18 16:27 ` Alan Stern 0 siblings, 2 replies; 22+ messages in thread From: Alexandre Rossi @ 2016-02-09 19:56 UTC (permalink / raw) To: Paul Menzel Cc: Alan Stern, Erich Schubert, Ben Hutchings, SCSI development list, 801925 [-- Attachment #1: Type: text/plain, Size: 352 bytes --] Hi, netconsole does not seem to work so early in the boot process this time. > As this is Linux 4.3 and not 4.4, I guess this is a different problem > though. Alexandre, where you able to capture the stack trace? I’d submit > a new bug report with this. Here is a photo. Please ping me if you need to test some debugging patches. Alex [-- Attachment #2: null-pointer-dereference-blk_post_runtime_resume.jpeg --] [-- Type: image/jpeg, Size: 160584 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-02-09 19:56 ` Alexandre Rossi @ 2016-02-09 20:51 ` Ben Hutchings 2016-02-18 16:27 ` Alan Stern 1 sibling, 0 replies; 22+ messages in thread From: Ben Hutchings @ 2016-02-09 20:51 UTC (permalink / raw) To: Alexandre Rossi, Paul Menzel Cc: Alan Stern, Erich Schubert, SCSI development list, 801925 [-- Attachment #1: Type: text/plain, Size: 746 bytes --] On Tue, 2016-02-09 at 20:56 +0100, Alexandre Rossi wrote: > Hi, > > netconsole does not seem to work so early in the boot process this time. > > > As this is Linux 4.3 and not 4.4, I guess this is a different problem > > though. Alexandre, where you able to capture the stack trace? I’d submit > > a new bug report with this. > > Here is a photo. Please ping me if you need to test some debugging patches. I'm pretty sure this crash is fixed by commit 4fd41a8552af ("SCSI: Fix NULL pointer dereference in runtime PM"), which I've now queued up for 4.3 (though it's already in 4.4 which I'll probably upload to unstable soon). Ben. -- Ben Hutchings Design a system any fool can use, and only a fool will want to use it. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-02-09 19:56 ` Alexandre Rossi 2016-02-09 20:51 ` Ben Hutchings @ 2016-02-18 16:27 ` Alan Stern 2016-02-22 23:15 ` Alexandre Rossi 1 sibling, 1 reply; 22+ messages in thread From: Alan Stern @ 2016-02-18 16:27 UTC (permalink / raw) To: Alexandre Rossi Cc: Paul Menzel, Erich Schubert, Ben Hutchings, SCSI development list, 801925 On Tue, 9 Feb 2016, Alexandre Rossi wrote: > Hi, > > netconsole does not seem to work so early in the boot process this time. > > > As this is Linux 4.3 and not 4.4, I guess this is a different problem > > though. Alexandre, where you able to capture the stack trace? I’d submit > > a new bug report with this. > > Here is a photo. Please ping me if you need to test some debugging patches. It looks like the problem occurs in blk_post_runtime_resume(). Since there have been recent changes to this routine, it's hard to tell whether you're using the most up-to-date code. In particular, the first few lines of blk_post_runtime_resume() in block/blk-core.c should look like this: void blk_post_runtime_resume(struct request_queue *q, int err) { if (!q->dev) return; The test was introduced by commit 4fd41a8552af ("SCSI: Fix NULL pointer dereference in runtime PM"), which was added to the mainline kernel between 4.3 and 4.4. I don't know what the commit ID would be for a .stable kernel. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-02-18 16:27 ` Alan Stern @ 2016-02-22 23:15 ` Alexandre Rossi 2016-02-23 15:14 ` Alan Stern 0 siblings, 1 reply; 22+ messages in thread From: Alexandre Rossi @ 2016-02-22 23:15 UTC (permalink / raw) To: Alan Stern Cc: Paul Menzel, Erich Schubert, Ben Hutchings, SCSI development list, 801925 Hello, >> > As this is Linux 4.3 and not 4.4, I guess this is a different problem >> > though. Alexandre, where you able to capture the stack trace? I’d submit >> > a new bug report with this. >> >> Here is a photo. Please ping me if you need to test some debugging patches. > > It looks like the problem occurs in blk_post_runtime_resume(). Since > there have been recent changes to this routine, it's hard to tell > whether you're using the most up-to-date code. > > In particular, the first few lines of blk_post_runtime_resume() in > block/blk-core.c should look like this: > > void blk_post_runtime_resume(struct request_queue *q, int err) > { > if (!q->dev) > return; > > The test was introduced by commit 4fd41a8552af ("SCSI: Fix NULL pointer > dereference in runtime PM"), which was added to the mainline kernel > between 4.3 and 4.4. I don't know what the commit ID would be for a > .stable kernel. Okay now I've tried with 4.4. The oops does not occur. So this is fixed for me in 4.4. If there is interest in backporting to 4.3, 13b438914341 ("SCSI: fix crashes in sd and sr runtime PM") is not enough to backport. Something in 4.4, most probably 4fd41a8552af ("SCSI: Fix NULL pointer dereference in runtime PM") is also needed. Thanks a lot, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-02-22 23:15 ` Alexandre Rossi @ 2016-02-23 15:14 ` Alan Stern 0 siblings, 0 replies; 22+ messages in thread From: Alan Stern @ 2016-02-23 15:14 UTC (permalink / raw) To: Alexandre Rossi Cc: Paul Menzel, Erich Schubert, Ben Hutchings, SCSI development list, 801925 On Tue, 23 Feb 2016, Alexandre Rossi wrote: > Okay now I've tried with 4.4. The oops does not occur. So this is > fixed for me in 4.4. > > If there is interest in backporting to 4.3, 13b438914341 ("SCSI: fix > crashes in sd and sr runtime PM") is not enough to backport. Something > in 4.4, most probably 4fd41a8552af ("SCSI: Fix NULL pointer > dereference in runtime PM") is also needed. Although that commit isn't in 4.3.x yet, it should be added soon. Maybe in the next release. Alan Stern ^ permalink raw reply [flat|nested] 22+ messages in thread
* NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] @ 2015-10-16 1:05 Paul Menzel 2015-10-16 7:54 ` Paul Menzel 0 siblings, 1 reply; 22+ messages in thread From: Paul Menzel @ 2015-10-16 1:05 UTC (permalink / raw) To: linux-scsi; +Cc: James E. J. Bottomley [-- Attachment #1: Type: text/plain, Size: 2470 bytes --] Dear Linux SCSI folks, using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd 227-1 to 227-2 [1] and other packages, the system doesn’t start up anymore and the /dev/md1 device doesn’t seem to be found and I am dropped into shell from initramfs (BusyBox). Only having wireless LAN and no serial or USB debug capabilities, and mount a USB storage device did not work, I manually copied the beginning of the Oops. ``` BUG: unable to handle kernel NULL pointer dereference at 00000014 IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] *pdpt = 000000003696e001 *pde = 000000000000000000 Oops: 0000 [#1] SMB Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+) CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1 Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009 task: f68dd040 ti: f6988000 task.ti: f6988000 EIP: 0060:[<fh28a00c>] EFLAGS: 00010246 CPU: 1 EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod] EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000 ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0 Stack: af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000 f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8 f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008 Call Trace: […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod] […] ? __rpm_callback+0x27/0x60 […] ``` I tried also to boot with Linux 4.1 and it fails the same way. Is that a known problem and has been fixed in the mean time? It’d be great if you helped me getting the system to boot again. Please tell me if you need more information to debug this issue and I’ll do my best to get it. Thanks, Paul [1] http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog -- GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2015-10-16 1:05 Paul Menzel @ 2015-10-16 7:54 ` Paul Menzel 2015-10-16 8:52 ` Paul Menzel 2015-10-20 1:39 ` Ben Hutchings 0 siblings, 2 replies; 22+ messages in thread From: Paul Menzel @ 2015-10-16 7:54 UTC (permalink / raw) To: James E. J. Bottomley, linux-scsi; +Cc: submit [-- Attachment #1: Type: text/plain, Size: 4268 bytes --] Package: linux-image-4.2.0-1-686-pae Version: 4.2.3-2 Severity: important Dear Linux SCSI folks, please don’t include the address submit@bugs.debian.org in your reply. Am Freitag, den 16.10.2015, 03:05 +0200 schrieb Paul Menzel: > using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd > 227-1 to 227-2 [1] and other packages, the system doesn’t start up > anymore and the /dev/md1 device doesn’t seem to be found and I am > dropped into shell from initramfs (BusyBox). > > Only having wireless LAN and no serial or USB debug capabilities, and > mount a USB storage device did not work, I manually copied the beginning > of the Oops. > > ``` > BUG: unable to handle kernel NULL pointer dereference at 00000014 > IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] > *pdpt = 000000003696e001 *pde = 000000000000000000 > Oops: 0000 [#1] SMB > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+) > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1 > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009 > task: f68dd040 ti: f6988000 task.ti: f6988000 > EIP: 0060:[<fh28a00c>] EFLAGS: 00010246 CPU: 1 > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod] > EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000 > ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88 > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0 > Stack: > af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000 > f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8 > f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008 > Call Trace: > […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod] > […] ? __rpm_callback+0x27/0x60 > […] > ``` > > I tried also to boot with Linux 4.1 and it fails the same way. > > Is that a known problem and has been fixed in the mean time? It’d be > great if you helped me getting the system to boot again. Please tell me > if you need more information to debug this issue and I’ll do my best to > get it. Ben Hutchings asked me to test the patch below to get more debug information. ``` diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 8bd54a6..dd5b5b2 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -144,6 +144,12 @@ static int sr_runtime_suspend(struct device *dev) { struct scsi_cd *cd = dev_get_drvdata(dev); + if (WARN_ON(!cd)) { + pr_info("%s: cd == NULL; power.usage_count = %d\n", + __func__, atomic_read(&dev->power.usage_count)); + return 0; + } + if (cd->media_present) return -EBUSY; else @@ -652,7 +658,13 @@ static int sr_probe(struct device *dev) struct scsi_cd *cd; int minor, error; - scsi_autopm_get_device(sdev); + error = scsi_autopm_get_device(sdev); + if (error) { + pr_err("%s: scsi_autopm_get_device returned %d\n", + __func__, error); + return error; + } + error = -ENODEV; if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM) goto fail; @@ -719,6 +731,9 @@ static int sr_probe(struct device *dev) if (register_cdrom(&cd->cdi)) goto fail_put; + pr_info("%s: power.usage_count = %d\n", + __func__, atomic_read(&dev->power.usage_count)); + /* * Initialize block layer runtime PM stuffs before the * periodic event checking request gets started in add_disk. ``` I’ll try that as soon as a spare drive has arrived, where I can copy the data to as a backup. More thoughts are welcome! Especially, if that error suggests a failing drive or not. Thanks, Paul > [1] http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog -- GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2015-10-16 7:54 ` Paul Menzel @ 2015-10-16 8:52 ` Paul Menzel 2015-10-20 1:39 ` Ben Hutchings 1 sibling, 0 replies; 22+ messages in thread From: Paul Menzel @ 2015-10-16 8:52 UTC (permalink / raw) To: linux-scsi; +Cc: James E. J. Bottomley, 801925 [-- Attachment #1: Type: text/plain, Size: 4735 bytes --] Dear Linux SCSI folks, Am Freitag, den 16.10.2015, 09:54 +0200 schrieb Paul Menzel: > Package: linux-image-4.2.0-1-686-pae > Version: 4.2.3-2 > Severity: important > please don’t include the address submit@bugs.debian.org in your reply. this issue is now also tracked in the Debian Bug Tracking System [2] and has the number #801925 [3]. Please keep that address in CC. > Am Freitag, den 16.10.2015, 03:05 +0200 schrieb Paul Menzel: > > > using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd > > 227-1 to 227-2 [1] and other packages, the system doesn’t start up > > anymore and the /dev/md1 device doesn’t seem to be found and I am > > dropped into shell from initramfs (BusyBox). > > > > Only having wireless LAN and no serial or USB debug capabilities, and > > mount a USB storage device did not work, I manually copied the beginning > > of the Oops. > > > > ``` > > BUG: unable to handle kernel NULL pointer dereference at 00000014 > > IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] > > *pdpt = 000000003696e001 *pde = 000000000000000000 > > Oops: 0000 [#1] SMB > > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+) > > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1 > > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009 > > task: f68dd040 ti: f6988000 task.ti: f6988000 > > EIP: 0060:[<fh28a00c>] EFLAGS: 00010246 CPU: 1 > > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod] > > EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000 > > ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88 > > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > > CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0 > > Stack: > > af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000 > > f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8 > > f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008 > > Call Trace: > > […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod] > > […] ? __rpm_callback+0x27/0x60 > > […] > > ``` > > > > I tried also to boot with Linux 4.1 and it fails the same way. > > > > Is that a known problem and has been fixed in the mean time? It’d be > > great if you helped me getting the system to boot again. Please tell me > > if you need more information to debug this issue and I’ll do my best to > > get it. > > Ben Hutchings asked me to test the patch below to get more debug > information. > > ``` > diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c > index 8bd54a6..dd5b5b2 100644 > --- a/drivers/scsi/sr.c > +++ b/drivers/scsi/sr.c > @@ -144,6 +144,12 @@ static int sr_runtime_suspend(struct device *dev) > { > struct scsi_cd *cd = dev_get_drvdata(dev); > > + if (WARN_ON(!cd)) { > + pr_info("%s: cd == NULL; power.usage_count = %d\n", > + __func__, atomic_read(&dev->power.usage_count)); > + return 0; > + } > + > if (cd->media_present) > return -EBUSY; > else > @@ -652,7 +658,13 @@ static int sr_probe(struct device *dev) > struct scsi_cd *cd; > int minor, error; > > - scsi_autopm_get_device(sdev); > + error = scsi_autopm_get_device(sdev); > + if (error) { > + pr_err("%s: scsi_autopm_get_device returned %d\n", > + __func__, error); > + return error; > + } > + > error = -ENODEV; > if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM) > goto fail; > @@ -719,6 +731,9 @@ static int sr_probe(struct device *dev) > if (register_cdrom(&cd->cdi)) > goto fail_put; > > + pr_info("%s: power.usage_count = %d\n", > + __func__, atomic_read(&dev->power.usage_count)); > + > /* > * Initialize block layer runtime PM stuffs before the > * periodic event checking request gets started in add_disk. > ``` > > I’ll try that as soon as a spare drive has arrived, where I can copy the > data to as a backup. > > More thoughts are welcome! Especially, if that error suggests a failing > drive or not. Thanks, Paul > > [1] http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog [2] https://www.debian.org/Bugs/ [3] https://bugs.debian.org/801925 -- GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2015-10-16 7:54 ` Paul Menzel 2015-10-16 8:52 ` Paul Menzel @ 2015-10-20 1:39 ` Ben Hutchings 2015-10-31 9:39 ` Paul Menzel 1 sibling, 1 reply; 22+ messages in thread From: Ben Hutchings @ 2015-10-20 1:39 UTC (permalink / raw) To: Paul Menzel, James E. J. Bottomley, linux-scsi; +Cc: submit [-- Attachment #1: Type: text/plain, Size: 2027 bytes --] On Fri, 2015-10-16 at 09:54 +0200, Paul Menzel wrote: [...] > > BUG: unable to handle kernel NULL pointer dereference at 00000014 > > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod] > > *pdpt = 000000003696e001 *pde = 000000000000000000 > > Oops: 0000 [#1] SMB > > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+) > > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1 > > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009 > > task: f68dd040 ti: f6988000 task.ti: f6988000 > > EIP: 0060:[] EFLAGS: 00010246 CPU: 1 > > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod] > > EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000 > > ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88 > > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > > CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0 > > Stack: > > af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000 > > f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8 > > f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008 > > Call Trace: > > […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod] > > […] ? __rpm_callback+0x27/0x60 > > […] [...] > Ben Hutchings asked me to test the patch below to get more debug > information. [...] Well, that didn't help much. Paul hit another oops, this time in sd_mod but again apparently related to runtime PM. My patch only touched sr_mod. This time he sent photos of the complete oops; see <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15> and <https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15> Ben. -- Ben Hutchings The first rule of tautology club is the first rule of tautology club. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2015-10-20 1:39 ` Ben Hutchings @ 2015-10-31 9:39 ` Paul Menzel 2015-11-01 1:56 ` Alan Stern 2015-11-01 2:05 ` Alan Stern 0 siblings, 2 replies; 22+ messages in thread From: Paul Menzel @ 2015-10-31 9:39 UTC (permalink / raw) To: Ben Hutchings; +Cc: James E. J. Bottomley, AlanStern, linux-scsi, 801925 [-- Attachment #1: Type: text/plain, Size: 2992 bytes --] Control: notfound -1 3.19-1~exp1 Control: found -1 4.2.5-1 Am Dienstag, den 20.10.2015, 02:39 +0100 schrieb Ben Hutchings: > On Fri, 2015-10-16 at 09:54 +0200, Paul Menzel wrote: > [...] > > > BUG: unable to handle kernel NULL pointer dereference at 00000014 > > > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod] > > > *pdpt = 000000003696e001 *pde = 000000000000000000 > > > Oops: 0000 [#1] SMB > > > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+) > > > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1 > > > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009 > > > task: f68dd040 ti: f6988000 task.ti: f6988000 > > > EIP: 0060:[] EFLAGS: 00010246 CPU: 1 > > > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod] > > > EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000 > > > ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88 > > > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > > > CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0 > > > Stack: > > > af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000 > > > f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8 > > > f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008 > > > Call Trace: > > > […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod] > > > […] ? __rpm_callback+0x27/0x60 > > > […] > [...] > > Ben Hutchings asked me to test the patch below to get more debug > > information. > [...] > > Well, that didn't help much. Paul hit another oops, this time in > sd_mod but again apparently related to runtime PM. My patch only > touched sr_mod. > > This time he sent photos of the complete oops; see > <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15> > and > <https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15> after backing up my data, I tested a little bit more, and using Linux 3.19 the drive is detected and the system boots. Does anything stand out what changed in this area between Linux 3.19 and 4.1? Thanks Paul -- go~mus | Besuchermanagement ▶ 18. – 20. November 2015 // Messe Köln – Stand D054 Besuchen Sie uns auf der EXPONATEC und lernen Sie die Software für Besuchermanagement kennen, die von führenden Museumsverbänden in Europa eingesetzt wird. Mehr Infos über go~mus finden Sie unter https://www.gomus.de ~ GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2015-10-31 9:39 ` Paul Menzel @ 2015-11-01 1:56 ` Alan Stern 2015-11-01 2:05 ` Alan Stern 1 sibling, 0 replies; 22+ messages in thread From: Alan Stern @ 2015-11-01 1:56 UTC (permalink / raw) To: Paul Menzel; +Cc: Ben Hutchings, James E. J. Bottomley, linux-scsi, 801925 On Sat, 31 Oct 2015, Paul Menzel wrote: > > Well, that didn't help much. Paul hit another oops, this time in > > sd_mod but again apparently related to runtime PM. My patch only > > touched sr_mod. > > > > This time he sent photos of the complete oops; see > > <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15> > > and > > <https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15> > > after backing up my data, I tested a little bit more, and using Linux > 3.19 the drive is detected and the system boots. > > Does anything stand out what changed in this area between Linux 3.19 and > 4.1? I believe the problem shown in that photo was fixed by commit 49718f0fb8c9 ("SCSI: Fix NULL pointer dereference in runtime PM"), which was merged in 4.2 and has been back-ported to various stable releases. Alan Stern ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2015-10-31 9:39 ` Paul Menzel 2015-11-01 1:56 ` Alan Stern @ 2015-11-01 2:05 ` Alan Stern 2016-01-09 15:23 ` Paul Menzel 1 sibling, 1 reply; 22+ messages in thread From: Alan Stern @ 2015-11-01 2:05 UTC (permalink / raw) To: Paul Menzel Cc: Ben Hutchings, James E. J. Bottomley, SCSI development list, 801925 On Sat, 31 Oct 2015, Alan Stern wrote: > I believe the problem shown in that photo was fixed by commit > 49718f0fb8c9 ("SCSI: Fix NULL pointer dereference in runtime PM"), > which was merged in 4.2 and has been back-ported to various stable > releases. On second thought, it seems more likely that this issue probably was _caused_ by that commit. The fix can be found in these two emails: http://marc.info/?l=linux-scsi&m=144185206825609&w=2 http://marc.info/?l=linux-scsi&m=144185208525611&w=2 which have not been merged yet as far as I know even though they were submitted back in September. Alan Stern ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2015-11-01 2:05 ` Alan Stern @ 2016-01-09 15:23 ` Paul Menzel 2016-01-09 16:36 ` Alan Stern 0 siblings, 1 reply; 22+ messages in thread From: Paul Menzel @ 2016-01-09 15:23 UTC (permalink / raw) To: Alan Stern Cc: Ben Hutchings, James E. J. Bottomley, SCSI development list, 801925, Erich Schubert, Alexandre Rossi [-- Attachment #1: Type: text/plain, Size: 1446 bytes --] Version: 4.4~rc8-1~exp1 Dear Alan, Thank you for your help! There were some follow-ups to the bug report [1], but I think you and I were not in CC. Am Samstag, den 31.10.2015, 22:05 -0400 schrieb Alan Stern: > On Sat, 31 Oct 2015, Alan Stern wrote: > > > I believe the problem shown in that photo was fixed by commit > > 49718f0fb8c9 ("SCSI: Fix NULL pointer dereference in runtime PM"), > > which was merged in 4.2 and has been back-ported to various stable > > releases. > > On second thought, it seems more likely that this issue probably was > _caused_ by that commit. The fix can be found in these two emails: > > http://marc.info/?l=linux-scsi&m=144185206825609&w=2 > http://marc.info/?l=linux-scsi&m=144185208525611&w=2 > > which have not been merged yet as far as I know even though they were > submitted back in September. I can only say, that I am still unable to boot my system with Linux 4.4-rc8 [2]. Are these patches included there? Thanks, Paul [1] https://bugs.debian.org/801925 [2] https://packages.debian.org/experimental/linux-image-4.4.0-rc8-686-pae-dbg -- GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-01-09 15:23 ` Paul Menzel @ 2016-01-09 16:36 ` Alan Stern 2016-01-10 11:44 ` Erich Schubert 0 siblings, 1 reply; 22+ messages in thread From: Alan Stern @ 2016-01-09 16:36 UTC (permalink / raw) To: Paul Menzel Cc: Ben Hutchings, James E. J. Bottomley, SCSI development list, 801925, Erich Schubert, Alexandre Rossi On Sat, 9 Jan 2016, Paul Menzel wrote: > Version: 4.4~rc8-1~exp1 > > Dear Alan, > > > Thank you for your help! > > There were some follow-ups to the bug report [1], but I think you and I > were not in CC. I wasn't. > > http://marc.info/?l=linux-scsi&m=144185206825609&w=2 > > http://marc.info/?l=linux-scsi&m=144185208525611&w=2 > I can only say, that I am still unable to boot my system with Linux > 4.4-rc8 [2]. Are these patches included there? They are. I don't see how they could cause a NULL pointer dereference in sd_resume(), though. If you revert them, does the problem go away? Also, can you add some debugging statements to sd_resume() so we can see where the NULL pointer comes from? Alan Stern ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-01-09 16:36 ` Alan Stern @ 2016-01-10 11:44 ` Erich Schubert 2016-01-10 15:32 ` Alan Stern 0 siblings, 1 reply; 22+ messages in thread From: Erich Schubert @ 2016-01-10 11:44 UTC (permalink / raw) To: Alan Stern Cc: Paul Menzel, Ben Hutchings, James E. J. Bottomley, SCSI development list, 801925, Alexandre Rossi Hi all, 4.4-rc8 does not fix the problem for me. Anything beyond 4.1.0 remains unable to boot this computer. Unfortunately, because the error occurs during early early SCSI initialization, I do not have easy access to the log - no disk, no network. It happens during SATA initialization: "scsi_runtime_resume". So my back trace looks different than Alex in https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=42;filename=scsi-null-pointer-dereference.log;bug=801925;att=1 but like the one Paul is seeing: https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=15;filename=20151020_006.jpg;bug=801925;att=3 I will try to do a photo next time, too. Here is some dmesg output from a successful boot on 4.1.0: Note there are some ACPI Errors there (but probably not related). --- ahci 0000:00:1f.2: version 3.0 ahci 0000:00:1f.2: SSS flag set, parallel bus scan disabled ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 3 Gbps 0x1 impl SATA mode ahci 0000:00:1f.2: flags: 64bit ncq sntf stag pm led clo pio slum part ems apst scsi host0: ahci scsi host1: ahci scsi host2: ahci scsi host3: ahci scsi host4: ahci scsi host5: ahci ata1: SATA max UDMA/133 abar m2048@0xc0728000 port 0xc0728100 irq 30 ata2: DUMMY ata3: DUMMY ata4: DUMMY ata5: DUMMY ata6: DUMMY usb 3-1: new high-speed USB device number 2 using ehci-pci usb 4-1: new high-speed USB device number 2 using ehci-pci ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359) ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._SDD] (Node ffff8802458b1608), AE_NOT_FOUND (20150410/psparse-536) ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359) ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._GTF] (Node ffff8802458b15e0), AE_NOT_FOUND (20150410/psparse-536) ata1.00: ATA-8: TOSHIBA THNSNS256GMCP, TA2ABBF0, max UDMA/133 ata1.00: 500118192 sectors, multi 16: LBA48 NCQ (depth 31/32), AA ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359) ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._SDD] (Node ffff8802458b1608), AE_NOT_FOUND (20150410/psparse-536) ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359) ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._GTF] (Node ffff8802458b15e0), AE_NOT_FOUND (20150410/psparse-536) ata1.00: configured for UDMA/133 scsi 0:0:0:0: Direct-Access ATA TOSHIBA THNSNS25 BBF0 PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 500118192 512-byte logical blocks: (256 GB/238 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sda4 < sda5 sda6 > sd 0:0:0:0: [sda] Attached SCSI disk PM: Starting manual resume from disk PM: Hibernation image partition 8:6 present PM: Looking for hibernation image. PM: Image not found (code -22) PM: Hibernation image not present or could not be loaded. --- On Sat, Jan 9, 2016 at 5:36 PM, Alan Stern <stern@rowland.harvard.edu> wrote: > On Sat, 9 Jan 2016, Paul Menzel wrote: > >> Version: 4.4~rc8-1~exp1 >> >> Dear Alan, >> >> >> Thank you for your help! >> >> There were some follow-ups to the bug report [1], but I think you and I >> were not in CC. > > I wasn't. > >> > http://marc.info/?l=linux-scsi&m=144185206825609&w=2 >> > http://marc.info/?l=linux-scsi&m=144185208525611&w=2 > >> I can only say, that I am still unable to boot my system with Linux >> 4.4-rc8 [2]. Are these patches included there? > > They are. I don't see how they could cause a NULL pointer dereference > in sd_resume(), though. If you revert them, does the problem go away? > > Also, can you add some debugging statements to sd_resume() so we can > see where the NULL pointer comes from? > > Alan Stern > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] 2016-01-10 11:44 ` Erich Schubert @ 2016-01-10 15:32 ` Alan Stern 0 siblings, 0 replies; 22+ messages in thread From: Alan Stern @ 2016-01-10 15:32 UTC (permalink / raw) To: Erich Schubert Cc: Paul Menzel, Ben Hutchings, SCSI development list, 801925, Alexandre Rossi On Sun, 10 Jan 2016, Erich Schubert wrote: > Hi all, > 4.4-rc8 does not fix the problem for me. > Anything beyond 4.1.0 remains unable to boot this computer. > > Unfortunately, because the error occurs during early early SCSI > initialization, I do not have easy access to the log - no disk, no > network. > It happens during SATA initialization: "scsi_runtime_resume". You didn't include any debugging information. However... > So my back trace looks different than Alex in > https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=42;filename=scsi-null-pointer-dereference.log;bug=801925;att=1 > but like the one Paul is seeing: > https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=15;filename=20151020_006.jpg;bug=801925;att=3 The information in that bug report says that the failure happens in sr_runtime_resume, not in scsi_runtime_resume. Compare with the Subject: line in this email thread. > I will try to do a photo next time, too. If I send you a patch, can you build and test it? Alan Stern ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-02-23 15:14 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAGKbab941Nh7Sy1NTZC6ySxG_P5g7HpjATQ_GSCvDY8y=qgmHA@mail.gmail.com>
2016-01-19 16:38 ` NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod] Alan Stern
2016-01-19 16:52 ` Paul Menzel
2016-01-19 21:08 ` Alan Stern
2016-01-19 23:20 ` Paul Menzel
2016-01-20 22:07 ` Alexandre Rossi
2016-02-09 16:47 ` Paul Menzel
2016-02-09 19:56 ` Alexandre Rossi
2016-02-09 20:51 ` Ben Hutchings
2016-02-18 16:27 ` Alan Stern
2016-02-22 23:15 ` Alexandre Rossi
2016-02-23 15:14 ` Alan Stern
2015-10-16 1:05 Paul Menzel
2015-10-16 7:54 ` Paul Menzel
2015-10-16 8:52 ` Paul Menzel
2015-10-20 1:39 ` Ben Hutchings
2015-10-31 9:39 ` Paul Menzel
2015-11-01 1:56 ` Alan Stern
2015-11-01 2:05 ` Alan Stern
2016-01-09 15:23 ` Paul Menzel
2016-01-09 16:36 ` Alan Stern
2016-01-10 11:44 ` Erich Schubert
2016-01-10 15:32 ` Alan Stern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox