public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Kernel problem on rx2800 i2
@ 2019-06-21 20:08 Frank Scheiner
  2019-06-26 15:58 ` Christoph Hellwig
  2019-06-26 16:15 ` John Paul Adrian Glaubitz
  0 siblings, 2 replies; 3+ messages in thread
From: Frank Scheiner @ 2019-06-21 20:08 UTC (permalink / raw)
  To: linux-ia64

Hi there,

recent testing of a Debian v4.19.37 kernel showed a problem on my rx2800
i2 happening during kernel boot:

```
[    0.000000] Linux version 4.19.0-5-itanium
(debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian
8.3.0-10~ia64.1)) #1 SMP Debian 4.19.37-3 (2019-05-18)
[    0.000000] EFI v2.10 by HP:
[    0.000000] efi:  SALsystab=0x6fdd63a18  ACPI 2.0=0x3d3c4014
HCDP=0x6ffff8798  SMBIOS=0x3d368000
[    0.000000] booting generic kernel on platform dig
[    0.000000] PCDP: v3 at 0x6ffff8798
[    0.000000] earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
[    0.000000] bootconsole [uart8250] enabled
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
[    0.000000] ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2
00000001      01000013)
[...]
[   13.993718] Unpacking initramfs...
[...]
[   22.655630] Run /init as init process
[   22.818930] SCSI subsystem initialized
[   22.844653] ACPI: bus type USB registered
[   22.878940] HP HPSA Driver (v 3.4.20-125)
[   22.930628] usbcore: registered new interface driver usbfs
[   23.072034] usbcore: registered new interface driver hub
[   23.072925] hpsa 0000:01:00.0: Logical aborts not supported
[   23.150942] usbcore: registered new device driver usb
[   23.231690] hpsa 0000:01:00.0: HP SSD Smart Path aborts not supported
[   23.306942] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   23.417101] systemd-udevd[115]: NaT consumption 2216203124768 [1]
[   23.488663] ehci-pci: EHCI PCI platform driver
[   23.490942] uhci_hcd: USB Universal Host Controller Interface driver
[   23.420927] Modules linked in: uhci_hcd(+) ehci_pci(+) ehci_hcd
hpsa(+) scsi_transport_sas usbcore scsi_mod usb_common
[   23.420927]
[   23.420927] CPU: 6 PID: 115 Comm: systemd-udevd Not tainted
4.19.0-5-itanium #1 Debian 4.19.37-3
[   23.420927] Hardware name: hp Integrity rx2800 i2, BIOS 01.93 09/12/2012
[   23.420927] psr : 0000121008026010 ifs : 8000000000002046 ip  :
[<a0000001002af041>]    Not tainted (4.19.0-5-itanium Debian 4.19.37-3)
[   23.420927] ip is at __alloc_pages_nodemask+0x261/0x20c0
[   23.420927] unat: 0000000000000000 pfs : 0000000000000793 rsc :
0000000000000003
[   23.420927] rnat: 0000000000000000 bsps: 0000000000000000 pr  :
85aaa9a99a6a6659
[   23.420927] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr:
0009804c8a70433f
[   23.420927] csd : 0000000000000000 ssd : 0000000000000000
[   23.420927] b0  : a0000001001710e0 b6  : a0000001003948c0 b7  :
a0000001000469c0
[   23.420927] f6  : 10012bffff00000000000 f7  : 1003e00000000000bffff
[   23.420927] f8  : 1003e0000000000003fc0 f9  : 1003effffffffffffffab
[   23.420927] f10 : 10016818d087e7cd81a78 f11 : 1003e000000000000002a
[   23.420927] r1  : a0000001015d6ba0 r2  : a0000001013643c8 r3  :
fffffffffffc04b8
[   23.420927] r8  : 0000000000001440 r9  : e000000001507708 r10 :
0000000000000008
[   23.420927] r11 : ffffffffffd8d818 r12 : e000000682fcfbd0 r13 :
e000000682fc8000
[   23.420927] r14 : a0000001013643b8 r15 : ffffffffffd8d828 r16 :
00000000007fffff
[   23.420927] r17 : 0000000000000008 r18 : 0000000000000000 r19 :
e000000001507710
[   23.420927] r20 : 0000000000000000 r21 : 0000000000002500 r22 :
0000000000000000
[   23.420927] r23 : 0000000000000000 r24 : 0000000000000000 r25 :
0000000000000000
[   23.420927] r26 : 0000000000000000 r27 : 0000000000000000 r28 :
e000000682fc87b0
[   23.420927] r29 : 0000000000200000 r30 : 0000000000000000 r31 :
0000000000000000
[   23.420927]
[   23.420927] Call Trace:
[   23.420927]  [<a000000100014bd0>] show_stack+0x90/0xc0
[   23.420927]                                 sp=e000000682fcf790
bsp=e000000682fc9c80
[   23.420927]  [<a0000001000152d0>] show_regs+0x6d0/0xa00
[   23.420927]                                 sp=e000000682fcf960
bsp=e000000682fc9c10
[   23.420927]  [<a000000100029330>] die+0x1b0/0x460
[   23.420927]                                 sp=e000000682fcf980
bsp=e000000682fc9bc8
[   23.420927]  [<a000000100e75100>] ia64_fault+0x5a0/0xf60
[   23.420927]                                 sp=e000000682fcf980
bsp=e000000682fc9b70
[   23.420927]  [<a00000010000c9c0>] ia64_leave_kernel+0x0/0x270
[   23.420927]                                 sp=e000000682fcfa00
bsp=e000000682fc9b70
[   23.420927]  [<a0000001002af040>] __alloc_pages_nodemask+0x260/0x20c0
[   23.420927]                                 sp=e000000682fcfbd0
bsp=e000000682fc9938
[   23.420927]  [<a0000001001710e0>] dma_direct_alloc+0x140/0x2e0
[   23.420927]                                 sp=e000000682fcfc40
bsp=e000000682fc98c0
[   23.420927]  [<a000000100173910>] swiotlb_alloc+0x50/0x2e0
[   23.420927]                                 sp=e000000682fcfc40
bsp=e000000682fc9868
```

The machine doesn't continue boot afterwards. The machine boots fine
with a 4.14.x with Gentoo patches but also no later minor kernel version
with Gentoo patches works on it. With some testing I could limit the
Linux versions, between which the problematic change could have been
introduced, to 4.15.x and 4.16.x. Bisecting between tag v4.15.18 (good)
and tag v4.16-rc1 (bad) pointed to commit
543cea9accd9804307541cb93d3ed7ec94b07237 ([1]) as first bad commit.

[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=543cea9accd9804307541cb93d3ed7ec94b07237

The kernel messages with problematic kernels from the bisecting process
look different to the ones from the above shown v4.19.37 from Debian though:

```
Linux version 4.15.0-rc7-00047-g543cea9accd9-dirty (root@rx2800-i2) (gcc
version 7.3.0 (Gentoo 7.3.0-r3 p1.4)) #1 SMP Thu Jun 13 22:16:30 CEST 2019
EFI v2.10 by HP:
efi:  SALsystab=0xdfdd63a18  ACPI 2.0=0x3d3c4014  HCDP=0xdffff8798
SMBIOS=0x3d368000
booting generic kernel on platform dig
PCDP: v3 at 0xdffff8798
earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
bootconsole [uart8250] enabled
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2 00000001
01000013)
[...]
Trying to unpack rootfs image as initramfs...
[...]
Loading Adaptec I2O RAID: Version 2.4 Build 5go
Detecting Adaptec I2O RAID controllers...
ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA
mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ccc ems
Unable to handle kernel NULL pointer dereference (address 0000000000001688)
swapper/0[1]: Oops 11012296146944 [1]
Modules linked in:

CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.15.0-rc7-00047-g543cea9accd9-dirty #1
Hardware name: hp Integrity rx2800 i2, BIOS 01.93 09/12/2012
psr : 00001210084a6010 ifs : 8000000000001734 ip  : [<a000000100180401>]
    Not tainted (4.15.0-rc7-00047-g543cea9accd9-dirty)
ip is at __alloc_pages_nodemask+0x1a1/0x1670
unat: 0000000000000000 pfs : 0000000000001734 rsc : 0000000000000003
rnat: 000000038c5ad78d bsps: 000000000001003e pr  : 565595a66aa65799
ldrs: 0000000000000000 ccv : 000000032e40a799 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001802c0 b6  : a000000100050b50 b7  : a0000001007e83d0
f6  : 1003e0000000000000000 f7  : 1003e00000000000164ff
f8  : 1003e0000000000000f00 f9  : 1003e000000000000000f
f10 : 1003e0000000000000400 f11 : 1003e0000000000003c00
r1  : a00000010155edc0 r2  : a0000001012b5e90 r3  : 0000000001ffffff
r8  : 0000000000001680 r9  : 0000000000250015 r10 : e000000001519980
r11 : e000000001519988 r12 : e000000d8334fcf0 r13 : e000000d83348000
r14 : ffffffffffd570d0 r15 : 0000000000000008 r16 : e000000001519990
r17 : 0000000000000000 r18 : 0000000000001680 r19 : 0000000000000000
r20 : 0000000000000000 r21 : 0000000000000000 r22 : 0000000000000000
r23 : 0000000000000000 r24 : ffffffffffd570c0 r25 : a0000001012b5e80
r26 : 0000000000000000 r27 : 0000000000000000 r28 : 0000000000001688
r29 : 0000000000000358 r30 : 0000000000000000 r31 : 0000000000000081

Call Trace:
  [<a000000100013760>] show_stack+0x40/0x90
                                 sp=e000000d8334f8c0 bsp=e000000d83349890
  [<a0000001000140e0>] show_regs+0x930/0x940
                                 sp=e000000d8334fa90 bsp=e000000d83349820
  [<a00000010003a7d0>] die+0x1a0/0x2f0
                                 sp=e000000d8334fa90 bsp=e000000d833497d8
  [<a000000100063140>] ia64_do_page_fault+0x830/0xa30
                                 sp=e000000d8334fa90 bsp=e000000d83349740
  [<a00000010000c400>] ia64_leave_kernel+0x0/0x270
                                 sp=e000000d8334fb20 bsp=e000000d83349740
  [<a000000100180400>] __alloc_pages_nodemask+0x1a0/0x1670
                                 sp=e000000d8334fcf0 bsp=e000000d83349598
  [<a000000100d70100>] dma_direct_alloc+0x170/0x470
                                 sp=e000000d8334fd50 bsp=e000000d83349518
  [<a0000001006a8770>] swiotlb_alloc+0x50/0x90
                                 sp=e000000d8334fd50 bsp=e000000d833494d8
  [<a00000010083abd0>] dmam_alloc_coherent+0x250/0x2c0
                                 sp=e000000d8334fd50 bsp=e000000d83349488
  [<a0000001009990c0>] ahci_port_start+0x2f0/0x4b0
                                 sp=e000000d8334fd50 bsp=e000000d83349440
  [<a000000100958490>] ata_host_start+0x310/0x470
                                 sp=e000000d8334fd60 bsp=e000000d833493d0
  [<a000000100964a70>] ata_host_activate+0x20/0x290
                                 sp=e000000d8334fd60 bsp=e000000d83349370
  [<a000000100999570>] ahci_host_activate+0x2f0/0x300
                                 sp=e000000d8334fd60 bsp=e000000d83349300
  [<a0000001009923d0>] ahci_init_one+0x1580/0x20b0
                                 sp=e000000d8334fd60 bsp=e000000d83349258
  [<a0000001006d0610>] local_pci_probe+0x90/0x150
                                 sp=e000000d8334fdc0 bsp=e000000d83349218
  [<a0000001006d1a30>] pci_device_probe+0x2f0/0x310
                                 sp=e000000d8334fdc0 bsp=e000000d833491d8
  [<a0000001008229f0>] driver_probe_device+0x520/0x720
                                 sp=e000000d8334fde0 bsp=e000000d83349170
  [<a000000100822d10>] __driver_attach+0x120/0x190
                                 sp=e000000d8334fde0 bsp=e000000d83349140
  [<a00000010081ec00>] bus_for_each_dev+0x120/0x140
                                 sp=e000000d8334fde0 bsp=e000000d83349100
  [<a000000100821bf0>] driver_attach+0x40/0x60
                                 sp=e000000d8334fdf0 bsp=e000000d833490e0
  [<a0000001008211b0>] bus_add_driver+0x400/0x4a0
                                 sp=e000000d8334fdf0 bsp=e000000d83349090
  [<a000000100823fc0>] driver_register+0x240/0x2d0
                                 sp=e000000d8334fdf0 bsp=e000000d83349068
  [<a0000001006cfde0>] __pci_register_driver+0xa0/0xc0
                                 sp=e000000d8334fdf0 bsp=e000000d83349038
  [<a0000001010ecdb0>] ahci_pci_driver_init+0x50/0x70
                                 sp=e000000d8334fdf0 bsp=e000000d83349020
  [<a00000010000a950>] do_one_initcall+0x290/0x2a0
                                 sp=e000000d8334fdf0 bsp=e000000d83348fe0
  [<a0000001010a1c10>] kernel_init_freeable+0x400/0x430
                                 sp=e000000d8334fe30 bsp=e000000d83348f78
  [<a000000100d93860>] kernel_init+0x20/0x280
                                 sp=e000000d8334fe30 bsp=e000000d83348f58
  [<a00000010000c1f0>] call_payload+0x50/0x80
                                 sp=e000000d8334fe30 bsp=e000000d83348f40
Disabling lock debugging due to kernel taint
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
```

...but because of the result below - spoiler: a v4.19.37 kernel working
on my rx2800 i2 - I assume they're created by the very same issue.

Starting at tag v4.19.37 I then reverted the following commits:

* cf65a0f6f6ff7631ba0ac0513a14ca5b65320d80 [2]

* 16e73adbca76fd18733278cb688b0ddb4cad162c [3]

* 9d37c094dacda531ac3e529dd4dd139e3c0b7811 [4]

* 4fac8076df854aa4ddb8acbf6cce9d337300219e [5]

* 543cea9accd9804307541cb93d3ed7ec94b07237 [6]

...and compiled a kernel using the localmodconfig target to create a
minimal config. The resulting kernel booted fine on my rx2800 i2:

```
Linux version 4.19.37-00005-g55bd603c2590-dirty (root@rx2800-i2) (gcc
version 7.3.0 (Gentoo 7.3.0-r3 p1.4)) #1 SMP Thu Jun 20 23:58:57 CEST 2019
EFI v2.10 by HP:
efi:  SALsystab=0xdfdd63a18  ACPI 2.0=0x3d3c4014  HCDP=0xdffff8798
SMBIOS=0x3d368000
booting generic kernel on platform dig
PCDP: v3 at 0xdffff8798
earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
bootconsole [uart8250] enabled
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2 00000001
01000013)
[...]
  * Starting sshd ...
  [ ok ]
  * Starting local ...
  [ ok ]


This is rx2800-i2[...] (Linux ia64 4.19.37-00005-g55bd603c2590-dirty)
20:49:42
```

[2]:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cf65a0f6f6ff7631ba0ac0513a14ca5b65320d80

[3]:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=16e73adbca76fd18733278cb688b0ddb4cad162c

[4]:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9d37c094dacda531ac3e529dd4dd139e3c0b7811

[5]:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4fac8076df854aa4ddb8acbf6cce9d337300219e

[6]:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=543cea9accd9804307541cb93d3ed7ec94b07237

****

Please note:

* that I'm always using the "ia64: fix ptrace" patch ([7]) in addition,
as I'm compiling with gcc 7.3.0 on Gentoo;

[7]: https://lore.kernel.org/patchwork/patch/884685/

* that the original problem only shows on my rx2800 i2 and not on my
other ia64 gear (rx4640 with Madison, rx2620 with Montecito and rx2660
with Montvale), so could be related to the different system architecture
of the Tukwila based rx2800 i2 (UMA => NUMA IIC);

I just now tried to compile a more recent v5.2-rc5 kernel with the above
commits reverted, but that fails. There seem to have been further
changes made since v4.19.37 for which I would still need to find the
respective commits to revert. But I assume this work could be unneeded
for a further examination of the problem, so I don't follow this for
now. If it is needed please let me know.

James Clarke already had an idea what could be involved in this issue.
Maybe he can give his assessment.

If you want me to try a patch for a specific Linux version, please let
me know. The same if you need further information from me.

Cheers
Frank

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel problem on rx2800 i2
  2019-06-21 20:08 Kernel problem on rx2800 i2 Frank Scheiner
@ 2019-06-26 15:58 ` Christoph Hellwig
  2019-06-26 16:15 ` John Paul Adrian Glaubitz
  1 sibling, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2019-06-26 15:58 UTC (permalink / raw)
  To: linux-ia64

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="windows-1252", Size: 15763 bytes --]

I'm running out of luck trying to understand the issue with the
zone list.  Adding the ia64 mailing list in addition to Tony
to see if someone can figure out how a alloc_pages_node for the
node stored in an AHCI PCIe pci_dev could cause an oops in the
zonelist lookup.

On Fri, Jun 21, 2019 at 10:08:06PM +0200, Frank Scheiner wrote:
> Hi there,
>
> recent testing of a Debian v4.19.37 kernel showed a problem on my rx2800
> i2 happening during kernel boot:
>
> ```
> [    0.000000] Linux version 4.19.0-5-itanium
> (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian
> 8.3.0-10~ia64.1)) #1 SMP Debian 4.19.37-3 (2019-05-18)
> [    0.000000] EFI v2.10 by HP:
> [    0.000000] efi:  SALsystab=0x6fdd63a18  ACPI 2.0=0x3d3c4014
> HCDP=0x6ffff8798  SMBIOS=0x3d368000
> [    0.000000] booting generic kernel on platform dig
> [    0.000000] PCDP: v3 at 0x6ffff8798
> [    0.000000] earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
> [    0.000000] bootconsole [uart8250] enabled
> [    0.000000] ACPI: Early table checksum verification disabled
> [    0.000000] ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
> [    0.000000] ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2
> 00000001      01000013)
> [...]
> [   13.993718] Unpacking initramfs...
> [...]
> [   22.655630] Run /init as init process
> [   22.818930] SCSI subsystem initialized
> [   22.844653] ACPI: bus type USB registered
> [   22.878940] HP HPSA Driver (v 3.4.20-125)
> [   22.930628] usbcore: registered new interface driver usbfs
> [   23.072034] usbcore: registered new interface driver hub
> [   23.072925] hpsa 0000:01:00.0: Logical aborts not supported
> [   23.150942] usbcore: registered new device driver usb
> [   23.231690] hpsa 0000:01:00.0: HP SSD Smart Path aborts not supported
> [   23.306942] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [   23.417101] systemd-udevd[115]: NaT consumption 2216203124768 [1]
> [   23.488663] ehci-pci: EHCI PCI platform driver
> [   23.490942] uhci_hcd: USB Universal Host Controller Interface driver
> [   23.420927] Modules linked in: uhci_hcd(+) ehci_pci(+) ehci_hcd
> hpsa(+) scsi_transport_sas usbcore scsi_mod usb_common
> [   23.420927]
> [   23.420927] CPU: 6 PID: 115 Comm: systemd-udevd Not tainted
> 4.19.0-5-itanium #1 Debian 4.19.37-3
> [   23.420927] Hardware name: hp Integrity rx2800 i2, BIOS 01.93 09/12/2012
> [   23.420927] psr : 0000121008026010 ifs : 8000000000002046 ip  :
> [<a0000001002af041>]    Not tainted (4.19.0-5-itanium Debian 4.19.37-3)
> [   23.420927] ip is at __alloc_pages_nodemask+0x261/0x20c0
> [   23.420927] unat: 0000000000000000 pfs : 0000000000000793 rsc :
> 0000000000000003
> [   23.420927] rnat: 0000000000000000 bsps: 0000000000000000 pr  :
> 85aaa9a99a6a6659
> [   23.420927] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr:
> 0009804c8a70433f
> [   23.420927] csd : 0000000000000000 ssd : 0000000000000000
> [   23.420927] b0  : a0000001001710e0 b6  : a0000001003948c0 b7  :
> a0000001000469c0
> [   23.420927] f6  : 10012bffff00000000000 f7  : 1003e00000000000bffff
> [   23.420927] f8  : 1003e0000000000003fc0 f9  : 1003effffffffffffffab
> [   23.420927] f10 : 10016818d087e7cd81a78 f11 : 1003e000000000000002a
> [   23.420927] r1  : a0000001015d6ba0 r2  : a0000001013643c8 r3  :
> fffffffffffc04b8
> [   23.420927] r8  : 0000000000001440 r9  : e000000001507708 r10 :
> 0000000000000008
> [   23.420927] r11 : ffffffffffd8d818 r12 : e000000682fcfbd0 r13 :
> e000000682fc8000
> [   23.420927] r14 : a0000001013643b8 r15 : ffffffffffd8d828 r16 :
> 00000000007fffff
> [   23.420927] r17 : 0000000000000008 r18 : 0000000000000000 r19 :
> e000000001507710
> [   23.420927] r20 : 0000000000000000 r21 : 0000000000002500 r22 :
> 0000000000000000
> [   23.420927] r23 : 0000000000000000 r24 : 0000000000000000 r25 :
> 0000000000000000
> [   23.420927] r26 : 0000000000000000 r27 : 0000000000000000 r28 :
> e000000682fc87b0
> [   23.420927] r29 : 0000000000200000 r30 : 0000000000000000 r31 :
> 0000000000000000
> [   23.420927]
> [   23.420927] Call Trace:
> [   23.420927]  [<a000000100014bd0>] show_stack+0x90/0xc0
> [   23.420927]                                 spà00000682fcf790
> bspà00000682fc9c80
> [   23.420927]  [<a0000001000152d0>] show_regs+0x6d0/0xa00
> [   23.420927]                                 spà00000682fcf960
> bspà00000682fc9c10
> [   23.420927]  [<a000000100029330>] die+0x1b0/0x460
> [   23.420927]                                 spà00000682fcf980
> bspà00000682fc9bc8
> [   23.420927]  [<a000000100e75100>] ia64_fault+0x5a0/0xf60
> [   23.420927]                                 spà00000682fcf980
> bspà00000682fc9b70
> [   23.420927]  [<a00000010000c9c0>] ia64_leave_kernel+0x0/0x270
> [   23.420927]                                 spà00000682fcfa00
> bspà00000682fc9b70
> [   23.420927]  [<a0000001002af040>] __alloc_pages_nodemask+0x260/0x20c0
> [   23.420927]                                 spà00000682fcfbd0
> bspà00000682fc9938
> [   23.420927]  [<a0000001001710e0>] dma_direct_alloc+0x140/0x2e0
> [   23.420927]                                 spà00000682fcfc40
> bspà00000682fc98c0
> [   23.420927]  [<a000000100173910>] swiotlb_alloc+0x50/0x2e0
> [   23.420927]                                 spà00000682fcfc40
> bspà00000682fc9868
> ```
>
> The machine doesn't continue boot afterwards. The machine boots fine
> with a 4.14.x with Gentoo patches but also no later minor kernel version
> with Gentoo patches works on it. With some testing I could limit the
> Linux versions, between which the problematic change could have been
> introduced, to 4.15.x and 4.16.x. Bisecting between tag v4.15.18 (good)
> and tag v4.16-rc1 (bad) pointed to commit
> 543cea9accd9804307541cb93d3ed7ec94b07237 ([1]) as first bad commit.
>
> [1]:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idT3cea9accd9804307541cb93d3ed7ec94b07237
>
> The kernel messages with problematic kernels from the bisecting process
> look different to the ones from the above shown v4.19.37 from Debian though:
>
> ```
> Linux version 4.15.0-rc7-00047-g543cea9accd9-dirty (root@rx2800-i2) (gcc
> version 7.3.0 (Gentoo 7.3.0-r3 p1.4)) #1 SMP Thu Jun 13 22:16:30 CEST 2019
> EFI v2.10 by HP:
> efi:  SALsystab=0xdfdd63a18  ACPI 2.0=0x3d3c4014  HCDP=0xdffff8798
> SMBIOS=0x3d368000
> booting generic kernel on platform dig
> PCDP: v3 at 0xdffff8798
> earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
> bootconsole [uart8250] enabled
> ACPI: Early table checksum verification disabled
> ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
> ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2 00000001
> 01000013)
> [...]
> Trying to unpack rootfs image as initramfs...
> [...]
> Loading Adaptec I2O RAID: Version 2.4 Build 5go
> Detecting Adaptec I2O RAID controllers...
> ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA
> mode
> ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ccc ems
> Unable to handle kernel NULL pointer dereference (address 0000000000001688)
> swapper/0[1]: Oops 11012296146944 [1]
> Modules linked in:
>
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.15.0-rc7-00047-g543cea9accd9-dirty #1
> Hardware name: hp Integrity rx2800 i2, BIOS 01.93 09/12/2012
> psr : 00001210084a6010 ifs : 8000000000001734 ip  : [<a000000100180401>]
>    Not tainted (4.15.0-rc7-00047-g543cea9accd9-dirty)
> ip is at __alloc_pages_nodemask+0x1a1/0x1670
> unat: 0000000000000000 pfs : 0000000000001734 rsc : 0000000000000003
> rnat: 000000038c5ad78d bsps: 000000000001003e pr  : 565595a66aa65799
> ldrs: 0000000000000000 ccv : 000000032e40a799 fpsr: 0009804c8a70433f
> csd : 0000000000000000 ssd : 0000000000000000
> b0  : a0000001001802c0 b6  : a000000100050b50 b7  : a0000001007e83d0
> f6  : 1003e0000000000000000 f7  : 1003e00000000000164ff
> f8  : 1003e0000000000000f00 f9  : 1003e000000000000000f
> f10 : 1003e0000000000000400 f11 : 1003e0000000000003c00
> r1  : a00000010155edc0 r2  : a0000001012b5e90 r3  : 0000000001ffffff
> r8  : 0000000000001680 r9  : 0000000000250015 r10 : e000000001519980
> r11 : e000000001519988 r12 : e000000d8334fcf0 r13 : e000000d83348000
> r14 : ffffffffffd570d0 r15 : 0000000000000008 r16 : e000000001519990
> r17 : 0000000000000000 r18 : 0000000000001680 r19 : 0000000000000000
> r20 : 0000000000000000 r21 : 0000000000000000 r22 : 0000000000000000
> r23 : 0000000000000000 r24 : ffffffffffd570c0 r25 : a0000001012b5e80
> r26 : 0000000000000000 r27 : 0000000000000000 r28 : 0000000000001688
> r29 : 0000000000000358 r30 : 0000000000000000 r31 : 0000000000000081
>
> Call Trace:
>  [<a000000100013760>] show_stack+0x40/0x90
>                                 spà00000d8334f8c0 bspà00000d83349890
>  [<a0000001000140e0>] show_regs+0x930/0x940
>                                 spà00000d8334fa90 bspà00000d83349820
>  [<a00000010003a7d0>] die+0x1a0/0x2f0
>                                 spà00000d8334fa90 bspà00000d833497d8
>  [<a000000100063140>] ia64_do_page_fault+0x830/0xa30
>                                 spà00000d8334fa90 bspà00000d83349740
>  [<a00000010000c400>] ia64_leave_kernel+0x0/0x270
>                                 spà00000d8334fb20 bspà00000d83349740
>  [<a000000100180400>] __alloc_pages_nodemask+0x1a0/0x1670
>                                 spà00000d8334fcf0 bspà00000d83349598
>  [<a000000100d70100>] dma_direct_alloc+0x170/0x470
>                                 spà00000d8334fd50 bspà00000d83349518
>  [<a0000001006a8770>] swiotlb_alloc+0x50/0x90
>                                 spà00000d8334fd50 bspà00000d833494d8
>  [<a00000010083abd0>] dmam_alloc_coherent+0x250/0x2c0
>                                 spà00000d8334fd50 bspà00000d83349488
>  [<a0000001009990c0>] ahci_port_start+0x2f0/0x4b0
>                                 spà00000d8334fd50 bspà00000d83349440
>  [<a000000100958490>] ata_host_start+0x310/0x470
>                                 spà00000d8334fd60 bspà00000d833493d0
>  [<a000000100964a70>] ata_host_activate+0x20/0x290
>                                 spà00000d8334fd60 bspà00000d83349370
>  [<a000000100999570>] ahci_host_activate+0x2f0/0x300
>                                 spà00000d8334fd60 bspà00000d83349300
>  [<a0000001009923d0>] ahci_init_one+0x1580/0x20b0
>                                 spà00000d8334fd60 bspà00000d83349258
>  [<a0000001006d0610>] local_pci_probe+0x90/0x150
>                                 spà00000d8334fdc0 bspà00000d83349218
>  [<a0000001006d1a30>] pci_device_probe+0x2f0/0x310
>                                 spà00000d8334fdc0 bspà00000d833491d8
>  [<a0000001008229f0>] driver_probe_device+0x520/0x720
>                                 spà00000d8334fde0 bspà00000d83349170
>  [<a000000100822d10>] __driver_attach+0x120/0x190
>                                 spà00000d8334fde0 bspà00000d83349140
>  [<a00000010081ec00>] bus_for_each_dev+0x120/0x140
>                                 spà00000d8334fde0 bspà00000d83349100
>  [<a000000100821bf0>] driver_attach+0x40/0x60
>                                 spà00000d8334fdf0 bspà00000d833490e0
>  [<a0000001008211b0>] bus_add_driver+0x400/0x4a0
>                                 spà00000d8334fdf0 bspà00000d83349090
>  [<a000000100823fc0>] driver_register+0x240/0x2d0
>                                 spà00000d8334fdf0 bspà00000d83349068
>  [<a0000001006cfde0>] __pci_register_driver+0xa0/0xc0
>                                 spà00000d8334fdf0 bspà00000d83349038
>  [<a0000001010ecdb0>] ahci_pci_driver_init+0x50/0x70
>                                 spà00000d8334fdf0 bspà00000d83349020
>  [<a00000010000a950>] do_one_initcall+0x290/0x2a0
>                                 spà00000d8334fdf0 bspà00000d83348fe0
>  [<a0000001010a1c10>] kernel_init_freeable+0x400/0x430
>                                 spà00000d8334fe30 bspà00000d83348f78
>  [<a000000100d93860>] kernel_init+0x20/0x280
>                                 spà00000d8334fe30 bspà00000d83348f58
>  [<a00000010000c1f0>] call_payload+0x50/0x80
>                                 spà00000d8334fe30 bspà00000d83348f40
> Disabling lock debugging due to kernel taint
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> ```
>
> ...but because of the result below - spoiler: a v4.19.37 kernel working
> on my rx2800 i2 - I assume they're created by the very same issue.
>
> Starting at tag v4.19.37 I then reverted the following commits:
>
> * cf65a0f6f6ff7631ba0ac0513a14ca5b65320d80 [2]
>
> * 16e73adbca76fd18733278cb688b0ddb4cad162c [3]
>
> * 9d37c094dacda531ac3e529dd4dd139e3c0b7811 [4]
>
> * 4fac8076df854aa4ddb8acbf6cce9d337300219e [5]
>
> * 543cea9accd9804307541cb93d3ed7ec94b07237 [6]
>
> ...and compiled a kernel using the localmodconfig target to create a
> minimal config. The resulting kernel booted fine on my rx2800 i2:
>
> ```
> Linux version 4.19.37-00005-g55bd603c2590-dirty (root@rx2800-i2) (gcc
> version 7.3.0 (Gentoo 7.3.0-r3 p1.4)) #1 SMP Thu Jun 20 23:58:57 CEST 2019
> EFI v2.10 by HP:
> efi:  SALsystab=0xdfdd63a18  ACPI 2.0=0x3d3c4014  HCDP=0xdffff8798
> SMBIOS=0x3d368000
> booting generic kernel on platform dig
> PCDP: v3 at 0xdffff8798
> earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
> bootconsole [uart8250] enabled
> ACPI: Early table checksum verification disabled
> ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
> ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2 00000001
> 01000013)
> [...]
>  * Starting sshd ...
>  [ ok ]
>  * Starting local ...
>  [ ok ]
>
>
> This is rx2800-i2[...] (Linux ia64 4.19.37-00005-g55bd603c2590-dirty)
> 20:49:42
> ```
>
> [2]:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idÏ65a0f6f6ff7631ba0ac0513a14ca5b65320d80
>
> [3]:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id\x16e73adbca76fd18733278cb688b0ddb4cad162c
>
> [4]:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id37c094dacda531ac3e529dd4dd139e3c0b7811
>
> [5]:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idOac8076df854aa4ddb8acbf6cce9d337300219e
>
> [6]:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idT3cea9accd9804307541cb93d3ed7ec94b07237
>
> ****
>
> Please note:
>
> * that I'm always using the "ia64: fix ptrace" patch ([7]) in addition,
> as I'm compiling with gcc 7.3.0 on Gentoo;
>
> [7]: https://lore.kernel.org/patchwork/patch/884685/
>
> * that the original problem only shows on my rx2800 i2 and not on my
> other ia64 gear (rx4640 with Madison, rx2620 with Montecito and rx2660
> with Montvale), so could be related to the different system architecture
> of the Tukwila based rx2800 i2 (UMA => NUMA IIC);
>
> I just now tried to compile a more recent v5.2-rc5 kernel with the above
> commits reverted, but that fails. There seem to have been further
> changes made since v4.19.37 for which I would still need to find the
> respective commits to revert. But I assume this work could be unneeded
> for a further examination of the problem, so I don't follow this for
> now. If it is needed please let me know.
>
> James Clarke already had an idea what could be involved in this issue.
> Maybe he can give his assessment.
>
> If you want me to try a patch for a specific Linux version, please let
> me know. The same if you need further information from me.
>
> Cheers
> Frank
---end quoted text---

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel problem on rx2800 i2
  2019-06-21 20:08 Kernel problem on rx2800 i2 Frank Scheiner
  2019-06-26 15:58 ` Christoph Hellwig
@ 2019-06-26 16:15 ` John Paul Adrian Glaubitz
  1 sibling, 0 replies; 3+ messages in thread
From: John Paul Adrian Glaubitz @ 2019-06-26 16:15 UTC (permalink / raw)
  To: linux-ia64

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="windows-1252", Size: 16464 bytes --]

Hi!

I'm CC'ing Michael Karcher, who is really good at tracking down
such bugs.

Adrian

On 6/26/19 5:58 PM, Christoph Hellwig wrote:
> I'm running out of luck trying to understand the issue with the
> zone list.  Adding the ia64 mailing list in addition to Tony
> to see if someone can figure out how a alloc_pages_node for the
> node stored in an AHCI PCIe pci_dev could cause an oops in the
> zonelist lookup.
> 
> On Fri, Jun 21, 2019 at 10:08:06PM +0200, Frank Scheiner wrote:
>> Hi there,
>>
>> recent testing of a Debian v4.19.37 kernel showed a problem on my rx2800
>> i2 happening during kernel boot:
>>
>> ```
>> [    0.000000] Linux version 4.19.0-5-itanium
>> (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian
>> 8.3.0-10~ia64.1)) #1 SMP Debian 4.19.37-3 (2019-05-18)
>> [    0.000000] EFI v2.10 by HP:
>> [    0.000000] efi:  SALsystab=0x6fdd63a18  ACPI 2.0=0x3d3c4014
>> HCDP=0x6ffff8798  SMBIOS=0x3d368000
>> [    0.000000] booting generic kernel on platform dig
>> [    0.000000] PCDP: v3 at 0x6ffff8798
>> [    0.000000] earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
>> [    0.000000] bootconsole [uart8250] enabled
>> [    0.000000] ACPI: Early table checksum verification disabled
>> [    0.000000] ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
>> [    0.000000] ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2
>> 00000001      01000013)
>> [...]
>> [   13.993718] Unpacking initramfs...
>> [...]
>> [   22.655630] Run /init as init process
>> [   22.818930] SCSI subsystem initialized
>> [   22.844653] ACPI: bus type USB registered
>> [   22.878940] HP HPSA Driver (v 3.4.20-125)
>> [   22.930628] usbcore: registered new interface driver usbfs
>> [   23.072034] usbcore: registered new interface driver hub
>> [   23.072925] hpsa 0000:01:00.0: Logical aborts not supported
>> [   23.150942] usbcore: registered new device driver usb
>> [   23.231690] hpsa 0000:01:00.0: HP SSD Smart Path aborts not supported
>> [   23.306942] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
>> [   23.417101] systemd-udevd[115]: NaT consumption 2216203124768 [1]
>> [   23.488663] ehci-pci: EHCI PCI platform driver
>> [   23.490942] uhci_hcd: USB Universal Host Controller Interface driver
>> [   23.420927] Modules linked in: uhci_hcd(+) ehci_pci(+) ehci_hcd
>> hpsa(+) scsi_transport_sas usbcore scsi_mod usb_common
>> [   23.420927]
>> [   23.420927] CPU: 6 PID: 115 Comm: systemd-udevd Not tainted
>> 4.19.0-5-itanium #1 Debian 4.19.37-3
>> [   23.420927] Hardware name: hp Integrity rx2800 i2, BIOS 01.93 09/12/2012
>> [   23.420927] psr : 0000121008026010 ifs : 8000000000002046 ip  :
>> [<a0000001002af041>]    Not tainted (4.19.0-5-itanium Debian 4.19.37-3)
>> [   23.420927] ip is at __alloc_pages_nodemask+0x261/0x20c0
>> [   23.420927] unat: 0000000000000000 pfs : 0000000000000793 rsc :
>> 0000000000000003
>> [   23.420927] rnat: 0000000000000000 bsps: 0000000000000000 pr  :
>> 85aaa9a99a6a6659
>> [   23.420927] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr:
>> 0009804c8a70433f
>> [   23.420927] csd : 0000000000000000 ssd : 0000000000000000
>> [   23.420927] b0  : a0000001001710e0 b6  : a0000001003948c0 b7  :
>> a0000001000469c0
>> [   23.420927] f6  : 10012bffff00000000000 f7  : 1003e00000000000bffff
>> [   23.420927] f8  : 1003e0000000000003fc0 f9  : 1003effffffffffffffab
>> [   23.420927] f10 : 10016818d087e7cd81a78 f11 : 1003e000000000000002a
>> [   23.420927] r1  : a0000001015d6ba0 r2  : a0000001013643c8 r3  :
>> fffffffffffc04b8
>> [   23.420927] r8  : 0000000000001440 r9  : e000000001507708 r10 :
>> 0000000000000008
>> [   23.420927] r11 : ffffffffffd8d818 r12 : e000000682fcfbd0 r13 :
>> e000000682fc8000
>> [   23.420927] r14 : a0000001013643b8 r15 : ffffffffffd8d828 r16 :
>> 00000000007fffff
>> [   23.420927] r17 : 0000000000000008 r18 : 0000000000000000 r19 :
>> e000000001507710
>> [   23.420927] r20 : 0000000000000000 r21 : 0000000000002500 r22 :
>> 0000000000000000
>> [   23.420927] r23 : 0000000000000000 r24 : 0000000000000000 r25 :
>> 0000000000000000
>> [   23.420927] r26 : 0000000000000000 r27 : 0000000000000000 r28 :
>> e000000682fc87b0
>> [   23.420927] r29 : 0000000000200000 r30 : 0000000000000000 r31 :
>> 0000000000000000
>> [   23.420927]
>> [   23.420927] Call Trace:
>> [   23.420927]  [<a000000100014bd0>] show_stack+0x90/0xc0
>> [   23.420927]                                 spà00000682fcf790
>> bspà00000682fc9c80
>> [   23.420927]  [<a0000001000152d0>] show_regs+0x6d0/0xa00
>> [   23.420927]                                 spà00000682fcf960
>> bspà00000682fc9c10
>> [   23.420927]  [<a000000100029330>] die+0x1b0/0x460
>> [   23.420927]                                 spà00000682fcf980
>> bspà00000682fc9bc8
>> [   23.420927]  [<a000000100e75100>] ia64_fault+0x5a0/0xf60
>> [   23.420927]                                 spà00000682fcf980
>> bspà00000682fc9b70
>> [   23.420927]  [<a00000010000c9c0>] ia64_leave_kernel+0x0/0x270
>> [   23.420927]                                 spà00000682fcfa00
>> bspà00000682fc9b70
>> [   23.420927]  [<a0000001002af040>] __alloc_pages_nodemask+0x260/0x20c0
>> [   23.420927]                                 spà00000682fcfbd0
>> bspà00000682fc9938
>> [   23.420927]  [<a0000001001710e0>] dma_direct_alloc+0x140/0x2e0
>> [   23.420927]                                 spà00000682fcfc40
>> bspà00000682fc98c0
>> [   23.420927]  [<a000000100173910>] swiotlb_alloc+0x50/0x2e0
>> [   23.420927]                                 spà00000682fcfc40
>> bspà00000682fc9868
>> ```
>>
>> The machine doesn't continue boot afterwards. The machine boots fine
>> with a 4.14.x with Gentoo patches but also no later minor kernel version
>> with Gentoo patches works on it. With some testing I could limit the
>> Linux versions, between which the problematic change could have been
>> introduced, to 4.15.x and 4.16.x. Bisecting between tag v4.15.18 (good)
>> and tag v4.16-rc1 (bad) pointed to commit
>> 543cea9accd9804307541cb93d3ed7ec94b07237 ([1]) as first bad commit.
>>
>> [1]:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idT3cea9accd9804307541cb93d3ed7ec94b07237
>>
>> The kernel messages with problematic kernels from the bisecting process
>> look different to the ones from the above shown v4.19.37 from Debian though:
>>
>> ```
>> Linux version 4.15.0-rc7-00047-g543cea9accd9-dirty (root@rx2800-i2) (gcc
>> version 7.3.0 (Gentoo 7.3.0-r3 p1.4)) #1 SMP Thu Jun 13 22:16:30 CEST 2019
>> EFI v2.10 by HP:
>> efi:  SALsystab=0xdfdd63a18  ACPI 2.0=0x3d3c4014  HCDP=0xdffff8798
>> SMBIOS=0x3d368000
>> booting generic kernel on platform dig
>> PCDP: v3 at 0xdffff8798
>> earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
>> bootconsole [uart8250] enabled
>> ACPI: Early table checksum verification disabled
>> ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
>> ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2 00000001
>> 01000013)
>> [...]
>> Trying to unpack rootfs image as initramfs...
>> [...]
>> Loading Adaptec I2O RAID: Version 2.4 Build 5go
>> Detecting Adaptec I2O RAID controllers...
>> ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA
>> mode
>> ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ccc ems
>> Unable to handle kernel NULL pointer dereference (address 0000000000001688)
>> swapper/0[1]: Oops 11012296146944 [1]
>> Modules linked in:
>>
>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>> 4.15.0-rc7-00047-g543cea9accd9-dirty #1
>> Hardware name: hp Integrity rx2800 i2, BIOS 01.93 09/12/2012
>> psr : 00001210084a6010 ifs : 8000000000001734 ip  : [<a000000100180401>]
>>    Not tainted (4.15.0-rc7-00047-g543cea9accd9-dirty)
>> ip is at __alloc_pages_nodemask+0x1a1/0x1670
>> unat: 0000000000000000 pfs : 0000000000001734 rsc : 0000000000000003
>> rnat: 000000038c5ad78d bsps: 000000000001003e pr  : 565595a66aa65799
>> ldrs: 0000000000000000 ccv : 000000032e40a799 fpsr: 0009804c8a70433f
>> csd : 0000000000000000 ssd : 0000000000000000
>> b0  : a0000001001802c0 b6  : a000000100050b50 b7  : a0000001007e83d0
>> f6  : 1003e0000000000000000 f7  : 1003e00000000000164ff
>> f8  : 1003e0000000000000f00 f9  : 1003e000000000000000f
>> f10 : 1003e0000000000000400 f11 : 1003e0000000000003c00
>> r1  : a00000010155edc0 r2  : a0000001012b5e90 r3  : 0000000001ffffff
>> r8  : 0000000000001680 r9  : 0000000000250015 r10 : e000000001519980
>> r11 : e000000001519988 r12 : e000000d8334fcf0 r13 : e000000d83348000
>> r14 : ffffffffffd570d0 r15 : 0000000000000008 r16 : e000000001519990
>> r17 : 0000000000000000 r18 : 0000000000001680 r19 : 0000000000000000
>> r20 : 0000000000000000 r21 : 0000000000000000 r22 : 0000000000000000
>> r23 : 0000000000000000 r24 : ffffffffffd570c0 r25 : a0000001012b5e80
>> r26 : 0000000000000000 r27 : 0000000000000000 r28 : 0000000000001688
>> r29 : 0000000000000358 r30 : 0000000000000000 r31 : 0000000000000081
>>
>> Call Trace:
>>  [<a000000100013760>] show_stack+0x40/0x90
>>                                 spà00000d8334f8c0 bspà00000d83349890
>>  [<a0000001000140e0>] show_regs+0x930/0x940
>>                                 spà00000d8334fa90 bspà00000d83349820
>>  [<a00000010003a7d0>] die+0x1a0/0x2f0
>>                                 spà00000d8334fa90 bspà00000d833497d8
>>  [<a000000100063140>] ia64_do_page_fault+0x830/0xa30
>>                                 spà00000d8334fa90 bspà00000d83349740
>>  [<a00000010000c400>] ia64_leave_kernel+0x0/0x270
>>                                 spà00000d8334fb20 bspà00000d83349740
>>  [<a000000100180400>] __alloc_pages_nodemask+0x1a0/0x1670
>>                                 spà00000d8334fcf0 bspà00000d83349598
>>  [<a000000100d70100>] dma_direct_alloc+0x170/0x470
>>                                 spà00000d8334fd50 bspà00000d83349518
>>  [<a0000001006a8770>] swiotlb_alloc+0x50/0x90
>>                                 spà00000d8334fd50 bspà00000d833494d8
>>  [<a00000010083abd0>] dmam_alloc_coherent+0x250/0x2c0
>>                                 spà00000d8334fd50 bspà00000d83349488
>>  [<a0000001009990c0>] ahci_port_start+0x2f0/0x4b0
>>                                 spà00000d8334fd50 bspà00000d83349440
>>  [<a000000100958490>] ata_host_start+0x310/0x470
>>                                 spà00000d8334fd60 bspà00000d833493d0
>>  [<a000000100964a70>] ata_host_activate+0x20/0x290
>>                                 spà00000d8334fd60 bspà00000d83349370
>>  [<a000000100999570>] ahci_host_activate+0x2f0/0x300
>>                                 spà00000d8334fd60 bspà00000d83349300
>>  [<a0000001009923d0>] ahci_init_one+0x1580/0x20b0
>>                                 spà00000d8334fd60 bspà00000d83349258
>>  [<a0000001006d0610>] local_pci_probe+0x90/0x150
>>                                 spà00000d8334fdc0 bspà00000d83349218
>>  [<a0000001006d1a30>] pci_device_probe+0x2f0/0x310
>>                                 spà00000d8334fdc0 bspà00000d833491d8
>>  [<a0000001008229f0>] driver_probe_device+0x520/0x720
>>                                 spà00000d8334fde0 bspà00000d83349170
>>  [<a000000100822d10>] __driver_attach+0x120/0x190
>>                                 spà00000d8334fde0 bspà00000d83349140
>>  [<a00000010081ec00>] bus_for_each_dev+0x120/0x140
>>                                 spà00000d8334fde0 bspà00000d83349100
>>  [<a000000100821bf0>] driver_attach+0x40/0x60
>>                                 spà00000d8334fdf0 bspà00000d833490e0
>>  [<a0000001008211b0>] bus_add_driver+0x400/0x4a0
>>                                 spà00000d8334fdf0 bspà00000d83349090
>>  [<a000000100823fc0>] driver_register+0x240/0x2d0
>>                                 spà00000d8334fdf0 bspà00000d83349068
>>  [<a0000001006cfde0>] __pci_register_driver+0xa0/0xc0
>>                                 spà00000d8334fdf0 bspà00000d83349038
>>  [<a0000001010ecdb0>] ahci_pci_driver_init+0x50/0x70
>>                                 spà00000d8334fdf0 bspà00000d83349020
>>  [<a00000010000a950>] do_one_initcall+0x290/0x2a0
>>                                 spà00000d8334fdf0 bspà00000d83348fe0
>>  [<a0000001010a1c10>] kernel_init_freeable+0x400/0x430
>>                                 spà00000d8334fe30 bspà00000d83348f78
>>  [<a000000100d93860>] kernel_init+0x20/0x280
>>                                 spà00000d8334fe30 bspà00000d83348f58
>>  [<a00000010000c1f0>] call_payload+0x50/0x80
>>                                 spà00000d8334fe30 bspà00000d83348f40
>> Disabling lock debugging due to kernel taint
>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>>
>> ---[ end Kernel panic - not syncing: Attempted to kill init!
>> exitcode=0x0000000b
>> ```
>>
>> ...but because of the result below - spoiler: a v4.19.37 kernel working
>> on my rx2800 i2 - I assume they're created by the very same issue.
>>
>> Starting at tag v4.19.37 I then reverted the following commits:
>>
>> * cf65a0f6f6ff7631ba0ac0513a14ca5b65320d80 [2]
>>
>> * 16e73adbca76fd18733278cb688b0ddb4cad162c [3]
>>
>> * 9d37c094dacda531ac3e529dd4dd139e3c0b7811 [4]
>>
>> * 4fac8076df854aa4ddb8acbf6cce9d337300219e [5]
>>
>> * 543cea9accd9804307541cb93d3ed7ec94b07237 [6]
>>
>> ...and compiled a kernel using the localmodconfig target to create a
>> minimal config. The resulting kernel booted fine on my rx2800 i2:
>>
>> ```
>> Linux version 4.19.37-00005-g55bd603c2590-dirty (root@rx2800-i2) (gcc
>> version 7.3.0 (Gentoo 7.3.0-r3 p1.4)) #1 SMP Thu Jun 20 23:58:57 CEST 2019
>> EFI v2.10 by HP:
>> efi:  SALsystab=0xdfdd63a18  ACPI 2.0=0x3d3c4014  HCDP=0xdffff8798
>> SMBIOS=0x3d368000
>> booting generic kernel on platform dig
>> PCDP: v3 at 0xdffff8798
>> earlycon: uart8250 at I/O port 0x4000 (options '115200n8')
>> bootconsole [uart8250] enabled
>> ACPI: Early table checksum verification disabled
>> ACPI: RSDP 0x000000003D3C4014 000024 (v02 HP    )
>> ACPI: XSDT 0x000000003D3C4580 000124 (v01 HP     RX2800-2 00000001
>> 01000013)
>> [...]
>>  * Starting sshd ...
>>  [ ok ]
>>  * Starting local ...
>>  [ ok ]
>>
>>
>> This is rx2800-i2[...] (Linux ia64 4.19.37-00005-g55bd603c2590-dirty)
>> 20:49:42
>> ```
>>
>> [2]:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idÏ65a0f6f6ff7631ba0ac0513a14ca5b65320d80
>>
>> [3]:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id\x16e73adbca76fd18733278cb688b0ddb4cad162c
>>
>> [4]:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id37c094dacda531ac3e529dd4dd139e3c0b7811
>>
>> [5]:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idOac8076df854aa4ddb8acbf6cce9d337300219e
>>
>> [6]:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?idT3cea9accd9804307541cb93d3ed7ec94b07237
>>
>> ****
>>
>> Please note:
>>
>> * that I'm always using the "ia64: fix ptrace" patch ([7]) in addition,
>> as I'm compiling with gcc 7.3.0 on Gentoo;
>>
>> [7]: https://lore.kernel.org/patchwork/patch/884685/
>>
>> * that the original problem only shows on my rx2800 i2 and not on my
>> other ia64 gear (rx4640 with Madison, rx2620 with Montecito and rx2660
>> with Montvale), so could be related to the different system architecture
>> of the Tukwila based rx2800 i2 (UMA => NUMA IIC);
>>
>> I just now tried to compile a more recent v5.2-rc5 kernel with the above
>> commits reverted, but that fails. There seem to have been further
>> changes made since v4.19.37 for which I would still need to find the
>> respective commits to revert. But I assume this work could be unneeded
>> for a further examination of the problem, so I don't follow this for
>> now. If it is needed please let me know.
>>
>> James Clarke already had an idea what could be involved in this issue.
>> Maybe he can give his assessment.
>>
>> If you want me to try a patch for a specific Linux version, please let
>> me know. The same if you need further information from me.
>>
>> Cheers
>> Frank
> ---end quoted text---
> 

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-06-26 16:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-21 20:08 Kernel problem on rx2800 i2 Frank Scheiner
2019-06-26 15:58 ` Christoph Hellwig
2019-06-26 16:15 ` John Paul Adrian Glaubitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox