* [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
@ 2005-06-03 11:25 Vivek Goyal
2005-06-03 11:54 ` Richard B. Johnson
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Vivek Goyal @ 2005-06-03 11:25 UTC (permalink / raw)
To: linux kernel mailing list, greg
Cc: Fastboot mailing list, Morton Andrew Morton, Alan Stern,
Eric W. Biederman
[-- Attachment #1: Type: text/plain, Size: 5535 bytes --]
Hi,
In kdump, sometimes, general driver initialization issues seems to be cropping
in second kernel due to devices not being shutdown during crash and these
devices are sending interrupts while second kernel is booting and drivers are
not expecting any interrupts yet.
In some cases, we are observing a storm of interrupts while second kernel
is booting and kernel disables that irq line. May be the case of stuck irq
line because it is shared level triggered irq and there is no driver
loaded for the device.
So, we need something generic which disables interrupt generation from device
until a driver has been registered for that device and driver is ready to
receive the interrupts. PCI specifications (ver 2.3 onwards), have introduced
interrupt disable bit in command register to disable interrupt generation
from the device. This can become handy here. In capture kernel, traverse all
the PCI devices, disable interrupt generation. Enable the interrupt generation
back once the driver for that device registers. May be after the probe handler
has run. In probe handler, driver can reset the device or register for irq so
that it can handle any interrupt from the device after that.
Greg mentioned that there are reasons that we can not disable all pci
interrupts. Meanwhile I am going through archives to find more about it.
In previous conversations, Alan Stern had raised the issue of console also
not working if interrupts are disabled on all the devices. I am not sure
but this should be working at least for serial consoles and vga text consoles.
May be sufficient to capture the dump.
Attached is a hack patch which is by no means complete. I have got one machine
which has got some PCI 2.3 compliant hardware. After applying this hack this
problem does not occur atleast on this machine. Attached is the serial console
log which shows one kind of problem due to unwanted interrupts.
Bugme 4631 and 4573 are two more instances of driver initialization failure
in second kernel.
---
o In kdump, devices are not shutdown/reset after a crash and this leads to
various driver initialization failures while second kernel is booting.
Most of them seem to be happening due to the fact that system/driver
is receiving the interrupts from device when it is not prepared to do so.
o This patch tries to solve the problem by disabling the interrupts at
PCI level for all the devices. These interrupts are enabled back once the
driver for that device registers. Currenty this enabling and disabling
is done only in dump capture kernel.
o This is for devices compliant to PCI specification v2.3 or higher.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---
linux-2.6.12-rc5-mm1-16M-root/drivers/pci/pci-driver.c | 3 +
linux-2.6.12-rc5-mm1-16M-root/drivers/pci/pci.c | 4 ++
linux-2.6.12-rc5-mm1-16M-root/include/linux/pci.h | 33 +++++++++++++++++
3 files changed, 39 insertions(+), 1 deletion(-)
diff -puN drivers/pci/pci.c~kdump-pci-interrupt-disable drivers/pci/pci.c
--- linux-2.6.12-rc5-mm1-16M/drivers/pci/pci.c~kdump-pci-interrupt-disable 2005-06-03 14:35:26.000000000 +0530
+++ linux-2.6.12-rc5-mm1-16M-root/drivers/pci/pci.c 2005-06-03 15:59:19.000000000 +0530
@@ -823,6 +823,10 @@ static int __devinit pci_init(void)
while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
pci_fixup_device(pci_fixup_final, dev);
+
+ /* A hack to disable interrupts from all PCI devices in
+ * capture kernel. */
+ pci_disable_device_intx(dev);
}
return 0;
}
diff -puN drivers/pci/pci-driver.c~kdump-pci-interrupt-disable drivers/pci/pci-driver.c
--- linux-2.6.12-rc5-mm1-16M/drivers/pci/pci-driver.c~kdump-pci-interrupt-disable 2005-06-03 14:35:26.000000000 +0530
+++ linux-2.6.12-rc5-mm1-16M-root/drivers/pci/pci-driver.c 2005-06-03 14:35:26.000000000 +0530
@@ -248,7 +248,8 @@ static int pci_device_probe(struct devic
error = __pci_device_probe(drv, pci_dev);
if (error)
pci_dev_put(pci_dev);
-
+ else
+ pci_enable_device_intx(pci_dev);
return error;
}
diff -puN include/linux/pci.h~kdump-pci-interrupt-disable include/linux/pci.h
--- linux-2.6.12-rc5-mm1-16M/include/linux/pci.h~kdump-pci-interrupt-disable 2005-06-03 14:35:26.000000000 +0530
+++ linux-2.6.12-rc5-mm1-16M-root/include/linux/pci.h 2005-06-03 16:06:32.000000000 +0530
@@ -901,6 +901,39 @@ extern void pci_disable_msix(struct pci_
extern void msi_remove_pci_irq_vectors(struct pci_dev *dev);
#endif
+#ifdef CONFIG_CRASH_DUMP
+static inline int pci_enable_device_intx(struct pci_dev *dev)
+{
+ u16 pci_command;
+
+ /* Enable Interrupt generation if not already enabled */
+ pci_read_config_word(dev, PCI_COMMAND, &pci_command);
+ if (pci_command & PCI_COMMAND_INTX_DISABLE) {
+ pci_command &= ~PCI_COMMAND_INTX_DISABLE;
+ pci_write_config_word(dev, PCI_COMMAND, pci_command);
+ }
+ return 0;
+}
+
+static inline int pci_disable_device_intx(struct pci_dev *dev)
+{
+ u16 pci_command;
+
+ /* Disable Interrupt generation if not already disabled */
+ pci_read_config_word(dev, PCI_COMMAND, &pci_command);
+ if (!(pci_command & PCI_COMMAND_INTX_DISABLE)) {
+ pci_command |= PCI_COMMAND_INTX_DISABLE;
+ pci_write_config_word(dev, PCI_COMMAND, pci_command);
+ }
+ return 0;
+}
+#else
+static inline int pci_enable_device_intx(struct pci_dev *dev)
+{ return 0; }
+static inline int pci_disable_device_intx(struct pci_dev *dev)
+{ return 0; }
+#endif /* CONFIG_CRASH_DUMP */
+
#endif /* CONFIG_PCI */
/* Include architecture-dependent settings and functions */
_
[-- Attachment #2: pci-logs.txt --]
[-- Type: text/plain, Size: 16715 bytes --]
[root@llm15p ~]# SysRq : Trigger a crashdump
Linux version 2.6.12-rc5-mm1-16M (root@llm15p.in.ibm.com) (gcc version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)) #16 Fri Jun 3 14:24:23 IST 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000100 - 000000000009dc00 (usable)
BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
BIOS-e820: 0000000000100000 - 00000000c7fcb940 (usable)
BIOS-e820: 00000000c7fcb940 - 00000000c7fcf800 (ACPI data)
BIOS-e820: 00000000c7fcf800 - 00000000c8000000 (reserved)
BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
user-defined physical RAM map:
user: 0000000000000000 - 00000000000a0000 (usable)
user: 0000000001000000 - 0000000001486000 (usable)
user: 0000000001526400 - 0000000005000000 (usable)
0MB HIGHMEM available.
80MB LOWMEM available.
DMI 2.3 present.
Allocating PCI resources starting at 05000000 (gap: 05000000:fb000000)
Built 1 zonelists
Initializing CPU#0
Kernel command line: root=/dev/sda3 console=tty0 console=ttyS1,38400 rhgb memmap=exactmap memmap=640K@0K memmap=4632K@16384K memmap=60263K@21657K elfcorehdr=21656K
PID hash table entries: 512 (order: 9, 8192 bytes)
Detected 3600.765 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Unknown interrupt or fault at EIP 00000292 00000060 c140c6a3
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 59464k/81920k available (3030k kernel code, 5880k reserved, 1098k data, 196k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 7209.88 BogoMIPS (lpj=14419769)
Mount-cache hash table entries: 512
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
CPU: Intel(R) Xeon(TM) CPU 3.60GHz stepping 01
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
ACPI: setting ELCR to 0200 (from 0c20)
checking if image is initramfs... it is
Freeing initrd memory: 538k freed
softlockup thread 0 started up.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfd71e, last bus=9
PCI: Using MMCONFIG
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050408
ACPI: Interpreter enabled
ACPI: Using PIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
ACPI: Assume root bridge [\_SB_.PCI0] segment is 0
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
ACPI: PCI Interrupt Link [LP00] (IRQs *5)
ACPI: PCI Interrupt Link [LP01] (IRQs *11)
ACPI: PCI Interrupt Link [LP02] (IRQs *10)
ACPI: PCI Interrupt Link [LP03] (IRQs *11)
ACPI: PCI Interrupt Link [LP04] (IRQs *11)
ACPI: PCI Interrupt Link [LP05] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP06] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP07] (IRQs *5)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 16 devices
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
pnp: 00:00: ioport range 0x510-0x517 could not be reserved
pnp: 00:00: ioport range 0x504-0x507 could not be reserved
pnp: 00:00: ioport range 0x500-0x503 could not be reserved
pnp: 00:00: ioport range 0x520-0x53f has been reserved
pnp: 00:00: ioport range 0x540-0x547 has been reserved
pnp: 00:00: ioport range 0x460-0x461 has been reserved
pnp: 00:0e: ioport range 0x400-0x43f has been reserved
Machine check exception polling timer started.
audit: initializing netlink socket (disabled)
audit(1117808719.712:1): initialized
inotify device minor=63
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
ACPI: Power Button (FF) [PWRF]
ACPI: CPU0 (power states: C1[C1])
lp: driver loaded but no devices found
Linux agpgart interface v0.101 (c) Dave Jones
[drm] Initialized drm 1.0.0 20040925
cn_fork is registered
PNP: PS/2 controller has invalid data port 0x64; using default 0x60
PNP: PS/2 controller has invalid command port 0x60; using default 0x64
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP(,...)]
lp0: using parport0 (interrupt-driven).
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
tg3.c:v3.29 (May 23, 2005)
ACPI: PCI Interrupt Link [LP00] enabled at IRQ 5
PCI: setting IRQ 5 as level-triggered
ACPI: PCI Interrupt 0000:06:00.0[A] -> Link [LP00] -> GSI 5 (level, low) -> IRQ 5
eth0: Tigon3 [partno(BCM95721) rev 4101 PHY(5750)] (PCIX:100MHz:32-bit) 10/100/1000BaseT Ethernet 00:11:25:3f:6f:10
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[76180000]
ACPI: PCI Interrupt 0000:07:00.0[A] -> Link [LP00] -> GSI 5 (level, low) -> IRQ 5
eth1: Tigon3 [partno(BCM95721) rev 4101 PHY(5750)] (PCIX:100MHz:32-bit) 10/100/1000BaseT Ethernet 00:11:25:3f:6f:11
eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[1]
eth1: dma_rwctrl[76180000]
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH5: IDE controller at PCI slot 0000:00:1f.1
ACPI: PCI Interrupt 0000:00:1f.1[A]: no GSI - using IRQ 0
ICH5: chipset revision 2
ICH5: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0x0480-0x0487, BIOS settings: hda:DMA, hdb:DMA
hda: BTC CD-ROM F523E, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide-disk: probe of 0.0 failed with error 1
hda: ATAPI 48X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
PCI: Enabling device 0000:03:07.0 (0156 -> 0157)
ACPI: PCI Interrupt Link [LP02] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:03:07.0[A] -> Link [LP02] -> GSI 10 (level, low) -> IRQ 10
PCI: Enabling device 0000:03:07.1 (0156 -> 0157)
ACPI: PCI Interrupt Link [LP03] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI Interrupt 0000:03:07.1[B] -> Link [LP03] -> GSI 11 (level, low) -> IRQ 11
scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
(scsi1:A:2): 320.000MB/s transfers (160.000MHz DT|IU|RTI, 16bit)
(scsi1:A:3): 320.000MB/s transfers (160.000MHz DT|IU|RTI, 16bit)
(scsi1:A:4): 320.000MB/s transfers (160.000MHz DT|IU|RTI, 16bit)
(scsi1:A:5): 320.000MB/s transfers (160.000MHz DT|IU|RTI, 16bit)
Vendor: IBM-ESXS Model: VPR073C3-ETS10FN Rev: S330
Type: Direct-Access ANSI SCSI revision: 04
scsi1:A:2:0: Tagged Queuing enabled. Depth 32
Vendor: IBM-ESXS Model: VPR073C3-ETS10FN Rev: S330
Type: Direct-Access ANSI SCSI revision: 04
scsi1:A:3:0: Tagged Queuing enabled. Depth 32
Vendor: IBM-ESXS Model: VPR073C3-ETS10FN Rev: S330
Type: Direct-Access ANSI SCSI revision: 04
scsi1:A:4:0: Tagged Queuing enabled. Depth 32
Vendor: IBM-ESXS Model: VPR073C3-ETS10FN Rev: S330
Type: Direct-Access ANSI SCSI revision: 04
scsi1:A:5:0: Tagged Queuing enabled. Depth 32
Vendor: IBM Model: 02R0962a S320 1 Rev: 1
Type: Processor ANSI SCSI revision: 02
SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sda: drive cache: write through
SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sda: drive cache: write through
sda: sda1 sda2 sda3 sda4 < sda5 >
Attached scsi disk sda at scsi1, channel 0, id 2, lun 0
SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdb: drive cache: write through
SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdb: drive cache: write through
sdb:
Attached scsi disk sdb at scsi1, channel 0, id 3, lun 0
SCSI device sdc: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdc: drive cache: write through
SCSI device sdc: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdc: drive cache: write through
sdc:
Attached scsi disk sdc at scsi1, channel 0, id 4, lun 0
SCSI device sdd: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdd: drive cache: write through
SCSI device sdd: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdd: drive cache: write through
sdd:
Attached scsi disk sdd at scsi1, channel 0, id 5, lun 0
Attached scsi generic sg0 at scsi1, channel 0, id 2, lun 0, type 0
Attached scsi generic sg1 at scsi1, channel 0, id 3, lun 0, type 0
Attached scsi generic sg2 at scsi1, channel 0, id 4, lun 0, type 0
Attached scsi generic sg3 at scsi1, channel 0, id 5, lun 0, type 0
Attached scsi generic sg4 at scsi1, channel 0, id 8, lun 0, type 3
ieee1394: raw1394: /dev/raw1394 device initialized
usbmon: debugs is not available
ACPI: PCI Interrupt Link [LP07] enabled at IRQ 5
ACPI: PCI Interrupt 0000:00:1d.7[D] -> Link [LP07] -> GSI 5 (level, low) -> IRQ 5
ehci_hcd 0000:00:1d.7: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller
ehci_hcd 0000:00:1d.7: debug port 1
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: irq 5, io mem 0xf0000000
ehci_hcd 0000:00:1d.7: USB 2.0 initialized, EHCI 1.00, driver 10 Dec 2004
hub 1-0:1.0: USB hub found
irq 11: nobody cared (try booting with the "irqpoll" option.
[<c103a3ef>] __report_bad_irq+0x2a/0x8d
[<c1039c28>] handle_IRQ_event+0x39/0x6d
[<c103a4eb>] note_interrupt+0x7f/0xe4
[<c1039dc5>] __do_IRQ+0x169/0x194
[<c1004cc6>] do_IRQ+0x1b/0x28
[<c1003256>] common_interrupt+0x1a/0x20
[<c101d61b>] __do_softirq+0x2b/0x88
[<c101d69e>] do_softirq+0x26/0x2a
[<c101d759>] irq_exit+0x33/0x35
[<c1004ccb>] do_IRQ+0x20/0x28
[<c1003256>] common_interrupt+0x1a/0x20
[<c1018916>] release_console_sem+0x43/0xf6
[<c101875f>] vprintk+0x16c/0x277
[<c1074dd8>] d_lookup+0x23/0x46
[<c10748ac>] d_alloc+0x136/0x1a6
[<c10185ef>] printk+0x17/0x1b
[<c120f33f>] hub_probe+0xa0/0x168
[<c1096dd1>] sysfs_new_dirent+0x1f/0x64
[<c120d37f>] usb_probe_interface+0x59/0x71
[<c116c9b2>] driver_probe_device+0x2f/0xa7
[<c116ca2a>] __device_attach+0x0/0x5
[<c116c259>] bus_for_each_drv+0x58/0x78
[<c116ca8f>] device_attach+0x60/0x64
[<c116ca2a>] __device_attach+0x0/0x5
[<c116c383>] bus_add_device+0x29/0xa6
[<c116f9c7>] device_pm_add+0x56/0x9c
[<c116b7c3>] device_add+0xc5/0x14a
[<c121557e>] usb_set_configuration+0x346/0x528
[<c120fa47>] usb_new_device+0xab/0x1be
[<c120007b>] ohci_iso_recv_set_channel_mask+0x3/0xc6
[<c12124e1>] register_root_hub+0xaf/0x162
[<c12133bf>] usb_add_hcd+0x172/0x396
[<c1217bce>] usb_hcd_pci_probe+0x26e/0x375
[<c10284e7>] __call_usermodehelper+0x0/0x61
[<c110cbe4>] pci_device_probe_static+0x40/0x54
[<c110cc27>] __pci_device_probe+0x2f/0x42
[<c110cc63>] pci_device_probe+0x29/0x47
[<c116c9b2>] driver_probe_device+0x2f/0xa7
[<c116ca93>] __driver_attach+0x0/0x43
[<c116cad4>] __driver_attach+0x41/0x43
[<c116c1c2>] bus_for_each_dev+0x5a/0x7a
[<c116cafc>] driver_attach+0x26/0x2a
[<c116ca93>] __driver_attach+0x0/0x43
[<c116c5cb>] bus_add_driver+0x6b/0xa5
[<c110ce97>] pci_register_driver+0x5e/0x7c
[<c14246f5>] init+0x1d/0x25
[<c140c7ed>] do_initcalls+0x54/0xb6
[<c100029c>] init+0x0/0x10c
[<c100029c>] init+0x0/0x10c
[<c10002c6>] init+0x2a/0x10c
[<c1001348>] kernel_thread_helper+0x0/0xb
[<c100134d>] kernel_thread_helper+0x5/0xb
handlers:
[<c11e3421>] (ahd_linux_isr+0x0/0x284)
Disabling IRQ #11
hub 1-0:1.0: 4 ports detected
USB Universal Host Controller Interface driver v2.3
ACPI: PCI Interrupt 0000:00:1d.0[A] -> Link [LP00] -> GSI 5 (level, low) -> IRQ 5
uhci_hcd 0000:00:1d.0: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:1d.0: irq 5, io base 0x00002200
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.1[B] -> Link [LP03] -> GSI 11 (level, low) -> IRQ 11
uhci_hcd 0000:00:1d.1: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1d.1: irq 11, io base 0x00002600
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
usb 3-2: new full speed USB device using uhci_hcd and address 2
usbcore: registered new driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
Initializing USB Mass Storage driver...
usbcore: registered new driver usb-storage
USB Mass Storage support registered.
input: USB HID v1.10 Keyboard [IBM PPC I/F] on usb-0000:00:1d.1-2
input: USB HID v1.10 Mouse [IBM PPC I/F] on usb-0000:00:1d.1-2
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.01:USB HID core driver
mice: PS/2 mouse device common for all mice
Advanced Linux Sound Architecture Driver Version 1.0.9rc3 (Thu Mar 24 10:33:39 2005 UTC).
ALSA device list:
No soundcards found.
oprofile: using timer interrupt.
NET: Registered protocol family 2
input: AT Translated Set 2 keyboard on isa0060/serio0
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP established hash table entries: 4096 (order: 3, 32768 bytes)
TCP bind hash table entries: 4096 (order: 2, 16384 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
ip_conntrack version 2.1 (640 buckets, 5120 max) - 212 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
input: PS/2 Generic Mouse on isa0060/serio1
ipt_recent v0.3.2: Stephen Frost <sfrost@snowman.net>. http://snowman.net/projects/ipt_recent/
arp_tables: (C) 2002 David S. Miller
NET: Registered protocol family 1
NET: Registered protocol family 17
ACPI wakeup devices:
PCI0
ACPI: (supports S0 S4 S5)
Freeing unused kernel memory: 196k freed
Red Hat nash version 4.1.18 starting
Mounted /proc filesystem
Mounting sysfs
Creating /dev
Starting udev
Loading scsi_modscsi_mod: version magic '2.6.9-5.EL 686 REGPARM 4KSTACKS gcc-3.4' should be '2.6.12-rc5-mm1-16M preempt PENTIUM4 gcc-3.4'
.ko module
sd_mod: version magic '2.6.9-5.EL 686 REGPARM 4KSTACKS gcc-3.4' should be '2.6.12-rc5-mm1-16M preempt PENTIUM4 gcc-3.4'
insmod: error inaic79xx: version magic '2.6.9-5.EL 686 REGPARM 4KSTACKS gcc-3.4' should be '2.6.12-rc5-mm1-16M preempt PENTIUM4 gcc-3.4'
serting '/lib/scjbd: version magic '2.6.9-5.EL 686 REGPARM 4KSTACKS gcc-3.4' should be '2.6.12-rc5-mm1-16M preempt PENTIUM4 gcc-3.4'
si_mod.ko': -1 Iext3: version magic '2.6.9-5.EL 686 REGPARM 4KSTACKS gcc-3.4' should be '2.6.12-rc5-mm1-16M preempt PENTIUM4 gcc-3.4'
nvalid module format
ERROR: /bin/insmod exited abnormally!
Loading sd_mod.ko module
insmod: error inserting '/lib/sd_mod.ko': -1 Invalid module format
ERROR: /bin/insmod exited abnormally!
Loading aic79xx.ko module
insmod: error inserting '/lib/aic79xx.ko': -1 Invalid module format
ERROR: /bin/insmod exited abnormally!
Loading jbd.ko module
insmod: error inserting '/lib/jbd.ko': -1 Invalid module format
ERROR: /bin/insmod exited abnormally!
Loading ext3.ko module
insmod: error inserting '/lib/ext3.ko': -1 Invalid module format
ERROR: /bin/insmod exited abnormally!
Creating root device
Mounting root filesystem
scsi1:0:2:0: Attempting to abort cmd c4dbab00: 0x28 0x0 0x2 0x8f 0xf9 0x76 0x0 0x0 0x2 0x0
scsi1:0:2:0: Command already completed
scsi1:0:2:0: Attempting to abort cmd c4dbab00: 0x0 0x0 0x0 0x0 0x0 0x0
scsi1:0:2:0: Command already completed
Recovery code sleeping
Recovery code awake
Timer Expired
scsi1: Device reset returning 0x2003
Recovery SCB completes
scsi1:0:2:0: Attempting to abort cmd c4dbab00: 0x0 0x0 0x0 0x0 0x0 0x0
(scsi1:A:2:0): Task Management Func 0x0 Complete
Kernel panic - not syncing: Unexpected TaskMgmt Func
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-03 11:25 [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel Vivek Goyal
@ 2005-06-03 11:54 ` Richard B. Johnson
2005-06-03 15:24 ` Alan Stern
2005-06-03 18:21 ` Greg KH
2 siblings, 0 replies; 16+ messages in thread
From: Richard B. Johnson @ 2005-06-03 11:54 UTC (permalink / raw)
To: Vivek Goyal
Cc: linux kernel mailing list, greg, Fastboot mailing list,
Morton Andrew Morton, Alan Stern, Eric W. Biederman
Yes. We need something like that since one must now 'enable' the
device to have its final IRQ routing established. This has created
dire consequences for existin drivers.
However, I have the following comments about the patch.
> + if (!(pci_command & PCI_COMMAND_INTX_DISABLE)) {
|
|
|___ This is never needed. Just read the register,
do your thing, then write it back.
On Fri, 3 Jun 2005, Vivek Goyal wrote:
> Hi,
>
> In kdump, sometimes, general driver initialization issues seems to be cropping
> in second kernel due to devices not being shutdown during crash and these
> devices are sending interrupts while second kernel is booting and drivers are
> not expecting any interrupts yet.
>
> In some cases, we are observing a storm of interrupts while second kernel
> is booting and kernel disables that irq line. May be the case of stuck irq
> line because it is shared level triggered irq and there is no driver
> loaded for the device.
>
> So, we need something generic which disables interrupt generation from device
> until a driver has been registered for that device and driver is ready to
> receive the interrupts. PCI specifications (ver 2.3 onwards), have introduced
> interrupt disable bit in command register to disable interrupt generation
> from the device. This can become handy here. In capture kernel, traverse all
> the PCI devices, disable interrupt generation. Enable the interrupt generation
> back once the driver for that device registers. May be after the probe handler
> has run. In probe handler, driver can reset the device or register for irq so
> that it can handle any interrupt from the device after that.
>
> Greg mentioned that there are reasons that we can not disable all pci
> interrupts. Meanwhile I am going through archives to find more about it.
>
> In previous conversations, Alan Stern had raised the issue of console also
> not working if interrupts are disabled on all the devices. I am not sure
> but this should be working at least for serial consoles and vga text consoles.
> May be sufficient to capture the dump.
>
> Attached is a hack patch which is by no means complete. I have got one machine
> which has got some PCI 2.3 compliant hardware. After applying this hack this
> problem does not occur atleast on this machine. Attached is the serial console
> log which shows one kind of problem due to unwanted interrupts.
>
> Bugme 4631 and 4573 are two more instances of driver initialization failure
> in second kernel.
>
>
>
>
>
>
> ---
> o In kdump, devices are not shutdown/reset after a crash and this leads to
> various driver initialization failures while second kernel is booting.
> Most of them seem to be happening due to the fact that system/driver
> is receiving the interrupts from device when it is not prepared to do so.
>
> o This patch tries to solve the problem by disabling the interrupts at
> PCI level for all the devices. These interrupts are enabled back once the
> driver for that device registers. Currenty this enabling and disabling
> is done only in dump capture kernel.
>
> o This is for devices compliant to PCI specification v2.3 or higher.
>
>
>
> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
> ---
>
> linux-2.6.12-rc5-mm1-16M-root/drivers/pci/pci-driver.c | 3 +
> linux-2.6.12-rc5-mm1-16M-root/drivers/pci/pci.c | 4 ++
> linux-2.6.12-rc5-mm1-16M-root/include/linux/pci.h | 33 +++++++++++++++++
> 3 files changed, 39 insertions(+), 1 deletion(-)
>
> diff -puN drivers/pci/pci.c~kdump-pci-interrupt-disable drivers/pci/pci.c
> --- linux-2.6.12-rc5-mm1-16M/drivers/pci/pci.c~kdump-pci-interrupt-disable 2005-06-03 14:35:26.000000000 +0530
> +++ linux-2.6.12-rc5-mm1-16M-root/drivers/pci/pci.c 2005-06-03 15:59:19.000000000 +0530
> @@ -823,6 +823,10 @@ static int __devinit pci_init(void)
>
> while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
> pci_fixup_device(pci_fixup_final, dev);
> +
> + /* A hack to disable interrupts from all PCI devices in
> + * capture kernel. */
> + pci_disable_device_intx(dev);
> }
> return 0;
> }
> diff -puN drivers/pci/pci-driver.c~kdump-pci-interrupt-disable drivers/pci/pci-driver.c
> --- linux-2.6.12-rc5-mm1-16M/drivers/pci/pci-driver.c~kdump-pci-interrupt-disable 2005-06-03 14:35:26.000000000 +0530
> +++ linux-2.6.12-rc5-mm1-16M-root/drivers/pci/pci-driver.c 2005-06-03 14:35:26.000000000 +0530
> @@ -248,7 +248,8 @@ static int pci_device_probe(struct devic
> error = __pci_device_probe(drv, pci_dev);
> if (error)
> pci_dev_put(pci_dev);
> -
> + else
> + pci_enable_device_intx(pci_dev);
> return error;
> }
>
> diff -puN include/linux/pci.h~kdump-pci-interrupt-disable include/linux/pci.h
> --- linux-2.6.12-rc5-mm1-16M/include/linux/pci.h~kdump-pci-interrupt-disable 2005-06-03 14:35:26.000000000 +0530
> +++ linux-2.6.12-rc5-mm1-16M-root/include/linux/pci.h 2005-06-03 16:06:32.000000000 +0530
> @@ -901,6 +901,39 @@ extern void pci_disable_msix(struct pci_
> extern void msi_remove_pci_irq_vectors(struct pci_dev *dev);
> #endif
>
> +#ifdef CONFIG_CRASH_DUMP
> +static inline int pci_enable_device_intx(struct pci_dev *dev)
> +{
> + u16 pci_command;
> +
> + /* Enable Interrupt generation if not already enabled */
> + pci_read_config_word(dev, PCI_COMMAND, &pci_command);
> + if (pci_command & PCI_COMMAND_INTX_DISABLE) {
> + pci_command &= ~PCI_COMMAND_INTX_DISABLE;
> + pci_write_config_word(dev, PCI_COMMAND, pci_command);
> + }
> + return 0;
> +}
> +
> +static inline int pci_disable_device_intx(struct pci_dev *dev)
> +{
> + u16 pci_command;
> +
> + /* Disable Interrupt generation if not already disabled */
> + pci_read_config_word(dev, PCI_COMMAND, &pci_command);
> + if (!(pci_command & PCI_COMMAND_INTX_DISABLE)) {
> + pci_command |= PCI_COMMAND_INTX_DISABLE;
> + pci_write_config_word(dev, PCI_COMMAND, pci_command);
> + }
> + return 0;
> +}
> +#else
> +static inline int pci_enable_device_intx(struct pci_dev *dev)
> +{ return 0; }
> +static inline int pci_disable_device_intx(struct pci_dev *dev)
> +{ return 0; }
> +#endif /* CONFIG_CRASH_DUMP */
> +
> #endif /* CONFIG_PCI */
>
> /* Include architecture-dependent settings and functions */
> _
>
Cheers,
Dick Johnson
Penguin : Linux version 2.6.11.9 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-03 11:25 [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel Vivek Goyal
2005-06-03 11:54 ` Richard B. Johnson
@ 2005-06-03 15:24 ` Alan Stern
2005-06-03 18:26 ` Eric W. Biederman
2005-06-03 18:21 ` Greg KH
2 siblings, 1 reply; 16+ messages in thread
From: Alan Stern @ 2005-06-03 15:24 UTC (permalink / raw)
To: Vivek Goyal
Cc: linux kernel mailing list, greg, Fastboot mailing list,
Morton Andrew Morton, Eric W. Biederman
On Fri, 3 Jun 2005, Vivek Goyal wrote:
> In previous conversations, Alan Stern had raised the issue of console also
> not working if interrupts are disabled on all the devices. I am not sure
> but this should be working at least for serial consoles and vga text consoles.
> May be sufficient to capture the dump.
This isn't an issue for x86. It affects other architectures, in which the
system console is managed during the early stages of booting by the
platform firmware. I suppose serial consoles would always work.
Alan Stern
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-03 11:25 [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel Vivek Goyal
2005-06-03 11:54 ` Richard B. Johnson
2005-06-03 15:24 ` Alan Stern
@ 2005-06-03 18:21 ` Greg KH
2005-06-03 18:36 ` Eric W. Biederman
2 siblings, 1 reply; 16+ messages in thread
From: Greg KH @ 2005-06-03 18:21 UTC (permalink / raw)
To: Vivek Goyal
Cc: linux kernel mailing list, Fastboot mailing list,
Morton Andrew Morton, Alan Stern, Eric W. Biederman
On Fri, Jun 03, 2005 at 04:55:24PM +0530, Vivek Goyal wrote:
> Hi,
>
> In kdump, sometimes, general driver initialization issues seems to be cropping
> in second kernel due to devices not being shutdown during crash and these
> devices are sending interrupts while second kernel is booting and drivers are
> not expecting any interrupts yet.
What are the errors you are seeing?
How would the drivers be able to be getting interrupts delivered to them
if they haven't registered the irq handler yet?
> In some cases, we are observing a storm of interrupts while second kernel
> is booting and kernel disables that irq line. May be the case of stuck irq
> line because it is shared level triggered irq and there is no driver
> loaded for the device.
>
> So, we need something generic which disables interrupt generation from device
> until a driver has been registered for that device and driver is ready to
> receive the interrupts. PCI specifications (ver 2.3 onwards), have introduced
> interrupt disable bit in command register to disable interrupt generation
> from the device. This can become handy here. In capture kernel, traverse all
> the PCI devices, disable interrupt generation. Enable the interrupt generation
> back once the driver for that device registers. May be after the probe handler
> has run. In probe handler, driver can reset the device or register for irq so
> that it can handle any interrupt from the device after that.
>
> Greg mentioned that there are reasons that we can not disable all pci
> interrupts. Meanwhile I am going through archives to find more about it.
You recalled the conversation in the next paragraph:
> In previous conversations, Alan Stern had raised the issue of console also
> not working if interrupts are disabled on all the devices. I am not sure
> but this should be working at least for serial consoles and vga text consoles.
> May be sufficient to capture the dump.
That's the main objection a lot of people had to this kind of patch,
last time it was discussed.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-03 15:24 ` Alan Stern
@ 2005-06-03 18:26 ` Eric W. Biederman
0 siblings, 0 replies; 16+ messages in thread
From: Eric W. Biederman @ 2005-06-03 18:26 UTC (permalink / raw)
To: Alan Stern
Cc: Vivek Goyal, linux kernel mailing list, greg,
Fastboot mailing list, Morton Andrew Morton
Alan Stern <stern@rowland.harvard.edu> writes:
> On Fri, 3 Jun 2005, Vivek Goyal wrote:
>
> > In previous conversations, Alan Stern had raised the issue of console also
> > not working if interrupts are disabled on all the devices. I am not sure
> > but this should be working at least for serial consoles and vga text consoles.
>
> > May be sufficient to capture the dump.
>
> This isn't an issue for x86. It affects other architectures, in which the
> system console is managed during the early stages of booting by the
> platform firmware. I suppose serial consoles would always work.
In the plain kexec case that should be doable. I don't think
I have heard of a kdump case where we can work with the platform
firmware.
Eric
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-03 18:21 ` Greg KH
@ 2005-06-03 18:36 ` Eric W. Biederman
2005-06-04 13:18 ` Denis Vlasenko
0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2005-06-03 18:36 UTC (permalink / raw)
To: Greg KH
Cc: Vivek Goyal, linux kernel mailing list, Fastboot mailing list,
Morton Andrew Morton, Alan Stern
Greg KH <greg@kroah.com> writes:
> On Fri, Jun 03, 2005 at 04:55:24PM +0530, Vivek Goyal wrote:
> > Hi,
> >
> > In kdump, sometimes, general driver initialization issues seems to be cropping
>
> > in second kernel due to devices not being shutdown during crash and these
> > devices are sending interrupts while second kernel is booting and drivers are
>
> > not expecting any interrupts yet.
>
> What are the errors you are seeing?
> How would the drivers be able to be getting interrupts delivered to them
> if they haven't registered the irq handler yet?
As I recall the drivers were not getting the interrupts but the interrupts
were happening. To stop being spammed the kernel disables the irq line,
at the interrupt controller. Then when the driver registered the
interrupt it would never receive the interrupt.
Eric
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Fastboot] Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
@ 2005-06-04 10:31 Maneesh Soni
0 siblings, 0 replies; 16+ messages in thread
From: Maneesh Soni @ 2005-06-04 10:31 UTC (permalink / raw)
To: Greg KH
Cc: Vivek Goyal, Morton Andrew Morton, Alan Stern,
Fastboot mailing list, linux kernel mailing list,
Eric W. Biederman
Quoting Greg KH <greg@kroah.com>:
> On Fri, Jun 03, 2005 at 04:55:24PM +0530, Vivek Goyal wrote:
> > Hi,
> >
> > In kdump, sometimes, general driver initialization issues seems to be
> cropping
> > in second kernel due to devices not being shutdown during crash and these
>
> > devices are sending interrupts while second kernel is booting and drivers
> are
> > not expecting any interrupts yet.
>
> What are the errors you are seeing?
> How would the drivers be able to be getting interrupts delivered to them
> if they haven't registered the irq handler yet?
Probably the boot log with error messages could have been inlined instead
of putting as attachement. This is from attachement in Vievk's post
irq 11: nobody cared (try booting with the "irqpoll" option.
[<c103a3ef>] __report_bad_irq+0x2a/0x8d
[<c1039c28>] handle_IRQ_event+0x39/0x6d
[<c103a4eb>] note_interrupt+0x7f/0xe4
[<c1039dc5>] __do_IRQ+0x169/0x194
[<c1004cc6>] do_IRQ+0x1b/0x28
[<c1003256>] common_interrupt+0x1a/0x20
[<c101d61b>] __do_softirq+0x2b/0x88
[<c101d69e>] do_softirq+0x26/0x2a
[<c101d759>] irq_exit+0x33/0x35
[<c1004ccb>] do_IRQ+0x20/0x28
[<c1003256>] common_interrupt+0x1a/0x20
[<c1018916>] release_console_sem+0x43/0xf6
[<c101875f>] vprintk+0x16c/0x277
[<c1074dd8>] d_lookup+0x23/0x46
[<c10748ac>] d_alloc+0x136/0x1a6
[<c10185ef>] printk+0x17/0x1b
[<c120f33f>] hub_probe+0xa0/0x168
[<c1096dd1>] sysfs_new_dirent+0x1f/0x64
[<c120d37f>] usb_probe_interface+0x59/0x71
[<c116c9b2>] driver_probe_device+0x2f/0xa7
[<c116ca2a>] __device_attach+0x0/0x5
[<c116c259>] bus_for_each_drv+0x58/0x78
[<c116ca8f>] device_attach+0x60/0x64
[<c116ca2a>] __device_attach+0x0/0x5
[<c116c383>] bus_add_device+0x29/0xa6
[<c116f9c7>] device_pm_add+0x56/0x9c
[<c116b7c3>] device_add+0xc5/0x14a
[<c121557e>] usb_set_configuration+0x346/0x528
[<c120fa47>] usb_new_device+0xab/0x1be
[<c120007b>] ohci_iso_recv_set_channel_mask+0x3/0xc6
[<c12124e1>] register_root_hub+0xaf/0x162
[<c12133bf>] usb_add_hcd+0x172/0x396
[<c1217bce>] usb_hcd_pci_probe+0x26e/0x375
[<c10284e7>] __call_usermodehelper+0x0/0x61
[<c110cbe4>] pci_device_probe_static+0x40/0x54
[<c110cc27>] __pci_device_probe+0x2f/0x42
[<c110cc63>] pci_device_probe+0x29/0x47
[<c116c9b2>] driver_probe_device+0x2f/0xa7
[<c116ca93>] __driver_attach+0x0/0x43
[<c116cad4>] __driver_attach+0x41/0x43
[<c116c1c2>] bus_for_each_dev+0x5a/0x7a
[<c116cafc>] driver_attach+0x26/0x2a
[<c116ca93>] __driver_attach+0x0/0x43
[<c116c5cb>] bus_add_driver+0x6b/0xa5
[<c110ce97>] pci_register_driver+0x5e/0x7c
[<c14246f5>] init+0x1d/0x25
[<c140c7ed>] do_initcalls+0x54/0xb6
[<c100029c>] init+0x0/0x10c
[<c100029c>] init+0x0/0x10c
[<c10002c6>] init+0x2a/0x10c
[<c1001348>] kernel_thread_helper+0x0/0xb
[<c100134d>] kernel_thread_helper+0x5/0xb
handlers:
[<c11e3421>] (ahd_linux_isr+0x0/0x284)
Disabling IRQ #11
Thanks
Maneesh
--
Maneesh Soni
IBM Linux Technology Center
IBM India Software Labs,
Bangalore, India
Ph. 91-80-25044990
email: maneesh@in.ibm.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-03 18:36 ` Eric W. Biederman
@ 2005-06-04 13:18 ` Denis Vlasenko
2005-06-04 13:43 ` [Fastboot] " Dipankar Sarma
0 siblings, 1 reply; 16+ messages in thread
From: Denis Vlasenko @ 2005-06-04 13:18 UTC (permalink / raw)
To: Eric W. Biederman, Greg KH
Cc: Vivek Goyal, linux kernel mailing list, Fastboot mailing list,
Andrew Morton, Alan Stern
On Friday 03 June 2005 21:36, Eric W. Biederman wrote:
> Greg KH <greg@kroah.com> writes:
>
> > On Fri, Jun 03, 2005 at 04:55:24PM +0530, Vivek Goyal wrote:
> > > Hi,
> > >
> > > In kdump, sometimes, general driver initialization issues seems to be cropping
> >
> > > in second kernel due to devices not being shutdown during crash and these
> > > devices are sending interrupts while second kernel is booting and drivers are
> >
> > > not expecting any interrupts yet.
> >
> > What are the errors you are seeing?
> > How would the drivers be able to be getting interrupts delivered to them
> > if they haven't registered the irq handler yet?
>
> As I recall the drivers were not getting the interrupts but the interrupts
> were happening. To stop being spammed the kernel disables the irq line,
> at the interrupt controller. Then when the driver registered the
> interrupt it would never receive the interrupt.
Shouldn't kernel keep all interrupt lines initially disabled
(sans platform-specific magic), and enable each like only when
a device driver requests IRQ? This sounds simpler to do...
--
vda
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Fastboot] Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-04 13:18 ` Denis Vlasenko
@ 2005-06-04 13:43 ` Dipankar Sarma
2005-06-04 14:03 ` Dipankar Sarma
0 siblings, 1 reply; 16+ messages in thread
From: Dipankar Sarma @ 2005-06-04 13:43 UTC (permalink / raw)
To: Denis Vlasenko
Cc: Eric W. Biederman, Greg KH, Andrew Morton, Alan Stern,
Fastboot mailing list, linux kernel mailing list
On Sat, Jun 04, 2005 at 04:18:24PM +0300, Denis Vlasenko wrote:
> On Friday 03 June 2005 21:36, Eric W. Biederman wrote:
> >
> > As I recall the drivers were not getting the interrupts but the interrupts
> > were happening. To stop being spammed the kernel disables the irq line,
> > at the interrupt controller. Then when the driver registered the
> > interrupt it would never receive the interrupt.
>
> Shouldn't kernel keep all interrupt lines initially disabled
> (sans platform-specific magic), and enable each like only when
> a device driver requests IRQ? This sounds simpler to do...
This doesn't help kdump folks. Interrupt pending from a device
from the first boot can flood the system when another driver
sharing the irq gets enabled in the second boot (kdump boot).
The disabling of interrupts need to be done on a per-device
basis for kdump to avoid this problem.
That said, I am not sure what is the issue with the console
drivers. What good is the irq for the console driver if
it hasn't requested for it ? Why should disabling it affect
consoles ? The interrupt will get enabled as soon as the driver
requests for it as per Vivek's patch. Am I missing something here ?
Thanks
Dipankar
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Fastboot] Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-04 13:43 ` [Fastboot] " Dipankar Sarma
@ 2005-06-04 14:03 ` Dipankar Sarma
2005-06-04 21:14 ` Eric W. Biederman
0 siblings, 1 reply; 16+ messages in thread
From: Dipankar Sarma @ 2005-06-04 14:03 UTC (permalink / raw)
To: Denis Vlasenko
Cc: Andrew Morton, Alan Stern, Eric W. Biederman, Greg KH,
Fastboot mailing list, linux kernel mailing list
On Sat, Jun 04, 2005 at 07:13:06PM +0530, Dipankar Sarma wrote:
> That said, I am not sure what is the issue with the console
> drivers. What good is the irq for the console driver if
> it hasn't requested for it ? Why should disabling it affect
> consoles ? The interrupt will get enabled as soon as the driver
> requests for it as per Vivek's patch. Am I missing something here ?
Doh! The answer is in earlier emails - fw controlled pci consoles.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Fastboot] Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-04 14:03 ` Dipankar Sarma
@ 2005-06-04 21:14 ` Eric W. Biederman
0 siblings, 0 replies; 16+ messages in thread
From: Eric W. Biederman @ 2005-06-04 21:14 UTC (permalink / raw)
To: dipankar
Cc: Denis Vlasenko, Andrew Morton, Alan Stern, Greg KH,
Fastboot mailing list, linux kernel mailing list
Dipankar Sarma <dipankar@in.ibm.com> writes:
> On Sat, Jun 04, 2005 at 07:13:06PM +0530, Dipankar Sarma wrote:
> > That said, I am not sure what is the issue with the console
> > drivers. What good is the irq for the console driver if
> > it hasn't requested for it ? Why should disabling it affect
> > consoles ? The interrupt will get enabled as soon as the driver
> > requests for it as per Vivek's patch. Am I missing something here ?
>
> Doh! The answer is in earlier emails - fw controlled pci consoles.
There is still the question do fw controlled pci consoles
intersect with consoles used during kdump. I would be
very surprised if the intersected but they might.
Eric
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Fastboot] Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-07 16:21 ` Grant Grundler
@ 2005-06-07 18:42 ` Eric W. Biederman
2005-06-08 4:02 ` Grant Grundler
2005-06-08 6:38 ` Vivek Goyal
0 siblings, 2 replies; 16+ messages in thread
From: Eric W. Biederman @ 2005-06-07 18:42 UTC (permalink / raw)
To: Grant Grundler
Cc: Morton Andrew Morton, awilliam, greg, Fastboot mailing list,
linux kernel mailing list, Bodo Eggert, stern, bjorn.helgaas
Grant Grundler <grundler@parisc-linux.org> writes:
> On Tue, Jun 07, 2005 at 03:59:18AM -0600, Eric W. Biederman wrote:
> > > *lots* of PCI devices predate PCI2.3. Possibly even the majority.
> >
> > In general generic hardware bits for disabling DMA, disabling interrupts
> > and the like are all advisory. With the current architecture things
> > will work properly even if you don't manage to disable DMA (assuming
> > you don't reassign IOMMU entries at least).
>
> ISTR, pSeries (IBM), some alpha, some sparc64, and parisc (64-bit) require
> use of the IOMMU for *any* DMA. ie IOMMU entries need to be programmed.
> Probably want to make a choice to ignore those arches for now
> or sort out how to deal with an IOMMU.
The howto deal with an IOMMU has been sorted out but so far no one
has actually done it. What has been discussed previously is simply
reserving a handful of IOMMU entries, and then only using those
in the crash recover kernel. This is essentially what we do with DMA
on architectures that don't have an IOMMU and it seems quite safe
enough there.
> > Shared interrupts are an interesting case. The simplest solution I can
> > think of for a crash dump capture kernel is to periodically poll
> > the hardware, as if all interrupts are shared. At that level
> > I think we could get away with ignoring all hardware interrupt sources.
>
> Yes, that's perfectly ok. We are no longer in a multitasking env.
Well we are at least capable of multitasking but that is no longer the
primary focus. Having polling as at least an option should make
debugging easier. Last I looked Andrews kernel hand an irqpoll option
to do something very like this.
> > Does anyone know of a anything that would break by always polling
> > the hardware? I guess there could be a problem with drivers
> > that don't understand shared interrupts, are there enough of those
> > to be an issue.
>
> PCI requires drivers support Shared IRQs.
> A few oddballs might be broken but I expect networking/mass storage
> drivers get this right.
Agreed. Which means any drivers we really need for dumping the system
should be fine. If the drivers don't work in that mode at least the
concept of fixing it won't be controversial.
Eric
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Fastboot] Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-07 18:42 ` [Fastboot] " Eric W. Biederman
@ 2005-06-08 4:02 ` Grant Grundler
2005-06-08 4:38 ` Eric W. Biederman
2005-06-08 6:38 ` Vivek Goyal
1 sibling, 1 reply; 16+ messages in thread
From: Grant Grundler @ 2005-06-08 4:02 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Grant Grundler, Morton Andrew Morton, awilliam, greg,
Fastboot mailing list, linux kernel mailing list, Bodo Eggert,
stern, bjorn.helgaas
On Tue, Jun 07, 2005 at 12:42:42PM -0600, Eric W. Biederman wrote:
> The howto deal with an IOMMU has been sorted out but so far no one
> has actually done it. What has been discussed previously is simply
> reserving a handful of IOMMU entries,
How? with dma_alloc_consistent() or some special hook?
I'm just curious.
...
> and then only using those
> in the crash recover kernel. This is essentially what we do with DMA
> on architectures that don't have an IOMMU and it seems quite safe
> enough there.
Yeah, in general that should be feasible.
One might be able to trivially allocate a small, seperate IO PDIR
just for KDUMP and switch to that. Key thing is it be physically
contiguous in memory. Very little code is involved with IO Pdir
setup for both parisc and IA64. I can't speak for Alpha/sparc/ppc/et al.
...
> Well we are at least capable of multitasking but that is no longer the
> primary focus. Having polling as at least an option should make
> debugging easier. Last I looked Andrews kernel hand an irqpoll option
> to do something very like this.
You could run the itimer but I don't see why you should.
Kdump is essentially an embedded linux kernel. It really
doesn't need to be premptive multitasking either.
Anyway, sounds like you guys are on the right track.
thanks,
grant
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Fastboot] Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-08 4:02 ` Grant Grundler
@ 2005-06-08 4:38 ` Eric W. Biederman
0 siblings, 0 replies; 16+ messages in thread
From: Eric W. Biederman @ 2005-06-08 4:38 UTC (permalink / raw)
To: Grant Grundler
Cc: Morton Andrew Morton, Bodo Eggert, stern, awilliam, greg,
Fastboot mailing list, linux kernel mailing list, bjorn.helgaas
Grant Grundler <grundler@parisc-linux.org> writes:
> On Tue, Jun 07, 2005 at 12:42:42PM -0600, Eric W. Biederman wrote:
> > The howto deal with an IOMMU has been sorted out but so far no one
> > has actually done it. What has been discussed previously is simply
> > reserving a handful of IOMMU entries,
>
> How? with dma_alloc_consistent() or some special hook?
> I'm just curious.
We didn't get that far but I believe the idea was a special hook.
> ...
> > and then only using those
> > in the crash recover kernel. This is essentially what we do with DMA
> > on architectures that don't have an IOMMU and it seems quite safe
> > enough there.
>
> Yeah, in general that should be feasible.
>
> One might be able to trivially allocate a small, seperate IO PDIR
> just for KDUMP and switch to that. Key thing is it be physically
> contiguous in memory. Very little code is involved with IO Pdir
> setup for both parisc and IA64. I can't speak for Alpha/sparc/ppc/et al.
Cool.
> ...
> > Well we are at least capable of multitasking but that is no longer the
> > primary focus. Having polling as at least an option should make
> > debugging easier. Last I looked Andrews kernel hand an irqpoll option
> > to do something very like this.
>
> You could run the itimer but I don't see why you should.
> Kdump is essentially an embedded linux kernel. It really
> doesn't need to be premptive multitasking either.
It is mostly a matter of minimizing differences from the norm.
> Anyway, sounds like you guys are on the right track.
Thanks. It just takes a while for the simple solutions to
get there.
Eric
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Fastboot] Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-07 18:42 ` [Fastboot] " Eric W. Biederman
2005-06-08 4:02 ` Grant Grundler
@ 2005-06-08 6:38 ` Vivek Goyal
2005-06-08 11:23 ` Vivek Goyal
1 sibling, 1 reply; 16+ messages in thread
From: Vivek Goyal @ 2005-06-08 6:38 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Grant Grundler, Morton Andrew Morton, Bodo Eggert, stern,
awilliam, greg, Fastboot mailing list, linux kernel mailing list,
bjorn.helgaas
On Tue, Jun 07, 2005 at 12:42:42PM -0600, Eric W. Biederman wrote:
> Grant Grundler <grundler@parisc-linux.org> writes:
>
> > On Tue, Jun 07, 2005 at 03:59:18AM -0600, Eric W. Biederman wrote:
> > > > *lots* of PCI devices predate PCI2.3. Possibly even the majority.
> > >
> > > In general generic hardware bits for disabling DMA, disabling interrupts
> > > and the like are all advisory. With the current architecture things
> > > will work properly even if you don't manage to disable DMA (assuming
> > > you don't reassign IOMMU entries at least).
> >
> > ISTR, pSeries (IBM), some alpha, some sparc64, and parisc (64-bit) require
> > use of the IOMMU for *any* DMA. ie IOMMU entries need to be programmed.
> > Probably want to make a choice to ignore those arches for now
> > or sort out how to deal with an IOMMU.
>
> The howto deal with an IOMMU has been sorted out but so far no one
> has actually done it. What has been discussed previously is simply
> reserving a handful of IOMMU entries, and then only using those
> in the crash recover kernel. This is essentially what we do with DMA
> on architectures that don't have an IOMMU and it seems quite safe
> enough there.
>
> > > Shared interrupts are an interesting case. The simplest solution I can
> > > think of for a crash dump capture kernel is to periodically poll
> > > the hardware, as if all interrupts are shared. At that level
> > > I think we could get away with ignoring all hardware interrupt sources.
> >
> > Yes, that's perfectly ok. We are no longer in a multitasking env.
>
> Well we are at least capable of multitasking but that is no longer the
> primary focus. Having polling as at least an option should make
> debugging easier. Last I looked Andrews kernel hand an irqpoll option
> to do something very like this.
>
If I understand this right, the idea is that let all irqs be masked (except
timer one) and invoke all the irq handlers whenever a timer interrupt occurs.
This will automatcally be equivalent to drivers polling their devices for
any interrupt.
As you mentioned that irqpoll option comes close. If enabled, it invokes
all the irq handlers on every timer interrupt (IRQ0). The only difference is
that irqs are not masked (until and unless kernel masks these due to excessive
unhandled interrupts).
I tried booting kdump kernel with irqpoll option. It seems to be going
little bit ahead than previous point of failure (boot without irqpoll) but
panics later. Following is the stack trace.
mptbase: Initiating ioc0 bringup
Unable to handle kernel NULL pointer dereference at virtual address 00000608
printing eip:
c11e1b73
*pde = 00000000
Oops: 0000 [#1]
Modules linked in:
CPU: 0
EIP: 0060:[<c11e1b73>] Not tainted VLI
EFLAGS: 00010006 (2.6.12-rc6-mm1-16M)
EIP is at mptscsih_io_done+0x23/0x350
eax: c1778400 ebx: 00000000 ecx: 00000600 edx: 00000250
esi: 00000000 edi: 0000000e ebp: 00000001 esp: c15efcbc
ds: 007b es: 007b ss: 0068
Process swapper (pid: 1, threadinfo=c15ee000 task=c147fa00)
Stack: c15f02d1 c1116cfd c1116d60 c15df660 00000000 0000006c 00000250 00000000
00000000 0000000e 00000001 c11db9ea c1778400 00000600 00000000 00000000
00000600 c1788aa0 c1410e80 00000009 00000001 c1039e7c 00000009 c1778400
Call Trace:
[<c1116cfd>] acpi_ev_gpe_detect+0x83/0x10f
[<c1116d60>] acpi_ev_gpe_detect+0xe6/0x10f
[<c11db9ea>] mpt_interrupt+0xfa/0x1e0
[<c1039e7c>] misrouted_irq+0xec/0x100
[<c103a007>] note_interrupt+0xb7/0xf0
[<c10399c4>] __do_IRQ+0xe4/0xf0
[<c1004e09>] do_IRQ+0x19/0x30
[<c1003246>] common_interrupt+0x1a/0x20
[<c1018d8f>] release_console_sem+0x3f/0xa0
[<c1018c27>] vprintk+0x177/0x220
[<c126ecbd>] pci_read+0x3d/0x50
[<c1101a0a>] kobject_get+0x1a/0x30
[<c116b676>] get_device+0x16/0x30
[<c110b95a>] pci_dev_get+0x1a/0x30
[<c1018aa7>] printk+0x17/0x20
[<c11dcbeb>] mpt_do_ioc_recovery+0x4b/0x540
[<c11dc4ff>] mpt_attach+0x2ef/0x690
[<c11e6e03>] mptspi_probe+0x23/0x3e0
[<c110b4f2>] pci_device_probe_static+0x52/0x70
[<c110b54c>] __pci_device_probe+0x3c/0x50
[<c110b58f>] pci_device_probe+0x2f/0x50
[<c116cb68>] driver_probe_device+0x38/0xb0
[<c116cc60>] __driver_attach+0x0/0x50
[<c116cca7>] __driver_attach+0x47/0x50
[<c116c229>] bus_for_each_dev+0x69/0x80
[<c116ccd6>] driver_attach+0x26/0x30
[<c116cc60>] __driver_attach+0x0/0x50
[<c116c6d8>] bus_add_driver+0x88/0xc0
[<c110b6d0>] pci_device_shutdown+0x0/0x30
[<c110b857>] pci_register_driver+0x67/0x90
[<c142a890>] mptspi_init+0xa0/0xb0
[<c11e37a0>] mptscsih_ioc_reset+0x0/0x170
[<c141483c>] do_initcalls+0x2c/0xc0
[<c142d5fa>] sock_init+0x2a/0x40
[<c1000290>] init+0x0/0x100
[<c10002b5>] init+0x25/0x100
[<c10013a0>] kernel_thread_helper+0x0/0x10
[<c10013a5>] kernel_thread_helper+0x5/0x10
Code: 00 8d bc 27 00 00 00 00 55 57 56 53 83 ec 1c 8b 44 24 30 8b 4c 24 34 8b 7
<0>Kernel panic - not syncing: Fatal exception in interrupt
Thanks
Vivek
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Fastboot] Re: [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel
2005-06-08 6:38 ` Vivek Goyal
@ 2005-06-08 11:23 ` Vivek Goyal
0 siblings, 0 replies; 16+ messages in thread
From: Vivek Goyal @ 2005-06-08 11:23 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Morton Andrew Morton, Bodo Eggert, stern, Grant Grundler,
awilliam, greg, Fastboot mailing list, linux kernel mailing list,
bjorn.helgaas, vgoyal
> >
> > > > Shared interrupts are an interesting case. The simplest solution I can
> > > > think of for a crash dump capture kernel is to periodically poll
> > > > the hardware, as if all interrupts are shared. At that level
> > > > I think we could get away with ignoring all hardware interrupt sources.
> > >
> > > Yes, that's perfectly ok. We are no longer in a multitasking env.
> >
> > Well we are at least capable of multitasking but that is no longer the
> > primary focus. Having polling as at least an option should make
> > debugging easier. Last I looked Andrews kernel hand an irqpoll option
> > to do something very like this.
> >
>
> If I understand this right, the idea is that let all irqs be masked (except
> timer one) and invoke all the irq handlers whenever a timer interrupt occurs.
> This will automatcally be equivalent to drivers polling their devices for
> any interrupt.
>
> As you mentioned that irqpoll option comes close. If enabled, it invokes
> all the irq handlers on every timer interrupt (IRQ0). The only difference is
> that irqs are not masked (until and unless kernel masks these due to excessive
> unhandled interrupts).
>
> I tried booting kdump kernel with irqpoll option. It seems to be going
> little bit ahead than previous point of failure (boot without irqpoll) but
> panics later. Following is the stack trace.
>
Second kernel booted fine with MPT_DEBUG_IRQ enabled (with irqpoll option).
There were few warning messages though spitted by the code under MPT_DEBUG_IRQ.
Looks like drivers need to be hardened on case to case basis to initialize
properly even if underlying device is not in a reset state.
Thanks
Vivek
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2005-06-08 11:23 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-03 11:25 [RFC/PATCH] Kdump: Disabling PCI interrupts in capture kernel Vivek Goyal
2005-06-03 11:54 ` Richard B. Johnson
2005-06-03 15:24 ` Alan Stern
2005-06-03 18:26 ` Eric W. Biederman
2005-06-03 18:21 ` Greg KH
2005-06-03 18:36 ` Eric W. Biederman
2005-06-04 13:18 ` Denis Vlasenko
2005-06-04 13:43 ` [Fastboot] " Dipankar Sarma
2005-06-04 14:03 ` Dipankar Sarma
2005-06-04 21:14 ` Eric W. Biederman
-- strict thread matches above, loose matches on Subject: below --
2005-06-04 10:31 Maneesh Soni
2005-06-07 3:07 Vivek Goyal
2005-06-07 5:07 ` Grant Grundler
2005-06-07 9:59 ` Eric W. Biederman
2005-06-07 16:21 ` Grant Grundler
2005-06-07 18:42 ` [Fastboot] " Eric W. Biederman
2005-06-08 4:02 ` Grant Grundler
2005-06-08 4:38 ` Eric W. Biederman
2005-06-08 6:38 ` Vivek Goyal
2005-06-08 11:23 ` Vivek Goyal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox