* 2.6.23-rc2: WARNING at kernel/irq/resend.c:70 check_irq_resend()
@ 2007-08-08 18:09 Indan Zupancic
0 siblings, 0 replies; 16+ messages in thread
From: Indan Zupancic @ 2007-08-08 18:09 UTC (permalink / raw)
To: Thomas Gleixner, lkml, netdev
Hi,
Just added an old network card, RTL-8029(AS), ne2k-pci driver, and tried to
expand the network (failed because I didn't use a cross-over cable).
The code snippet that spat the thing:
/*
* IRQ resend
*
* Is called with interrupts disabled and desc->lock held.
*/
void check_irq_resend(struct irq_desc *desc, unsigned int irq)
{
unsigned int status = desc->status;
/*
* Make sure the interrupt is enabled, before resending it:
*/
desc->chip->enable(irq);
/*
* Temporary hack to figure out more about the problem, which
* is causing the ancient network cards to die.
*/
if (desc->handle_irq != handle_edge_irq) {
WARN_ON_ONCE(1);
return;
}
and relevant dmesg snippet:
[ 446.836399] eth1: RealTek RTL-8029 found at 0x9000, IRQ 16, 00:00:B4:BB:BF:E5.
[ 855.458468] r8169: eth0: link up
[ 857.500667] r8169: eth0: link up
[ 902.294283] r8169: eth0: link up
[ 4126.348427] r8169: eth0: link up
[ 5598.446233] r8169: eth0: link down
[ 5618.904846] r8169: eth0: link up
[ 5623.133090] r8169: eth0: link down
[ 5624.708071] r8169: eth0: link up
[ 6328.267872] WARNING: at /home/indan/src/git/linux-2.6/kernel/irq/resend.c:70 check_irq_resend()
[ 6328.267896] [<b0135d8f>] check_irq_resend+0x4f/0x8c
[ 6328.267912] [<b0135888>] enable_irq+0x84/0xaa
[ 6328.267918] [<c08e37cc>] ne2k_pci_block_output+0x0/0x138 [ne2k_pci]
[ 6328.267931] [<c08e37cc>] ne2k_pci_block_output+0x0/0x138 [ne2k_pci]
[ 6328.267939] [<c08e37cc>] ne2k_pci_block_output+0x0/0x138 [ne2k_pci]
[ 6328.267947] [<c08e0f8a>] ei_start_xmit+0x29f/0x2b9 [8390]
[ 6328.267966] [<b022a53e>] dev_hard_start_xmit+0x19e/0x1fd
[ 6328.267976] [<b023620a>] __qdisc_run+0x6c/0x151
[ 6328.267986] [<b022c459>] dev_queue_xmit+0x12a/0x271
[ 6328.267991] [<b0260ec9>] arp_send+0x4c/0x64
[ 6328.268002] [<b0260524>] arp_xmit+0x4d/0x51
[ 6328.268009] [<b026117c>] arp_process+0x29b/0x50a
[ 6328.268018] [<b0244f0a>] ip_forward+0x22c/0x23a
[ 6328.268026] [<b02434a8>] ip_rcv_finish+0x0/0x282
[ 6328.268032] [<b0243db9>] ip_rcv+0x479/0x4a8
[ 6328.268037] [<b02434a8>] ip_rcv_finish+0x0/0x282
[ 6328.268044] [<b02614db>] arp_rcv+0xf0/0x104
[ 6328.268049] [<b012ae17>] hrtimer_run_queues+0x12/0x1a1
[ 6328.268055] [<b02613eb>] arp_rcv+0x0/0x104
[ 6328.268061] [<b022a07d>] netif_receive_skb+0x2d9/0x317
[ 6328.268068] [<b022baa0>] process_backlog+0x6d/0xd2
[ 6328.268075] [<b022c1c2>] net_rx_action+0x81/0x14d
[ 6328.268082] [<b011dd84>] __do_softirq+0x35/0x75
[ 6328.268088] [<b0106030>] do_softirq+0x3e/0x8d
[ 6328.268098] [<b0136530>] handle_fasteoi_irq+0x0/0xbe
[ 6328.268103] [<b011dd44>] irq_exit+0x25/0x30
[ 6328.268107] [<b010630f>] do_IRQ+0x94/0xad
[ 6328.268114] [<b0104763>] common_interrupt+0x23/0x28
[ 6328.268123] [<b0102a0c>] default_idle+0x27/0x39
[ 6328.268128] [<b0102347>] cpu_idle+0x41/0x55
[ 6328.268133] [<b032f99b>] start_kernel+0x20e/0x213
[ 6328.268140] [<b032f317>] unknown_bootoption+0x0/0x196
[ 6328.268146] =======================
# cat /proc/interrupts
CPU0
0: 9804926 IO-APIC-edge timer
1: 8270 IO-APIC-edge i8042
9: 0 IO-APIC-fasteoi acpi
12: 181281 IO-APIC-edge i8042
16: 40410 IO-APIC-fasteoi sata_sil, eth1
17: 0 IO-APIC-fasteoi ohci_hcd:usb1, NVidia nForce2
18: 499428 IO-APIC-fasteoi ohci_hcd:usb2
19: 2 IO-APIC-fasteoi ehci_hcd:usb3
20: 689588 IO-APIC-fasteoi eth0
NMI: 0
LOC: 9805098
ERR: 0
MIS: 0
So the card is sharing an IRQ with the disk controller.
No idea if the network card died after this warning, for all
practical purposes it didn't work before, so couldn't check.
I can provide more information and run some tests if anyone
wants that, just keep in mind that the card isn't connected to
anything.
Greetings,
Indan
^ permalink raw reply [flat|nested] 16+ messages in thread
* 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
@ 2007-08-09 15:03 John Stoffel
2007-08-09 15:54 ` Jarek Poplawski
0 siblings, 1 reply; 16+ messages in thread
From: John Stoffel @ 2007-08-09 15:03 UTC (permalink / raw)
To: linux-kernel
Cc: jarkao2, shemminger, vignaud, marcin.slusarz, tglx, mingo,
torvalds, akpm, alan, linux-net, netdev
Hi,
I'm opening this ticket as a new subject, even though it looks like it
might be related to the thread "Networking dies after random time".
Sorry for the wide CC list, but since my network hasn't died since I
rebooted into 2.6.23-rc2 (after 30+ days at 2.6.22-rc7), I'm wondering
if the problem is more than networking related.
Honestly, I haven't gone back over the previous thread in detail, so I
might be missing info here.
System details: Dell Precision 610MT, Intel 440GX chipset, Dual PIII
Xeon, 550Mhz, 2gb RAM (upgraded from 768Mb last night), a mix of IDE,
SCSI and SATA disks in the system. My poor PCI bus! Just upgraded to
2.6.23-rc2. Interrupts looks like this:
> cat /proc/interrupts
CPU0 CPU1
0: 280 1 IO-APIC-edge timer
1: 788 0 IO-APIC-edge i8042
6: 1 4 IO-APIC-edge floppy
8: 0 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
11: 82410 1239 IO-APIC-edge Cyclom-Y
12: 279 106 IO-APIC-edge i8042
14: 440901 4266 IO-APIC-edge libata
15: 0 0 IO-APIC-edge libata
16: 2394727 42983 IO-APIC-fasteoi ohci_hcd:usb3, Ensoniq
AudioPCI, mga@pci:0000:01:00.0
17: 2237362 1110 IO-APIC-fasteoi sata_sil,
ehci_hcd:usb1, eth0
18: 126520 31978 IO-APIC-fasteoi aic7xxx, aic7xxx, ide2,
ide3, ohci1394
19: 0 0 IO-APIC-fasteoi ohci_hcd:usb2,
uhci_hcd:usb4
NMI: 0 0
LOC: 40672484 40672246
ERR: 0
MIS: 0
I've only seen the one Warning oops, and backups and other system
processes have been running for the past 12 hours without a problem.
[ 187.747442] Probing IDE interface ide2...
[ 188.011634] hde: WDC WD1200JB-00CRA1, ATA DISK drive
[ 188.623038] WARNING: at kernel/irq/resend.c:70 check_irq_resend()
[ 188.623105] [<c0149e38>] check_irq_resend+0xa8/0xc0
[ 188.623204] [<c01499d3>] enable_irq+0xc3/0xd0
[ 188.623295] [<f8867280>] probe_hwif+0x670/0x7c0 [ide_core]
[ 188.623448] [<f8869f04>] do_ide_setup_pci_device+0x154/0x480
[ide_core]
[ 188.623571] [<f8867d6c>] probe_hwif_init_with_fixup+0xc/0x90
[ide_core]
[ 188.623690] [<f88817d0>] init_setup_hpt302+0x0/0x30 [hpt366]
[ 188.623791] [<f886a39b>] ide_setup_pci_device+0x7b/0xc0 [ide_core]
[ 188.623909] [<f88817d0>] init_setup_hpt302+0x0/0x30 [hpt366]
[ 188.624004] [<f88811ed>] hpt366_init_one+0x8d/0xa0 [hpt366]
[ 188.624095] [<f88817d0>] init_setup_hpt302+0x0/0x30 [hpt366]
[ 188.624187] [<f8881e50>] init_chipset_hpt366+0x0/0x680 [hpt366]
[ 188.624281] [<f8882680>] init_hwif_hpt366+0x0/0x380 [hpt366]
[ 188.624372] [<f8881800>] init_dma_hpt366+0x0/0xe0 [hpt366]
[ 188.624466] [<c0265fc6>] pci_device_probe+0x56/0x80
[ 188.624565] [<c02d0f8e>] driver_probe_device+0x8e/0x190
[ 188.624669] [<c02d11fe>] __driver_attach+0x9e/0xa0
[ 188.624756] [<c02d038a>] bus_for_each_dev+0x3a/0x60
[ 188.624845] [<c02d0e06>] driver_attach+0x16/0x20
[ 188.624932] [<c02d1160>] __driver_attach+0x0/0xa0
[ 188.625017] [<c02d075a>] bus_add_driver+0x8a/0x1b0
[ 188.625107] [<c0266173>] __pci_register_driver+0x53/0xa0
[ 188.625197] [<c0144d5d>] sys_init_module+0x13d/0x1820
[ 188.625315] [<f8844000>] snd_timer_find+0x0/0x90 [snd_timer]
[ 188.625424] [<c0149530>] disable_irq+0x0/0x30
[ 188.625513] [<c0108b7d>] sys_mmap2+0xcd/0xd0
[ 188.625612] [<c0104266>] syscall_call+0x7/0xb
[ 188.625701] [<c0410000>] rpc_get_inode+0x0/0x80
[ 188.625798] =======================
[ 188.625871] hde: selected mode 0x45
[ 188.626817] ide2 at 0xecf8-0xecff,0xecf2 on irq 18
[ 188.627080] Probing IDE interface ide3...
[ 188.891165] hdg: WDC WD1200JB-00EVA0, ATA DISK drive
[ 189.502580] hdg: selected mode 0x45
[ 189.503698] ide3 at 0xece0-0xece7,0xecda on irq 18
Let
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-09 15:03 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend() John Stoffel
@ 2007-08-09 15:54 ` Jarek Poplawski
2007-08-10 8:05 ` Thomas Gleixner
0 siblings, 1 reply; 16+ messages in thread
From: Jarek Poplawski @ 2007-08-09 15:54 UTC (permalink / raw)
To: John Stoffel
Cc: linux-kernel, shemminger, vignaud, marcin.slusarz, tglx, mingo,
torvalds, akpm, alan, linux-net, netdev
On Thu, Aug 09, 2007 at 11:03:03AM -0400, John Stoffel wrote:
>
> Hi,
Hi, read below, please...
>
> I'm opening this ticket as a new subject, even though it looks like it
> might be related to the thread "Networking dies after random time".
> Sorry for the wide CC list, but since my network hasn't died since I
> rebooted into 2.6.23-rc2 (after 30+ days at 2.6.22-rc7), I'm wondering
> if the problem is more than networking related.
>
> Honestly, I haven't gone back over the previous thread in detail, so I
> might be missing info here.
>
> System details: Dell Precision 610MT, Intel 440GX chipset, Dual PIII
> Xeon, 550Mhz, 2gb RAM (upgraded from 768Mb last night), a mix of IDE,
> SCSI and SATA disks in the system. My poor PCI bus! Just upgraded to
> 2.6.23-rc2. Interrupts looks like this:
>
> > cat /proc/interrupts
> CPU0 CPU1
> 0: 280 1 IO-APIC-edge timer
> 1: 788 0 IO-APIC-edge i8042
> 6: 1 4 IO-APIC-edge floppy
> 8: 0 1 IO-APIC-edge rtc
> 9: 0 0 IO-APIC-fasteoi acpi
> 11: 82410 1239 IO-APIC-edge Cyclom-Y
> 12: 279 106 IO-APIC-edge i8042
> 14: 440901 4266 IO-APIC-edge libata
> 15: 0 0 IO-APIC-edge libata
> 16: 2394727 42983 IO-APIC-fasteoi ohci_hcd:usb3, Ensoniq
> AudioPCI, mga@pci:0000:01:00.0
> 17: 2237362 1110 IO-APIC-fasteoi sata_sil,
> ehci_hcd:usb1, eth0
> 18: 126520 31978 IO-APIC-fasteoi aic7xxx, aic7xxx, ide2,
> ide3, ohci1394
> 19: 0 0 IO-APIC-fasteoi ohci_hcd:usb2,
> uhci_hcd:usb4
> NMI: 0 0
> LOC: 40672484 40672246
> ERR: 0
> MIS: 0
>
> I've only seen the one Warning oops, and backups and other system
> processes have been running for the past 12 hours without a problem.
>
>
> [ 187.747442] Probing IDE interface ide2...
> [ 188.011634] hde: WDC WD1200JB-00CRA1, ATA DISK drive
> [ 188.623038] WARNING: at kernel/irq/resend.c:70 check_irq_resend()
> [ 188.623105] [<c0149e38>] check_irq_resend+0xa8/0xc0
> [ 188.623204] [<c01499d3>] enable_irq+0xc3/0xd0
> [ 188.623295] [<f8867280>] probe_hwif+0x670/0x7c0 [ide_core]
> [ 188.623448] [<f8869f04>] do_ide_setup_pci_device+0x154/0x480
> [ide_core]
> [ 188.623571] [<f8867d6c>] probe_hwif_init_with_fixup+0xc/0x90
> [ide_core]
> [ 188.623690] [<f88817d0>] init_setup_hpt302+0x0/0x30 [hpt366]
> [ 188.623791] [<f886a39b>] ide_setup_pci_device+0x7b/0xc0 [ide_core]
> [ 188.623909] [<f88817d0>] init_setup_hpt302+0x0/0x30 [hpt366]
> [ 188.624004] [<f88811ed>] hpt366_init_one+0x8d/0xa0 [hpt366]
> [ 188.624095] [<f88817d0>] init_setup_hpt302+0x0/0x30 [hpt366]
> [ 188.624187] [<f8881e50>] init_chipset_hpt366+0x0/0x680 [hpt366]
> [ 188.624281] [<f8882680>] init_hwif_hpt366+0x0/0x380 [hpt366]
> [ 188.624372] [<f8881800>] init_dma_hpt366+0x0/0xe0 [hpt366]
> [ 188.624466] [<c0265fc6>] pci_device_probe+0x56/0x80
> [ 188.624565] [<c02d0f8e>] driver_probe_device+0x8e/0x190
> [ 188.624669] [<c02d11fe>] __driver_attach+0x9e/0xa0
> [ 188.624756] [<c02d038a>] bus_for_each_dev+0x3a/0x60
> [ 188.624845] [<c02d0e06>] driver_attach+0x16/0x20
> [ 188.624932] [<c02d1160>] __driver_attach+0x0/0xa0
> [ 188.625017] [<c02d075a>] bus_add_driver+0x8a/0x1b0
> [ 188.625107] [<c0266173>] __pci_register_driver+0x53/0xa0
> [ 188.625197] [<c0144d5d>] sys_init_module+0x13d/0x1820
> [ 188.625315] [<f8844000>] snd_timer_find+0x0/0x90 [snd_timer]
> [ 188.625424] [<c0149530>] disable_irq+0x0/0x30
> [ 188.625513] [<c0108b7d>] sys_mmap2+0xcd/0xd0
> [ 188.625612] [<c0104266>] syscall_call+0x7/0xb
> [ 188.625701] [<c0410000>] rpc_get_inode+0x0/0x80
> [ 188.625798] =======================
> [ 188.625871] hde: selected mode 0x45
> [ 188.626817] ide2 at 0xecf8-0xecff,0xecf2 on irq 18
> [ 188.627080] Probing IDE interface ide3...
> [ 188.891165] hdg: WDC WD1200JB-00EVA0, ATA DISK drive
> [ 189.502580] hdg: selected mode 0x45
> [ 189.503698] ide3 at 0xece0-0xece7,0xecda on irq 18
>
>
> Let
I'm not sure I don't miss anything (a little in hurry now), but this
warning's aim was purely diagnostical and nothing wrong is meant!
Unless there is something wrong... Then please try to be more explicit.
If you prefer to not see this, there is my patch proposal somewhere
in this older thread:
Subject: [patch] genirq: temporary fix for level-triggered IRQ resend
Date: Wed, 8 Aug 2007 13:00:37 +0200
On the other hand, if it works OK, it would be better to let it be
tested more like this...
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-09 15:54 ` Jarek Poplawski
@ 2007-08-10 8:05 ` Thomas Gleixner
2007-08-10 8:23 ` Jarek Poplawski
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2007-08-10 8:05 UTC (permalink / raw)
To: Jarek Poplawski
Cc: John Stoffel, linux-kernel, shemminger, vignaud, marcin.slusarz,
mingo, torvalds, akpm, alan, linux-net, netdev
On Thu, 2007-08-09 at 17:54 +0200, Jarek Poplawski wrote:
> I'm not sure I don't miss anything (a little in hurry now), but this
> warning's aim was purely diagnostical and nothing wrong is meant!
> Unless there is something wrong... Then please try to be more explicit.
>
> If you prefer to not see this, there is my patch proposal somewhere
> in this older thread:
> Subject: [patch] genirq: temporary fix for level-triggered IRQ resend
> Date: Wed, 8 Aug 2007 13:00:37 +0200
>
> On the other hand, if it works OK, it would be better to let it be
> tested more like this...
Hmm. This solution is still just pampering over the real problem. The
delayed disable just re-sends level interrupts unnecessarily. I have a
fix (needs some testing) for this, which I send out tomorrow, when I'm
really back from vacation.
But suppressing the resend is not fixing the driver problem. The problem
can show up with spurious interrupts and with interrupts on a shared PCI
interrupt line at any time. It just might take weeks instead of minutes.
Alan,
is there anything which can be done on the driver level ?
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-10 8:05 ` Thomas Gleixner
@ 2007-08-10 8:23 ` Jarek Poplawski
2007-08-10 8:30 ` Ingo Molnar
0 siblings, 1 reply; 16+ messages in thread
From: Jarek Poplawski @ 2007-08-10 8:23 UTC (permalink / raw)
To: Thomas Gleixner
Cc: John Stoffel, linux-kernel, shemminger, vignaud, marcin.slusarz,
mingo, torvalds, akpm, alan, linux-net, netdev
On Fri, Aug 10, 2007 at 10:05:40AM +0200, Thomas Gleixner wrote:
> On Thu, 2007-08-09 at 17:54 +0200, Jarek Poplawski wrote:
> > I'm not sure I don't miss anything (a little in hurry now), but this
> > warning's aim was purely diagnostical and nothing wrong is meant!
> > Unless there is something wrong... Then please try to be more explicit.
> >
> > If you prefer to not see this, there is my patch proposal somewhere
> > in this older thread:
> > Subject: [patch] genirq: temporary fix for level-triggered IRQ resend
> > Date: Wed, 8 Aug 2007 13:00:37 +0200
> >
> > On the other hand, if it works OK, it would be better to let it be
> > tested more like this...
>
> Hmm. This solution is still just pampering over the real problem. The
> delayed disable just re-sends level interrupts unnecessarily. I have a
> fix (needs some testing) for this, which I send out tomorrow, when I'm
> really back from vacation.
>
> But suppressing the resend is not fixing the driver problem. The problem
> can show up with spurious interrupts and with interrupts on a shared PCI
> interrupt line at any time. It just might take weeks instead of minutes.
Doesn't it look like a little change of mind? Well, there are probably
(but need more testing) two other solutions: _SW_RESEND and disabling
without delay for levels only...
Jarek P.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-10 8:23 ` Jarek Poplawski
@ 2007-08-10 8:30 ` Ingo Molnar
2007-08-10 8:49 ` Jarek Poplawski
0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2007-08-10 8:30 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Thomas Gleixner, John Stoffel, linux-kernel, shemminger, vignaud,
marcin.slusarz, torvalds, akpm, alan, linux-net, netdev
* Jarek Poplawski <jarkao2@o2.pl> wrote:
> > Hmm. This solution is still just pampering over the real problem.
> > The delayed disable just re-sends level interrupts unnecessarily. I
> > have a fix (needs some testing) for this, which I send out tomorrow,
> > when I'm really back from vacation.
> >
> > But suppressing the resend is not fixing the driver problem. The
> > problem can show up with spurious interrupts and with interrupts on
> > a shared PCI interrupt line at any time. It just might take weeks
> > instead of minutes.
>
> Doesn't it look like a little change of mind? [...]
what change of mind do you mean exactly?
> [...] Well, there are probably (but need more testing) two other
> solutions: _SW_RESEND and disabling without delay for levels only...
IIRC Marcin tested software-resend and it didnt fix the hang. That
strongly points in the direction of a driver bug (or a genirq bug) being
made more prominent by the genirq change - not any hardware detail such
as the APIC vector-retrigger sequence.
While we'd like to see the suspected driver bug (or any higher level
genirq bug) fixed, we'll undo the effect of the genirq change (because
it is causing a regression). We'll also add a separate, optional
irq-debugging feature that generates high-rate interrupts on any shared
irq line. (and thus artificially stresses the robustness of the driver
and the genirq layer against spurious interrupts.)
Ingo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-10 8:30 ` Ingo Molnar
@ 2007-08-10 8:49 ` Jarek Poplawski
2007-08-10 8:56 ` Ingo Molnar
0 siblings, 1 reply; 16+ messages in thread
From: Jarek Poplawski @ 2007-08-10 8:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, John Stoffel, linux-kernel, shemminger, vignaud,
marcin.slusarz, torvalds, akpm, alan, linux-net, netdev
On Fri, Aug 10, 2007 at 10:30:50AM +0200, Ingo Molnar wrote:
>
> * Jarek Poplawski <jarkao2@o2.pl> wrote:
>
> > > Hmm. This solution is still just pampering over the real problem.
> > > The delayed disable just re-sends level interrupts unnecessarily. I
> > > have a fix (needs some testing) for this, which I send out tomorrow,
> > > when I'm really back from vacation.
> > >
> > > But suppressing the resend is not fixing the driver problem. The
> > > problem can show up with spurious interrupts and with interrupts on
> > > a shared PCI interrupt line at any time. It just might take weeks
> > > instead of minutes.
> >
> > Doesn't it look like a little change of mind? [...]
>
> what change of mind do you mean exactly?
>
> > [...] Well, there are probably (but need more testing) two other
> > solutions: _SW_RESEND and disabling without delay for levels only...
>
> IIRC Marcin tested software-resend and it didnt fix the hang. That
> strongly points in the direction of a driver bug (or a genirq bug) being
> made more prominent by the genirq change - not any hardware detail such
> as the APIC vector-retrigger sequence.
>
> While we'd like to see the suspected driver bug (or any higher level
> genirq bug) fixed, we'll undo the effect of the genirq change (because
> it is causing a regression). We'll also add a separate, optional
> irq-debugging feature that generates high-rate interrupts on any shared
> irq line. (and thus artificially stresses the robustness of the driver
> and the genirq layer against spurious interrupts.)
Not exactly so... I've send modified version of your software-resend
patch, and it seems to work OK.
Jarek P.
>From marcin.slusarz@gmail.com Wed Aug 8 13:20:02 2007
From: "=?ISO-8859-2?Q?Marcin_=A6lusarz?=" <marcin.slusarz@gmail.com>
...
Subject: Re: 2.6.20->2.6.21 - networking dies after random time
...
2007/8/7, Jarek Poplawski <jarkao2@o2.pl>:
> So, the let's try this idea yet: modified Ingo's "x86: activate
> HARDIRQS_SW_RESEND" patch.
> (Don't forget about make oldconfig before make.)
> For testing only.
>
> Cheers,
> Jarek P.
>
> PS: alas there was not even time for "compile checking"...
>
> ---
>
> diff -Nurp 2.6.22.1-/arch/i386/Kconfig 2.6.22.1/arch/i386/Kconfig
> --- 2.6.22.1-/arch/i386/Kconfig 2007-07-09 01:32:17.000000000 +0200
> +++ 2.6.22.1/arch/i386/Kconfig 2007-08-07 13:13:03.000000000 +0200
> @@ -1252,6 +1252,10 @@ config GENERIC_PENDING_IRQ
> depends on GENERIC_HARDIRQS && SMP
> default y
>
> +config HARDIRQS_SW_RESEND
> + bool
> + default y
> +
> config X86_SMP
> bool
> depends on SMP && !X86_VOYAGER
> diff -Nurp 2.6.22.1-/arch/x86_64/Kconfig 2.6.22.1/arch/x86_64/Kconfig
> --- 2.6.22.1-/arch/x86_64/Kconfig 2007-07-09 01:32:17.000000000 +0200
> +++ 2.6.22.1/arch/x86_64/Kconfig 2007-08-07 13:13:03.000000000 +0200
> @@ -690,6 +690,10 @@ config GENERIC_PENDING_IRQ
> depends on GENERIC_HARDIRQS && SMP
> default y
>
> +config HARDIRQS_SW_RESEND
> + bool
> + default y
> +
> menu "Power management options"
>
> source kernel/power/Kconfig
> diff -Nurp 2.6.22.1-/kernel/irq/manage.c 2.6.22.1/kernel/irq/manage.c
> --- 2.6.22.1-/kernel/irq/manage.c 2007-07-09 01:32:17.000000000 +0200
> +++ 2.6.22.1/kernel/irq/manage.c 2007-08-07 13:13:03.000000000 +0200
> @@ -169,6 +169,14 @@ void enable_irq(unsigned int irq)
> desc->depth--;
> }
> spin_unlock_irqrestore(&desc->lock, flags);
> +#ifdef CONFIG_HARDIRQS_SW_RESEND
> + /*
> + * Do a bh disable/enable pair to trigger any pending
> + * irq resend logic:
> + */
> + local_bh_disable();
> + local_bh_enable();
> +#endif
> }
> EXPORT_SYMBOL(enable_irq);
>
> diff -Nurp 2.6.22.1-/kernel/irq/resend.c 2.6.22.1/kernel/irq/resend.c
> --- 2.6.22.1-/kernel/irq/resend.c 2007-07-09 01:32:17.000000000 +0200
> +++ 2.6.22.1/kernel/irq/resend.c 2007-08-07 13:57:54.000000000 +0200
> @@ -62,16 +62,24 @@ void check_irq_resend(struct irq_desc *d
> */
> desc->chip->enable(irq);
>
> + /*
> + * Temporary hack to figure out more about the problem, which
> + * is causing the ancient network cards to die.
> + */
> +
> if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) {
> desc->status = (status & ~IRQ_PENDING) | IRQ_REPLAY;
>
> - if (!desc->chip || !desc->chip->retrigger ||
> - !desc->chip->retrigger(irq)) {
> + if (desc->handle_irq == handle_edge_irq) {
> + if (desc->chip->retrigger)
> + desc->chip->retrigger(irq);
> + return;
> + }
> #ifdef CONFIG_HARDIRQS_SW_RESEND
> - /* Set it pending and activate the softirq: */
> - set_bit(irq, irqs_resend);
> - tasklet_schedule(&resend_tasklet);
> + WARN_ON_ONCE(1);
> + /* Set it pending and activate the softirq: */
> + set_bit(irq, irqs_resend);
> + tasklet_schedule(&resend_tasklet);
> #endif
> - }
> }
> }
>
Works fine with:
WARNING: at kernel/irq/resend.c:79 check_irq_resend()
Call Trace:
[<ffffffff8025e660>] check_irq_resend+0xc0/0xd0
[<ffffffff8025e1cd>] enable_irq+0xed/0xf0
[<ffffffff8807f21d>] :8390:ei_start_xmit+0x14d/0x30c
[<ffffffff8024d055>] lock_release_non_nested+0xe5/0x190
[<ffffffff80539b78>] __qdisc_run+0x98/0x1f0
[<ffffffff80539b8e>] __qdisc_run+0xae/0x1f0
[<ffffffff8052b65e>] dev_hard_start_xmit+0x26e/0x2d0
[<ffffffff80539ba0>] __qdisc_run+0xc0/0x1f0
[<ffffffff8052dc2f>] dev_queue_xmit+0x24f/0x310
[<ffffffff805337a7>] neigh_resolve_output+0xe7/0x290
[<ffffffff8054f5c0>] dst_output+0x0/0x10
[<ffffffff80552aff>] ip_output+0x19f/0x340
[<ffffffff80551f77>] ip_queue_xmit+0x217/0x430
[<ffffffff80563b2a>] tcp_transmit_skb+0x40a/0x7c0
[<ffffffff805657bb>] __tcp_push_pending_frames+0x11b/0x940
[<ffffffff8055972a>] tcp_sendmsg+0x87a/0xc80
[<ffffffff80577735>] inet_sendmsg+0x45/0x80
[<ffffffff8051e2d4>] sock_aio_write+0x104/0x120
[<ffffffff80285fc1>] do_sync_write+0xf1/0x130
[<ffffffff80243290>] autoremove_wake_function+0x0/0x40
[<ffffffff802868e9>] vfs_write+0x159/0x170
[<ffffffff80286ef0>] sys_write+0x50/0x90
[<ffffffff802097fe>] system_call+0x7e/0x83
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-10 8:49 ` Jarek Poplawski
@ 2007-08-10 8:56 ` Ingo Molnar
2007-08-10 9:12 ` Jarek Poplawski
0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2007-08-10 8:56 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Thomas Gleixner, John Stoffel, linux-kernel, shemminger, vignaud,
marcin.slusarz, torvalds, akpm, alan, linux-net, netdev
* Jarek Poplawski <jarkao2@o2.pl> wrote:
> > > [...] Well, there are probably (but need more testing) two other
> > > solutions: _SW_RESEND and disabling without delay for levels
> > > only...
> >
> > IIRC Marcin tested software-resend and it didnt fix the hang. That
> > strongly points in the direction of a driver bug (or a genirq bug)
> > being made more prominent by the genirq change - not any hardware
> > detail such as the APIC vector-retrigger sequence.
> >
> > While we'd like to see the suspected driver bug (or any higher level
> > genirq bug) fixed, we'll undo the effect of the genirq change
> > (because it is causing a regression). We'll also add a separate,
> > optional irq-debugging feature that generates high-rate interrupts
> > on any shared irq line. (and thus artificially stresses the
> > robustness of the driver and the genirq layer against spurious
> > interrupts.)
>
> Not exactly so... I've send modified version of your software-resend
> patch, and it seems to work OK.
ah, i completely missed that! Thanks :-)
this changes the picture completely and makes the IO-APIC/local-APIC hw
retrigger code/logic the main suspect. I think you right that it's quite
bogus to hw-retrigger level irqs, and that could be confusing the
IO-APIC (or the local APIC, or both).
and i think i see why my first sw-resend patch didnt do the trick:
> > - if (!desc->chip || !desc->chip->retrigger ||
> > - !desc->chip->retrigger(irq)) {
> > + if (desc->handle_irq == handle_edge_irq) {
> > + if (desc->chip->retrigger)
> > + desc->chip->retrigger(irq);
> > + return;
> > + }
> > #ifdef CONFIG_HARDIRQS_SW_RESEND
we used the hw-resend method unconditionally, right?
Ingo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-10 8:56 ` Ingo Molnar
@ 2007-08-10 9:12 ` Jarek Poplawski
2007-08-10 9:33 ` Ingo Molnar
0 siblings, 1 reply; 16+ messages in thread
From: Jarek Poplawski @ 2007-08-10 9:12 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, John Stoffel, linux-kernel, shemminger, vignaud,
marcin.slusarz, torvalds, akpm, alan, linux-net, netdev
On Fri, Aug 10, 2007 at 10:56:11AM +0200, Ingo Molnar wrote:
...
> this changes the picture completely and makes the IO-APIC/local-APIC hw
> retrigger code/logic the main suspect. I think you right that it's quite
> bogus to hw-retrigger level irqs, and that could be confusing the
> IO-APIC (or the local APIC, or both).
>
> and i think i see why my first sw-resend patch didnt do the trick:
>
> > > - if (!desc->chip || !desc->chip->retrigger ||
> > > - !desc->chip->retrigger(irq)) {
> > > + if (desc->handle_irq == handle_edge_irq) {
> > > + if (desc->chip->retrigger)
> > > + desc->chip->retrigger(irq);
> > > + return;
> > > + }
> > > #ifdef CONFIG_HARDIRQS_SW_RESEND
>
> we used the hw-resend method unconditionally, right?
Right: unconditionally on a condition they are not edges...
But, since not resending at all seems to work so good in testing,
I thought, _SW_RESEND could be considered as an unnecessarily
complicated alternative.
Now, I'm a bit confused...
Jarek P.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-10 9:12 ` Jarek Poplawski
@ 2007-08-10 9:33 ` Ingo Molnar
2007-08-10 10:05 ` Jarek Poplawski
2007-08-10 10:13 ` Stephen Hemminger
0 siblings, 2 replies; 16+ messages in thread
From: Ingo Molnar @ 2007-08-10 9:33 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Thomas Gleixner, John Stoffel, linux-kernel, shemminger, vignaud,
marcin.slusarz, torvalds, akpm, alan, linux-net, netdev
* Jarek Poplawski <jarkao2@o2.pl> wrote:
> > > > + }
> > > > #ifdef CONFIG_HARDIRQS_SW_RESEND
> >
> > we used the hw-resend method unconditionally, right?
>
> Right: unconditionally on a condition they are not edges...
>
> But, since not resending at all seems to work so good in testing, I
> thought, _SW_RESEND could be considered as an unnecessarily
> complicated alternative.
>
> Now, I'm a bit confused...
the idea is multi-pronged:
- Primarily, we want to fix the regression. 2.6.20 worked, 2.6.21
didnt, that has to be fixed, no matter what - end of story. But we've
got a wide selection of patches for that purpose now, so what matters
at this point is the secondary question:
- we want to know _why exactly_ the hang happens. We now have a pretty
good theory: hw-resend hangs the IO-APIC. (there is a delicate dance
between local APICs and IO-APICs for level-triggered irqs, and if we
interject via hw-resending via the local APIC, existing races, hw
bugs or weaknesses in our hw-resend implementation might be exposed)
and even though we now have a wide selection of patches we really want
to get to the bottom of the problem so that we can fix the bug that got
exposed: apparently hw resend doesnt always work with level-triggered
irqs.
Note that the hw-resend sequence can trigger _even without our original
patch that triggered the regression_, it's just much less likely to
happen, so this is a pre-existing IO-APIC/APIC code bug that could
trigger anytime, and which we want to see fixed.
To confirm this theory - does the debug-patch below fix the hang? If it
fixes the hang then the theory is confirmed and then the right solution
is to retrigger an IRQ for level-triggered irqs with the proper
trigger-type set.
Ingo
------------------>
Not-Signed-off-by: Ingo Molnar <mingo@elte.hu>
Index: linux/arch/i386/kernel/io_apic.c
===================================================================
--- linux.orig/arch/i386/kernel/io_apic.c
+++ linux/arch/i386/kernel/io_apic.c
@@ -735,7 +735,8 @@ void fastcall send_IPI_self(int vector)
* Wait for idle.
*/
apic_wait_icr_idle();
- cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL;
+ cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL |
+ APIC_INT_LEVELTRIG;
/*
* Send the IPI. The write to APIC_ICR fires this off.
*/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-10 9:33 ` Ingo Molnar
@ 2007-08-10 10:05 ` Jarek Poplawski
2007-08-10 10:16 ` Ingo Molnar
2007-08-10 10:13 ` Stephen Hemminger
1 sibling, 1 reply; 16+ messages in thread
From: Jarek Poplawski @ 2007-08-10 10:05 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, John Stoffel, linux-kernel, shemminger, vignaud,
marcin.slusarz, torvalds, akpm, alan, linux-net, netdev
On Fri, Aug 10, 2007 at 11:33:53AM +0200, Ingo Molnar wrote:
>
> * Jarek Poplawski <jarkao2@o2.pl> wrote:
>
> > > > > + }
> > > > > #ifdef CONFIG_HARDIRQS_SW_RESEND
> > >
> > > we used the hw-resend method unconditionally, right?
> >
> > Right: unconditionally on a condition they are not edges...
> >
> > But, since not resending at all seems to work so good in testing, I
> > thought, _SW_RESEND could be considered as an unnecessarily
> > complicated alternative.
> >
> > Now, I'm a bit confused...
>
> the idea is multi-pronged:
>
> - Primarily, we want to fix the regression. 2.6.20 worked, 2.6.21
> didnt, that has to be fixed, no matter what - end of story. But we've
> got a wide selection of patches for that purpose now, so what matters
> at this point is the secondary question:
>
> - we want to know _why exactly_ the hang happens. We now have a pretty
> good theory: hw-resend hangs the IO-APIC. (there is a delicate dance
> between local APICs and IO-APICs for level-triggered irqs, and if we
> interject via hw-resending via the local APIC, existing races, hw
> bugs or weaknesses in our hw-resend implementation might be exposed)
>
> and even though we now have a wide selection of patches we really want
> to get to the bottom of the problem so that we can fix the bug that got
> exposed: apparently hw resend doesnt always work with level-triggered
> irqs.
>
> Note that the hw-resend sequence can trigger _even without our original
> patch that triggered the regression_, it's just much less likely to
> happen, so this is a pre-existing IO-APIC/APIC code bug that could
> trigger anytime, and which we want to see fixed.
>
> To confirm this theory - does the debug-patch below fix the hang? If it
> fixes the hang then the theory is confirmed and then the right solution
> is to retrigger an IRQ for level-triggered irqs with the proper
> trigger-type set.
>
> Ingo
Ingo: I think, you have to do this in x86_64, and there is probably
send_IPI_mask used for this (but I can miss something...).
I think, Marcin will not be able to do this and report before monday,
but,
Jean-Baptiste: of course current Ingo's or Thomas' patches are
more urgent, so if you could break the current test and try this
(maybe after Ingo acks this yet?) with eg. clean 2.6.23-rc1 or 2.6.22?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jarek P.
>
> ------------------>
> Not-Signed-off-by: Ingo Molnar <mingo@elte.hu>
>
> Index: linux/arch/i386/kernel/io_apic.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/io_apic.c
> +++ linux/arch/i386/kernel/io_apic.c
> @@ -735,7 +735,8 @@ void fastcall send_IPI_self(int vector)
> * Wait for idle.
> */
> apic_wait_icr_idle();
> - cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL;
> + cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL |
> + APIC_INT_LEVELTRIG;
> /*
> * Send the IPI. The write to APIC_ICR fires this off.
> */
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-10 9:33 ` Ingo Molnar
2007-08-10 10:05 ` Jarek Poplawski
@ 2007-08-10 10:13 ` Stephen Hemminger
1 sibling, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2007-08-10 10:13 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jarek Poplawski, Thomas Gleixner, John Stoffel, linux-kernel,
vignaud, marcin.slusarz, torvalds, akpm, alan, linux-net, netdev
On Fri, 10 Aug 2007 11:33:53 +0200
Ingo Molnar <mingo@elte.hu> wrote:
>
> * Jarek Poplawski <jarkao2@o2.pl> wrote:
>
> > > > > + }
> > > > > #ifdef CONFIG_HARDIRQS_SW_RESEND
> > >
> > > we used the hw-resend method unconditionally, right?
> >
> > Right: unconditionally on a condition they are not edges...
> >
> > But, since not resending at all seems to work so good in testing, I
> > thought, _SW_RESEND could be considered as an unnecessarily
> > complicated alternative.
> >
> > Now, I'm a bit confused...
>
> the idea is multi-pronged:
>
> - Primarily, we want to fix the regression. 2.6.20 worked, 2.6.21
> didnt, that has to be fixed, no matter what - end of story. But we've
> got a wide selection of patches for that purpose now, so what matters
> at this point is the secondary question:
>
> - we want to know _why exactly_ the hang happens. We now have a pretty
> good theory: hw-resend hangs the IO-APIC. (there is a delicate dance
> between local APICs and IO-APICs for level-triggered irqs, and if we
> interject via hw-resending via the local APIC, existing races, hw
> bugs or weaknesses in our hw-resend implementation might be exposed)
>
> and even though we now have a wide selection of patches we really want
> to get to the bottom of the problem so that we can fix the bug that got
> exposed: apparently hw resend doesnt always work with level-triggered
> irqs.
>
> Note that the hw-resend sequence can trigger _even without our original
> patch that triggered the regression_, it's just much less likely to
> happen, so this is a pre-existing IO-APIC/APIC code bug that could
> trigger anytime, and which we want to see fixed.
>
> To confirm this theory - does the debug-patch below fix the hang? If it
> fixes the hang then the theory is confirmed and then the right solution
> is to retrigger an IRQ for level-triggered irqs with the proper
> trigger-type set.
>
All this might explain some of the IRQ loss, I saw with sky2 on mac mini.
Basically, the device would act like it missed an IRQ. The chip and PCI registers
all said "device has asserted IRQ" but the IRQ handler never got called.
Then again, the problem might be completely different since this was with
PCI-E with either MSI or INTA mode.
The workaround was to perodically call the soft IRQ handler and that would
clear the IRQ, but it's not something I want to keep.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-10 10:05 ` Jarek Poplawski
@ 2007-08-10 10:16 ` Ingo Molnar
2007-08-13 7:13 ` Marcin Ślusarz
0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2007-08-10 10:16 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Thomas Gleixner, John Stoffel, linux-kernel, shemminger, vignaud,
marcin.slusarz, torvalds, akpm, alan, linux-net, netdev
* Jarek Poplawski <jarkao2@o2.pl> wrote:
> Ingo: I think, you have to do this in x86_64, and there is probably
> send_IPI_mask used for this (but I can miss something...).
indeed - full patch below.
Ingo
---
arch/i386/kernel/io_apic.c | 3 ++-
arch/x86_64/kernel/genapic.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
Index: linux/arch/i386/kernel/io_apic.c
===================================================================
--- linux.orig/arch/i386/kernel/io_apic.c
+++ linux/arch/i386/kernel/io_apic.c
@@ -735,7 +735,8 @@ void fastcall send_IPI_self(int vector)
* Wait for idle.
*/
apic_wait_icr_idle();
- cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL;
+ cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL |
+ APIC_INT_LEVELTRIG;
/*
* Send the IPI. The write to APIC_ICR fires this off.
*/
Index: linux/arch/x86_64/kernel/genapic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/genapic.c
+++ linux/arch/x86_64/kernel/genapic.c
@@ -62,5 +62,6 @@ void __init setup_apic_routing(void)
void send_IPI_self(int vector)
{
- __send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL);
+ __send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL |
+ APIC_INT_LEVELTRIG);
}
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
@ 2007-08-10 11:35 Jean-Baptiste Vignaud
0 siblings, 0 replies; 16+ messages in thread
From: Jean-Baptiste Vignaud @ 2007-08-10 11:35 UTC (permalink / raw)
To: jarkao2
Cc: mingo, tglx, john, linux-kernel, shemminger, marcin.slusarz,
torvalds, akpm, alan, linux-net, netdev
> Ingo: I think, you have to do this in x86_64, and there is probably
> send_IPI_mask used for this (but I can miss something...).
>
> I think, Marcin will not be able to do this and report before monday,
> but,
> Jean-Baptiste: of course current Ingo's or Thomas' patches are
> more urgent, so if you could break the current test and try this
> (maybe after Ingo acks this yet?) with eg. clean 2.6.23-rc1 or 2.6.22?
>
i'm compiling 2.6.23-rc1 with http://lkml.org/lkml/diff/2007/8/10/101/1
when finished, i'll stop current test (atm : about 100Go of network traffic and still ok) to try it.
Jb
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
@ 2007-08-10 12:27 Jean-Baptiste Vignaud
0 siblings, 0 replies; 16+ messages in thread
From: Jean-Baptiste Vignaud @ 2007-08-10 12:27 UTC (permalink / raw)
To: mingo
Cc: jarkao2, tglx, john, linux-kernel, shemminger, marcin.slusarz,
torvalds, akpm, alan, linux-net, netdev
see below
> arch/i386/kernel/io_apic.c | 3 ++-
> arch/x86_64/kernel/genapic.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> Index: linux/arch/i386/kernel/io_apic.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/io_apic.c
> +++ linux/arch/i386/kernel/io_apic.c
> @@ -735,7 +735,8 @@ void fastcall send_IPI_self(int vector)
> * Wait for idle.
> */
> apic_wait_icr_idle();
> - cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL;
> + cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL |
> + APIC_INT_LEVELTRIG;
> /*
> * Send the IPI. The write to APIC_ICR fires this off.
> */
> Index: linux/arch/x86_64/kernel/genapic.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/genapic.c
> +++ linux/arch/x86_64/kernel/genapic.c
> @@ -62,5 +62,6 @@ void __init setup_apic_routing(void)
>
> void send_IPI_self(int vector)
> {
> - __send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL);
> + __send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL |
> + APIC_INT_LEVELTRIG);
> }
>
clean 2.6.23-rc1 with this patch :
Aug 10 14:12:09 loki kernel: NETDEV WATCHDOG: eth2: transmit timed out
Aug 10 14:12:09 loki kernel: eth2: transmit timed out, tx_status 00 status e601.
Aug 10 14:12:09 loki kernel: diagnostics: net 0ccc media 8880 dma 0000003a fifo 8000
Aug 10 14:12:09 loki kernel: eth2: Interrupt posted but not delivered -- IRQ blocked by another device?
Aug 10 14:12:09 loki kernel: Flags; bus-master 1, dirty 231829(5) current 231829(5)
Aug 10 14:12:09 loki kernel: Transmit list 00000000 vs. ffff81007eaad520.
Aug 10 14:12:09 loki kernel: 0: @ffff81007eaad200 length 80000115 status 0c010115
Aug 10 14:12:09 loki kernel: 1: @ffff81007eaad2a0 length 8000005c status 0c01005c
Aug 10 14:12:09 loki kernel: 2: @ffff81007eaad340 length 8000002a status 0001002a
Aug 10 14:12:09 loki kernel: 3: @ffff81007eaad3e0 length 8000002a status 8001002a
Aug 10 14:12:09 loki kernel: 4: @ffff81007eaad480 length 8000005c status 8c01005c
Aug 10 14:12:09 loki kernel: 5: @ffff81007eaad520 length 80000042 status 00010042
Aug 10 14:12:09 loki kernel: 6: @ffff81007eaad5c0 length 8000007b status 0001007b
Aug 10 14:12:09 loki kernel: 7: @ffff81007eaad660 length 8000002a status 0001002a
Aug 10 14:12:09 loki kernel: 8: @ffff81007eaad700 length 8000002a status 0001002a
Aug 10 14:12:09 loki kernel: 9: @ffff81007eaad7a0 length 8000002a status 0001002a
Aug 10 14:12:09 loki kernel: 10: @ffff81007eaad840 length 8000002a status 0001002a
Aug 10 14:12:09 loki kernel: 11: @ffff81007eaad8e0 length 8000002a status 0001002a
Aug 10 14:12:09 loki kernel: 12: @ffff81007eaad980 length 8000002a status 0001002a
Aug 10 14:12:09 loki kernel: 13: @ffff81007eaada20 length 8000002a status 0001002a
Aug 10 14:12:09 loki kernel: 14: @ffff81007eaadac0 length 8000002a status 0001002a
Aug 10 14:12:09 loki kernel: 15: @ffff81007eaadb60 length 8000002a status 0001002a
I did not had to wait too long for this to occurs (1-2 minutes).
Jb
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend()
2007-08-10 10:16 ` Ingo Molnar
@ 2007-08-13 7:13 ` Marcin Ślusarz
0 siblings, 0 replies; 16+ messages in thread
From: Marcin Ślusarz @ 2007-08-13 7:13 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jarek Poplawski, Thomas Gleixner, John Stoffel, linux-kernel,
shemminger, vignaud, torvalds, akpm, alan, linux-net, netdev
2007/8/10, Ingo Molnar <mingo@elte.hu>:
> Index: linux/arch/i386/kernel/io_apic.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/io_apic.c
> +++ linux/arch/i386/kernel/io_apic.c
> @@ -735,7 +735,8 @@ void fastcall send_IPI_self(int vector)
> * Wait for idle.
> */
> apic_wait_icr_idle();
> - cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL;
> + cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL |
> + APIC_INT_LEVELTRIG;
> /*
> * Send the IPI. The write to APIC_ICR fires this off.
> */
> Index: linux/arch/x86_64/kernel/genapic.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/genapic.c
> +++ linux/arch/x86_64/kernel/genapic.c
> @@ -62,5 +62,6 @@ void __init setup_apic_routing(void)
>
> void send_IPI_self(int vector)
> {
> - __send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL);
> + __send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL |
> + APIC_INT_LEVELTRIG);
> }
>
network card timed out as usual ;)
Marcin
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-08-13 7:13 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-09 15:03 2.6.23-rc2: WARNING: at kernel/irq/resend.c:70 check_irq_resend() John Stoffel
2007-08-09 15:54 ` Jarek Poplawski
2007-08-10 8:05 ` Thomas Gleixner
2007-08-10 8:23 ` Jarek Poplawski
2007-08-10 8:30 ` Ingo Molnar
2007-08-10 8:49 ` Jarek Poplawski
2007-08-10 8:56 ` Ingo Molnar
2007-08-10 9:12 ` Jarek Poplawski
2007-08-10 9:33 ` Ingo Molnar
2007-08-10 10:05 ` Jarek Poplawski
2007-08-10 10:16 ` Ingo Molnar
2007-08-13 7:13 ` Marcin Ślusarz
2007-08-10 10:13 ` Stephen Hemminger
-- strict thread matches above, loose matches on Subject: below --
2007-08-10 12:27 Jean-Baptiste Vignaud
2007-08-10 11:35 Jean-Baptiste Vignaud
2007-08-08 18:09 2.6.23-rc2: WARNING " Indan Zupancic
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).