* Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time
@ 2007-08-10 8:15 Jean-Baptiste Vignaud
2007-08-10 8:37 ` Jarek Poplawski
0 siblings, 1 reply; 12+ messages in thread
From: Jean-Baptiste Vignaud @ 2007-08-10 8:15 UTC (permalink / raw)
To: jarkao2
Cc: marcin.slusarz, mingo, tglx, torvalds, linux-kernel, shemminger,
linux-net, netdev, akpm, alan
> So, we still have to wait for the exact explanation...
>
> Thanks very much Marcin!
>
> I think, there is this one possible for your testing yet?:
> Subject: [patch] genirq: temporary fix for level-triggered IRQ resend
> Date: Wed, 8 Aug 2007 13:00:37 +0200
>
> If it's not a great problem it would be interesting to try this with
> different CONFIG_HZ too e.g. you could start with 100 (I guess, you
> tested very similar thing in 2.6.23-rc2 with 1000(?) already).
>
> Jean-Baptiste: you can skip/break testing of this 'experimental'
ok
I was still testing on -rc2:
Subject: [patch] genirq: temporary fix for level-triggered IRQ resend
Date: Wed, 8 Aug 2007 13:00:37 +0200
For me after 1day 20hours, the network is still up, with more than 1To
of network traffic. HZ was 1000, i restart with HZ=100.
Jb
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time
2007-08-10 8:15 [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time Jean-Baptiste Vignaud
@ 2007-08-10 8:37 ` Jarek Poplawski
2007-08-10 8:48 ` Ingo Molnar
0 siblings, 1 reply; 12+ messages in thread
From: Jarek Poplawski @ 2007-08-10 8:37 UTC (permalink / raw)
To: Jean-Baptiste Vignaud
Cc: marcin.slusarz, mingo, tglx, torvalds, linux-kernel, shemminger,
linux-net, netdev, akpm, alan
On Fri, Aug 10, 2007 at 10:15:53AM +0200, Jean-Baptiste Vignaud wrote:
...
> I was still testing on -rc2:
> Subject: [patch] genirq: temporary fix for level-triggered IRQ resend
> Date: Wed, 8 Aug 2007 13:00:37 +0200
>
> For me after 1day 20hours, the network is still up, with more than 1To
> of network traffic. HZ was 1000, i restart with HZ=100.
For me it's enough too but Thomas seems to doubt.
You've written earlier that you've 2.6.23-rc1 with HARDIRQS_SW_RESEND
prepared too. So, if this is not a great problem maybe you could try
this first. Tomorrow Thomas may send something, so this 100HZ could
wait yet, I hope?
Many thanks,
Jarek P.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time
2007-08-10 8:37 ` Jarek Poplawski
@ 2007-08-10 8:48 ` Ingo Molnar
2007-08-10 9:03 ` Jarek Poplawski
0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2007-08-10 8:48 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Jean-Baptiste Vignaud, marcin.slusarz, tglx, torvalds,
linux-kernel, shemminger, linux-net, netdev, akpm, alan
* Jarek Poplawski <jarkao2@o2.pl> wrote:
> On Fri, Aug 10, 2007 at 10:15:53AM +0200, Jean-Baptiste Vignaud wrote:
> ...
> > I was still testing on -rc2:
> > Subject: [patch] genirq: temporary fix for level-triggered IRQ resend
> > Date: Wed, 8 Aug 2007 13:00:37 +0200
> >
> > For me after 1day 20hours, the network is still up, with more than
> > 1To of network traffic. HZ was 1000, i restart with HZ=100.
>
> For me it's enough too but Thomas seems to doubt.
seem to doubt what? That rc2 fixes the symptom? That is a sure thing,
and we never doubted that. I think you might have misunderstood what
Thomas said and meant, so please just state your opinion unambiguously
so that we can fix any mis-communication :)
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time
2007-08-10 8:48 ` Ingo Molnar
@ 2007-08-10 9:03 ` Jarek Poplawski
2007-08-10 9:08 ` Ingo Molnar
0 siblings, 1 reply; 12+ messages in thread
From: Jarek Poplawski @ 2007-08-10 9:03 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jean-Baptiste Vignaud, marcin.slusarz, tglx, torvalds,
linux-kernel, shemminger, linux-net, netdev, akpm, alan
On Fri, Aug 10, 2007 at 10:48:41AM +0200, Ingo Molnar wrote:
>
> * Jarek Poplawski <jarkao2@o2.pl> wrote:
>
> > On Fri, Aug 10, 2007 at 10:15:53AM +0200, Jean-Baptiste Vignaud wrote:
> > ...
> > > I was still testing on -rc2:
> > > Subject: [patch] genirq: temporary fix for level-triggered IRQ resend
> > > Date: Wed, 8 Aug 2007 13:00:37 +0200
> > >
> > > For me after 1day 20hours, the network is still up, with more than
> > > 1To of network traffic. HZ was 1000, i restart with HZ=100.
> >
> > For me it's enough too but Thomas seems to doubt.
>
> seem to doubt what? That rc2 fixes the symptom? That is a sure thing,
> and we never doubted that. I think you might have misunderstood what
> Thomas said and meant, so please just state your opinion unambiguously
> so that we can fix any mis-communication :)
>
> Ingo
>
On 25-07-2007 02:19, Thomas Gleixner wrote:
...
> Actually we only need the resend for edge type interrupts. Level type
> interrupts come back once enable_irq() re-enables the interrupt line.
>
On 10-08-2007 10:05, Thomas Gleixner wrote:
...
> But suppressing the resend is not fixing the driver problem. The problem
> can show up with spurious interrupts and with interrupts on a shared PCI
> interrupt line at any time. It just might take weeks instead of minutes.
Maybe I miss something but it's not the same!
So, should Jean-Baptiste or Marcin test this for weeks or it's enough?
Jarek P.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time
2007-08-10 9:03 ` Jarek Poplawski
@ 2007-08-10 9:08 ` Ingo Molnar
2007-08-10 9:19 ` Jarek Poplawski
0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2007-08-10 9:08 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Jean-Baptiste Vignaud, marcin.slusarz, tglx, torvalds,
linux-kernel, shemminger, linux-net, netdev, akpm, alan
* Jarek Poplawski <jarkao2@o2.pl> wrote:
> On 10-08-2007 10:05, Thomas Gleixner wrote:
> ...
> > But suppressing the resend is not fixing the driver problem. The
> > problem can show up with spurious interrupts and with interrupts on
> > a shared PCI interrupt line at any time. It just might take weeks
> > instead of minutes.
>
> Maybe I miss something but it's not the same!
_now_ i finally understand what you probably meant: because sw-resend
worked and hw-resend didnt, it's hw-resend that is causing the breakage,
not any driver or irqflow bug - correct?
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time
2007-08-10 9:08 ` Ingo Molnar
@ 2007-08-10 9:19 ` Jarek Poplawski
2007-08-10 9:38 ` Ingo Molnar
0 siblings, 1 reply; 12+ messages in thread
From: Jarek Poplawski @ 2007-08-10 9:19 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jean-Baptiste Vignaud, marcin.slusarz, tglx, torvalds,
linux-kernel, shemminger, linux-net, netdev, akpm, alan
On Fri, Aug 10, 2007 at 11:08:33AM +0200, Ingo Molnar wrote:
>
> * Jarek Poplawski <jarkao2@o2.pl> wrote:
>
> > On 10-08-2007 10:05, Thomas Gleixner wrote:
> > ...
> > > But suppressing the resend is not fixing the driver problem. The
> > > problem can show up with spurious interrupts and with interrupts on
> > > a shared PCI interrupt line at any time. It just might take weeks
> > > instead of minutes.
> >
> > Maybe I miss something but it's not the same!
>
> _now_ i finally understand what you probably meant: because sw-resend
> worked and hw-resend didnt, it's hw-resend that is causing the breakage,
> not any driver or irqflow bug - correct?
All correct! There was also checked a possibility it can be not
hw itself, but wrong way of handling after hw (acking too late). This
was false idea (or bad implementation), so it looks like hw vs lapic
problem.
Jarek P.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time
2007-08-10 9:19 ` Jarek Poplawski
@ 2007-08-10 9:38 ` Ingo Molnar
0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2007-08-10 9:38 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Jean-Baptiste Vignaud, marcin.slusarz, tglx, torvalds,
linux-kernel, shemminger, linux-net, netdev, akpm, alan
* Jarek Poplawski <jarkao2@o2.pl> wrote:
> All correct! There was also checked a possibility it can be not hw
> itself, but wrong way of handling after hw (acking too late). This was
> false idea (or bad implementation), so it looks like hw vs lapic
> problem.
i think the problem is that local APIC 'self vectors' might be
edge-triggered by default. I'm not exactly sure whether passing in
APIC_INT_LEVELTRIG to send_IPI_self() will truly be interpreted by the
local APIC into any external IO-APIC ACK sequence (the local APIC might
just treat self-vectors as always-edge) - and it might also be that the
pure act of mixing self-triggered vectors with level-triggered external
irqs sometimes confuses the IO-APIC <-> local-APIC messaging. One more
test of the patch below will tell us a bit more about this part of the
story.
Ingo
Index: linux/arch/i386/kernel/io_apic.c
===================================================================
--- linux.orig/arch/i386/kernel/io_apic.c
+++ linux/arch/i386/kernel/io_apic.c
@@ -735,7 +735,8 @@ void fastcall send_IPI_self(int vector)
* Wait for idle.
*/
apic_wait_icr_idle();
- cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL;
+ cfg = APIC_DM_FIXED | APIC_DEST_SELF | vector | APIC_DEST_LOGICAL |
+ APIC_INT_LEVELTRIG;
/*
* Send the IPI. The write to APIC_ICR fires this off.
*/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time
@ 2007-08-10 8:41 Jean-Baptiste Vignaud
0 siblings, 0 replies; 12+ messages in thread
From: Jean-Baptiste Vignaud @ 2007-08-10 8:41 UTC (permalink / raw)
To: jarkao2
Cc: marcin.slusarz, mingo, tglx, torvalds, linux-kernel, shemminger,
linux-net, netdev, akpm, alan
> For me it's enough too but Thomas seems to doubt.
>
> You've written earlier that you've 2.6.23-rc1 with HARDIRQS_SW_RESEND
> prepared too. So, if this is not a great problem maybe you could try
> this first. Tomorrow Thomas may send something, so this 100HZ could
> wait yet, I hope?
Ok, i'll test 2.6.23-rc1 with HARDIRQS_SW_RESEND first.
Jb
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20->2.6.21 - networking dies after random time
@ 2007-07-31 13:20 Jarek Poplawski
2007-08-06 7:00 ` Marcin Ślusarz
0 siblings, 1 reply; 12+ messages in thread
From: Jarek Poplawski @ 2007-07-31 13:20 UTC (permalink / raw)
To: Marcin Ślusarz
Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds,
Jean-Baptiste Vignaud, linux-kernel, shemminger, linux-net,
netdev, Andrew Morton, Alan Cox
On Mon, Jul 30, 2007 at 09:29:38AM +0200, Marcin Ślusarz wrote:
...
> ps: I retested all patches posted in this thread on top of 2.6.22.1
> and behavior from 2.6.21.3 didn't changed. My next tests will be on
> 2.6.22.x only.
Marcin,
I see you're quite busy, but if after testing this next Ingo's patch
you are alive yet, maybe you could try one more "idea"? No patch this
time, but if you could try this after adding boot option "noirqdebug"
(I'd like to be sure it's not about timinig after all).
Cheers & thanks,
Jarek P.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20->2.6.21 - networking dies after random time
2007-07-31 13:20 Jarek Poplawski
@ 2007-08-06 7:00 ` Marcin Ślusarz
2007-08-06 7:03 ` Ingo Molnar
0 siblings, 1 reply; 12+ messages in thread
From: Marcin Ślusarz @ 2007-08-06 7:00 UTC (permalink / raw)
To: Jarek Poplawski
Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds,
Jean-Baptiste Vignaud, linux-kernel, shemminger, linux-net,
netdev, Andrew Morton, Alan Cox
2007/7/31, Jarek Poplawski <jarkao2@o2.pl>:
> Marcin,
>
> I see you're quite busy, but if after testing this next Ingo's patch
> you are alive yet, maybe you could try one more "idea"? No patch this
> time, but if you could try this after adding boot option "noirqdebug"
> (I'd like to be sure it's not about timinig after all).
It didn't change anything. Network card still timed out.
Marcin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20->2.6.21 - networking dies after random time
2007-08-06 7:00 ` Marcin Ślusarz
@ 2007-08-06 7:03 ` Ingo Molnar
2007-08-07 7:46 ` Marcin Ślusarz
0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2007-08-06 7:03 UTC (permalink / raw)
To: Marcin Ślusarz
Cc: Jarek Poplawski, Thomas Gleixner, Linus Torvalds,
Jean-Baptiste Vignaud, linux-kernel, shemminger, linux-net,
netdev, Andrew Morton, Alan Cox
* Marcin Ślusarz <marcin.slusarz@gmail.com> wrote:
> 2007/7/31, Jarek Poplawski <jarkao2@o2.pl>:
> > Marcin,
> >
> > I see you're quite busy, but if after testing this next Ingo's patch
> > you are alive yet, maybe you could try one more "idea"? No patch this
> > time, but if you could try this after adding boot option "noirqdebug"
> > (I'd like to be sure it's not about timinig after all).
> It didn't change anything. Network card still timed out.
please try Jarek's second patch too - there was a missing unmask.
Ingo
-------------->
Subject: genirq: fix simple and fasteoi irq handlers
From: Jarek Poplawski <jarkao2@o2.pl>
After the "genirq: do not mask interrupts by default" patch interrupts
should be disabled not immediately upon request, but after they happen.
But, handle_simple_irq() and handle_fasteoi_irq() can skip this once or
more if an irq is just serviced (IRQ_INPROGRESS), possibly disrupting a
driver's work.
The main reason of problems here, pointing the broken patch and making
the first patch which can fix this was done by Marcin Slusarz.
Additional test patches of Thomas Gleixner and Ingo Molnar tested by
Marcin Slusarz helped to narrow possible reasons even more. Thanks.
PS: this patch fixes only one evident error here, but there could be
more places affected by above-mentioned change in irq handling.
PS 2:
After rethinking, IMHO, there are two most probable scenarios here:
1. After hw resend there could be a conflict between retriggered
edge type irq and the next level type one: e.g. if this level type
irq (io_apic is enabled then) is triggered while retriggered irq is
serviced (IRQ_INPROGRESS) there is goto out with eoi, and probably
the next such levels are triggered and looping, so probably kind of
flood in io_apic until this retriggered edge service has ended.
2. There is something wrong with ioapic_retrigger_irq (less probable
because this should be probably seen with 'normal' edge retriggers,
but on the other hand, they could be less common).
So, if there is #1, this fixed patch should work.
But, since level types don't need this retriggers too much I think
this "don't mask interrupts by default" idea should be rethinked:
is there enough gain to risk such hard to diagnose errors?
So, IMHO, there should be at least possibility to turn this off for
level types in config (it should be a visible option, so people could
find & try this before writing for help or changing a network card).
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
---
diff -Nurp 2.6.23-rc1-/kernel/irq/chip.c 2.6.23-rc1/kernel/irq/chip.c
--- 2.6.23-rc1-/kernel/irq/chip.c 2007-07-09 01:32:17.000000000 +0200
+++ 2.6.23-rc1/kernel/irq/chip.c 2007-08-05 21:49:46.000000000 +0200
@@ -295,12 +295,11 @@ handle_simple_irq(unsigned int irq, stru
spin_lock(&desc->lock);
- if (unlikely(desc->status & IRQ_INPROGRESS))
- goto out_unlock;
kstat_cpu(cpu).irqs[irq]++;
action = desc->action;
- if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
+ if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
+ IRQ_DISABLED)))) {
if (desc->chip->mask)
desc->chip->mask(irq);
desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
@@ -318,6 +317,8 @@ handle_simple_irq(unsigned int irq, stru
spin_lock(&desc->lock);
desc->status &= ~IRQ_INPROGRESS;
+ if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
+ desc->chip->unmask(irq);
out_unlock:
spin_unlock(&desc->lock);
}
@@ -392,18 +393,16 @@ handle_fasteoi_irq(unsigned int irq, str
spin_lock(&desc->lock);
- if (unlikely(desc->status & IRQ_INPROGRESS))
- goto out;
-
desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
kstat_cpu(cpu).irqs[irq]++;
/*
- * If its disabled or no action available
+ * If it's running, disabled or no action available
* then mask it and get out of here:
*/
action = desc->action;
- if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
+ if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
+ IRQ_DISABLED)))) {
desc->status |= IRQ_PENDING;
if (desc->chip->mask)
desc->chip->mask(irq);
@@ -420,6 +419,8 @@ handle_fasteoi_irq(unsigned int irq, str
spin_lock(&desc->lock);
desc->status &= ~IRQ_INPROGRESS;
+ if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
+ desc->chip->unmask(irq);
out:
desc->chip->eoi(irq);
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: 2.6.20->2.6.21 - networking dies after random time
2007-08-06 7:03 ` Ingo Molnar
@ 2007-08-07 7:46 ` Marcin Ślusarz
2007-08-07 8:23 ` Jarek Poplawski
0 siblings, 1 reply; 12+ messages in thread
From: Marcin Ślusarz @ 2007-08-07 7:46 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jarek Poplawski, Thomas Gleixner, Linus Torvalds,
Jean-Baptiste Vignaud, linux-kernel, shemminger, linux-net,
netdev, Andrew Morton, Alan Cox
2007/8/6, Ingo Molnar <mingo@elte.hu>:
> (..)
> please try Jarek's second patch too - there was a missing unmask.
>
> Ingo
>
> -------------->
> Subject: genirq: fix simple and fasteoi irq handlers
> From: Jarek Poplawski <jarkao2@o2.pl>
>
> After the "genirq: do not mask interrupts by default" patch interrupts
> should be disabled not immediately upon request, but after they happen.
> But, handle_simple_irq() and handle_fasteoi_irq() can skip this once or
> more if an irq is just serviced (IRQ_INPROGRESS), possibly disrupting a
> driver's work.
>
> The main reason of problems here, pointing the broken patch and making
> the first patch which can fix this was done by Marcin Slusarz.
> Additional test patches of Thomas Gleixner and Ingo Molnar tested by
> Marcin Slusarz helped to narrow possible reasons even more. Thanks.
>
> PS: this patch fixes only one evident error here, but there could be
> more places affected by above-mentioned change in irq handling.
>
> PS 2:
> After rethinking, IMHO, there are two most probable scenarios here:
>
> 1. After hw resend there could be a conflict between retriggered
> edge type irq and the next level type one: e.g. if this level type
> irq (io_apic is enabled then) is triggered while retriggered irq is
> serviced (IRQ_INPROGRESS) there is goto out with eoi, and probably
> the next such levels are triggered and looping, so probably kind of
> flood in io_apic until this retriggered edge service has ended.
> 2. There is something wrong with ioapic_retrigger_irq (less probable
> because this should be probably seen with 'normal' edge retriggers,
> but on the other hand, they could be less common).
>
> So, if there is #1, this fixed patch should work.
>
> But, since level types don't need this retriggers too much I think
> this "don't mask interrupts by default" idea should be rethinked:
> is there enough gain to risk such hard to diagnose errors?
>
> So, IMHO, there should be at least possibility to turn this off for
> level types in config (it should be a visible option, so people could
> find & try this before writing for help or changing a network card).
>
>
> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
>
> ---
>
> diff -Nurp 2.6.23-rc1-/kernel/irq/chip.c 2.6.23-rc1/kernel/irq/chip.c
> --- 2.6.23-rc1-/kernel/irq/chip.c 2007-07-09 01:32:17.000000000 +0200
> +++ 2.6.23-rc1/kernel/irq/chip.c 2007-08-05 21:49:46.000000000 +0200
> @@ -295,12 +295,11 @@ handle_simple_irq(unsigned int irq, stru
>
> spin_lock(&desc->lock);
>
> - if (unlikely(desc->status & IRQ_INPROGRESS))
> - goto out_unlock;
> kstat_cpu(cpu).irqs[irq]++;
>
> action = desc->action;
> - if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
> + if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
> + IRQ_DISABLED)))) {
> if (desc->chip->mask)
> desc->chip->mask(irq);
> desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
> @@ -318,6 +317,8 @@ handle_simple_irq(unsigned int irq, stru
>
> spin_lock(&desc->lock);
> desc->status &= ~IRQ_INPROGRESS;
> + if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
> + desc->chip->unmask(irq);
> out_unlock:
> spin_unlock(&desc->lock);
> }
> @@ -392,18 +393,16 @@ handle_fasteoi_irq(unsigned int irq, str
>
> spin_lock(&desc->lock);
>
> - if (unlikely(desc->status & IRQ_INPROGRESS))
> - goto out;
> -
> desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
> kstat_cpu(cpu).irqs[irq]++;
>
> /*
> - * If its disabled or no action available
> + * If it's running, disabled or no action available
> * then mask it and get out of here:
> */
> action = desc->action;
> - if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
> + if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
> + IRQ_DISABLED)))) {
> desc->status |= IRQ_PENDING;
> if (desc->chip->mask)
> desc->chip->mask(irq);
> @@ -420,6 +419,8 @@ handle_fasteoi_irq(unsigned int irq, str
>
> spin_lock(&desc->lock);
> desc->status &= ~IRQ_INPROGRESS;
> + if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
> + desc->chip->unmask(irq);
> out:
> desc->chip->eoi(irq);
>
>
Network card still locks up (tested on 2.6.22.1). I had to upload more
data than usual (~350 MB vs ~1-100 MB) to trigger that bug but it
might be a coincidence...
Marcin
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: 2.6.20->2.6.21 - networking dies after random time
2007-08-07 7:46 ` Marcin Ślusarz
@ 2007-08-07 8:23 ` Jarek Poplawski
[not found] ` <4bacf17f0708070237w19d184b3p7f74b53612edb9a6@mail.gmail.com>
0 siblings, 1 reply; 12+ messages in thread
From: Jarek Poplawski @ 2007-08-07 8:23 UTC (permalink / raw)
To: Marcin Ślusarz
Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds,
Jean-Baptiste Vignaud, linux-kernel, shemminger, linux-net,
netdev, Andrew Morton, Alan Cox
On Tue, Aug 07, 2007 at 09:46:36AM +0200, Marcin Ślusarz wrote:
> 2007/8/6, Ingo Molnar <mingo@elte.hu>:
> > (..)
> > please try Jarek's second patch too - there was a missing unmask.
> >
> > Ingo
> >
> > -------------->
> > Subject: genirq: fix simple and fasteoi irq handlers
> > From: Jarek Poplawski <jarkao2@o2.pl>
...
> Network card still locks up (tested on 2.6.22.1). I had to upload more
> data than usual (~350 MB vs ~1-100 MB) to trigger that bug but it
> might be a coincidence...
Thanks! It's a good news after all - it would be really strange why
this place doesn't hit more people (it seems there is some safety
elsewhere for this).
BTW: I hope, this previous Thomas' patch with Ingo's warning to resend.c
(with a warning), had no problems with a similar load?
So, once more, I would suspect hw retrigger code. Ingo, IMHO, this
patch for testing HARDIRQS_SW_RESEND could be reworked, so that
desc->chip->retrigger() is done only for eadges and the tasklet only
for levels. BTW, I think this current warning in the "temporary" is
is too early - we don't know if after this the ->retrigger() will
take place.
Regards,
Jarek P.
PS: Marcin, if you need a break in this testing let us know!
I think the main idea of this bug is known enough.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-08-10 11:37 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-10 8:15 [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time Jean-Baptiste Vignaud
2007-08-10 8:37 ` Jarek Poplawski
2007-08-10 8:48 ` Ingo Molnar
2007-08-10 9:03 ` Jarek Poplawski
2007-08-10 9:08 ` Ingo Molnar
2007-08-10 9:19 ` Jarek Poplawski
2007-08-10 9:38 ` Ingo Molnar
-- strict thread matches above, loose matches on Subject: below --
2007-08-10 8:41 Jean-Baptiste Vignaud
2007-07-31 13:20 Jarek Poplawski
2007-08-06 7:00 ` Marcin Ślusarz
2007-08-06 7:03 ` Ingo Molnar
2007-08-07 7:46 ` Marcin Ślusarz
2007-08-07 8:23 ` Jarek Poplawski
[not found] ` <4bacf17f0708070237w19d184b3p7f74b53612edb9a6@mail.gmail.com>
2007-08-07 9:52 ` Jarek Poplawski
2007-08-07 12:13 ` Jarek Poplawski
2007-08-08 11:09 ` Marcin Ślusarz
2007-08-08 11:42 ` Jarek Poplawski
2007-08-09 9:19 ` [patch (testing)] " Jarek Poplawski
[not found] ` <4bacf17f0708092333n17e0ba19jf2c769531610868d@mail.gmail.com>
2007-08-10 7:10 ` Jarek Poplawski
2007-08-10 10:43 ` Marcin Ślusarz
2007-08-10 11:37 ` Jarek Poplawski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).