David Miller wrote: > I wonder if the spurious interrupts trigger exactly at the > > nw64(LD_IM0(LDN_RXDMA(rp->rx_channel)), 0); > > in niu_poll_core(). > > Can you run one more test? Supplement the debugging output > with: > > "%pS", get_irq_regs()->tpc > > so we can see where the program counter is at the time of > the spurious interrupt? The tpc at the time of the spurious interrupt is niu_poll+0x99c. Looking this address up, it's at this line in niu_ldg_rearm(): nw64(LDG_IMGMT(lp->ldg_num), val); Since the timer is also reprogrammed when the LDG is rearmed, interrupts should not have been generated immediately after writing to LDG_IMGMT. The tpc also showed interrupts happening in net_rx_action. In this case the LDG has been rearmed, but the timer prevented interrupt delivery until after niu_poll is done. > Meanwhile, even if we go with your patch to fix this, we can't > use it as-is. Let me explain. > > Suppose that we get this spurious interrupt right after we unmask the > interrupt and right before napi_complete(). Your change will make us > re-mask the interrupts, but without scheduling NAPI. > > So once the napi_complete() happens, if no further interrupts trigger > in that LDG, we'll never process those interrupt events cleared by > your new code. See what I mean? Understood. > I don't know how to fix this, it's full of races. I suppose we could > recheck if events are pending in the LDG after we do the > napi_complete() and reschedule NAPI again if so. But that might be > expensive (several register reads, just to check something that's not > going to happen most of the time). > I'm also wondering why we see this on Niagara-2 and not on PCI-E > cards. If the interrupts that go into the NCU unit of Niagara-2 are > levelled interrupts, and somehow the ARM bit is not implemented > correctly in the NIU logic when hooked up to NCU instead of PCI-E > logic, that could explain things. > > I bet that our Linux driver is the only one that bangs on the LDG > mask registers like this. I tried the test on a T5440, which has a PCI-E NIU (4 x 1GB) card. I could not reproduce the spurious interrupts. So this bug seems to be limited to XAUI NIU cards. Which also makes it a Niagara-2 specific problem. Regards, Hong [ 2226.589782] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt [ 2226.589800] tpc = [ 2226.589814] LD_IM0 = 0x0000000000000003 [ldf_mask=0x03] [ 2226.589826] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2226.589855] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt [ 2226.589867] tpc = [ 2226.589878] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2226.589890] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2226.589915] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt [ 2226.589927] tpc = [ 2226.589938] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2226.589950] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2226.589974] NIU: eth4 CPU=5 LDG=41 rx_vec=0x2000: spurious interrupt [ 2226.589986] tpc = [ 2226.589996] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2226.590008] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2229.380931] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt [ 2229.380949] tpc = [ 2229.380962] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2229.380974] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2229.381003] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt [ 2229.381015] tpc = [ 2229.381026] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2229.381038] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2229.381063] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt [ 2229.381075] tpc = [ 2229.381086] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2229.381097] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2229.381122] NIU: eth4 CPU=58 LDG=40 rx_vec=0x1000: spurious interrupt [ 2229.381134] tpc = [ 2229.381145] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2229.381156] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2236.743967] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt [ 2236.743983] tpc = [ 2236.743996] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2236.744008] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2236.744034] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt [ 2236.744046] tpc = [ 2236.744058] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2236.744070] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2236.744095] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt [ 2236.744107] tpc = [ 2236.744118] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2236.744130] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00] [ 2236.744155] NIU: eth4 CPU=21 LDG=43 rx_vec=0x8000: spurious interrupt [ 2236.744167] tpc = [ 2236.744178] LD_IM0 = 0x0000000000000000 [ldf_mask=0x00] [ 2236.744190] LDG_IMGMT= 0x0000000000000000 [arm=0x00 timer=0x00]