Re: [PATCH 0/1] NIU: fix spurious interrupts

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Hong H. Pham" <hong.pham@windriver.com>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org, matheos.worku@sun.com
Subject: Re: [PATCH 0/1] NIU: fix spurious interrupts
Date: Wed, 27 May 2009 12:29:38 -0400	[thread overview]
Message-ID: <4A1D6A72.5070801@windriver.com> (raw)
In-Reply-To: <20090525.231622.222754412.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 3567 bytes --]

David Miller wrote:
> I wonder if it's the niu_interrupt path, and all the v0 bits are
> clear.  Yeah, I bet that's it.  We're taking some slowpath interrupt
> for RX or TX counter overflows or errors, and then we try to rearm the
> LDG even though we're already handling normal RX/TX via NAPI.
> 
> But that shouldn't happen, the thing that went into RX/TX NAPI work
> should have turned those interrupt off.  We handle RX normal work and
> error interrupts in the same LDG, and similar for TX, and thus using
> the same interrupt.
> 
> Can you check to see who calls niu_ldg_rearm() when we see it trigger
> the interrupt with NAPI already scheduled?  That will help narrow this
> down even further.  Probably the best thing to do is to get a full
> stack trace using show_stack() or dump_stack().
> 
> This is looking more and more like a driver bug at this point.

I've added a check for v0 being zero in niu_interrupt().  I have not seen
this check succeed, so niu_ldg_rearm() is never invoked prior to or
during a spurious interrupt in niu_interrupt().  So it seems the only
other place where the LDG is rearmed during network activity is in
niu_poll(), which is expected.

Here's something else that doesn't make sense.  I've added a check
in __niu_fastpath_interrupt() and niu_rx_work() to see if the RCR qlen
is zero, i.e. we're scheduling NAPI with no work.  This could happen if
there's a spurious interrupt, but the interrupt is not a run away that
would hang the CPU.  This check happens quite frequently, much more than
the spurious interrupt check.  Also, it happens with both the PCI-E and
XAUI cards (I could not reproduce run away spurious interrupts with a
PCI-E card).

This is a log from a test with 1 TCP stream.  If there's no work to do,
one would expect to see for each __niu_fastpath_interrupt(), there is only
one corresponding niu_rx_work().  i.e. work_done in niu_poll is 0, so NAPI
is done and should not have been rescheduled.  The check for spurious
interrupts in niu_schedule_napi() is not reached.

[  474.387984] niu: __niu_fastpath_interrupt() CPU=58 rx_channel=12 qlen is 0!
[  474.388009] niu: niu_rx_work() CPU=58 rx_channel=12 qlen is zero!
[  474.663008] niu: __niu_fastpath_interrupt() CPU=58 rx_channel=12 qlen is 0!
[  474.663034] niu: niu_rx_work() CPU=58 rx_channel=12 qlen is zero!
[  474.805663] niu: niu_rx_work() CPU=58 rx_channel=12 qlen is zero!
[  475.657501] niu: niu_rx_work() CPU=58 rx_channel=12 qlen is zero!
[  476.139072] niu: niu_rx_work() CPU=58 rx_channel=12 qlen is zero!
[  476.264352] niu: niu_rx_work() CPU=58 rx_channel=12 qlen is zero!
[  476.474596] niu: niu_rx_work() CPU=58 rx_channel=12 qlen is zero!
[  476.698176] niu: __niu_fastpath_interrupt() CPU=58 rx_channel=12 qlen is 0!
[  476.698201] niu: niu_rx_work() CPU=58 rx_channel=12 qlen is zero!

Here's another log excerpt with 1 TCP stream.  In this case NAPI is being
rescheduled multiple times with no work to do.  Again, run away spurious
interrupts did not happen.

[ 1098.870170] niu: niu_rx_work() CPU=50 rx_channel=11 qlen is zero!
[ 1098.879759] niu: __niu_fastpath_interrupt() CPU=50 rx_channel=11 qlen is 0!
[ 1098.904478] niu: __niu_fastpath_interrupt() CPU=50 rx_channel=11 qlen is 0!
[ 1098.908674] niu: __niu_fastpath_interrupt() CPU=50 rx_channel=11 qlen is 0!
[ 1098.908703] niu: niu_rx_work() CPU=50 rx_channel=11 qlen is zero!

In any case, interrupts with qlen being 0 always happen before we see the
run away spurious interrupts (but that could be due to the latter happening
a lot less frequently).

Regards,
Hong





[-- Attachment #2: niu-debug-spurious-interrupts.patch --]
[-- Type: text/plain, Size: 2869 bytes --]

diff --git a/drivers/net/niu.c b/drivers/net/niu.c
index 2b17453..125450a 100644
--- a/drivers/net/niu.c
+++ b/drivers/net/niu.c
@@ -3724,6 +3724,10 @@ static int niu_rx_work(struct napi_struct *napi, struct niu *np,
 	mbox->rx_dma_ctl_stat = 0;
 	mbox->rcrstat_a = 0;
 
+	if (!qlen)
+		printk(KERN_DEBUG PFX "%s() CPU=%i rx_channel=%i qlen is zero!\n",
+		       __FUNCTION__, raw_smp_processor_id(), rp->rx_channel);
+
 	niudbg(RX_STATUS, "%s: niu_rx_work(chan[%d]), stat[%llx] qlen=%d\n",
 	       np->dev->name, rp->rx_channel, (unsigned long long) stat, qlen);
 
@@ -4193,10 +4197,17 @@ static void __niu_fastpath_interrupt(struct niu *np, int ldg, u64 v0)
 	for (i = 0; i < np->num_rx_rings; i++) {
 		struct rx_ring_info *rp = &np->rx_rings[i];
 		int ldn = LDN_RXDMA(rp->rx_channel);
+		u64 qlen;
 
 		if (parent->ldg_map[ldn] != ldg)
 			continue;
 
+		qlen = nr64(RCRSTAT_A(rp->rx_channel)) & RCRSTAT_A_QLEN;
+		if (!qlen) {
+			printk(KERN_DEBUG PFX "%s() CPU=%i rx_channel=%i qlen is 0!\n",
+			       __FUNCTION__, smp_processor_id(), rp->rx_channel);
+		}
+
 		nw64(LD_IM0(ldn), LD_IM0_MASK);
 		if (rx_vec & (1 << rp->rx_channel))
 			niu_rxchan_intr(np, rp, ldn);
@@ -4215,6 +4226,22 @@ static void __niu_fastpath_interrupt(struct niu *np, int ldg, u64 v0)
 	}
 }
 
+static void niu_dump_ldg_irq(struct niu *np, int ldg, u64 v0)
+{
+	static DEFINE_PER_CPU(unsigned long, spurious_count) = { 4 };
+
+	struct niu_parent *parent = np->parent;
+	int qlen, ldn, i;
+
+	if (!__get_cpu_var(spurious_count))
+		return;
+
+	__get_cpu_var(spurious_count)--;
+
+	printk(KERN_DEBUG PFX "%s CPU=%i LDG=%i v0=%p: spurious interrupt\n",
+	       np->dev->name, smp_processor_id(), ldg, (void *)v0);
+}
+
 static void niu_schedule_napi(struct niu *np, struct niu_ldg *lp,
 			      u64 v0, u64 v1, u64 v2)
 {
@@ -4224,7 +4251,8 @@ static void niu_schedule_napi(struct niu *np, struct niu_ldg *lp,
 		lp->v2 = v2;
 		__niu_fastpath_interrupt(np, lp->ldg_num, v0);
 		__napi_schedule(&lp->napi);
-	}
+	} else
+		niu_dump_ldg_irq(np, lp->ldg_num, v0);
 }
 
 static irqreturn_t niu_interrupt(int irq, void *dev_id)
@@ -4251,6 +4279,10 @@ static irqreturn_t niu_interrupt(int irq, void *dev_id)
 		       (unsigned long long) v1,
 		       (unsigned long long) v2);
 
+	if (!v0)
+		printk(KERN_DEBUG PFX "%s() CPU=%i LDG=%i v0 is zero!\n",
+		       __FUNCTION__, smp_processor_id(), ldg);
+
 	if (unlikely(!v0 && !v1 && !v2)) {
 		spin_unlock_irqrestore(&np->lock, flags);
 		return IRQ_NONE;
@@ -4263,8 +4295,11 @@ static irqreturn_t niu_interrupt(int irq, void *dev_id)
 	}
 	if (likely(v0 & ~((u64)1 << LDN_MIF)))
 		niu_schedule_napi(np, lp, v0, v1, v2);
-	else
+	else {
+		printk(KERN_DEBUG PFX "%s() CPU=%i no DMA work, rearming LDG\n",
+		       __FUNCTION__, smp_processor_id());
 		niu_ldg_rearm(np, lp, 1);
+	}
 out:
 	spin_unlock_irqrestore(&np->lock, flags);

     prev parent reply	other threads:[~2009-05-27 16:30 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-11 19:00 [PATCH 0/1] NIU: fix spurious interrupts Hong H. Pham
2009-05-11 19:00 ` [PATCH 1/1] " Hong H. Pham
2009-05-19  5:09 ` [PATCH 0/1] " David Miller
2009-05-19 21:52   ` Hong H. Pham
2009-05-19 22:01     ` David Miller
2009-05-20 15:57       ` Hong H. Pham
2009-05-21  0:37         ` David Miller
2009-05-21 22:18         ` David Miller
2009-05-22  0:40           ` Hong H. Pham
2009-05-22  8:08             ` David Miller
2009-05-22 16:42               ` Hong H. Pham
2009-05-26  6:16                 ` David Miller
2009-05-27 16:29                   ` Hong H. Pham [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:2b17453 dfblob:125450a )
 OR (
bs:"Re: [PATCH 0/1] NIU: fix spurious interrupts" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A1D6A72.5070801@windriver.com \
    --to=hong.pham@windriver.com \
    --cc=davem@davemloft.net \
    --cc=matheos.worku@sun.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.