public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: David Dillow <dave@thedillows.org>
Cc: "Michael Riepe" <michael.riepe@googlemail.com>,
	"Michael Buesch" <mb@bu3sch.de>,
	"Francois Romieu" <romieu@fr.zoreil.com>,
	"Rui Santos" <rsantos@grupopie.com>,
	"Michael Büker" <m.bueker@berlin.de>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
Date: Fri, 21 Aug 2009 13:57:49 -0700	[thread overview]
Message-ID: <m1skfkrik2.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <1243042174.3580.23.camel@obelisk.thedillows.org> (David Dillow's message of "Fri\, 22 May 2009 21\:29\:34 -0400")

David Dillow <dave@thedillows.org> writes:

> The 8169 chip only generates MSI interrupts when all enabled event
> sources are quiescent and one or more sources transition to active. If
> not all of the active events are acknowledged, or a new event becomes
> active while the existing ones are cleared in the handler, we will not
> see a new interrupt.
>
> The current interrupt handler masks off the Rx and Tx events once the
> NAPI handler has been scheduled, which opens a race window in which we
> can get another Rx or Tx event and never ACK'ing it, stopping all
> activity until the link is reset (ifconfig down/up). Fix this by always
> ACK'ing all event sources, and loop in the handler until we have all
> sources quiescent.
>
> Signed-off-by: David Dillow <dave@thedillows.org>
> ---
> This fixes the lockups I've seen. Both MSI and level-triggered interrupt
> configurations survive over an hour of testing when it would lockup in
> under 90 seconds before. I am certain of the analysis of the root cause,
> but there may be better ways to fix it. There may also be a theoretical
> race window between the ending of a NAPI poll cycle and a link change
> interrupt coming in, but I'm not sure it would matter. 
>
> Some variant of this should also be applied to the currently running
> stable trees, as the problem is long-standing.

I have what at first glance looks like a problem caused by this
patch.  For the last month since upgrading one of my machines from
2.6.28 to 2.6.30 it has been becomming inaccessible from the
network and I have a few:

NETDEV WATCHDOG: eth0 (r8169): transmit timed out

in my logs and a lot soft lockups that always have rtl8169_interrupt
as the thing that is running.   I suspect your patch has introduced
a near infinite loop in the interrupt handler and is causing these
soft lockups.

Any ideas?

Eric

BUG: soft lockup - CPU#3 stuck for 61s! [swapper:0]
CPU 3:
Pid: 0, comm: swapper Tainted: G        W  2.6.30-170263.2006.Arora.fc11.x86_64 #1 G33M-S2
RIP: 0010:[<ffffffffa01deacd>]  [<ffffffffa01deacd>] rtl8169_interrupt+0x26f/0x2b7 [r8169]
RSP: 0018:ffff880028070cb0  EFLAGS: 00000206
RAX: 0000000000000050 RBX: ffff880028070d10 RCX: ffff88002807b9e0
RDX: ffffc2000065c03e RSI: ffff88012d79a000 RDI: 0000000000000246
RBP: ffffffff8100c9d3 R08: ffff88012fae0000 R09: ffff880028070ec0
R10: 077321422cb06619 R11: 000000003c5efb73 R12: ffff880028070c30
R13: ffff88012d79a000 R14: ffff88012d79a600 R15: 077321422cb06619
FS:  0000000000000000(0000) GS:ffff88002806d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fc10010c000 CR3: 0000000000201000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>  [<ffffffff81093f0b>] ? handle_IRQ_event+0x6a/0x13f
 [<ffffffff810219fa>] ? apic_write+0x24/0x3a
 [<ffffffff8109607a>] ? handle_edge_irq+0xdb/0x138
 [<ffffffff81012fbd>] ? native_sched_clock+0x2d/0x54
 [<ffffffff8100e996>] ? handle_irq+0x95/0xb7
 [<ffffffff8100df42>] ? do_IRQ+0x6a/0xe9
 [<ffffffff8100c853>] ? ret_from_intr+0x0/0x11
 [<ffffffff8104ba16>] ? __do_softirq+0x5e/0x1b0
 [<ffffffff8100cfcc>] ? call_softirq+0x1c/0x28
 [<ffffffff8100e721>] ? do_softirq+0x51/0xae
 [<ffffffff8104b6d2>] ? irq_exit+0x52/0xa3
 [<ffffffff81020f11>] ? smp_apic_timer_interrupt+0x94/0xb8
 [<ffffffff8100c9d3>] ? apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff81014096>] ? mwait_idle+0x9b/0xcc
 [<ffffffff81014038>] ? mwait_idle+0x3d/0xcc
 [<ffffffff8100ae08>] ? enter_idle+0x33/0x49
 [<ffffffff8100aece>] ? cpu_idle+0xb0/0xf3
 [<ffffffff8136f30c>] ? start_secondary+0x19c/0x1b7



  parent reply	other threads:[~2009-08-21 20:57 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-04 17:28 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Michael Büker
2009-03-04 22:43 ` Francois Romieu
2009-03-06  0:17   ` Michael Büker
2009-03-08 10:27   ` Tom Weber
2009-03-10  5:42     ` Tom Weber
2009-03-09 12:07   ` Rui Santos
2009-03-13 18:29     ` Rui Santos
2009-03-16 13:07     ` Rui Santos
2009-03-22 21:12       ` Francois Romieu
2009-03-22 21:19         ` Michael Buesch
2009-03-22 22:00           ` Francois Romieu
2009-03-22 22:09             ` Michael Buesch
2009-03-22 22:27               ` Francois Romieu
2009-03-22 22:38                 ` Michael Buesch
2009-03-23 11:47         ` Michael Buesch
2009-03-23 12:47           ` Michael Buesch
2009-03-23 23:47             ` Francois Romieu
2009-03-24  9:43               ` Michael Buesch
2009-03-23 14:29         ` Michael Büker
2009-03-23 14:57           ` Rui Santos
2009-03-23 15:04             ` Michael Büker
2009-03-25 11:40         ` Rui Santos
2009-04-04 17:50           ` Michael Buesch
2009-05-10 13:38             ` Michael Riepe
2009-05-10 15:01               ` Michael S. Zick
2009-05-10 15:10                 ` Michael S. Zick
2009-05-10 15:53               ` Michael Buesch
2009-05-10 16:27                 ` Michael Riepe
2009-05-10 17:09                   ` Michael S. Zick
2009-05-11  0:29               ` David Dillow
2009-05-11 20:48                 ` Michael Buesch
2009-05-11 21:10                   ` Michael Buesch
2009-05-11 21:29                     ` David Dillow
2009-05-11 21:59                       ` Michael Buesch
2009-05-12 20:29                       ` Michael Riepe
2009-05-14  2:38                         ` David Dillow
2009-05-14 18:37                           ` Michael Riepe
2009-05-14 19:14                             ` David Dillow
2009-05-14 19:42                               ` Michael Riepe
2009-05-23  1:29                                 ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts David Dillow
2009-05-23  9:24                                   ` Michael Buesch
2009-05-23 14:35                                     ` Michael Riepe
2009-05-23 14:44                                       ` Michael Buesch
2009-05-23 15:01                                         ` Michael Riepe
2009-05-23 16:40                                           ` Michael Buesch
2009-05-23 14:51                                       ` David Dillow
2009-05-23 16:12                                         ` Michael Riepe
2009-05-23 16:45                                           ` Michael Buesch
2009-05-23 16:46                                           ` David Dillow
2009-05-23 16:50                                             ` Michael Buesch
2009-05-23 16:53                                             ` Michael Riepe
2009-05-23 17:03                                               ` David Dillow
2009-05-24 21:15                                   ` Francois Romieu
2009-05-24 22:55                                     ` David Dillow
2009-05-26  5:55                                   ` David Miller
2009-05-26 18:22                                     ` Michael Buesch
2009-05-26 21:52                                       ` David Miller
2009-05-26 22:14                                         ` David Miller
2009-05-26 22:40                                           ` Michael Riepe
2009-05-26 22:43                                             ` David Miller
2009-05-26 23:10                                               ` David Miller
2009-05-27 16:19                                           ` Michael Buesch
2009-06-16 19:32                                           ` Rui Santos
2009-08-21 20:57                                   ` Eric W. Biederman [this message]
2009-08-21 21:22                                     ` Michael Riepe
2009-08-21 22:59                                     ` David Dillow
2009-08-21 23:34                                       ` David Dillow
2009-08-22  0:24                                         ` Eric W. Biederman
2009-08-22 11:48                                         ` Eric W. Biederman
2009-08-22 12:07                                           ` Eric W. Biederman
2009-08-22 20:43                                             ` David Dillow
2009-08-23 17:17                                               ` Jarek Poplawski
2009-08-23 17:43                                                 ` Michal Soltys
2009-08-23 17:54                                                   ` Jarek Poplawski
2009-08-24  2:37                                               ` Eric W. Biederman
2009-08-25  0:51                                               ` Eric W. Biederman
2009-08-25  2:59                                                 ` David Dillow
2009-08-25 20:22                                                   ` Eric W. Biederman
2009-08-25 20:40                                                     ` David Dillow
2009-08-25 21:24                                                       ` Eric W. Biederman
2009-08-25 21:46                                                         ` David Dillow
2009-08-25 22:19                                                         ` Francois Romieu
2009-08-26  3:47                                                           ` Eric W. Biederman
2009-08-26  7:58                                                           ` [PATCH] r8169: Reduce looping in the interrupt handler Eric W. Biederman
2009-08-26 13:56                                                             ` David Dillow
2009-08-26 13:59                                                               ` David Dillow
2009-08-26 20:02                                                                 ` Eric W. Biederman
2009-08-26 21:30                                                                   ` Francois Romieu
2009-08-26 21:40                                                                     ` Eric W. Biederman
2009-08-27  5:24                                                                       ` Francois Romieu
2009-08-27  5:38                                                                         ` Eric W. Biederman
2009-08-27 23:20                                                                           ` Francois Romieu
2009-08-28  1:17                                                                             ` Eric W. Biederman
2009-08-28  1:29                                                                               ` David Dillow
2009-08-30 20:37                                                                                 ` Francois Romieu
2009-08-30 20:53                                                                                   ` Eric W. Biederman
2009-09-01  3:33                                                                                     ` David Dillow
2009-09-01  9:20                                                                                       ` Francois Romieu
2009-08-25 21:37                                                   ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts Eric W. Biederman
2009-08-25 21:54                                                     ` David Dillow
2009-08-25 23:11                                                       ` Francois Romieu
2009-05-12 11:10                   ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Krzysztof Halasa
2009-05-12 21:45                     ` Michael Riepe
2009-05-13  6:11                       ` Francois Romieu
2009-05-13  6:27                         ` Michael Riepe
2009-05-13 19:34                       ` Krzysztof Halasa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1skfkrik2.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=dave@thedillows.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.bueker@berlin.de \
    --cc=mb@bu3sch.de \
    --cc=michael.riepe@googlemail.com \
    --cc=netdev@vger.kernel.org \
    --cc=romieu@fr.zoreil.com \
    --cc=rsantos@grupopie.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox