From: ebiederm@xmission.com (Eric W. Biederman)
To: David Dillow <dave@thedillows.org>
Cc: "Michael Riepe" <michael.riepe@googlemail.com>,
"Michael Buesch" <mb@bu3sch.de>,
"Francois Romieu" <romieu@fr.zoreil.com>,
"Rui Santos" <rsantos@grupopie.com>,
"Michael Büker" <m.bueker@berlin.de>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
Date: Fri, 21 Aug 2009 13:57:49 -0700 [thread overview]
Message-ID: <m1skfkrik2.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <1243042174.3580.23.camel@obelisk.thedillows.org> (David Dillow's message of "Fri\, 22 May 2009 21\:29\:34 -0400")
David Dillow <dave@thedillows.org> writes:
> The 8169 chip only generates MSI interrupts when all enabled event
> sources are quiescent and one or more sources transition to active. If
> not all of the active events are acknowledged, or a new event becomes
> active while the existing ones are cleared in the handler, we will not
> see a new interrupt.
>
> The current interrupt handler masks off the Rx and Tx events once the
> NAPI handler has been scheduled, which opens a race window in which we
> can get another Rx or Tx event and never ACK'ing it, stopping all
> activity until the link is reset (ifconfig down/up). Fix this by always
> ACK'ing all event sources, and loop in the handler until we have all
> sources quiescent.
>
> Signed-off-by: David Dillow <dave@thedillows.org>
> ---
> This fixes the lockups I've seen. Both MSI and level-triggered interrupt
> configurations survive over an hour of testing when it would lockup in
> under 90 seconds before. I am certain of the analysis of the root cause,
> but there may be better ways to fix it. There may also be a theoretical
> race window between the ending of a NAPI poll cycle and a link change
> interrupt coming in, but I'm not sure it would matter.
>
> Some variant of this should also be applied to the currently running
> stable trees, as the problem is long-standing.
I have what at first glance looks like a problem caused by this
patch. For the last month since upgrading one of my machines from
2.6.28 to 2.6.30 it has been becomming inaccessible from the
network and I have a few:
NETDEV WATCHDOG: eth0 (r8169): transmit timed out
in my logs and a lot soft lockups that always have rtl8169_interrupt
as the thing that is running. I suspect your patch has introduced
a near infinite loop in the interrupt handler and is causing these
soft lockups.
Any ideas?
Eric
BUG: soft lockup - CPU#3 stuck for 61s! [swapper:0]
CPU 3:
Pid: 0, comm: swapper Tainted: G W 2.6.30-170263.2006.Arora.fc11.x86_64 #1 G33M-S2
RIP: 0010:[<ffffffffa01deacd>] [<ffffffffa01deacd>] rtl8169_interrupt+0x26f/0x2b7 [r8169]
RSP: 0018:ffff880028070cb0 EFLAGS: 00000206
RAX: 0000000000000050 RBX: ffff880028070d10 RCX: ffff88002807b9e0
RDX: ffffc2000065c03e RSI: ffff88012d79a000 RDI: 0000000000000246
RBP: ffffffff8100c9d3 R08: ffff88012fae0000 R09: ffff880028070ec0
R10: 077321422cb06619 R11: 000000003c5efb73 R12: ffff880028070c30
R13: ffff88012d79a000 R14: ffff88012d79a600 R15: 077321422cb06619
FS: 0000000000000000(0000) GS:ffff88002806d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fc10010c000 CR3: 0000000000201000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<IRQ> [<ffffffff81093f0b>] ? handle_IRQ_event+0x6a/0x13f
[<ffffffff810219fa>] ? apic_write+0x24/0x3a
[<ffffffff8109607a>] ? handle_edge_irq+0xdb/0x138
[<ffffffff81012fbd>] ? native_sched_clock+0x2d/0x54
[<ffffffff8100e996>] ? handle_irq+0x95/0xb7
[<ffffffff8100df42>] ? do_IRQ+0x6a/0xe9
[<ffffffff8100c853>] ? ret_from_intr+0x0/0x11
[<ffffffff8104ba16>] ? __do_softirq+0x5e/0x1b0
[<ffffffff8100cfcc>] ? call_softirq+0x1c/0x28
[<ffffffff8100e721>] ? do_softirq+0x51/0xae
[<ffffffff8104b6d2>] ? irq_exit+0x52/0xa3
[<ffffffff81020f11>] ? smp_apic_timer_interrupt+0x94/0xb8
[<ffffffff8100c9d3>] ? apic_timer_interrupt+0x13/0x20
<EOI> [<ffffffff81014096>] ? mwait_idle+0x9b/0xcc
[<ffffffff81014038>] ? mwait_idle+0x3d/0xcc
[<ffffffff8100ae08>] ? enter_idle+0x33/0x49
[<ffffffff8100aece>] ? cpu_idle+0xb0/0xf3
[<ffffffff8136f30c>] ? start_secondary+0x19c/0x1b7
next prev parent reply other threads:[~2009-08-21 20:57 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-04 17:28 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Michael Büker
2009-03-04 22:43 ` Francois Romieu
2009-03-06 0:17 ` Michael Büker
2009-03-08 10:27 ` Tom Weber
2009-03-10 5:42 ` Tom Weber
2009-03-09 12:07 ` Rui Santos
2009-03-13 18:29 ` Rui Santos
2009-03-16 13:07 ` Rui Santos
2009-03-22 21:12 ` Francois Romieu
2009-03-22 21:19 ` Michael Buesch
2009-03-22 22:00 ` Francois Romieu
2009-03-22 22:09 ` Michael Buesch
2009-03-22 22:27 ` Francois Romieu
2009-03-22 22:38 ` Michael Buesch
2009-03-23 11:47 ` Michael Buesch
2009-03-23 12:47 ` Michael Buesch
2009-03-23 23:47 ` Francois Romieu
2009-03-24 9:43 ` Michael Buesch
2009-03-23 14:29 ` Michael Büker
2009-03-23 14:57 ` Rui Santos
2009-03-23 15:04 ` Michael Büker
2009-03-25 11:40 ` Rui Santos
2009-04-04 17:50 ` Michael Buesch
2009-05-10 13:38 ` Michael Riepe
2009-05-10 15:01 ` Michael S. Zick
2009-05-10 15:10 ` Michael S. Zick
2009-05-10 15:53 ` Michael Buesch
2009-05-10 16:27 ` Michael Riepe
2009-05-10 17:09 ` Michael S. Zick
2009-05-11 0:29 ` David Dillow
2009-05-11 20:48 ` Michael Buesch
2009-05-11 21:10 ` Michael Buesch
2009-05-11 21:29 ` David Dillow
2009-05-11 21:59 ` Michael Buesch
2009-05-12 20:29 ` Michael Riepe
2009-05-14 2:38 ` David Dillow
2009-05-14 18:37 ` Michael Riepe
2009-05-14 19:14 ` David Dillow
2009-05-14 19:42 ` Michael Riepe
2009-05-23 1:29 ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts David Dillow
2009-05-23 9:24 ` Michael Buesch
2009-05-23 14:35 ` Michael Riepe
2009-05-23 14:44 ` Michael Buesch
2009-05-23 15:01 ` Michael Riepe
2009-05-23 16:40 ` Michael Buesch
2009-05-23 14:51 ` David Dillow
2009-05-23 16:12 ` Michael Riepe
2009-05-23 16:45 ` Michael Buesch
2009-05-23 16:46 ` David Dillow
2009-05-23 16:50 ` Michael Buesch
2009-05-23 16:53 ` Michael Riepe
2009-05-23 17:03 ` David Dillow
2009-05-24 21:15 ` Francois Romieu
2009-05-24 22:55 ` David Dillow
2009-05-26 5:55 ` David Miller
2009-05-26 18:22 ` Michael Buesch
2009-05-26 21:52 ` David Miller
2009-05-26 22:14 ` David Miller
2009-05-26 22:40 ` Michael Riepe
2009-05-26 22:43 ` David Miller
2009-05-26 23:10 ` David Miller
2009-05-27 16:19 ` Michael Buesch
2009-06-16 19:32 ` Rui Santos
2009-08-21 20:57 ` Eric W. Biederman [this message]
2009-08-21 21:22 ` Michael Riepe
2009-08-21 22:59 ` David Dillow
2009-08-21 23:34 ` David Dillow
2009-08-22 0:24 ` Eric W. Biederman
2009-08-22 11:48 ` Eric W. Biederman
2009-08-22 12:07 ` Eric W. Biederman
2009-08-22 20:43 ` David Dillow
2009-08-23 17:17 ` Jarek Poplawski
2009-08-23 17:43 ` Michal Soltys
2009-08-23 17:54 ` Jarek Poplawski
2009-08-24 2:37 ` Eric W. Biederman
2009-08-25 0:51 ` Eric W. Biederman
2009-08-25 2:59 ` David Dillow
2009-08-25 20:22 ` Eric W. Biederman
2009-08-25 20:40 ` David Dillow
2009-08-25 21:24 ` Eric W. Biederman
2009-08-25 21:46 ` David Dillow
2009-08-25 22:19 ` Francois Romieu
2009-08-26 3:47 ` Eric W. Biederman
2009-08-26 7:58 ` [PATCH] r8169: Reduce looping in the interrupt handler Eric W. Biederman
2009-08-26 13:56 ` David Dillow
2009-08-26 13:59 ` David Dillow
2009-08-26 20:02 ` Eric W. Biederman
2009-08-26 21:30 ` Francois Romieu
2009-08-26 21:40 ` Eric W. Biederman
2009-08-27 5:24 ` Francois Romieu
2009-08-27 5:38 ` Eric W. Biederman
2009-08-27 23:20 ` Francois Romieu
2009-08-28 1:17 ` Eric W. Biederman
2009-08-28 1:29 ` David Dillow
2009-08-30 20:37 ` Francois Romieu
2009-08-30 20:53 ` Eric W. Biederman
2009-09-01 3:33 ` David Dillow
2009-09-01 9:20 ` Francois Romieu
2009-08-25 21:37 ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts Eric W. Biederman
2009-08-25 21:54 ` David Dillow
2009-08-25 23:11 ` Francois Romieu
2009-05-12 11:10 ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Krzysztof Halasa
2009-05-12 21:45 ` Michael Riepe
2009-05-13 6:11 ` Francois Romieu
2009-05-13 6:27 ` Michael Riepe
2009-05-13 19:34 ` Krzysztof Halasa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1skfkrik2.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=dave@thedillows.org \
--cc=linux-kernel@vger.kernel.org \
--cc=m.bueker@berlin.de \
--cc=mb@bu3sch.de \
--cc=michael.riepe@googlemail.com \
--cc=netdev@vger.kernel.org \
--cc=romieu@fr.zoreil.com \
--cc=rsantos@grupopie.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox