netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: David Miller <davem@davemloft.net>
Cc: vgusev@openvz.org, e1000-devel@lists.sourceforge.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	mcmanus@ducksong.com, ilpo.jarvinen@helsinki.fi,
	kuznet@ms2.inr.ac.ru, xemul@openvz.org
Subject: Re: [TCP]: TCP_DEFER_ACCEPT causes leak sockets
Date: Tue, 17 Jun 2008 10:09:58 +0200	[thread overview]
Message-ID: <20080617080958.GC12535@elte.hu> (raw)
In-Reply-To: <20080617.003832.130616157.davem@davemloft.net>


* David Miller <davem@davemloft.net> wrote:

> From: Ingo Molnar <mingo@elte.hu>
> Date: Tue, 17 Jun 2008 09:26:58 +0200
> 
> > So since there's no clear bug pattern and no sure reproducability on 
> > my side i'd suggest we track this problem separately and "do 
> > nothing" right now. I've excluded this warning from my 'is the 
> > freshly booted kernel buggy' list of conditions of -tip testing so 
> > it's not holding me up.
> 
> I'm going to push the revert through just to be safe and I think it's 
> a good idea to do so because all of those defer accept changes should 
> be resubmitted as a group for 2.6.27

okay - in that case the full revert is well-tested on my side as well, 
fwiw.

Tested-by: Ingo Molnar <mingo@elte.hu>

> > and i can apply any test-patch if that would be helpful - if it does 
> > a WARN_ON() i'll notice it. (pure extra debug printks with no stack 
> > trace are much harder to notice in automated tests)
> 
> I don't have time to work on your bug, sorry.  Someone else will have 
> to step forward and help you with it.

it's not really "my bug" - i just offered help to debug someone else's 
bug :-) This is pretty common hw so i guess there will be such reports.

Let me describe what i'm doing exactly: i do a lot of randomized testing 
on about a dozen real systems (all across the x86 spectrum) so i tend to 
trigger a lot of mainline bugs pretty early on.

My collection of kernel bugs for the last 8 months shows 1285 bugs 
(kernel crashes or build failures - about 50%/50%) triggered. One 
test-system alone has a serial log of 15 gigabytes - and there's a dozen 
of them. That's about 5 kernel bugs a day handled by me, on average.

These systems have about 10 times the hardware variability of your 
Niagara system for example, and many of them are rather difficult to 
debug (laptops without serial port, etc.). So i physically cannot avoid 
and debug all bugs on all my test-systems, like you do on the Niagara. I 
will report bugs, i'll bisect anything that is bisectable (on average i 
bisect once a day), and i can add patches and report any test-results, 
and i'll of course debug any bugs that look like heavy mainline 
showstoppers.

> FWIW I don't think your TX timeout problem has anything to do with 
> packet ordering.  The TX element of the network device is totally 
> stateless, but it's hanging under some set of circumstances to the 
> point where we timeout and reset the hardware to get it going again.

ok. That's e1000 then. Cc:s added. Stock T60 laptop, 32-bit:

02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
        Subsystem: Lenovo ThinkPad T60
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at ee000000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at 2000 [size=32]
        Capabilities: <access denied>
        Kernel driver in use: e1000

the problem is this non-fatal warning showing up after bootup, 
sporadically, in a non-reproducible way:

[  173.354049] NETDEV WATCHDOG: eth0: transmit timed out
[  173.354148] ------------[ cut here ]------------
[  173.354221] WARNING: at net/sched/sch_generic.c:222 dev_watchdog+0x9a/0xec()
[  173.354298] Modules linked in:
[  173.354421] Pid: 13452, comm: cc1 Tainted: G        W 2.6.26-rc6-00273-g81ae43a-dirty #2573
[  173.354516]  [<c01250ca>] warn_on_slowpath+0x46/0x76
[  173.354641]  [<c011d428>] ? try_to_wake_up+0x1d6/0x1e0
[  173.354815]  [<c01411e9>] ? trace_hardirqs_off+0xb/0xd
[  173.357370]  [<c011d43d>] ? default_wake_function+0xb/0xd
[  173.357370]  [<c014112a>] ? trace_hardirqs_off_caller+0x15/0xc9
[  173.357370]  [<c01411e9>] ? trace_hardirqs_off+0xb/0xd
[  173.357370]  [<c0142c83>] ? trace_hardirqs_on+0xb/0xd
[  173.357370]  [<c0142b33>] ? trace_hardirqs_on_caller+0x16/0x15b
[  173.357370]  [<c0142c83>] ? trace_hardirqs_on+0xb/0xd
[  173.357370]  [<c06bb3c9>] ? _spin_unlock_irqrestore+0x5b/0x71
[  173.357370]  [<c0133d46>] ? __queue_work+0x2d/0x32
[  173.357370]  [<c0134023>] ? queue_work+0x50/0x72
[  173.357483]  [<c0134059>] ? schedule_work+0x14/0x16
[  173.357654]  [<c05c59b8>] dev_watchdog+0x9a/0xec
[  173.357783]  [<c012d456>] run_timer_softirq+0x13d/0x19d
[  173.357905]  [<c05c591e>] ? dev_watchdog+0x0/0xec
[  173.358073]  [<c05c591e>] ? dev_watchdog+0x0/0xec
[  173.360804]  [<c0129ad7>] __do_softirq+0xb2/0x15c
[  173.360804]  [<c0129a25>] ? __do_softirq+0x0/0x15c
[  173.360804]  [<c0105526>] do_softirq+0x84/0xe9
[  173.360804]  [<c0129996>] irq_exit+0x4b/0x88
[  173.360804]  [<c010ec7a>] smp_apic_timer_interrupt+0x73/0x81
[  173.360804]  [<c0103ddd>] apic_timer_interrupt+0x2d/0x34
[  173.360804]  =======================
[  173.360804] ---[ end trace a7919e7f17c0a725 ]---

full report can be found at:

   http://lkml.org/lkml/2008/6/13/224

i have 3 other test-systems with e1000 (with a similar CPU) which are 
_not_ showing this symptom, so this could be some model-specific e1000 
issue.

	Ingo

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

  reply	other threads:[~2008-06-17  8:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-11 12:58 [TCP]: TCP_DEFER_ACCEPT causes leak sockets Vitaliy Gusev
2008-06-11 13:57 ` Alexey Kuznetsov
2008-06-11 23:52   ` David Miller
2008-06-12 23:32     ` David Miller
2008-06-13  6:30       ` Ingo Molnar
2008-06-13  9:32         ` David Miller
2008-06-13 11:09           ` Ingo Molnar
2008-06-13 11:47             ` Ingo Molnar
2008-06-13 21:10               ` Ingo Molnar
2008-06-16 23:59               ` David Miller
2008-06-17  7:26                 ` Ingo Molnar
2008-06-17  7:38                   ` David Miller
2008-06-17  8:09                     ` Ingo Molnar [this message]
2008-06-17  8:32                       ` Ingo Molnar
2008-06-17  9:08                         ` David Miller
2008-06-17  9:27                           ` Ingo Molnar
2008-06-17  9:29                             ` David Miller
2008-06-17  9:39                               ` Ingo Molnar
2008-06-18 18:50                                 ` [E1000-devel] " Kok, Auke
2008-06-18 20:08                                   ` Ingo Molnar
2008-06-18 21:25                                     ` [E1000-devel] " Kok, Auke
2008-06-18 22:12                                       ` David Miller
2008-06-19  7:06                                         ` Jarek Poplawski
2008-06-18 21:32                                     ` Ingo Molnar
2008-06-18 21:41                                       ` Denys Fedoryshchenko
2008-06-18 22:05                                         ` Ingo Molnar
2008-06-18 22:44                                           ` Denys Fedoryshchenko
2008-06-18 23:14                                   ` Ingo Molnar
2008-06-17  8:43                       ` Vitaliy Gusev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080617080958.GC12535@elte.hu \
    --to=mingo@elte.hu \
    --cc=davem@davemloft.net \
    --cc=e1000-devel@lists.sourceforge.net \
    --cc=ilpo.jarvinen@helsinki.fi \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcmanus@ducksong.com \
    --cc=netdev@vger.kernel.org \
    --cc=vgusev@openvz.org \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).