netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vitaliy Gusev <vgusev@openvz.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>,
	kuznet@ms2.inr.ac.ru, mcmanus@ducksong.com, xemul@openvz.org,
	netdev@vger.kernel.org, ilpo.jarvinen@helsinki.fi,
	linux-kernel@vger.kernel.org, e1000-devel@lists.sourceforge.net
Subject: Re: [TCP]: TCP_DEFER_ACCEPT causes leak sockets
Date: Tue, 17 Jun 2008 12:43:37 +0400	[thread overview]
Message-ID: <200806171243.40093.vgusev@openvz.org> (raw)
In-Reply-To: <20080617080958.GC12535@elte.hu>

On 17 June 2008 12:09:58 Ingo Molnar wrote:
> * David Miller <davem@davemloft.net> wrote:
> > From: Ingo Molnar <mingo@elte.hu>
> > Date: Tue, 17 Jun 2008 09:26:58 +0200
> >
> > > So since there's no clear bug pattern and no sure reproducability on
> > > my side i'd suggest we track this problem separately and "do
> > > nothing" right now. I've excluded this warning from my 'is the
> > > freshly booted kernel buggy' list of conditions of -tip testing so
> > > it's not holding me up.
> >
> > I'm going to push the revert through just to be safe and I think it's
> > a good idea to do so because all of those defer accept changes should
> > be resubmitted as a group for 2.6.27
>
> okay - in that case the full revert is well-tested on my side as well,
> fwiw.
>
> Tested-by: Ingo Molnar <mingo@elte.hu>

Revert patch takes away problem with leak sockets.
Tested-by: Vitaliy Gusev <vgusev@openvz.org>

>
> > > and i can apply any test-patch if that would be helpful - if it does
> > > a WARN_ON() i'll notice it. (pure extra debug printks with no stack
> > > trace are much harder to notice in automated tests)
> >
> > I don't have time to work on your bug, sorry.  Someone else will have
> > to step forward and help you with it.
>
> it's not really "my bug" - i just offered help to debug someone else's
> bug :-) This is pretty common hw so i guess there will be such reports.
>
> Let me describe what i'm doing exactly: i do a lot of randomized testing
> on about a dozen real systems (all across the x86 spectrum) so i tend to
> trigger a lot of mainline bugs pretty early on.
>
> My collection of kernel bugs for the last 8 months shows 1285 bugs
> (kernel crashes or build failures - about 50%/50%) triggered. One
> test-system alone has a serial log of 15 gigabytes - and there's a dozen
> of them. That's about 5 kernel bugs a day handled by me, on average.
>
> These systems have about 10 times the hardware variability of your
> Niagara system for example, and many of them are rather difficult to
> debug (laptops without serial port, etc.). So i physically cannot avoid
> and debug all bugs on all my test-systems, like you do on the Niagara. I
> will report bugs, i'll bisect anything that is bisectable (on average i
> bisect once a day), and i can add patches and report any test-results,
> and i'll of course debug any bugs that look like heavy mainline
> showstoppers.
>
> > FWIW I don't think your TX timeout problem has anything to do with
> > packet ordering.  The TX element of the network device is totally
> > stateless, but it's hanging under some set of circumstances to the
> > point where we timeout and reset the hardware to get it going again.
>
> ok. That's e1000 then. Cc:s added. Stock T60 laptop, 32-bit:
>
> 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
> Controller Subsystem: Lenovo ThinkPad T60
>         Flags: bus master, fast devsel, latency 0, IRQ 16
>         Memory at ee000000 (32-bit, non-prefetchable) [size=128K]
>         I/O ports at 2000 [size=32]
>         Capabilities: <access denied>
>         Kernel driver in use: e1000
>
> the problem is this non-fatal warning showing up after bootup,
> sporadically, in a non-reproducible way:
>
> [  173.354049] NETDEV WATCHDOG: eth0: transmit timed out
> [  173.354148] ------------[ cut here ]------------
> [  173.354221] WARNING: at net/sched/sch_generic.c:222
> dev_watchdog+0x9a/0xec() [  173.354298] Modules linked in:
> [  173.354421] Pid: 13452, comm: cc1 Tainted: G        W
> 2.6.26-rc6-00273-g81ae43a-dirty #2573 [  173.354516]  [<c01250ca>]
> warn_on_slowpath+0x46/0x76
> [  173.354641]  [<c011d428>] ? try_to_wake_up+0x1d6/0x1e0
> [  173.354815]  [<c01411e9>] ? trace_hardirqs_off+0xb/0xd
> [  173.357370]  [<c011d43d>] ? default_wake_function+0xb/0xd
> [  173.357370]  [<c014112a>] ? trace_hardirqs_off_caller+0x15/0xc9
> [  173.357370]  [<c01411e9>] ? trace_hardirqs_off+0xb/0xd
> [  173.357370]  [<c0142c83>] ? trace_hardirqs_on+0xb/0xd
> [  173.357370]  [<c0142b33>] ? trace_hardirqs_on_caller+0x16/0x15b
> [  173.357370]  [<c0142c83>] ? trace_hardirqs_on+0xb/0xd
> [  173.357370]  [<c06bb3c9>] ? _spin_unlock_irqrestore+0x5b/0x71
> [  173.357370]  [<c0133d46>] ? __queue_work+0x2d/0x32
> [  173.357370]  [<c0134023>] ? queue_work+0x50/0x72
> [  173.357483]  [<c0134059>] ? schedule_work+0x14/0x16
> [  173.357654]  [<c05c59b8>] dev_watchdog+0x9a/0xec
> [  173.357783]  [<c012d456>] run_timer_softirq+0x13d/0x19d
> [  173.357905]  [<c05c591e>] ? dev_watchdog+0x0/0xec
> [  173.358073]  [<c05c591e>] ? dev_watchdog+0x0/0xec
> [  173.360804]  [<c0129ad7>] __do_softirq+0xb2/0x15c
> [  173.360804]  [<c0129a25>] ? __do_softirq+0x0/0x15c
> [  173.360804]  [<c0105526>] do_softirq+0x84/0xe9
> [  173.360804]  [<c0129996>] irq_exit+0x4b/0x88
> [  173.360804]  [<c010ec7a>] smp_apic_timer_interrupt+0x73/0x81
> [  173.360804]  [<c0103ddd>] apic_timer_interrupt+0x2d/0x34
> [  173.360804]  =======================
> [  173.360804] ---[ end trace a7919e7f17c0a725 ]---
>
> full report can be found at:
>
>    http://lkml.org/lkml/2008/6/13/224
>
> i have 3 other test-systems with e1000 (with a similar CPU) which are
> _not_ showing this symptom, so this could be some model-specific e1000
> issue.
>
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thank,
Vitaliy Gusev

      parent reply	other threads:[~2008-06-17  8:43 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-11 12:58 [TCP]: TCP_DEFER_ACCEPT causes leak sockets Vitaliy Gusev
2008-06-11 13:57 ` Alexey Kuznetsov
2008-06-11 23:52   ` David Miller
2008-06-12 23:32     ` David Miller
2008-06-13  6:30       ` Ingo Molnar
2008-06-13  9:32         ` David Miller
2008-06-13 11:09           ` Ingo Molnar
2008-06-13 11:47             ` Ingo Molnar
2008-06-13 21:10               ` Ingo Molnar
2008-06-16 23:59               ` David Miller
2008-06-17  7:26                 ` Ingo Molnar
2008-06-17  7:38                   ` David Miller
2008-06-17  8:09                     ` Ingo Molnar
2008-06-17  8:32                       ` Ingo Molnar
2008-06-17  9:08                         ` David Miller
2008-06-17  9:27                           ` Ingo Molnar
2008-06-17  9:29                             ` David Miller
2008-06-17  9:39                               ` Ingo Molnar
2008-06-18 18:50                                 ` [E1000-devel] " Kok, Auke
2008-06-18 20:08                                   ` Ingo Molnar
2008-06-18 21:25                                     ` [E1000-devel] " Kok, Auke
2008-06-18 22:12                                       ` David Miller
2008-06-19  7:06                                         ` Jarek Poplawski
2008-06-18 21:32                                     ` Ingo Molnar
2008-06-18 21:41                                       ` Denys Fedoryshchenko
2008-06-18 22:05                                         ` Ingo Molnar
2008-06-18 22:44                                           ` Denys Fedoryshchenko
2008-06-18 23:14                                   ` Ingo Molnar
2008-06-17  8:43                       ` Vitaliy Gusev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200806171243.40093.vgusev@openvz.org \
    --to=vgusev@openvz.org \
    --cc=davem@davemloft.net \
    --cc=e1000-devel@lists.sourceforge.net \
    --cc=ilpo.jarvinen@helsinki.fi \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcmanus@ducksong.com \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).