netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Richard Kennedy <richard@rsk.demon.co.uk>
To: netdev@vger.kernel.org
Cc: Francois Romieu <romieu@fr.zoreil.com>
Subject: Re: v3.0-rc* intermittent network failure: Test case found!
Date: Mon, 25 Jul 2011 13:01:52 +0100	[thread overview]
Message-ID: <4E2D5B30.30003@rsk.demon.co.uk> (raw)
In-Reply-To: <1311261527.2980.26.camel@castor.rsk>

On 21/07/11 16:18, Richard Kennedy wrote:
>> Richard Kennedy<richard@rsk.demon.co.uk>  :
>>> I keep seeing a total network failure on v3.0.0-rc* , it is highly
>>> intermittent, anything from 1 hour to 12+, and I don't have a reliable
>>> test case.
>>> When it fails I lose all network comms, but there are no errors in the
>>> system log, no hung tasks reported, nothing. But after it fails the
>>> machine hangs during shutdown, it just never turns off. So I guess
>>> something is getting stuck but I can't find it.
>>

I have found a reliable test case, I can instantly trigger my problem by 
starting 2 instances of rsync at the same time. [this is on x86_64 AMDX2]

e.g.
rsync -a linux-2.6 server:t1 & ;rsync -a linux-2.6 server:t2 &


If I have a ping running when I trigger the problem, it pauses then 
errors with :-

	ping: sendmsg: No buffer space available

But if I start a ping after, it fails with

...	Destination Host Unreachable
.

I have a serial console attached but don't really understand what it's 
telling me.
AFAICT -- I have no blocked tasks  - sysrq w shows :-


SysRq : Show Blocked State
   task                        PC stack   pid father
Sched Debug Version: v0.10, 3.0.0 #46
ktime                                   : 7129717.783042
sched_clk                               : 7126380.221722
cpu_clk                                 : 7129711.544071
jiffies                                 : 4301797008
sched_clock_stable                      : 0
.....[lots more schedule & cpu info]

But now I've got a reliable test case I can find a last know good kernel 
and have a stab at bisecting this, unless anyone has got any better 
suggestions?

regards
Richard




      reply	other threads:[~2011-07-25 12:01 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-21 13:49 v3.0-rc* intermittent network failure: how to debug? Richard Kennedy
2011-07-21 14:32 ` Francois Romieu
2011-07-21 15:18   ` Richard Kennedy
2011-07-25 12:01     ` Richard Kennedy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E2D5B30.30003@rsk.demon.co.uk \
    --to=richard@rsk.demon.co.uk \
    --cc=netdev@vger.kernel.org \
    --cc=romieu@fr.zoreil.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).