All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
To: linux-sctp@vger.kernel.org
Subject: Re: SCTP abort with T-bit set after handshake
Date: Mon, 19 Mar 2018 18:38:00 +0000	[thread overview]
Message-ID: <20180319183800.GN9345@localhost.localdomain> (raw)
In-Reply-To: <482208C5-8F01-4698-80EB-74DB994382F9@attocore.com>

On Mon, Mar 19, 2018 at 05:06:05PM +0000, David Neil wrote:
> Marcelo,
> Sorry for the slow reply, have been away and then have been struggling to reproduce the problem.

No problem.

> 
> 
> > 
> > A few lines below it will check if an asoc couldn't be found and will
> > increment SCTP_MIB_OUTOFBLUES. There are more places that inc it, but
> > it's a start.
> > 
> > It should show up in netstat -s or /proc/net/sctp/snmp.
> > 
> 
> Have finally caught another instance of the problem while monitoring
> the SCTP statistics. 
> This is not helped by the fact that the out-of-blue counter goes up
> in total by about 600 while running a complete set of tests (I
> assume this is mainly at the end of each test when conections are
> abruptly terminated).

Ouch. This will make it very hard to debug. Even with Neil's
idea of using systemtap, it will likely get too much noise with it.

> I have therefore been capturing the stats every 100msec and looking
> at the counters at the moment when the problem occurred.
> 
> This shows the out-of-blue counter being incremented at the same
> time as the SCTP connection failure.
> 
...

Ok. This didn't help much, sorry. Just the fact that the counter is
going up, on this situation of several tests going on, won't give us
much. It is a good info, it's just that now we have to remove all the
noise together with it.

> 
> > 
> > Btw, is this test public? Can I run it too?  
> 
> Unfortunately, it is private.
> 
> 
> > Or if you can create a
> > small reproducer, that would be great.
> 
> This would be great if I could figure out what the important elements are in what I am doing.
> The tests are opening and closing and aborting large numbers of connections. 
> Some of the connections are used to exchange a lot of data, others hardly carry anything.
> The connection that fails appears to be fairly random. The timing of when it fails appears to be fairly random.
> The failure only occurs after an average of over an hour of running.
> Any hints at the kind of behaviour that could trigger a failure like this?

I noticed that the association you referenced used the same port at
both hosts. You don't have a port re-use happening in there, do you?

I fear you won't have other choice other than trimming this down to a
more specific test.

We could, for example, trigger a Panic when the test fails, but then
it's probably too late for us to do any analysis in the vmcore. And we
can't trigger the panic on Abort generation because it will catch the
other expected failures.

One other idea is, if it takes ~1hr to reproduce, try reducing the
pool of tests that are executed in that window and see how it goes.

  M.

  parent reply	other threads:[~2018-03-19 18:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-16  9:33 SCTP abort with T-bit set after handshake David Neil
2018-03-16 15:14 ` Marcelo Ricardo Leitner
2018-03-16 15:54 ` David Neil
2018-03-16 17:36 ` Marcelo Ricardo Leitner
2018-03-16 19:05 ` Neil Horman
2018-03-19 17:06 ` David Neil
2018-03-19 18:38 ` Marcelo Ricardo Leitner [this message]
2018-03-19 20:28 ` Marcelo Ricardo Leitner
2018-03-19 20:29 ` Marcelo Ricardo Leitner
2018-03-19 22:05 ` David Neil
2018-03-19 22:24 ` Marcelo Ricardo Leitner
2018-03-21 16:09 ` David Neil
2018-03-21 16:35 ` Marcelo Ricardo Leitner
2018-03-24  7:32 ` David Neil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180319183800.GN9345@localhost.localdomain \
    --to=marcelo.leitner@gmail.com \
    --cc=linux-sctp@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.