TCP hang in timewait processing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nivedita Singhvi <niv@us.ibm.com>
To: David Miller <davem@redhat.com>
Cc: netdev <netdev@oss.sgi.com>, Elizabeth Kon <bkon@us.ibm.com>,
	jgrimm@us.ibm.com, jgarvey@us.ibm.com
Subject: TCP hang in timewait processing
Date: Sat, 27 Mar 2004 15:27:51 -0800	[thread overview]
Message-ID: <40660DF7.9090806@us.ibm.com> (raw)

Dave,

We're investigating a hang in TCP that a clustered node
is running into, and I'd appreciate any help whatsoever
on this...

System is running SLES8 + patches (including latest
fixes in timewait stuff) - but is pretty equivalent
to mainline 2.4 kernel from what I can tell.
Problem is reproducible, takes anywhere from several
hours to a day.

The hang occurs due to the while in tcp_twkill going
into an infinite loop:

while((tw = tcp_tw_death_row[tcp_tw_death_row_slot]) != NULL) {
	tcp_tw_death_row[tcp_tw_death_row_slot] = tw->next_death;
	if (tw->next_death)
		tw->next_death->pprev_death = tw->pprev_death;
	tw->pprev_death = NULL;
	spin_unlock(&tw_death_lock);

	tcp_timewait_kill(tw);
	tcp_tw_put(tw);

	killed++;

	spin_lock(&tw_death_lock);
}

Thanks to some neat detective work by Beth Kon and Joe
Garvey, the culprit seems to be a tw node pointing to
itself. See attached note from Beth at end.

This is possible if a tcp_tw_bucket is freed prematurely
before being taken off the death list. If the node is
at the head of the list, and is freed and then later
reallocated in tcp_time_wait() and reinserted into the
list, (now linked to a new sk) it will end up pointing at
itself. [There might be other ways to end up like this,
but I'm not seeing them]

We come into tcp_tw_schedule() (which puts it into the
death list) with pprev_death cleared by tcp_time_wait().

tcp_tw_schedule() {

	if (tw->pprev_death) {
		...
	} else
		atomic_inc(&tw->refcnt);

	...

	if((tw->next_death = *tpp) != NULL)
		(*tpp)->pprev_death = &tw->next_death;
	*tpp = tw;
	tw->pprev_death = tpp;

If tw is at the head of the list, (*tpp == tw), then
we just created a loop of tw->next_death pointing at tw.
If tw is in other places on the death list, we could
potentially have Y-shaped chains and other garbage...

Does that seem correct, or am I barking up the wrong
tree here?

Just checking at this point for a node pointing to
itself is rather late - the damage has been done in
losing the original linkages from the tcp_tw_bucket
to the other structures which we need to remove as
well, so as to not cause a further mess in the hash
table and death list pointers.

So the question is, is there any path that leads to
us erroneously freeing tcp_tw_bucket without taking it
off the death list?

I've been looking at the tw refcount manipulation
and am trying to identify any  possible gratuitous
tcp_tw_put() calls, but haven't successfully isolated
any one yet.

Any ideas, pointers would be very much appreciated!

thanks,
Nivedita

---
 From Beth Kon:
I see what is going on here... not sure how it got to this state.

Joe Garvey did excellent work gathering kdb info (and
graciously taught me a lot as he went along) and confirming that the
while loop in tcp_twkill is in an infinite loop.

Here is the code in tcp_twkill that is in an infinite loop:

while((tw = tcp_tw_death_row[tcp_tw_death_row_slot]) != NULL) {
                 tcp_tw_death_row[tcp_tw_death_row_slot] = tw->next_death;
                 if (tw->next_death)
                         tw->next_death->pprev_death = tw->pprev_death;
                 tw->pprev_death = NULL;
                 spin_unlock(&tw_death_lock);

                 tcp_timewait_kill(tw);
                 tcp_tw_put(tw);

                 killed++;

                 spin_lock(&tw_death_lock);
         }
Using the data Joe gathered, here is what I see...

[0]kdb> rd
eax = 0x00000001 ebx = 0xc50a7840 ecx = 0xdf615478 edx = 0x00000001
esi = 0x061c3332 edi = 0x00000000 esp = 0xc03e7f10 eip = 0xc02be950
ebp = 0x00000000 xss = 0xc02e0018 xcs = 0x00000010 eflags = 0x00000282
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff &regs = 0xc03e7edc

In the above register dump, the pointer to the tw being handled in the
tcp_twkill loop is in ebx.

The contents of the tw struct (annotated by me) are:

[0]kdb> mds %ebx tw
0xc50a7840 260f3c09   .<.&    daddr
0xc50a7844 6d0f3c09   .<.m    rcv_saddr
0xc50a7848 8200a3e5   å£..    dport, num
0xc50a784c 00000000   ....    bound_dev_if
0xc50a7850 00000000   ....    next
0xc50a7854 00000000   ....    pprev
0xc50a7858 00000000   ....    bindnext
0xc50a785c c26dcbc8   ÈËmÂ    bind_pprev
[0]kdb>
0xc50a7860 00820506   ....    state, substate, sport
0xc50a7864 00000002   ....    family
0xc50a7868 f9e3ccd0   ÐÌãù   refcnt
0xc50a786c 00002a8f   .*..   hashent
0xc50a7870 00001770   p...   timeout
0xc50a7874 d4ad3cee   î<Ô   rcv_next
0xc50a7878 878fe09e   .à..   send_next
0xc50a787c 000016d0   Ð...   rcv_wnd
[0]kdb>
0xc50a7880 00000000   ....    ts_recent
0xc50a7884 00000000   ....    ts_recent_stamp
0xc50a7888 000353c1   ÁS..    ttd
0xc50a788c 00000000   ....    tb
0xc50a7890 c50a7840   @x.Å    next_death
0xc50a7894 00000000   ....    pprev_death
0xc50a7898 00000000   ....
0xc50a789c 00000000   ....

The above shows that next_death in the structure == ebx. Which means this
element of the linked list is pointing to itself. So it in an infinite loop.
Assuming this is the last element on the linked list, next_death should be null.

next             reply	other threads:[~2004-03-27 23:27 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-27 23:27 Nivedita Singhvi [this message]
2004-03-28  9:35 ` TCP hang in timewait processing David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40660DF7.9090806@us.ibm.com \
    --to=niv@us.ibm.com \
    --cc=bkon@us.ibm.com \
    --cc=davem@redhat.com \
    --cc=jgarvey@us.ibm.com \
    --cc=jgrimm@us.ibm.com \
    --cc=netdev@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.