netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Håkon Løvdal" <hlovdal@gmail.com>
To: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Netdev <netdev@vger.kernel.org>, "Ingo Molnar" <mingo@elte.hu>,
	"David S. Miller" <davem@davemloft.net>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
Date: Wed, 4 Jun 2008 02:10:13 +0200	[thread overview]
Message-ID: <a01a16b50806031710u54b9297bkf1fad9c4f8fd581f@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0805312332270.2760@wrl-59.cs.helsinki.fi>

2008/5/31 Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>:
> I didn't really mean tcpdump, I was more thinking of syscall what is the
> syscall where the process is waiting. Though tcpdump might reveal
> something as well about the behavior when nearing the problem,

> Hmm, perhaps periodically checking /proc/net/tcp (e.g., once per 10s) if
> the timeout is larger than TCP_RTO_MAX might allow some script to
> immediately notice when things broke while reproducing it. Storing all
> those once per 10s values shouldn't be a too big either, it could even be
> done in both ends for a single flow (but I'll leave a script to do that on
> Monday).

Hi. I have now tested and taken various logs before/under/after (ps, netstat,
tcpdump, strace, /proc/net/tcp). The logs did not grow as big as I feared
however they are way too big attach here. Any tips on how to distribute?

I mentioned before that the ssh connection died on the new PC, but sometimes
it just hangs as well, i.e. the ssh connection is alive but there is
no data transferred.
The last test I did hang after just 292MB transferred. In this case pv
just hangs
on the new PC, waiting for data that never arrives. On the sending
side tar which
is producing data have died and the corresponding ssh is locked in some
uninterruptible sleep:

//netstat before the hang:
Proto Recv-Q Send-Q Local Address               Foreign Address
     State       PID/Program name    Timer
tcp        0    504 new_pc:22                   old_pc:39464
     ESTABLISHED 17054/sshd: hlovdal on (0.20/0/0)

// after the hang:
Proto Recv-Q Send-Q Local Address               Foreign Address
     State       PID/Program name    Timer
tcp        0   2896 old_pc:22                   new_pc:39464
     ESTABLISHED -                   on (21329441.19/0/0)

old_pc>.ps -elf | sed -n '1p; /17054.*ssh/p
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
4 D root     17054  4402  0  80   0 -     0 -      00:08 ?
00:00:00 [sshd]
5 Z hlovdal  17058 17054  0  80   0 -     0 -      00:08 ?
00:00:33 [sshd] <defunct>
old_pc>strace -p 17054
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
old_pc>

Many things  in /proc/17054/ are in a non-normal state (exe is an
empty symlink, fd is empty, etc)
but the stat* files seem normal:

/proc/17054/stat
17054 (sshd) D 4402 17054 17054 0 -1 4202756 1007 395 11 2 1 6 0 1 20
0 1 0 80373564 0 0 4294967295 0 0 0 0 0 0 0 4096 16387 0 0 0 17 0 0 0
37 0 0

/proc/17054/statm
0 0 0 0 0 0 0

/proc/17054/status
Name:   sshd
State:  D (disk sleep)
Tgid:   17054
Pid:    17054
PPid:   4402
TracerPid:      0
Uid:    0       0       500     0
Gid:    0       0       500     0
FDSize: 0
Groups:
Threads:        1
SigQ:   0/32639
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 0000000180004003
CapInh: 0000000000000000
CapPrm: 00000000ffffffff
CapEff: 00000000ffffffff
voluntary_ctxt_switches:        86
nonvoluntary_ctxt_switches:     43

I also took a sysrq-t dump and here is the the output related to 17054:

sshd          D 000110ab  1972 17054   4402
       f0628aa0 00200046 a3fbe728 000110ab f3788640 a4088c34 000110ab effb1a80
       effb1ab0 effb1ac4 e769a380 c05be260 00000001 f0628aa0 c042f321 effb1ad4
       effb1ad4 effb1a80 e769a3b4 00000000 e769a380 c05bb9aa 00000000 c9e82c00
Call Trace:
 [<c05be260>] lock_sock_nested+0x6c/0x9d
 [<c042f321>] autoremove_wake_function+0x0/0x35
 [<c05bb9aa>] sock_fasync+0x5e/0x113
 [<c05bcc68>] sock_close+0x22/0x2f
 [<c0468b07>] __fput+0xaf/0x180
 [<c046623a>] filp_close+0x51/0x58
 [<c0421318>] put_files_struct+0x5f/0xa7
 [<c0422443>] do_exit+0x208/0x6b5
 [<c045990b>] do_munmap+0x193/0x1ac
 [<c042295b>] sys_exit_group+0x0/0xd
 [<c0404daa>] sysenter_past_esp+0x5f/0x85
 =======================
sshd          ? c04dc1a6  1296 17058  17054
       d240d740 00200046 c0626ea3 c04dc1a6 c7886380 d240d740 00000000 00000010
       d240d740 f0628aa0 f783f440 c04228ec c06d0b50 d240d9fd 000042a2 00000002
       d240d8c4 00000000 f394eb60 f394eb60 f394ebe8 f394ec20 c06c0068 00200006
Call Trace:
 [<c0626ea3>] _spin_unlock+0xf/0x23
 [<c04dc1a6>] _atomic_dec_and_lock+0x22/0x2c
 [<c04228ec>] do_exit+0x6b1/0x6b5
 [<c04060d0>] die+0x1d8/0x1e0
 [<c062838e>] do_page_fault+0x0/0x615
 [<c06288b9>] do_page_fault+0x52b/0x615
 [<c062838e>] do_page_fault+0x0/0x615
 [<c062710a>] error_code+0x6a/0x70
 [<f8c77b39>] nf_conntrack_find_get+0x36/0x3a [nf_conntrack]
 [<f8c78379>] nf_conntrack_in+0x135/0x3e8 [nf_conntrack]
 [<c0423c5d>] local_bh_disable+0xa/0xb
 [<c0626d54>] _read_lock_bh+0x12/0x1a
 [<c0423d2c>] local_bh_enable_ip+0x37/0x4b
 [<f8a4a3a3>] ipt_do_table+0x2af/0x2d8 [ip_tables]
 [<f8c88161>] ipv4_conntrack_local+0x0/0x51 [nf_conntrack_ipv4]
 [<c05dbde6>] nf_iterate+0x3a/0x6e
 [<c05e33b0>] dst_output+0x0/0x7
 [<c05dc12f>] nf_hook_slow+0x57/0xde
 [<c05e33b0>] dst_output+0x0/0x7
 [<c05e5678>] ip_queue_xmit+0x2ff/0x340
 [<c05e33b0>] dst_output+0x0/0x7
 [<c05f34d7>] tcp_transmit_skb+0x6cc/0x6ff
 [<c05f34d7>] tcp_transmit_skb+0x6cc/0x6ff
 [<c04057bb>] common_interrupt+0x23/0x28
 [<c05f2acc>] tcp_snd_test+0x13/0xe1
 [<c05f554c>] tcp_push_one+0xb3/0xd4
 [<c05eb11e>] tcp_sendmsg+0x7f1/0xa3b
 [<c05ebae4>] tcp_recvmsg+0x60b/0x717
 [<c04bcb75>] avc_has_perm_noaudit+0x389/0x431
 [<c04c57e4>] selinux_netlbl_inode_permission+0x40/0xa3
 [<c05bbd73>] sock_aio_write+0xe3/0xef
 [<c04678e1>] do_sync_write+0xc6/0x109
 [<c0626f11>] _spin_unlock_irqrestore+0x14/0x2a
 [<c042f321>] autoremove_wake_function+0x0/0x35
 [<c0468135>] vfs_write+0xb5/0x14b
 [<c046869f>] sys_write+0x41/0x67
 [<c0404daa>] sysenter_past_esp+0x5f/0x85
 =======================

All the other ssh zombies are also stuck in lock_sock_nested/_spin_unlock
(except one pair of lock_sock_nested/do_exit).

Any other tests I should do/logs you want?

BR Håkon Løvdal

  parent reply	other threads:[~2008-06-04  0:10 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-26 11:56 [bug] stuck localhost TCP connections, v2.6.26-rc3+ Ingo Molnar
2008-05-26 13:28 ` Ilpo Järvinen
2008-05-26 13:59   ` Ingo Molnar
2008-05-26 14:12     ` Ingo Molnar
2008-05-26 14:17       ` Ingo Molnar
2008-05-26 14:29         ` Ingo Molnar
2008-05-26 14:43         ` Ilpo Järvinen
2008-05-26 14:58       ` Ilpo Järvinen
2008-05-26 16:23         ` Ingo Molnar
2008-05-26 16:32           ` Ilpo Järvinen
2008-05-26 16:54             ` Ingo Molnar
2008-05-26 17:08               ` Ilpo Järvinen
2008-05-26 18:12                 ` Ingo Molnar
2008-05-26 20:41                   ` Ingo Molnar
2008-05-26 21:20                     ` Ilpo Järvinen
2008-05-30 16:23   ` Ray Lee
2008-05-26 16:24 ` Arjan van de Ven
2008-05-28  9:27 ` Peter Zijlstra
2008-05-31 14:25   ` Håkon Løvdal
2008-05-31 16:09     ` Ilpo Järvinen
2008-05-31 17:22       ` Ilpo Järvinen
2008-05-31 17:58       ` Håkon Løvdal
2008-05-31 18:37         ` Ilpo Järvinen
2008-05-31 20:25           ` Håkon Løvdal
2008-05-31 21:39             ` Ilpo Järvinen
2008-05-31 21:45               ` Håkon Løvdal
2008-06-04  0:10               ` Håkon Løvdal [this message]
2008-06-04 11:14                 ` Ilpo Järvinen
2008-06-04 14:00                   ` Håkon Løvdal
2008-06-04 15:09                     ` Ilpo Järvinen
2008-06-06  9:32                       ` Håkon Løvdal
2008-06-09 19:24                         ` Ilpo Järvinen
2008-06-10 23:26                           ` Håkon Løvdal
2008-06-11 13:39                             ` Ilpo Järvinen
2008-06-19  0:30                               ` Håkon Løvdal
2008-05-29  8:45 ` Ingo Molnar
2008-05-29 11:14   ` Ilpo Järvinen
2008-05-29 11:22     ` Ingo Molnar
2008-05-29 13:05       ` Evgeniy Polyakov
2008-05-29 13:43         ` Ingo Molnar
2008-05-29 13:08       ` Ingo Molnar
2008-05-29 13:48         ` Ilpo Järvinen
2008-05-30 11:09         ` Ingo Molnar
2008-05-30 21:12           ` Ilpo Järvinen
2008-05-30 18:18       ` Ingo Molnar
2008-05-31  6:09         ` Ingo Molnar
2008-05-31 11:46           ` Ilpo Järvinen
2008-05-31 12:18             ` Ilpo Järvinen
2008-05-31 12:54               ` Ingo Molnar
2008-05-31 12:58                 ` Ilpo Järvinen
2008-05-31 16:35                   ` Ingo Molnar
2008-05-31 22:46                     ` Patrick McManus
2008-06-01  5:51                       ` Ilpo Järvinen
2008-06-01  6:04                       ` Eric Dumazet
2008-06-02  9:23                         ` Ingo Molnar
2008-06-03  9:40                     ` [fixed] [patch] " Ingo Molnar
2008-06-03 14:41                       ` Patrick McManus
2008-06-03 21:46                       ` Ilpo Järvinen
2008-06-03 22:01                         ` Ilpo Järvinen
2008-06-03 22:03                           ` David Miller
2008-06-03 22:10                             ` Ilpo Järvinen
2008-06-03 23:22                             ` Ilpo Järvinen
2008-06-03 23:54                               ` Joe Perches
2008-06-04  6:25                                 ` Ilpo Järvinen
2008-06-04  2:54                               ` Patrick McManus
2008-06-04  6:42                                 ` Ilpo Järvinen
2008-06-05 14:22                               ` Ingo Molnar
2008-06-05 18:00                                 ` Ilpo Järvinen
2008-06-05 21:13                                   ` Ilpo Järvinen
2008-06-05 23:29                                     ` Patrick McManus
2008-06-06 10:03                                       ` Ilpo Järvinen
2008-06-06 17:11                                         ` Patrick McManus
2008-06-06 17:33                                           ` Ingo Molnar
2008-06-06 18:19                                             ` Ilpo Järvinen
2008-06-06 18:39                                               ` Ingo Molnar
2008-06-06 19:49                                                 ` Ilpo Järvinen
2008-06-06 20:08                                                 ` Patrick McManus
2008-06-06 21:12                                                   ` Ilpo Järvinen
2008-06-06 21:23                                                     ` Arjan van de Ven
2008-06-06 21:28                                                       ` Ilpo Järvinen
2008-06-10 22:49                                                   ` David Miller
2008-06-06 18:25                                           ` Ilpo Järvinen
2008-06-10 22:32                               ` David Miller
2008-06-11 13:10                                 ` Patrick McManus
2008-06-11 15:13                                 ` Ilpo Järvinen
2008-06-04  7:23                         ` Ingo Molnar
2008-06-04 18:24                           ` David Miller
2008-06-04 20:56                             ` Ilpo Järvinen
2008-06-04 21:55                               ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2008-05-30 16:31 Ray Lee
2008-05-30 21:11 ` Ilpo Järvinen
2008-05-31  6:03   ` Evgeniy Polyakov
2008-05-31 10:05     ` Ilpo Järvinen
2008-06-02  6:19       ` Herbert Xu
2008-06-02 11:53         ` Ilpo Järvinen
2008-06-02 14:08           ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a01a16b50806031710u54b9297bkf1fad9c4f8fd581f@mail.gmail.com \
    --to=hlovdal@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=ilpo.jarvinen@helsinki.fi \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).