* 3.8-rc2/rc3 write() blocked on CLOSE_WAIT TCP socket
@ 2013-01-11 0:49 Eric Wong
2013-01-11 2:01 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2013-01-11 0:49 UTC (permalink / raw)
To: netdev
Cc: Eric Dumazet, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
Minchan Kim, Andrew Morton, Linus Torvalds
The below Ruby script reproduces the issue for me with write()
getting stuck, usually with a few iterations (sometimes up to 100).
I've reproduced this with 3.8-rc2 and rc3, even with Mel's partial
revert patch in <20130110194212.GJ13304@suse.de> applied.
I can not reproduce this with 3.7.1+
stable-queue 2afd72f59c518da18853192ceeebead670ced5ea
So this seems to be a new bug from the 3.8 cycle...
Fortunately, this bug far easier for me to reproduce than the ppoll+send
(toosleepy) failures.
Both socat and ruby (Ruby 1.8, 1.9, 2.0 should all work), along with
common shell tools (dd, sh, cat) are required for testing this:
# 100 iterations, raise/lower the number if needed
ruby the_script_below.rb 100
lsof -p 15236 reveals this:
ruby 15236 ew 5u IPv4 23066 0t0 TCP localhost:33728->localhost:38658 (CLOSE_WAIT)
$ strace -f -p 15236
Process 15236 attached - interrupt to quit
write(5, "byebye!\n", 8
So write() to fd=5 is blocked, but the lsof shows the socket is already
in CLOSE_WAIT state. I expect write() to give me -EPIPE here since
the socat process on the reading end is long dead.
This could be an issue with sk_stream_wait_memory() that Eric Dumazet
alluded to with when I was chasing the toosleepy problem:
$ cat /proc/15236/stack
[<ffffffff8129fb19>] release_sock+0xe5/0x11b
[<ffffffff812a6328>] sk_stream_wait_memory+0x1f7/0x1fc
[<ffffffff81040d3a>] autoremove_wake_function+0x0/0x2a
[<ffffffff812d8ebf>] tcp_sendmsg+0x710/0x86d
[<ffffffff81000e34>] __switch_to+0x235/0x3c5
[<ffffffff81299c0d>] sock_aio_write+0x102/0x10d
[<ffffffff810d0b66>] do_sync_write+0x88/0xc1
[<ffffffff810d1476>] vfs_write+0xb3/0xda
[<ffffffff81036613>] ptrace_notify+0x5d/0x76
[<ffffffff810d158e>] sys_write+0x58/0x92
[<ffffffff81322669>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
I know many of you may not be familiar with Ruby, so I've tried to
comment the code as much as possible. Feel free to ask for
clarification.
-------------------------------- 8< -------------------------------
require 'socket'
require 'tempfile'
$stdout.sync = $stderr.sync = true # don't buffer any output
# Read the number of iterations (first command-line argument)
# (ARGV[0] in Ruby is argv[1] in C)
iterations = ARGV[0].to_i
iterations > 0 or abort "Usage: #$0 ITERATIONS (iterations should be > 0)"
# Capture a temporary file name for using in the shell script (below)
tmp = Tempfile.new('out')
out = tmp.path
# This is an array of FIFO path names, we create two of them:
fifos = []
%w(a b).each do |name|
fifotmp = Tempfile.new([name, '.fifo'])
# get the pathname from the Tempfile object
fifoname = fifotmp.path
fifos << fifoname
# This unlinks the temporary pathname so mkfifo can succeed
# (yes, there's a tiny race here but unlikely to ever happen).
fifotmp.close!
# create the FIFO
system("mkfifo", fifoname) or abort "mkfifo #{fifoname} failed: #$?"
end
# bind to a random TCP port over loopback
addr = "127.0.0.1"
srv = TCPServer.new(addr, 0)
port = srv.addr[1]
# Start the TCP server in a child process
pid = fork do
begin
response = "byebye!\n"
n = -1 # count the client number, first client is n==0
while client = srv.accept # this is an accept(2) wrapper
n += 1
warn "Accepted client=#{n}"
begin
# this is select(2)
warn "Waiting on client=#{n} to become readable"
IO.select([client], nil, nil, 5) or abort "BUG: #{client} not readable"
# read the request, it should be "hihi"
warn "Reading from client=#{n}"
req = client.gets
if req =~ /hihi/
warn "sending infinite response to client=#{n}"
client.sync = true # do not buffer output
# just write the response in an infinite loop on the socket
# The client will only read 4K (see dd below), disconnect,
# and trigger Errno::EPIPE.
# This just calls write(2) in a loop
loop { client.write(response) }
else
warn "Client sent bad request: #{req}"
end
rescue => e
warn "Got #{e.class} #{e.message} error for client=#{n}"
# this is expected, the client will only read 4K of our infinite
# response and drop the socket. We write to the fifo the
# client is running: "cat #{fifos[0]} &" on
fifo = fifos[0]
File.open(fifo, "w") do |fp|
warn "writing message to #{fifo} for client=#{n}"
fp.write("CLOSING #{n}")
warn "done writing message to #{fifo} for client=#{n}"
end
ensure
warn "Done dealing with client=#{n}"
client.close
end
end
ensure
warn "Server exiting"
end
end
# close the server port in the main process, server is running in child
srv.close
# ensure we shut the server down at exit
at_exit do
Process.kill(:TERM, pid)
fifos.each { |fifopath| File.unlink(fifopath) }
_, status = Process.waitpid2(pid)
puts "Server exited: #{status.inspect}"
end
# inline shell script here
x = <<SH
set -e
# wait for the server to write "CLOSING" above
# After enough iterations, this can get hung up on open():
cat #{fifos[0]} > #{out} &
(
# send a request to the server
echo hihi
# read 4K of the "byebye!" response
dd bs=4096 count=1 < #{fifos[1]} > /dev/null
# socat reads the stdout of the above ('hihi') and writes
# it to the TCP:#{addr}:#{port}, the server response goes to fifos[1],
# which the above dd(1) invocation reads the first 4K of.
# This socat is expected to error out with EPIPE here
) | socat - TCP:#{addr}:#{port} > #{fifos[1]} || :
echo "Waiting on #{fifos[0]} for client=$client"
wait # for the cat fifo[0] above
grep CLOSING #{out}
> #{out}
SH
# run the above shell script, assign the client= variable to the
# iteration number
iterations.times do |i|
system("client=#{i}\n#{x}") or abort "client #{i} failed: #$?"
end
puts "All done!"
--
Eric Wong
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.8-rc2/rc3 write() blocked on CLOSE_WAIT TCP socket
2013-01-11 0:49 3.8-rc2/rc3 write() blocked on CLOSE_WAIT TCP socket Eric Wong
@ 2013-01-11 2:01 ` Eric Dumazet
2013-01-11 2:18 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2013-01-11 2:01 UTC (permalink / raw)
To: Eric Wong
Cc: netdev, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
Minchan Kim, Andrew Morton, Linus Torvalds
On Fri, 2013-01-11 at 00:49 +0000, Eric Wong wrote:
> The below Ruby script reproduces the issue for me with write()
> getting stuck, usually with a few iterations (sometimes up to 100).
>
> I've reproduced this with 3.8-rc2 and rc3, even with Mel's partial
> revert patch in <20130110194212.GJ13304@suse.de> applied.
>
> I can not reproduce this with 3.7.1+
> stable-queue 2afd72f59c518da18853192ceeebead670ced5ea
> So this seems to be a new bug from the 3.8 cycle...
>
> Fortunately, this bug far easier for me to reproduce than the ppoll+send
> (toosleepy) failures.
>
> Both socat and ruby (Ruby 1.8, 1.9, 2.0 should all work), along with
> common shell tools (dd, sh, cat) are required for testing this:
>
> # 100 iterations, raise/lower the number if needed
> ruby the_script_below.rb 100
>
> lsof -p 15236 reveals this:
> ruby 15236 ew 5u IPv4 23066 0t0 TCP localhost:33728->localhost:38658 (CLOSE_WAIT)
Hmm, it might be commit c3ae62af8e755ea68380fb5ce682e60079a4c388
tcp: should drop incoming frames without ACK flag set
It seems RST should be allowed to not have ACK set.
I'll send a fix, thanks !
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.8-rc2/rc3 write() blocked on CLOSE_WAIT TCP socket
2013-01-11 2:01 ` Eric Dumazet
@ 2013-01-11 2:18 ` Eric Dumazet
2013-01-11 2:40 ` Neal Cardwell
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Eric Dumazet @ 2013-01-11 2:18 UTC (permalink / raw)
To: Eric Wong, David Miller
Cc: netdev, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
Minchan Kim, Andrew Morton, Linus Torvalds
From: Eric Dumazet <edumazet@google.com>
On Thu, 2013-01-10 at 18:01 -0800, Eric Dumazet wrote:
> Hmm, it might be commit c3ae62af8e755ea68380fb5ce682e60079a4c388
> tcp: should drop incoming frames without ACK flag set
>
> It seems RST should be allowed to not have ACK set.
>
> I'll send a fix, thanks !
Yes, thats definitely the problem, sorry for that.
[PATCH] tcp: accept RST without ACK flag
commit c3ae62af8e755 (tcp: should drop incoming frames without ACK flag
set) added a regression on the handling of RST messages.
RST should be allowed to come even without ACK bit set. We validate
the RST by checking the exact sequence, as requested by RFC 793 and
5961 3.2, in tcp_validate_incoming()
Reported-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp_input.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 38e1184..0905997 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5541,7 +5541,7 @@ slow_path:
if (len < (th->doff << 2) || tcp_checksum_complete_user(sk, skb))
goto csum_error;
- if (!th->ack)
+ if (!th->ack && !th->rst)
goto discard;
/*
@@ -5986,7 +5986,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
goto discard;
}
- if (!th->ack)
+ if (!th->ack && !th->rst)
goto discard;
if (!tcp_validate_incoming(sk, skb, th, 0))
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: 3.8-rc2/rc3 write() blocked on CLOSE_WAIT TCP socket
2013-01-11 2:18 ` Eric Dumazet
@ 2013-01-11 2:40 ` Neal Cardwell
2013-01-11 2:50 ` Eric Wong
2013-01-11 6:49 ` David Miller
2 siblings, 0 replies; 6+ messages in thread
From: Neal Cardwell @ 2013-01-11 2:40 UTC (permalink / raw)
To: Eric Dumazet
Cc: Eric Wong, David Miller, Netdev, Mel Gorman, linux-mm, LKML,
Rik van Riel, Minchan Kim, Andrew Morton, Linus Torvalds
On Thu, Jan 10, 2013 at 9:18 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> On Thu, 2013-01-10 at 18:01 -0800, Eric Dumazet wrote:
>
>> Hmm, it might be commit c3ae62af8e755ea68380fb5ce682e60079a4c388
>> tcp: should drop incoming frames without ACK flag set
>>
>> It seems RST should be allowed to not have ACK set.
>>
>> I'll send a fix, thanks !
>
> Yes, thats definitely the problem, sorry for that.
>
>
> [PATCH] tcp: accept RST without ACK flag
>
> commit c3ae62af8e755 (tcp: should drop incoming frames without ACK flag
> set) added a regression on the handling of RST messages.
>
> RST should be allowed to come even without ACK bit set. We validate
> the RST by checking the exact sequence, as requested by RFC 793 and
> 5961 3.2, in tcp_validate_incoming()
>
> Reported-by: Eric Wong <normalperson@yhbt.net>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
neal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.8-rc2/rc3 write() blocked on CLOSE_WAIT TCP socket
2013-01-11 2:18 ` Eric Dumazet
2013-01-11 2:40 ` Neal Cardwell
@ 2013-01-11 2:50 ` Eric Wong
2013-01-11 6:49 ` David Miller
2 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2013-01-11 2:50 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, netdev, Mel Gorman, linux-mm, linux-kernel,
Rik van Riel, Minchan Kim, Andrew Morton, Linus Torvalds
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Yes, thats definitely the problem, sorry for that.
>
>
> [PATCH] tcp: accept RST without ACK flag
>
> commit c3ae62af8e755 (tcp: should drop incoming frames without ACK flag
> set) added a regression on the handling of RST messages.
>
> RST should be allowed to come even without ACK bit set. We validate
> the RST by checking the exact sequence, as requested by RFC 793 and
> 5961 3.2, in tcp_validate_incoming()
>
> Reported-by: Eric Wong <normalperson@yhbt.net>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
All good here, thanks for the quick turnaround!
Tested-by: Eric Wong <normalperson@yhbt.net>
(I originally thought the FIFOs were part of the problem, so I left
that in my test case)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.8-rc2/rc3 write() blocked on CLOSE_WAIT TCP socket
2013-01-11 2:18 ` Eric Dumazet
2013-01-11 2:40 ` Neal Cardwell
2013-01-11 2:50 ` Eric Wong
@ 2013-01-11 6:49 ` David Miller
2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2013-01-11 6:49 UTC (permalink / raw)
To: eric.dumazet
Cc: normalperson, netdev, mgorman, linux-mm, linux-kernel, riel,
minchan, akpm, torvalds
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 10 Jan 2013 18:18:47 -0800
> [PATCH] tcp: accept RST without ACK flag
>
> commit c3ae62af8e755 (tcp: should drop incoming frames without ACK flag
> set) added a regression on the handling of RST messages.
>
> RST should be allowed to come even without ACK bit set. We validate
> the RST by checking the exact sequence, as requested by RFC 793 and
> 5961 3.2, in tcp_validate_incoming()
>
> Reported-by: Eric Wong <normalperson@yhbt.net>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks Eric.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-01-11 6:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-11 0:49 3.8-rc2/rc3 write() blocked on CLOSE_WAIT TCP socket Eric Wong
2013-01-11 2:01 ` Eric Dumazet
2013-01-11 2:18 ` Eric Dumazet
2013-01-11 2:40 ` Neal Cardwell
2013-01-11 2:50 ` Eric Wong
2013-01-11 6:49 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).