* TCP stalls in current git, possibly splice related
@ 2007-07-12 20:12 James Morris
2007-07-12 20:52 ` David Miller
2007-07-13 10:02 ` Johannes Berg
0 siblings, 2 replies; 16+ messages in thread
From: James Morris @ 2007-07-12 20:12 UTC (permalink / raw)
To: Jens Axboe; +Cc: David S. Miller, netdev
I'm seeing TCP connection stalls with current git, and a bisect found the
following as a possible cause:
534f2aaa6ab07cd71164180bc958a7dcde41db11 is first bad commit
commit 534f2aaa6ab07cd71164180bc958a7dcde41db11
Author: Jens Axboe <jens.axboe@oracle.com>
Date: Fri Jun 1 14:52:37 2007 +0200
sys_sendfile: switch to using ->splice_read, if available
This patch makes sendfile prefer to use ->splice_read(), if it's
available in the file_operations structure.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's a large patch, and not clear if it's the patch itself or coincidental
to it.
I've looked at some tcpdumps, but may not be able to get back to them
until tomorrow or the weekend, & thought it might be useful to get the
report out now.
- James
--
James Morris
<jmorris@namei.org>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: TCP stalls in current git, possibly splice related 2007-07-12 20:12 TCP stalls in current git, possibly splice related James Morris @ 2007-07-12 20:52 ` David Miller 2007-07-13 2:26 ` James Morris 2007-07-13 10:02 ` Johannes Berg 1 sibling, 1 reply; 16+ messages in thread From: David Miller @ 2007-07-12 20:52 UTC (permalink / raw) To: jmorris; +Cc: jens.axboe, netdev From: James Morris <jmorris@namei.org> Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT) > I'm seeing TCP connection stalls with current git, and a bisect found the > following as a possible cause: To add to this James is seeing this with distcc I believe. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-12 20:52 ` David Miller @ 2007-07-13 2:26 ` James Morris 2007-07-13 5:36 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: James Morris @ 2007-07-13 2:26 UTC (permalink / raw) To: David Miller; +Cc: jens.axboe, netdev On Thu, 12 Jul 2007, David Miller wrote: > From: James Morris <jmorris@namei.org> > Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT) > > > I'm seeing TCP connection stalls with current git, and a bisect found the > > following as a possible cause: > > To add to this James is seeing this with distcc I believe. Correct. -- James Morris <jmorris@namei.org> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-13 2:26 ` James Morris @ 2007-07-13 5:36 ` Jens Axboe 2007-07-13 6:26 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2007-07-13 5:36 UTC (permalink / raw) To: James Morris; +Cc: David Miller, netdev On Thu, Jul 12 2007, James Morris wrote: > On Thu, 12 Jul 2007, David Miller wrote: > > > From: James Morris <jmorris@namei.org> > > Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT) > > > > > I'm seeing TCP connection stalls with current git, and a bisect found the > > > following as a possible cause: > > > > To add to this James is seeing this with distcc I believe. > > Correct. I'll try and reproduce. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-13 5:36 ` Jens Axboe @ 2007-07-13 6:26 ` Jens Axboe 2007-07-13 9:25 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2007-07-13 6:26 UTC (permalink / raw) To: James Morris; +Cc: David Miller, netdev On Fri, Jul 13 2007, Jens Axboe wrote: > On Thu, Jul 12 2007, James Morris wrote: > > On Thu, 12 Jul 2007, David Miller wrote: > > > > > From: James Morris <jmorris@namei.org> > > > Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT) > > > > > > > I'm seeing TCP connection stalls with current git, and a bisect found the > > > > following as a possible cause: > > > > > > To add to this James is seeing this with distcc I believe. > > > > Correct. > > I'll try and reproduce. You didn't happen to get a sysrq-t backtrace of that distcc being hung, did you? -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-13 6:26 ` Jens Axboe @ 2007-07-13 9:25 ` Jens Axboe 0 siblings, 0 replies; 16+ messages in thread From: Jens Axboe @ 2007-07-13 9:25 UTC (permalink / raw) To: James Morris; +Cc: David Miller, netdev On Fri, Jul 13 2007, Jens Axboe wrote: > On Fri, Jul 13 2007, Jens Axboe wrote: > > On Thu, Jul 12 2007, James Morris wrote: > > > On Thu, 12 Jul 2007, David Miller wrote: > > > > > > > From: James Morris <jmorris@namei.org> > > > > Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT) > > > > > > > > > I'm seeing TCP connection stalls with current git, and a bisect found the > > > > > following as a possible cause: > > > > > > > > To add to this James is seeing this with distcc I believe. > > > > > > Correct. > > > > I'll try and reproduce. > > You didn't happen to get a sysrq-t backtrace of that distcc being hung, > did you? Does this work for you? diff --git a/fs/splice.c b/fs/splice.c index ed2ce99..92646aa 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -491,7 +491,7 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, ret = 0; spliced = 0; - while (len) { + while (len && !spliced) { ret = __generic_file_splice_read(in, ppos, pipe, len, flags); if (ret < 0) @@ -1051,15 +1051,10 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, sd->flags &= ~SPLICE_F_NONBLOCK; while (len) { - size_t read_len, max_read_len; - - /* - * Do at most PIPE_BUFFERS pages worth of transfer: - */ - max_read_len = min(len, (size_t)(PIPE_BUFFERS*PAGE_SIZE)); + size_t read_len; - ret = do_splice_to(in, &sd->pos, pipe, max_read_len, flags); - if (unlikely(ret < 0)) + ret = do_splice_to(in, &sd->pos, pipe, len, flags); + if (unlikely(ret <= 0)) goto out_release; read_len = ret; @@ -1071,26 +1066,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, * could get stuck data in the internal pipe: */ ret = actor(pipe, sd); - if (unlikely(ret < 0)) + if (unlikely(ret <= 0)) goto out_release; bytes += ret; len -= ret; - /* - * In nonblocking mode, if we got back a short read then - * that was due to either an IO error or due to the - * pagecache entry not being there. In the IO error case - * the _next_ splice attempt will produce a clean IO error - * return value (not a short read), so in both cases it's - * correct to break out of the loop here: - */ - if ((flags & SPLICE_F_NONBLOCK) && (read_len < max_read_len)) - break; + if (ret < read_len) + goto out_release; } pipe->nrbufs = pipe->curbuf = 0; - return bytes; out_release: @@ -1152,10 +1138,12 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, .pos = *ppos, .u.file = out, }; - size_t ret; + long ret; ret = splice_direct_to_actor(in, &sd, direct_splice_actor); - *ppos = sd.pos; + if (ret > 0) + *ppos += ret; + return ret; } -- Jens Axboe ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-12 20:12 TCP stalls in current git, possibly splice related James Morris 2007-07-12 20:52 ` David Miller @ 2007-07-13 10:02 ` Johannes Berg 2007-07-13 11:05 ` Jens Axboe 1 sibling, 1 reply; 16+ messages in thread From: Johannes Berg @ 2007-07-13 10:02 UTC (permalink / raw) To: James Morris; +Cc: Jens Axboe, David S. Miller, netdev [-- Attachment #1: Type: text/plain, Size: 220 bytes --] On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote: > I'm seeing TCP connection stalls with current git, and a bisect found the > following as a possible cause: I'm also seeing stalls with synergy. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 190 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-13 10:02 ` Johannes Berg @ 2007-07-13 11:05 ` Jens Axboe 2007-07-13 12:23 ` James Morris ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Jens Axboe @ 2007-07-13 11:05 UTC (permalink / raw) To: Johannes Berg; +Cc: James Morris, David S. Miller, netdev On Fri, Jul 13 2007, Johannes Berg wrote: > On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote: > > I'm seeing TCP connection stalls with current git, and a bisect found the > > following as a possible cause: > > I'm also seeing stalls with synergy. Please try the below. diff --git a/fs/splice.c b/fs/splice.c index ed2ce99..92646aa 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -491,7 +491,7 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, ret = 0; spliced = 0; - while (len) { + while (len && !spliced) { ret = __generic_file_splice_read(in, ppos, pipe, len, flags); if (ret < 0) @@ -1051,15 +1051,10 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, sd->flags &= ~SPLICE_F_NONBLOCK; while (len) { - size_t read_len, max_read_len; - - /* - * Do at most PIPE_BUFFERS pages worth of transfer: - */ - max_read_len = min(len, (size_t)(PIPE_BUFFERS*PAGE_SIZE)); + size_t read_len; - ret = do_splice_to(in, &sd->pos, pipe, max_read_len, flags); - if (unlikely(ret < 0)) + ret = do_splice_to(in, &sd->pos, pipe, len, flags); + if (unlikely(ret <= 0)) goto out_release; read_len = ret; @@ -1071,26 +1066,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd, * could get stuck data in the internal pipe: */ ret = actor(pipe, sd); - if (unlikely(ret < 0)) + if (unlikely(ret <= 0)) goto out_release; bytes += ret; len -= ret; - /* - * In nonblocking mode, if we got back a short read then - * that was due to either an IO error or due to the - * pagecache entry not being there. In the IO error case - * the _next_ splice attempt will produce a clean IO error - * return value (not a short read), so in both cases it's - * correct to break out of the loop here: - */ - if ((flags & SPLICE_F_NONBLOCK) && (read_len < max_read_len)) - break; + if (ret < read_len) + goto out_release; } pipe->nrbufs = pipe->curbuf = 0; - return bytes; out_release: @@ -1152,10 +1138,12 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, .pos = *ppos, .u.file = out, }; - size_t ret; + long ret; ret = splice_direct_to_actor(in, &sd, direct_splice_actor); - *ppos = sd.pos; + if (ret > 0) + *ppos += ret; + return ret; } -- Jens Axboe ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-13 11:05 ` Jens Axboe @ 2007-07-13 12:23 ` James Morris 2007-07-13 21:55 ` Johannes Berg 2007-07-16 8:45 ` Johannes Berg 2 siblings, 0 replies; 16+ messages in thread From: James Morris @ 2007-07-13 12:23 UTC (permalink / raw) To: Jens Axboe; +Cc: Johannes Berg, David S. Miller, netdev On Fri, 13 Jul 2007, Jens Axboe wrote: > On Fri, Jul 13 2007, Johannes Berg wrote: > > On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote: > > > I'm seeing TCP connection stalls with current git, and a bisect found the > > > following as a possible cause: > > > > I'm also seeing stalls with synergy. > > Please try the below. Works for me. -- James Morris <jmorris@namei.org> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-13 11:05 ` Jens Axboe 2007-07-13 12:23 ` James Morris @ 2007-07-13 21:55 ` Johannes Berg 2007-07-16 8:45 ` Johannes Berg 2 siblings, 0 replies; 16+ messages in thread From: Johannes Berg @ 2007-07-13 21:55 UTC (permalink / raw) To: Jens Axboe; +Cc: James Morris, David S. Miller, netdev [-- Attachment #1: Type: text/plain, Size: 552 bytes --] On Fri, 2007-07-13 at 13:05 +0200, Jens Axboe wrote: > On Fri, Jul 13 2007, Johannes Berg wrote: > > On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote: > > > I'm seeing TCP connection stalls with current git, and a bisect found the > > > following as a possible cause: > > > > I'm also seeing stalls with synergy. > > Please try the below. Thanks. I already had your mail in the other part of the thread marked, I'll give it a try on Monday or Tuesday; I'm away from the setup where I always see the problem right now. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 190 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-13 11:05 ` Jens Axboe 2007-07-13 12:23 ` James Morris 2007-07-13 21:55 ` Johannes Berg @ 2007-07-16 8:45 ` Johannes Berg 2007-07-16 12:02 ` Jens Axboe 2 siblings, 1 reply; 16+ messages in thread From: Johannes Berg @ 2007-07-16 8:45 UTC (permalink / raw) To: Jens Axboe; +Cc: James Morris, David S. Miller, netdev [-- Attachment #1: Type: text/plain, Size: 922 bytes --] On Fri, 2007-07-13 at 13:05 +0200, Jens Axboe wrote: > On Fri, Jul 13 2007, Johannes Berg wrote: > > On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote: > > > I'm seeing TCP connection stalls with current git, and a bisect found the > > > following as a possible cause: > > > > I'm also seeing stalls with synergy. > > Please try the below. I usually see stalls sending data from machine A to machine B. I just upgraded machine B to commit 8d9107e8c50e1c4ff43c91c8841805833f3ecfb9 (Fri Jul 13 16:53:18 2007 -0700) and the tree includes this patch. This did not, however, fix the problem so I guess the problem is on the sender side. Due to wireless-dev lagging behind I there still have 2.6.22-something where this patch doesn't apply. I hope John will merge again soon and then I'll report back whether I still see the stalling behaviour with a current tree that includes this patch. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 190 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-16 8:45 ` Johannes Berg @ 2007-07-16 12:02 ` Jens Axboe 2007-07-17 22:36 ` Johannes Berg 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2007-07-16 12:02 UTC (permalink / raw) To: Johannes Berg; +Cc: James Morris, David S. Miller, netdev On Mon, Jul 16 2007, Johannes Berg wrote: > On Fri, 2007-07-13 at 13:05 +0200, Jens Axboe wrote: > > On Fri, Jul 13 2007, Johannes Berg wrote: > > > On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote: > > > > I'm seeing TCP connection stalls with current git, and a bisect found the > > > > following as a possible cause: > > > > > > I'm also seeing stalls with synergy. > > > > Please try the below. > > I usually see stalls sending data from machine A to machine B. I just > upgraded machine B to commit 8d9107e8c50e1c4ff43c91c8841805833f3ecfb9 > (Fri Jul 13 16:53:18 2007 -0700) and the tree includes this patch. This > did not, however, fix the problem so I guess the problem is on the > sender side. Due to wireless-dev lagging behind I there still have > 2.6.22-something where this patch doesn't apply. I hope John will merge > again soon and then I'll report back whether I still see the stalling > behaviour with a current tree that includes this patch. Yep, it's a sender thing, so upgrading the receiver will not change anything. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-16 12:02 ` Jens Axboe @ 2007-07-17 22:36 ` Johannes Berg 2007-07-18 10:49 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Johannes Berg @ 2007-07-17 22:36 UTC (permalink / raw) To: Jens Axboe; +Cc: James Morris, David S. Miller, netdev [-- Attachment #1: Type: text/plain, Size: 847 bytes --] On Mon, 2007-07-16 at 14:02 +0200, Jens Axboe wrote: > Yep, it's a sender thing, so upgrading the receiver will not change > anything. Ok, I upgraded, but that didn't help. And in fact, I don't see how it could have since synergy doesn't use splice or sendfile. I should've thought of that right away, sorry. It seems that packets are actually coming in during the time that my mouse hangs though (ran wireshark in parallel and saw no pauses in the timeline.) Hence, it actually seems to be on the receiver side, and running the synergy client under strace reveals that during the time my mouse hangs it's in poll() waiting for input on the tcp socket. sysrg-t doesn't show anything useful, it's just scheduling waiting for data. According to wireshark data is sent, but it never shows up at the application layer. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 190 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-17 22:36 ` Johannes Berg @ 2007-07-18 10:49 ` Jens Axboe 2007-07-18 10:58 ` Johannes Berg 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2007-07-18 10:49 UTC (permalink / raw) To: Johannes Berg; +Cc: James Morris, David S. Miller, netdev On Wed, Jul 18 2007, Johannes Berg wrote: > On Mon, 2007-07-16 at 14:02 +0200, Jens Axboe wrote: > > > Yep, it's a sender thing, so upgrading the receiver will not change > > anything. > > Ok, I upgraded, but that didn't help. And in fact, I don't see how it > could have since synergy doesn't use splice or sendfile. I should've > thought of that right away, sorry. > > It seems that packets are actually coming in during the time that my > mouse hangs though (ran wireshark in parallel and saw no pauses in the > timeline.) Hence, it actually seems to be on the receiver side, and > running the synergy client under strace reveals that during the time my > mouse hangs it's in poll() waiting for input on the tcp socket. sysrg-t > doesn't show anything useful, it's just scheduling waiting for data. > According to wireshark data is sent, but it never shows up at the > application layer. OK, then we can put splice off the hook at least :-) If it's easily reproducible (and it sounds like it), then a git bisect might be the easiest way forward. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-18 10:49 ` Jens Axboe @ 2007-07-18 10:58 ` Johannes Berg 2007-07-18 21:24 ` Johannes Berg 0 siblings, 1 reply; 16+ messages in thread From: Johannes Berg @ 2007-07-18 10:58 UTC (permalink / raw) To: Jens Axboe; +Cc: James Morris, David S. Miller, netdev [-- Attachment #1: Type: text/plain, Size: 399 bytes --] On Wed, 2007-07-18 at 12:49 +0200, Jens Axboe wrote: > OK, then we can put splice off the hook at least :-) :) > If it's easily reproducible (and it sounds like it), then a git bisect > might be the easiest way forward. Yeah, I only need to try to remember when I last *didn't* see the problem, and hope that my analysis about it being on the receiver side is correct :) johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 190 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: TCP stalls in current git, possibly splice related 2007-07-18 10:58 ` Johannes Berg @ 2007-07-18 21:24 ` Johannes Berg 0 siblings, 0 replies; 16+ messages in thread From: Johannes Berg @ 2007-07-18 21:24 UTC (permalink / raw) To: Jens Axboe; +Cc: James Morris, David S. Miller, netdev [-- Attachment #1: Type: text/plain, Size: 572 bytes --] On Wed, 2007-07-18 at 12:58 +0200, Johannes Berg wrote: > Yeah, I only need to try to remember when I last *didn't* see the > problem, and hope that my analysis about it being on the receiver side > is correct :) Hm. I went back to 2.6.20 on the receiver side and still saw it. Maybe I need to go back further? With strace (the synergyc process) and tcpdump (on the interface) I continually see packets coming in but the read sometimes pauses for 8 seconds... Not sure why. Very weird, but right now I don't have enough time for investigating. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 190 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-07-19 10:05 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-07-12 20:12 TCP stalls in current git, possibly splice related James Morris 2007-07-12 20:52 ` David Miller 2007-07-13 2:26 ` James Morris 2007-07-13 5:36 ` Jens Axboe 2007-07-13 6:26 ` Jens Axboe 2007-07-13 9:25 ` Jens Axboe 2007-07-13 10:02 ` Johannes Berg 2007-07-13 11:05 ` Jens Axboe 2007-07-13 12:23 ` James Morris 2007-07-13 21:55 ` Johannes Berg 2007-07-16 8:45 ` Johannes Berg 2007-07-16 12:02 ` Jens Axboe 2007-07-17 22:36 ` Johannes Berg 2007-07-18 10:49 ` Jens Axboe 2007-07-18 10:58 ` Johannes Berg 2007-07-18 21:24 ` Johannes Berg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).