* tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
@ 2004-07-30 2:19 Robert White
2004-07-30 2:36 ` David S. Miller
0 siblings, 1 reply; 9+ messages in thread
From: Robert White @ 2004-07-30 2:19 UTC (permalink / raw)
To: linux-kernel
Greetings,
I have several environments where I have two or more boxes tied together with
short/private Ethernet segments. The applications on these boxes toss events back
and forth using that network. Reliability is most important so I use TCP, but
performance is also important as a close second. The events are often quite short
and are, as often as not assembled on the fly in the code via multiple writes.
The protocol(s) also have "known complete" moments where the sender knows that a
complete event has been written.
If I turn Nagle off, then the segmented write-on-assembly generates a lot of very
short (3 to 30 etc bytes) packets; if I leave it on then Nagle tends to delay the
complete events (because, of course, the stack isn't psychic 8-).
I currently flush these event boundaries by turning nagle off and then back on using
back-to-back calls to setsockop(). The extra syscalls seem like a waste. To that
end I am looking into a patch to add a SIOCFLUSH ioctl or similar.
The below [untested] patch is my first-take on the question. I am interested in
knowing whether this looks useful to others (or ill conceived etc 8-) before I try to
add this to the tree pervasively.
[Sorry about the mis-indented line to stifle outlook word-wrap. I "have to" use this
bloody program here at work... /sigh 8-)]
==== Begin Patch ====
--- linux.orig/include/asm-i386/sockios.h 2004-06-15 22:19:02.000000000 -0700
+++ linux/include/asm-i386/sockios.h 2004-07-29 18:37:22.000000000 -0700
@@ -8,5 +8,6 @@
#define SIOCGPGRP 0x8904
#define SIOCATMARK 0x8905
#define SIOCGSTAMP 0x8906 /* Get stamp */
+#define SIOCFLUSH 0x8907
#endif
diff --recursive -u linux.orig/net/ipv4/tcp.c linux/net/ipv4/tcp.c
--- linux.orig/net/ipv4/tcp.c 2004-06-15 22:19:03.000000000 -0700
+++ linux/net/ipv4/tcp.c 2004-07-29 19:09:07.000000000 -0700
@@ -526,6 +526,14 @@
else
answ = tp->write_seq - tp->snd_una;
break;
+ case SIOCFLUSH:
+ {
+ __u8 scratch = tp->nonagle;
+ tp->nonagle = (scratch & ~TCP_NAGLE_CORK) | TCP_NAGLE_OFF | TCP_NAGLE_PUSH;
+ tcp_push_pending_frames(sk, tp);
+ tp->nonagle = scratch;
+ }
+ return 0;
default:
return -ENOIOCTLCMD;
};
==== End Patch ====
Robert White,
Casabyte, Inc.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
2004-07-30 2:19 tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY Robert White
@ 2004-07-30 2:36 ` David S. Miller
2004-07-30 22:02 ` Robert White
0 siblings, 1 reply; 9+ messages in thread
From: David S. Miller @ 2004-07-30 2:36 UTC (permalink / raw)
To: Robert White; +Cc: linux-kernel
Turn off NAGLE, and flip TCP_CORK on and off around the sequences.
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
2004-07-30 2:36 ` David S. Miller
@ 2004-07-30 22:02 ` Robert White
2004-07-30 22:37 ` David S. Miller
0 siblings, 1 reply; 9+ messages in thread
From: Robert White @ 2004-07-30 22:02 UTC (permalink / raw)
To: 'David S. Miller'; +Cc: linux-kernel
I thought about turning cork on an off. While it does essentially burp the transfer
into larger frames it is still flawed for my usage for several reasons.
Omitting the uninteresting parts and giving a real-but-partial example...
In one of the applications I am stuck with I have a USB device on one machine that I
am tunneling onto another. The reasons I can't just plug the USB device into the
other machine are tedious. The USB device "bursts" 32 characters at a time maximum.
I also send monitoring information "on the side band" (on the same socket in sequence
with the data but marked as control information) on occasion, which is where my
application-level framing comes into play. Normally it is fine to just use TCP as-is
and let things Nagle normally and live with the induced delay.
Now consider running PPP on that device over that tunnel. Again it is uninteresting
why I can't run pppd on the computer local to the USB device. If the TCP stream is
flushed when any 32-or-less byte burst ends with the PPP framing character, the data
throughput on ppp interface abstracted across the tunnel rises considerably.
In the current implementation, the only flush available in TCP is to turn nagle off
and then back on, which over the course of a tunnel lifespan adds a significant
number of syscalls.
1) I am not really interested in delaying the front of the frames as much as I am
interested in expediting the end of the frames. So I don't actually want to cork the
stream, I just want to flush it under easily detectable conditions.
2) There are times when the frames can be larger than the Ethernet MTU so corking
starts getting counter-productive.
3) There are times when the right moment to release a cork isn't determinable (as I
don't want the tunnel in the business of guessing whether the overlying user is or
isn't in ppp mode).
4) Cork-then-uncork would still end up with two syscalls instead of one.
5) Just turning Nagle off completely begins to scale rather poorly as device count
increases because of the large number of 32 byte payloads.
I can imagine other protocols and applications (near-real-time signaling or games?)
that could benefit from a flush operation on the TCP stack; and indeed I have another
one in force that is less easy to explain (without getting really boring 8-) than the
above.
Rob White,
Casabyte, Inc.
-----Original Message-----
From: David S. Miller [mailto:davem@redhat.com]
Sent: Thursday, July 29, 2004 7:37 PM
To: Robert White
Cc: linux-kernel@vger.kernel.org
Subject: Re: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
Turn off NAGLE, and flip TCP_CORK on and off around the sequences.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
2004-07-30 22:02 ` Robert White
@ 2004-07-30 22:37 ` David S. Miller
2004-07-31 8:10 ` Arjan van de Ven
0 siblings, 1 reply; 9+ messages in thread
From: David S. Miller @ 2004-07-30 22:37 UTC (permalink / raw)
To: Robert White; +Cc: linux-kernel
On Fri, 30 Jul 2004 15:02:33 -0700
"Robert White" <rwhite@casabyte.com> wrote:
> 4) Cork-then-uncork would still end up with two syscalls instead of one.
Syscalls are incredible cheap, this is not an argument for not
using cork'ing.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
2004-07-30 22:37 ` David S. Miller
@ 2004-07-31 8:10 ` Arjan van de Ven
2004-08-02 2:54 ` David S. Miller
0 siblings, 1 reply; 9+ messages in thread
From: Arjan van de Ven @ 2004-07-31 8:10 UTC (permalink / raw)
To: David S. Miller; +Cc: Robert White, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 404 bytes --]
On Sat, 2004-07-31 at 00:37, David S. Miller wrote:
> On Fri, 30 Jul 2004 15:02:33 -0700
> "Robert White" <rwhite@casabyte.com> wrote:
>
> > 4) Cork-then-uncork would still end up with two syscalls instead of one.
>
> Syscalls are incredible cheap, this is not an argument for not
> using cork'ing.
btw do we export MSG_MORE functionality to userspace ? That might be a
solution as well...
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
2004-07-31 8:10 ` Arjan van de Ven
@ 2004-08-02 2:54 ` David S. Miller
2004-08-02 19:44 ` Robert White
0 siblings, 1 reply; 9+ messages in thread
From: David S. Miller @ 2004-08-02 2:54 UTC (permalink / raw)
To: arjanv; +Cc: rwhite, linux-kernel
On Sat, 31 Jul 2004 10:10:06 +0200
Arjan van de Ven <arjanv@redhat.com> wrote:
> btw do we export MSG_MORE functionality to userspace ? That might be a
> solution as well...
Yes, we do.
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
2004-08-02 2:54 ` David S. Miller
@ 2004-08-02 19:44 ` Robert White
2004-08-02 19:53 ` Arjan van de Ven
0 siblings, 1 reply; 9+ messages in thread
From: Robert White @ 2004-08-02 19:44 UTC (permalink / raw)
To: 'David S. Miller', arjanv; +Cc: linux-kernel
Is there an argument _against_ providing an explicit flush?
-----Original Message-----
From: David S. Miller [mailto:davem@redhat.com]
Sent: Sunday, August 01, 2004 7:54 PM
To: arjanv@redhat.com
Cc: rwhite@casabyte.com; linux-kernel@vger.kernel.org
Subject: Re: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
On Sat, 31 Jul 2004 10:10:06 +0200
Arjan van de Ven <arjanv@redhat.com> wrote:
> btw do we export MSG_MORE functionality to userspace ? That might be a
> solution as well...
Yes, we do.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
2004-08-02 19:44 ` Robert White
@ 2004-08-02 19:53 ` Arjan van de Ven
2004-08-02 20:18 ` Robert White
0 siblings, 1 reply; 9+ messages in thread
From: Arjan van de Ven @ 2004-08-02 19:53 UTC (permalink / raw)
To: Robert White; +Cc: 'David S. Miller', linux-kernel
[-- Attachment #1: Type: text/plain, Size: 186 bytes --]
On Mon, Aug 02, 2004 at 12:44:41PM -0700, Robert White wrote:
> Is there an argument _against_ providing an explicit flush?
well MSG_MORE is equivalent, it's an explicit non-flush...
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
2004-08-02 19:53 ` Arjan van de Ven
@ 2004-08-02 20:18 ` Robert White
0 siblings, 0 replies; 9+ messages in thread
From: Robert White @ 2004-08-02 20:18 UTC (permalink / raw)
To: 'Arjan van de Ven'; +Cc: 'David S. Miller', linux-kernel
I am after explicit flushing of a "normal" (e.g. nagle mode) stream. The TCP_CORK
stuff is the opposite of what I am after as I am "only sometimes" in a position to
know that a flush will increase performance but I am often not in a position to know
that a CORK won't hurt it.
Right now I am performing the flush operation by calling setsockopt() twice, turning
NODELAY on and then back off.
Using the MSG_MORE will improve some of the small-frame preamble stuff, even with
nagle on (if I am understanding all of the ramifications). Does MSG_MORE work with
nagle off (NODELAY set)?
The original candidate patch is designed to flush a CORKed or normally Nagled (is
that a word?) socket in one call without losing the actual options. It does that by
saving the cork/nagle flag, doing the same flush stuff that turning off nagle does,
and then restoring the flag. It should be safely applicable to each of the three
modes (though kind of pointless for the NODELAY mode 8-).
Arguments for are:
-- The flush is one call.
-- The flush is "safe" to call from a library because it has no net-effect on the
persistent state of the socket. (This is a new argument in this message 8-)
Arguments against are:
-- Two calls are not significantly expensive.
-- Have to add the ioctl info to lots of places and the docs.
-----Original Message-----
From: Arjan van de Ven [mailto:arjanv@redhat.com]
Sent: Monday, August 02, 2004 12:54 PM
To: Robert White
Cc: 'David S. Miller'; linux-kernel@vger.kernel.org
Subject: Re: tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY
On Mon, Aug 02, 2004 at 12:44:41PM -0700, Robert White wrote:
> Is there an argument _against_ providing an explicit flush?
well MSG_MORE is equivalent, it's an explicit non-flush...
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-08-02 20:19 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-30 2:19 tcp_push_pending_frames() without TCP_CORK or TCP_NODELAY Robert White
2004-07-30 2:36 ` David S. Miller
2004-07-30 22:02 ` Robert White
2004-07-30 22:37 ` David S. Miller
2004-07-31 8:10 ` Arjan van de Ven
2004-08-02 2:54 ` David S. Miller
2004-08-02 19:44 ` Robert White
2004-08-02 19:53 ` Arjan van de Ven
2004-08-02 20:18 ` Robert White
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox