From: Andrea Arcangeli <andrea@suse.de>
To: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@transmeta.com>,
Rick Jones <raj@cup.hp.com>,
Linux Kernel List <linux-kernel@vger.kernel.org>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
"David S. Miller" <davem@redhat.com>
Subject: Re: [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]]
Date: Fri, 19 Jan 2001 16:25:52 +0100 [thread overview]
Message-ID: <20010119162552.D3447@athlon.random> (raw)
In-Reply-To: <20010118231651.L28276@athlon.random> <Pine.LNX.4.30.0101182317420.3437-100000@elte.hu>
In-Reply-To: <Pine.LNX.4.30.0101182317420.3437-100000@elte.hu>; from mingo@elte.hu on Thu, Jan 18, 2001 at 11:18:48PM +0100
On Thu, Jan 18, 2001 at 11:18:48PM +0100, Ingo Molnar wrote:
>
> On Thu, 18 Jan 2001, Andrea Arcangeli wrote:
>
> > This is a possible slow (but userspace based) implementation of SIOCPUSH:
>
> of course this is what i meant. Lets stop wasting time on this, ok?
We were both wrong. Not even my pseudocode was equivalent to SIOCPUSH 8). Infact
this example 1):
val = 1
setsockopt(TCP_CORK, &val)
sendfile()
sendfile()
write()
write()
whatever()
val = 0
setsockopt(TCP_CORK, &val)
is _not_ equivalent to this example 2):
val = 1
setsockopt(TCP_CORK, &val)
sendfile()
sendfile()
write()
write()
whatever()
val = 0
setsockopt(TCP_CORK, &val)
ioctl(SIOCPUSH)
We were wrong because the "uncork" doesn't do what we all expected from it.
The "uncork" won't push the last skb on the wire if there is not acknowledged
data in the write_queue and the payload of the last skb in the write_queue
isn't large MSS. This because the `uncork' will only re-evaluate the
write_queue in function of the _nagle_ algorithm, quite correctly because the
"uncork" will move frok "cork" to "nagle" (not from "cork" to "nodelay").
Infact this below ugly 3) is finally equivalent to 2) and that's what we
all expected from 1):
val = 1
setsockopt(TCP_CORK, &val)
sendfile()
sendfile()
write()
write()
whatever()
val = 0
setsockopt(TCP_CORK, &val)
val = 1
setsockopt(TCP_NODELAY, &val)
val = 0
setsockopt(TCP_NODELAY, &val)
The uncork didn't do what we all expected but I don't consider it a bug in the
uncork itself because reprocessing the write queue with nagle makes perfect
sense (the socket effectively is in nalge mode, not in nodelay mode), and with
SIOCPUSH we don't need to hack the uncork semantics to avoid having to enter
nodelay mode just to push the last skb in the writequeue into the wire, so
there's no pratical problem (the socket will remain corked or nagled all the
time anyways).
So in short I just wanted to clarify that the "uncorking" doesn't "push" the
write_queue into the wire ASAP as SIOCPUSH does, but it only ensures that the
queued data will arrive to the other end "eventually". SIOCPUSH is instead the
right way to notify the stack it doesn't worth to wait for new outgoing packets
caming from usersapce.
However my implementation of SIOCPUSH of yesterday isn't completly right yet,
my one works fine only if we can send the last fragment immediatly (our cwnd
and receiver advertised window may forbid that).
The right way to implement the SIOCPUSH is to add a flag into the `struct
tcp_opt' tp->push. It works this way: tp->push is set by SIOCPUSH if something
is left in the write_queue after re-evaluating the write_queue in nodelay mode as
my patch just does. Then the checks on the write queue triggered by the
incoming acks will also check tp->push and they will consider tp->nonagle set
as 1 (nodelay) if tp->push is set. The first send on the socket that arrives
from userspace will clear tp->push. Then SIOCPUSH will work fine with the
semantics we expect.
I'm too busy with other stuff to spend more time than this on the TCP stack
right now, so if somebody of the TCP folks wants to reimplement SIOCPUSH
correctly (as described above) I'm fine. If you can wait I will do that myself
in a few weeks.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2001-01-19 15:31 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <3A65E825.FFEB194@cup.hp.com>
2001-01-17 19:27 ` [Fwd: [Fwd: Is sendfile all that sexy? (fwd)]] Linus Torvalds
2001-01-17 20:03 ` Rick Jones
2001-01-17 20:38 ` dean gaudet
2001-01-17 20:57 ` Rick Jones
2001-01-18 13:06 ` Ingo Molnar
2001-01-18 14:57 ` Andi Kleen
2001-01-18 18:24 ` Rick Jones
2001-01-19 2:46 ` dean gaudet
2001-01-19 3:03 ` dean gaudet
2001-01-19 19:07 ` Rick Jones
2001-01-19 20:03 ` kuznet
2001-01-19 21:20 ` Rick Jones
2001-01-20 18:03 ` kuznet
2001-01-22 18:44 ` Rick Jones
2001-01-20 14:56 ` Kai Henningsen
2001-01-23 7:20 ` dean gaudet
2001-01-20 23:09 ` Lincoln Dale
2001-01-20 23:27 ` James Sutherland
2001-01-21 0:02 ` Chris Wedgwood
2001-01-18 12:56 ` Ingo Molnar
2001-01-25 17:58 ` Jamie Lokier
2001-01-17 21:22 ` Linus Torvalds
2001-01-17 22:17 ` Rick Jones
2001-01-17 22:53 ` Linus Torvalds
2001-01-17 22:44 ` Jonathan Walther
2001-01-18 13:18 ` Ingo Molnar
2001-01-18 16:49 ` Linus Torvalds
2001-01-18 17:32 ` Ingo Molnar
2001-01-18 17:49 ` Zach Brown
2001-01-19 3:16 ` dean gaudet
2001-01-19 3:56 ` David Ford
2001-01-18 18:29 ` Rick Jones
2001-01-18 18:50 ` Linus Torvalds
2001-01-18 19:38 ` Andrea Arcangeli
2001-01-18 19:43 ` Ingo Molnar
2001-01-18 19:52 ` Linus Torvalds
2001-01-18 20:11 ` kuznet
2001-01-18 20:33 ` Ingo Molnar
2001-01-18 21:14 ` Andrea Arcangeli
2001-01-18 20:24 ` Andrea Arcangeli
2001-01-18 20:37 ` kuznet
2001-01-18 21:04 ` Andrea Arcangeli
2001-01-19 0:27 ` Olivier Galibert
2001-01-19 0:59 ` Rick Jones
2001-01-19 17:52 ` kuznet
2001-01-19 20:54 ` Andrea Arcangeli
2001-01-18 20:44 ` Ingo Molnar
2001-01-18 21:54 ` Andrea Arcangeli
2001-01-18 21:57 ` Ingo Molnar
2001-01-18 22:16 ` Andrea Arcangeli
2001-01-18 22:18 ` Ingo Molnar
2001-01-19 15:25 ` Andrea Arcangeli [this message]
2001-01-19 18:18 ` kuznet
2001-01-19 21:13 ` Andrea Arcangeli
2001-01-20 17:28 ` kuznet
2001-01-20 18:14 ` Abramo Bagnara
2001-01-20 18:23 ` Andrea Arcangeli
2001-01-20 19:05 ` kuznet
2001-01-20 19:30 ` Andrea Arcangeli
2001-01-20 19:39 ` Linus Torvalds
2001-01-20 20:22 ` kuznet
2001-01-20 21:20 ` Andrea Arcangeli
2001-01-20 20:56 ` Andrea Arcangeli
2001-01-21 18:37 ` kuznet
2001-01-20 19:39 ` kuznet
2001-01-20 21:05 ` Andrea Arcangeli
2001-01-20 21:31 ` Guus Sliepen
2001-01-18 22:20 ` Ingo Molnar
2001-01-18 19:45 ` Linus Torvalds
2001-01-18 19:59 ` kuznet
2001-01-18 20:44 ` Andrea Arcangeli
2001-01-19 3:25 ` dean gaudet
2001-01-19 3:35 ` dean gaudet
2001-01-18 9:34 ` Andi Kleen
2001-01-18 18:20 ` Rick Jones
2001-01-18 19:45 ` Andi Kleen
2001-01-18 20:30 ` kuznet
2001-01-18 20:50 ` Ingo Molnar
2001-01-18 22:49 ` Rick Jones
2001-01-18 13:29 ` Ingo Molnar
2001-01-18 16:51 ` Linus Torvalds
2001-01-18 17:04 ` Ingo Molnar
2001-01-17 21:51 Dan Kegel
-- strict thread matches above, loose matches on Subject: below --
2001-01-24 0:19 Cacophonix
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20010119162552.D3447@athlon.random \
--to=andrea@suse.de \
--cc=davem@redhat.com \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=raj@cup.hp.com \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox