From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: Florian Westphal <fw@strlen.de>
Cc: David Miller <davem@davemloft.net>,
tom@herbertland.com, hannes@stressinduktion.org,
netdev@vger.kernel.org, kernel-team@fb.com, davejwatson@fb.com,
alexei.starovoitov@gmail.com
Subject: Re: [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM)
Date: Wed, 25 Nov 2015 11:26:21 -0500 [thread overview]
Message-ID: <20151125162621.GA13985@oracle.com> (raw)
In-Reply-To: <20151124162515.GA22266@breakpoint.cc>
On (11/24/15 17:25), Florian Westphal wrote:
> Its a well-written document, but I don't see how moving the burden of
> locking a single logical tcp connection (to prevent threads from
> reading a partial record) from userspace to kernel is an improvement.
>
> If you really have 100 threads and must use a single tcp connection
> to multiplex some arbitrarily complex record-format in atomic fashion,
> then your requirements suck.
In the interest of providing some context from the rds-tcp use-case
here (without drifting into hyperbole).. RDS-TCP, like KCM,
provides a dgram-over-stream socket, with SEQPACKET semantics,
and an upper-bounded record-size per POSIX/SEQPACKET semantics.
The major difference from kcm is that it does not use BPF, but
instead has its own protocol header for each datagram.
There seems to be some misconception in this thread that this model
is about allowing application to be "lazy" and do a 1:1 mapping between
streams- that's not the case for RDS.
In the case of cluster apps, we have DB apps that want to have a single
dgram socket to talk to multiple peers (i.e., a star-network, with the
node in the center of the star wanting to have dgram sockets to everyone
else. Scale is more than a mere 100 threads).
If that central node wants reliable, ordered, congestion-managed
delivery, it would have to use UDP + bunch of its own code for
seq#, rexmit etc. And they are doing that today, but dont want the
to reinvent TCP's congavoid (and in fact, in the absence of congestion,
one complaint is that udp latency is 2x-3x better than rds-tcp for a
512 byte req, 8K resp that is typical for DB workloads. I'm still
investigating)
>From the TCP standpoint of rds-tcp, we have a many-one mapping:
multiple RDS sockets funneling to a single tcp connection, sharing
a single congestion state-machine.
I dont know if this is a "poorly designed application", I'm sure
its not perfect, but we have a ton of Oracle clustering s/w that's
already doing this with IB, so extending this with rds-tcp made
sense for us at this point.
--Sowmini
prev parent reply other threads:[~2015-11-25 16:26 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-20 21:21 [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM) Tom Herbert
2015-11-20 21:21 ` [PATCH net-next 1/6] rcu: Add list_next_or_null_rcu Tom Herbert
2015-11-20 21:21 ` [PATCH net-next 2/6] net: Make sock_alloc exportable Tom Herbert
2015-11-20 21:21 ` [PATCH net-next 3/6] net: Add MSG_BATCH flag Tom Herbert
2015-11-23 10:02 ` Hannes Frederic Sowa
2015-11-20 21:21 ` [PATCH net-next 4/6] kcm: Kernel Connection Multiplexor module Tom Herbert
2015-11-20 22:50 ` Sowmini Varadhan
2015-11-20 23:19 ` Tom Herbert
2015-11-20 23:27 ` Sowmini Varadhan
2015-11-20 23:10 ` Alexei Starovoitov
2015-11-20 23:20 ` Tom Herbert
2015-11-23 9:42 ` Daniel Borkmann
2015-11-20 21:21 ` [PATCH net-next 5/6] kcm: Add statistics and proc interfaces Tom Herbert
2015-11-20 21:22 ` [PATCH net-next 6/6] kcm: Add description in Documentation Tom Herbert
2015-11-23 9:53 ` [PATCH net-next 0/6] kcm: Kernel Connection Multiplexor (KCM) Hannes Frederic Sowa
2015-11-23 12:43 ` Sowmini Varadhan
2015-11-23 17:33 ` Tom Herbert
2015-11-23 19:35 ` Hannes Frederic Sowa
2015-11-23 19:54 ` David Miller
2015-11-23 20:02 ` Tom Herbert
2015-11-24 11:25 ` Hannes Frederic Sowa
2015-11-24 15:49 ` David Miller
2015-11-24 15:27 ` Florian Westphal
2015-11-24 15:49 ` Eric Dumazet
2015-11-24 18:09 ` Rick Jones
2015-11-24 15:55 ` David Miller
2015-11-24 16:25 ` Florian Westphal
2015-11-24 17:00 ` Tom Herbert
2015-11-24 17:16 ` Florian Westphal
2015-11-24 17:43 ` Tom Herbert
2015-11-24 20:55 ` Florian Westphal
2015-11-24 21:49 ` Tom Herbert
2015-11-24 22:22 ` Florian Westphal
2015-11-24 22:25 ` David Miller
2015-11-24 22:45 ` Florian Westphal
2015-11-24 23:13 ` Hannes Frederic Sowa
2015-11-24 18:23 ` Hannes Frederic Sowa
2015-11-24 18:59 ` Alexei Starovoitov
2015-11-24 19:16 ` Hannes Frederic Sowa
2015-11-24 19:26 ` Hannes Frederic Sowa
2015-11-24 20:23 ` Alexei Starovoitov
[not found] ` <1448402288.1489559.449199721.64EBB346@webmail.messagingengine.com>
[not found] ` <20151124222109.GA86838@ast-mbp.thefacebook.com>
2015-11-25 10:38 ` Hannes Frederic Sowa
2015-11-25 16:26 ` Sowmini Varadhan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151125162621.GA13985@oracle.com \
--to=sowmini.varadhan@oracle.com \
--cc=alexei.starovoitov@gmail.com \
--cc=davejwatson@fb.com \
--cc=davem@davemloft.net \
--cc=fw@strlen.de \
--cc=hannes@stressinduktion.org \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).