netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: netdev@vger.kernel.org
Cc: davem@davemloft.net, rds-devel@oss.oracle.com,
	ajaykumar.hotchandani@oracle.com, santosh.shilimkar@oracle.com,
	sowmini.varadhan@oracle.com
Subject: [PATCH net-next 00/17] RDS: multiple connection paths for scaling
Date: Mon, 13 Jun 2016 09:44:25 -0700	[thread overview]
Message-ID: <cover.1465829626.git.sowmini.varadhan@oracle.com> (raw)

Today RDS-over-TCP is implemented by demux-ing multiple PF_RDS sockets
between any 2 endpoints (where endpoint == [IP address, port]) over a
single TCP socket between the 2 IP addresses involved. This has the
limitation that it ends up funneling multiple RDS flows over a single
TCP flow, thus the rds/tcp connection is
   (a) upper-bounded to the single-flow bandwidth,
   (b) suffers from head-of-line blocking for the RDS sockets. 

Better throughput (for a fixed small packet size, MTU) can be achieved
by having multiple TCP/IP flows per rds/tcp connection, i.e., multipathed
RDS (mprds).  Each such TCP/IP flow constitutes a path for the rds/tcp
connection. RDS sockets will be attached to a path based on some hash
(e.g., of local address and RDS port number) and packets for that RDS
socket will be sent over the attached path using TCP to segment/reassemble
RDS datagrams on that path.

The table below, generated using a prototype that implements mprds,
shows that this is significant for scaling to 40G.  Packet sizes
used were: 8K byte req, 256 byte resp. MTU: 1500.  The parameters for
RDS-concurrency used below are described in the rds-stress(1) man page-
the number listed is proportional to the number of threads at which max
throughput was attained.

  -------------------------------------------------------------------
     RDS-concurrency   Num of       tx+rx K/s (iops)       throughput
     (-t N -d N)       TCP paths
  -------------------------------------------------------------------
        16             1             600K -  700K            4 Gbps
        28             8            5000K - 6000K           32 Gbps
  -------------------------------------------------------------------

FAQ: what is the relation between mprds and mptcp?
  mprds is orthogonal to mptcp. Whereas mptcp creates
  sub-flows for a single TCP connection, mprds parallelizes tx/rx
  at the RDS layer. MPRDS with N paths will allow N datagrams to
  be sent in parallel; each path will continue to send one
  datagram at a time, with sender and receiver keeping track of
  the retransmit and dgram-assembly state based on the RDS header.
  If desired, mptcp can additionally be used to speed up each TCP
  path. That acceleration is orthogonal to the parallelization benefits
  of mprds.

This patch series lays down the foundational data-structures to support
mprds in the kernel. It implements the changes to split up the
rds_connection structure into a common (to all paths) part,
and a per-path rds_conn_path. All I/O workqs are driven from
the rds_conn_path. 

Note that this patchset does not (yet) actually enable multipathing
for any of the transports; all transports will continue to use a 
single path with the refactored data-structures. A subsequent patchset
will  add the changes to the rds-tcp module to actually use mprds
in rds-tcp.

Sowmini Varadhan (17):
  RDS: split out connection specific state from rds_connection to
    rds_conn_path
  RDS: add t_mp_capable bit to be set by MP capable transports
  RDS: recv path gets the conn_path from rds_incoming for MP capable
    transports
  RDS: rds_inc_path_init() helper function for MP capable transports
  RDS: Add rds_send_path_reset()
  RDS: Add rds_send_path_drop_acked()
  RDS: Remove stale function rds_send_get_message()
  RDS: Make rds_send_queue_rm() rds_conn_path aware
  RDS: Pass rds_conn_path to rds_send_xmit()
  RDS: Extract rds_conn_path from i_conn_path in rds_send_drop_to() for
    MP-capable transports
  RDS: Make rds_send_pong() take a rds_conn_path argument
  RDS: Add rds_conn_path_connect_if_down() for MP-aware callers
  RDS: update rds-info related functions to traverse multiple
    conn_paths
  RDS: Add rds_conn_path_error()
  RDS: Initialize all RDS_MPATH_WORKERS in __rds_conn_create
  RDS: Update rds_conn_shutdown to work with rds_conn_path
  RDS: Update rds_conn_destroy to be MP capable

 net/rds/cong.c            |    3 +-
 net/rds/connection.c      |  329 +++++++++++++++++++++++++++++++-------------
 net/rds/ib.c              |    1 +
 net/rds/ib_cm.c           |    3 +-
 net/rds/ib_rdma.c         |    1 +
 net/rds/ib_recv.c         |    1 +
 net/rds/ib_send.c         |    1 +
 net/rds/loop.c            |    1 +
 net/rds/rdma_transport.c  |    1 +
 net/rds/rds.h             |  152 ++++++++++++++-------
 net/rds/rds_single_path.h |   30 ++++
 net/rds/recv.c            |   27 +++-
 net/rds/send.c            |  293 ++++++++++++++++++++--------------------
 net/rds/tcp.c             |    3 +-
 net/rds/tcp_connect.c     |    4 +-
 net/rds/tcp_listen.c      |   11 +-
 net/rds/tcp_recv.c        |    1 +
 net/rds/tcp_send.c        |    1 +
 net/rds/threads.c         |   95 ++++++++------
 19 files changed, 611 insertions(+), 347 deletions(-)
 create mode 100644 net/rds/rds_single_path.h

             reply	other threads:[~2016-06-13 16:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-13 16:44 Sowmini Varadhan [this message]
2016-06-13 16:44 ` [PATCH net-next 01/17] RDS: split out connection specific state from rds_connection to rds_conn_path Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 02/17] RDS: add t_mp_capable bit to be set by MP capable transports Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 03/17] RDS: recv path gets the conn_path from rds_incoming for " Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 04/17] RDS: rds_inc_path_init() helper function " Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 05/17] RDS: Add rds_send_path_reset() Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 06/17] RDS: Add rds_send_path_drop_acked() Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 07/17] RDS: Remove stale function rds_send_get_message() Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 08/17] RDS: Make rds_send_queue_rm() rds_conn_path aware Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 09/17] RDS: Pass rds_conn_path to rds_send_xmit() Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 10/17] RDS: Extract rds_conn_path from i_conn_path in rds_send_drop_to() for MP-capable transports Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 11/17] RDS: Make rds_send_pong() take a rds_conn_path argument Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 12/17] RDS: Add rds_conn_path_connect_if_down() for MP-aware callers Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 13/17] RDS: update rds-info related functions to traverse multiple conn_paths Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 14/17] RDS: Add rds_conn_path_error() Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 15/17] RDS: Initialize all RDS_MPATH_WORKERS in __rds_conn_create Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 16/17] RDS: Update rds_conn_shutdown to work with rds_conn_path Sowmini Varadhan
2016-06-13 16:44 ` [PATCH net-next 17/17] RDS: Update rds_conn_destroy to be MP capable Sowmini Varadhan
2016-06-15  7:04 ` [PATCH net-next 00/17] RDS: multiple connection paths for scaling David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1465829626.git.sowmini.varadhan@oracle.com \
    --to=sowmini.varadhan@oracle.com \
    --cc=ajaykumar.hotchandani@oracle.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=rds-devel@oss.oracle.com \
    --cc=santosh.shilimkar@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).