netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Herbert <tom@herbertland.com>
To: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
	"David S. Miller" <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket
Date: Fri, 11 Mar 2016 19:21:26 -0800	[thread overview]
Message-ID: <CALx6S37+Zj-OmApDHzb2hq0jSLsbat4M9Xv6ASzB5VGP9HrQqw@mail.gmail.com> (raw)
In-Reply-To: <20160312024341.GA26486@oracle.com>

On Fri, Mar 11, 2016 at 6:43 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (03/11/16 11:09), Stephen Hemminger wrote:
>> Module parameters are a problem for distributions and should only be used
>> as a last resort.
>
> I dont know the history of what the distibution problem is, but I
> did try to use sysctl as an alternative for this. I'm starting to
> believe that this is one case where module params, with all their
> problems, are the least evil option. Here's what I find if I use sysctl:
>
> - being able to tune the sndbuf and rcvbuf actually gives me a noticeable
>   2X perf improvement over the default for DB/Cluster request/response
>   transactions, where the classic req size is 8K bytes, response is 256
>   bytes, and there are a large number of such concurrent transactions
>   queued up on the kernel tcp socket. (The defaults work well for
>   larger packet sizes, but as I noted in the commit, packet sizes vary
>   in actual deployment).
>
> Assuming I use sysctl:
>
> - by the time the admin gets to execute the sysctl, the kernel listen
>   socket for RDS_TCP_PORT would already have been created, and an
>   arbitrary number of accept/connect (kernel) endpoints may have been
>   created.  According to tcp(7), you should be setting the buf sizes before
>   connect/listen. So using sysctl means you have to reset a variable
>   number of existing cluster connections. All this can be done, but
>   adds a large mass of confusing code to reset kernel sockets and
>   also get the cluster HA/failover right.
>
> - at first I thought sysctl was attractive because it was netns aware,
>   but found that it is  only superficially so. The ->proc_handler does
>   not pass in the struct net *, and setting up the ctl_table's ->data
>   to a per-net var is not simple thing to do. Thus, even though
>   register_net_sysctl() takes a net * pointer, my handler has to do
>   extra ugly things to get to per-net vars.
>
> I dont know if there is a better alternative than sysctl/module_param
> for achieving what I'm trying to do, which is to set up the params
> for the kernel sockets before creating them. If yes, some
> hints/rtfms would be helpful.
>
You could consider and alternate model where connection management
(connect, listen, etc.) is performed in a userspace daemon and the
attached to the kernel (pretty much like KCM does). This moves
complexity out of kernel, and gives you the flexibility to configure
arbitrarily configure TCP sockets (like the 4 setsockopts in
rds_tcp_keepalive could be configurable). The other cool thing this
would allow is switching an existing TCP connection into RDS mode
(like is done with several protocols on an http port).

Tom

> --Sowmini
>

  reply	other threads:[~2016-03-12  3:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-11 18:29 [PATCH net-next] rds-tcp: Add module parameters to control sndbuf/rcvbuf size of RDS-TCP socket Sowmini Varadhan
2016-03-11 19:09 ` Stephen Hemminger
2016-03-11 19:12   ` Sowmini Varadhan
2016-03-12  2:43   ` Sowmini Varadhan
2016-03-12  3:21     ` Tom Herbert [this message]
2016-03-12  3:44       ` Sowmini Varadhan
2016-03-12  4:07         ` Tom Herbert
2016-03-12  4:39           ` Sowmini Varadhan
2016-03-14 17:57             ` Tom Herbert
2016-03-14 18:06               ` Sowmini Varadhan
2016-03-14 18:59                 ` David Miller
2016-03-11 19:14 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALx6S37+Zj-OmApDHzb2hq0jSLsbat4M9Xv6ASzB5VGP9HrQqw@mail.gmail.com \
    --to=tom@herbertland.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=sowmini.varadhan@oracle.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).