From: Vlad Yasevich <vyasevich@gmail.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: netdev@vger.kernel.org, Sridhar Samudrala <sri@us.ibm.com>,
"David S. Miller" <davem@davemloft.net>,
linux-sctp@vger.kernel.org
Subject: Re: [PATCH v2] sctp: Implement quick failover draft from tsvwg
Date: Wed, 18 Jul 2012 17:23:05 -0400 [thread overview]
Message-ID: <50072939.5030600@gmail.com> (raw)
In-Reply-To: <1342634466-17930-1-git-send-email-nhorman@tuxdriver.com>
On 07/18/2012 02:01 PM, Neil Horman wrote:
> I've seen several attempts recently made to do quick failover of sctp transports
> by reducing various retransmit timers and counters. While its possible to
> implement a faster failover on multihomed sctp associations, its not
> particularly robust, in that it can lead to unneeded retransmits, as well as
> false connection failures due to intermittent latency on a network.
>
> Instead, lets implement the new ietf quick failover draft found here:
> http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
>
> This will let the sctp stack identify transports that have had a small number of
> errors, and avoid using them quickly until their reliability can be
> re-established. I've tested this out on two virt guests connected via multiple
> isolated virt networks and believe its in compliance with the above draft and
> works well.
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Vlad Yasevich <vyasevich@gmail.com>
> CC: Sridhar Samudrala <sri@us.ibm.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: linux-sctp@vger.kernel.org
>
> ---
> Change notes:
>
> V2)
> - Added socket option API from section 6.1 of the specification, as per
> request from Vlad. Adding this socket option allows us to alter both the path
> maximum retransmit value and the path partial failure threshold for each
> transport and the association as a whole.
>
> - Added a per transport pf_retrans value, and initialized it from the
> association value. This makes each transport independently configurable as per
> the socket option above, and prevents changes in the sysctl from bleeding into
> an already created association.
> ---
> Documentation/networking/ip-sysctl.txt | 14 +++++
> include/net/sctp/constants.h | 1 +
> include/net/sctp/structs.h | 11 +++-
> include/net/sctp/user.h | 11 ++++
> net/sctp/associola.c | 36 ++++++++++--
> net/sctp/outqueue.c | 6 +-
> net/sctp/sm_sideeffect.c | 33 ++++++++++-
> net/sctp/socket.c | 96 ++++++++++++++++++++++++++++++++
> net/sctp/sysctl.c | 9 +++
> net/sctp/transport.c | 4 +-
> 10 files changed, 206 insertions(+), 15 deletions(-)
>
[ snip ]
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index b3b8a8d..dfffece 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -3470,6 +3470,52 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
> }
>
>
> +/*
> + * SCTP_PEER_ADDR_THLDS
> + *
> + * This option allows us to alter the partially failed threshold for one or all
> + * transports in an association. See Section 6.1 of:
> + * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> + */
> +static int sctp_setsockopt_paddr_thresholds(struct sock *sk,
> + char __user *optval,
> + unsigned int optlen)
> +{
> + struct sctp_paddrthlds val;
> + struct sctp_transport *trans;
> + struct sctp_association *asoc;
> +
> + if (optlen < sizeof(struct sctp_paddrthlds))
> + return -EINVAL;
> + if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval,
> + optlen))
> + return -EFAULT;
What if optlen is bigger? You going to trash the stack.
> +
> + if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
> + asoc = sctp_id2assoc(sk, val.spt_assoc_id);
> + if (!asoc)
> + return -ENOENT;
> + list_for_each_entry(trans, &asoc->peer.transport_addr_list,
> + transports) {
> + trans->pathmaxrxt = val.spt_pathmaxrxt;
> + trans->pf_retrans = val.spt_pathpfthld;
You want to make sure that the values aren't 0. Otherwise, you'll set
the pathmaxrxt to 0 and that would be bad.
> + }
> +
> + asoc->pf_retrans = val.spt_pathpfthld;
> + asoc->pathmaxrxt = val.spt_pathmaxrxt;
Ditto.
> + } else {
> + trans = sctp_addr_id2transport(sk, &val.spt_address,
> + val.spt_assoc_id);
> + if (!trans)
> + return -ENOENT;
> +
> + trans->pathmaxrxt = val.spt_pathmaxrxt;
> + trans->pf_retrans = val.spt_pathpfthld;
Ditto.
> + }
> +
> + return 0;
> +}
> +
> /* API 6.2 setsockopt(), getsockopt()
> *
> * Applications use setsockopt() and getsockopt() to set or retrieve
> @@ -3619,6 +3665,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
> case SCTP_AUTO_ASCONF:
> retval = sctp_setsockopt_auto_asconf(sk, optval, optlen);
> break;
> + case SCTP_PEER_ADDR_THLDS:
> + retval = sctp_setsockopt_paddr_thresholds(sk, optval, optlen);
> + break;
> default:
> retval = -ENOPROTOOPT;
> break;
> @@ -5490,6 +5539,50 @@ static int sctp_getsockopt_assoc_ids(struct sock *sk, int len,
> return 0;
> }
>
> +/*
> + * SCTP_PEER_ADDR_THLDS
> + *
> + * This option allows us to fetch the partially failed threshold for one or all
> + * transports in an association. See Section 6.1 of:
> + * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> + */
> +static int sctp_getsockopt_paddr_thresholds(struct sock *sk,
> + char __user *optval,
> + int optlen)
> +{
> + struct sctp_paddrthlds val;
> + struct sctp_transport *trans;
> + struct sctp_association *asoc;
> +
> + if (optlen < sizeof(struct sctp_paddrthlds))
> + return -EINVAL;
> + if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval, optlen))
> + return -EFAULT;
Again, trashing the stack if optlen and optval are bigger.
-vlad
> +
> + if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
> + val.spt_assoc_id);
> + asoc = sctp_id2assoc(sk, val.spt_assoc_id);
> + if (!asoc)
> + return -ENOENT;
> +
> + val.spt_pathpfthld = asoc->pf_retrans;
> + val.spt_pathmaxrxt = asoc->pathmaxrxt;
> + } else {
> + trans = sctp_addr_id2transport(sk, &val.spt_address,
> + val.spt_assoc_id);
> + if (!trans)
> + return -ENOENT;
> +
> + val.spt_pathmaxrxt = trans->pathmaxrxt;
> + val.spt_pathpfthld = trans->pf_retrans;
> + }
> +
> + if (copy_to_user(optval, &val, optlen))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
> char __user *optval, int __user *optlen)
> {
> @@ -5628,6 +5721,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
> case SCTP_AUTO_ASCONF:
> retval = sctp_getsockopt_auto_asconf(sk, len, optval, optlen);
> break;
> + case SCTP_PEER_ADDR_THLDS:
> + retval = sctp_getsockopt_paddr_thresholds(sk, optval, len);
> + break;
> default:
> retval = -ENOPROTOOPT;
> break;
> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> index e5fe639..2b2bfe9 100644
> --- a/net/sctp/sysctl.c
> +++ b/net/sctp/sysctl.c
> @@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
> .extra2 = &int_max
> },
> {
> + .procname = "pf_retrans",
> + .data = &sctp_pf_retrans,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = &zero,
> + .extra2 = &int_max
> + },
> + {
> .procname = "max_init_retransmits",
> .data = &sctp_max_retrans_init,
> .maxlen = sizeof(int),
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index b026ba0..194d0f3 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -85,6 +85,7 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
>
> /* Initialize the default path max_retrans. */
> peer->pathmaxrxt = sctp_max_retrans_path;
> + peer->pf_retrans = sctp_pf_retrans;
>
> INIT_LIST_HEAD(&peer->transmitted);
> INIT_LIST_HEAD(&peer->send_ready);
> @@ -585,7 +586,8 @@ unsigned long sctp_transport_timeout(struct sctp_transport *t)
> {
> unsigned long timeout;
> timeout = t->rto + sctp_jitter(t->rto);
> - if (t->state != SCTP_UNCONFIRMED)
> + if ((t->state != SCTP_UNCONFIRMED) &&
> + (t->state != SCTP_PF))
> timeout += t->hbinterval;
> timeout += jiffies;
> return timeout;
>
next prev parent reply other threads:[~2012-07-18 21:23 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-13 18:26 [PATCH] sctp: Implement quick failover draft from tsvwg Neil Horman
2012-07-14 18:12 ` Vlad Yasevich
2012-07-14 21:22 ` Neil Horman
2012-07-18 18:01 ` [PATCH v2] " Neil Horman
2012-07-18 20:30 ` Joe Perches
2012-07-19 10:45 ` Neil Horman
2012-07-19 16:54 ` Joe Perches
2012-07-18 21:23 ` Vlad Yasevich [this message]
2012-07-19 10:46 ` Neil Horman
2012-07-19 16:51 ` [PATCH v3] " Neil Horman
2012-07-20 16:51 ` Flavio Leitner
2012-07-20 17:19 ` [PATCH v4] " Neil Horman
2012-07-20 17:55 ` Vlad Yasevich
2012-07-20 18:36 ` Neil Horman
2012-07-20 18:51 ` [PATCH v5] " Neil Horman
2012-07-20 19:10 ` Flavio Leitner
2012-07-20 19:31 ` David Miller
2012-07-23 14:28 ` Flavio Leitner
2012-07-21 6:45 ` Vlad Yasevich
2012-07-21 11:03 ` Neil Horman
2012-07-21 17:56 ` [PATCH v6] " Neil Horman
2012-07-22 18:18 ` Vlad Yasevich
2012-07-22 19:14 ` David Miller
2012-07-22 19:18 ` Neil Horman <nhorman@tuxdriver.com>
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50072939.5030600@gmail.com \
--to=vyasevich@gmail.com \
--cc=davem@davemloft.net \
--cc=linux-sctp@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=sri@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).