Re: [PATCH v2] sctp: Implement quick failover draft from tsvwg

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vlad Yasevich <vyasevich@gmail.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: netdev@vger.kernel.org, Sridhar Samudrala <sri@us.ibm.com>,
	"David S. Miller" <davem@davemloft.net>,
	linux-sctp@vger.kernel.org
Subject: Re: [PATCH v2] sctp: Implement quick failover draft from tsvwg
Date: Wed, 18 Jul 2012 21:23:05 +0000	[thread overview]
Message-ID: <50072939.5030600@gmail.com> (raw)
In-Reply-To: <1342634466-17930-1-git-send-email-nhorman@tuxdriver.com>

On 07/18/2012 02:01 PM, Neil Horman wrote:
> I've seen several attempts recently made to do quick failover of sctp transports
> by reducing various retransmit timers and counters.  While its possible to
> implement a faster failover on multihomed sctp associations, its not
> particularly robust, in that it can lead to unneeded retransmits, as well as
> false connection failures due to intermittent latency on a network.
>
> Instead, lets implement the new ietf quick failover draft found here:
> http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
>
> This will let the sctp stack identify transports that have had a small number of
> errors, and avoid using them quickly until their reliability can be
> re-established.  I've tested this out on two virt guests connected via multiple
> isolated virt networks and believe its in compliance with the above draft and
> works well.
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Vlad Yasevich <vyasevich@gmail.com>
> CC: Sridhar Samudrala <sri@us.ibm.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: linux-sctp@vger.kernel.org
>
> ---
> Change notes:
>
> V2)
> - Added socket option API from section 6.1 of the specification, as per
> request from Vlad. Adding this socket option allows us to alter both the path
> maximum retransmit value and the path partial failure threshold for each
> transport and the association as a whole.
>
> - Added a per transport pf_retrans value, and initialized it from the
> association value.  This makes each transport independently configurable as per
> the socket option above, and prevents changes in the sysctl from bleeding into
> an already created association.
> ---
>   Documentation/networking/ip-sysctl.txt |   14 +++++
>   include/net/sctp/constants.h           |    1 +
>   include/net/sctp/structs.h             |   11 +++-
>   include/net/sctp/user.h                |   11 ++++
>   net/sctp/associola.c                   |   36 ++++++++++--
>   net/sctp/outqueue.c                    |    6 +-
>   net/sctp/sm_sideeffect.c               |   33 ++++++++++-
>   net/sctp/socket.c                      |   96 ++++++++++++++++++++++++++++++++
>   net/sctp/sysctl.c                      |    9 +++
>   net/sctp/transport.c                   |    4 +-
>   10 files changed, 206 insertions(+), 15 deletions(-)
>

[ snip ]

> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index b3b8a8d..dfffece 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -3470,6 +3470,52 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
>   }
>
>
> +/*
> + * SCTP_PEER_ADDR_THLDS
> + *
> + * This option allows us to alter the partially failed threshold for one or all
> + * transports in an association.  See Section 6.1 of:
> + * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> + */
> +static int sctp_setsockopt_paddr_thresholds(struct sock *sk,
> +					    char __user *optval,
> +					    unsigned int optlen)
> +{
> +	struct sctp_paddrthlds val;
> +	struct sctp_transport *trans;
> +	struct sctp_association *asoc;
> +
> +	if (optlen < sizeof(struct sctp_paddrthlds))
> +		return -EINVAL;
> +	if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval,
> +			   optlen))
> +		return -EFAULT;

What if optlen is bigger?  You going to trash the stack.

> +
> +	if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
> +		asoc = sctp_id2assoc(sk, val.spt_assoc_id);
> +		if (!asoc)
> +			return -ENOENT;
> +		list_for_each_entry(trans, &asoc->peer.transport_addr_list,
> +				    transports) {
> +			trans->pathmaxrxt = val.spt_pathmaxrxt;
> +			trans->pf_retrans = val.spt_pathpfthld;

You want to make sure that the values aren't 0.  Otherwise, you'll set 
the pathmaxrxt to 0 and that would be bad.

> +		}
> +
> +		asoc->pf_retrans = val.spt_pathpfthld;
> +		asoc->pathmaxrxt = val.spt_pathmaxrxt;

Ditto.

> +	} else {
> +		trans = sctp_addr_id2transport(sk, &val.spt_address,
> +					       val.spt_assoc_id);
> +		if (!trans)
> +			return -ENOENT;
> +
> +		trans->pathmaxrxt = val.spt_pathmaxrxt;
> +		trans->pf_retrans = val.spt_pathpfthld;

Ditto.

> +	}
> +
> +	return 0;
> +}
> +
>   /* API 6.2 setsockopt(), getsockopt()
>    *
>    * Applications use setsockopt() and getsockopt() to set or retrieve
> @@ -3619,6 +3665,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
>   	case SCTP_AUTO_ASCONF:
>   		retval = sctp_setsockopt_auto_asconf(sk, optval, optlen);
>   		break;
> +	case SCTP_PEER_ADDR_THLDS:
> +		retval = sctp_setsockopt_paddr_thresholds(sk, optval, optlen);
> +		break;
>   	default:
>   		retval = -ENOPROTOOPT;
>   		break;
> @@ -5490,6 +5539,50 @@ static int sctp_getsockopt_assoc_ids(struct sock *sk, int len,
>   	return 0;
>   }
>
> +/*
> + * SCTP_PEER_ADDR_THLDS
> + *
> + * This option allows us to fetch the partially failed threshold for one or all
> + * transports in an association.  See Section 6.1 of:
> + * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> + */
> +static int sctp_getsockopt_paddr_thresholds(struct sock *sk,
> +					    char __user *optval,
> +					    int optlen)
> +{
> +	struct sctp_paddrthlds val;
> +	struct sctp_transport *trans;
> +	struct sctp_association *asoc;
> +
> +	if (optlen < sizeof(struct sctp_paddrthlds))
> +		return -EINVAL;
> +	if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval, optlen))
> +		return -EFAULT;

Again, trashing the stack if optlen and optval are bigger.

-vlad
> +
> +	if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
> +			val.spt_assoc_id);
> +		asoc = sctp_id2assoc(sk, val.spt_assoc_id);
> +		if (!asoc)
> +			return -ENOENT;
> +
> +		val.spt_pathpfthld = asoc->pf_retrans;
> +		val.spt_pathmaxrxt = asoc->pathmaxrxt;
> +	} else {
> +		trans = sctp_addr_id2transport(sk, &val.spt_address,
> +					       val.spt_assoc_id);
> +		if (!trans)
> +			return -ENOENT;
> +
> +		val.spt_pathmaxrxt = trans->pathmaxrxt;
> +		val.spt_pathpfthld = trans->pf_retrans;
> +	}
> +
> +	if (copy_to_user(optval, &val, optlen))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>   SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
>   				char __user *optval, int __user *optlen)
>   {
> @@ -5628,6 +5721,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
>   	case SCTP_AUTO_ASCONF:
>   		retval = sctp_getsockopt_auto_asconf(sk, len, optval, optlen);
>   		break;
> +	case SCTP_PEER_ADDR_THLDS:
> +		retval = sctp_getsockopt_paddr_thresholds(sk, optval, len);
> +		break;
>   	default:
>   		retval = -ENOPROTOOPT;
>   		break;
> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> index e5fe639..2b2bfe9 100644
> --- a/net/sctp/sysctl.c
> +++ b/net/sctp/sysctl.c
> @@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
>   		.extra2		= &int_max
>   	},
>   	{
> +		.procname	= "pf_retrans",
> +		.data		= &sctp_pf_retrans,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &int_max
> +	},
> +	{
>   		.procname	= "max_init_retransmits",
>   		.data		= &sctp_max_retrans_init,
>   		.maxlen		= sizeof(int),
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index b026ba0..194d0f3 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -85,6 +85,7 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
>
>   	/* Initialize the default path max_retrans.  */
>   	peer->pathmaxrxt  = sctp_max_retrans_path;
> +	peer->pf_retrans  = sctp_pf_retrans;
>
>   	INIT_LIST_HEAD(&peer->transmitted);
>   	INIT_LIST_HEAD(&peer->send_ready);
> @@ -585,7 +586,8 @@ unsigned long sctp_transport_timeout(struct sctp_transport *t)
>   {
>   	unsigned long timeout;
>   	timeout = t->rto + sctp_jitter(t->rto);
> -	if (t->state != SCTP_UNCONFIRMED)
> +	if ((t->state != SCTP_UNCONFIRMED) &&
> +	    (t->state != SCTP_PF))
>   		timeout += t->hbinterval;
>   	timeout += jiffies;
>   	return timeout;
>

WARNING: multiple messages have this Message-ID (diff)

From: Vlad Yasevich <vyasevich@gmail.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: netdev@vger.kernel.org, Sridhar Samudrala <sri@us.ibm.com>,
	"David S. Miller" <davem@davemloft.net>,
	linux-sctp@vger.kernel.org
Subject: Re: [PATCH v2] sctp: Implement quick failover draft from tsvwg
Date: Wed, 18 Jul 2012 17:23:05 -0400	[thread overview]
Message-ID: <50072939.5030600@gmail.com> (raw)
In-Reply-To: <1342634466-17930-1-git-send-email-nhorman@tuxdriver.com>

On 07/18/2012 02:01 PM, Neil Horman wrote:
> I've seen several attempts recently made to do quick failover of sctp transports
> by reducing various retransmit timers and counters.  While its possible to
> implement a faster failover on multihomed sctp associations, its not
> particularly robust, in that it can lead to unneeded retransmits, as well as
> false connection failures due to intermittent latency on a network.
>
> Instead, lets implement the new ietf quick failover draft found here:
> http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
>
> This will let the sctp stack identify transports that have had a small number of
> errors, and avoid using them quickly until their reliability can be
> re-established.  I've tested this out on two virt guests connected via multiple
> isolated virt networks and believe its in compliance with the above draft and
> works well.
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Vlad Yasevich <vyasevich@gmail.com>
> CC: Sridhar Samudrala <sri@us.ibm.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: linux-sctp@vger.kernel.org
>
> ---
> Change notes:
>
> V2)
> - Added socket option API from section 6.1 of the specification, as per
> request from Vlad. Adding this socket option allows us to alter both the path
> maximum retransmit value and the path partial failure threshold for each
> transport and the association as a whole.
>
> - Added a per transport pf_retrans value, and initialized it from the
> association value.  This makes each transport independently configurable as per
> the socket option above, and prevents changes in the sysctl from bleeding into
> an already created association.
> ---
>   Documentation/networking/ip-sysctl.txt |   14 +++++
>   include/net/sctp/constants.h           |    1 +
>   include/net/sctp/structs.h             |   11 +++-
>   include/net/sctp/user.h                |   11 ++++
>   net/sctp/associola.c                   |   36 ++++++++++--
>   net/sctp/outqueue.c                    |    6 +-
>   net/sctp/sm_sideeffect.c               |   33 ++++++++++-
>   net/sctp/socket.c                      |   96 ++++++++++++++++++++++++++++++++
>   net/sctp/sysctl.c                      |    9 +++
>   net/sctp/transport.c                   |    4 +-
>   10 files changed, 206 insertions(+), 15 deletions(-)
>

[ snip ]

> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index b3b8a8d..dfffece 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -3470,6 +3470,52 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
>   }
>
>
> +/*
> + * SCTP_PEER_ADDR_THLDS
> + *
> + * This option allows us to alter the partially failed threshold for one or all
> + * transports in an association.  See Section 6.1 of:
> + * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> + */
> +static int sctp_setsockopt_paddr_thresholds(struct sock *sk,
> +					    char __user *optval,
> +					    unsigned int optlen)
> +{
> +	struct sctp_paddrthlds val;
> +	struct sctp_transport *trans;
> +	struct sctp_association *asoc;
> +
> +	if (optlen < sizeof(struct sctp_paddrthlds))
> +		return -EINVAL;
> +	if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval,
> +			   optlen))
> +		return -EFAULT;

What if optlen is bigger?  You going to trash the stack.

> +
> +	if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
> +		asoc = sctp_id2assoc(sk, val.spt_assoc_id);
> +		if (!asoc)
> +			return -ENOENT;
> +		list_for_each_entry(trans, &asoc->peer.transport_addr_list,
> +				    transports) {
> +			trans->pathmaxrxt = val.spt_pathmaxrxt;
> +			trans->pf_retrans = val.spt_pathpfthld;

You want to make sure that the values aren't 0.  Otherwise, you'll set 
the pathmaxrxt to 0 and that would be bad.

> +		}
> +
> +		asoc->pf_retrans = val.spt_pathpfthld;
> +		asoc->pathmaxrxt = val.spt_pathmaxrxt;

Ditto.

> +	} else {
> +		trans = sctp_addr_id2transport(sk, &val.spt_address,
> +					       val.spt_assoc_id);
> +		if (!trans)
> +			return -ENOENT;
> +
> +		trans->pathmaxrxt = val.spt_pathmaxrxt;
> +		trans->pf_retrans = val.spt_pathpfthld;

Ditto.

> +	}
> +
> +	return 0;
> +}
> +
>   /* API 6.2 setsockopt(), getsockopt()
>    *
>    * Applications use setsockopt() and getsockopt() to set or retrieve
> @@ -3619,6 +3665,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
>   	case SCTP_AUTO_ASCONF:
>   		retval = sctp_setsockopt_auto_asconf(sk, optval, optlen);
>   		break;
> +	case SCTP_PEER_ADDR_THLDS:
> +		retval = sctp_setsockopt_paddr_thresholds(sk, optval, optlen);
> +		break;
>   	default:
>   		retval = -ENOPROTOOPT;
>   		break;
> @@ -5490,6 +5539,50 @@ static int sctp_getsockopt_assoc_ids(struct sock *sk, int len,
>   	return 0;
>   }
>
> +/*
> + * SCTP_PEER_ADDR_THLDS
> + *
> + * This option allows us to fetch the partially failed threshold for one or all
> + * transports in an association.  See Section 6.1 of:
> + * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> + */
> +static int sctp_getsockopt_paddr_thresholds(struct sock *sk,
> +					    char __user *optval,
> +					    int optlen)
> +{
> +	struct sctp_paddrthlds val;
> +	struct sctp_transport *trans;
> +	struct sctp_association *asoc;
> +
> +	if (optlen < sizeof(struct sctp_paddrthlds))
> +		return -EINVAL;
> +	if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval, optlen))
> +		return -EFAULT;

Again, trashing the stack if optlen and optval are bigger.

-vlad
> +
> +	if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
> +			val.spt_assoc_id);
> +		asoc = sctp_id2assoc(sk, val.spt_assoc_id);
> +		if (!asoc)
> +			return -ENOENT;
> +
> +		val.spt_pathpfthld = asoc->pf_retrans;
> +		val.spt_pathmaxrxt = asoc->pathmaxrxt;
> +	} else {
> +		trans = sctp_addr_id2transport(sk, &val.spt_address,
> +					       val.spt_assoc_id);
> +		if (!trans)
> +			return -ENOENT;
> +
> +		val.spt_pathmaxrxt = trans->pathmaxrxt;
> +		val.spt_pathpfthld = trans->pf_retrans;
> +	}
> +
> +	if (copy_to_user(optval, &val, optlen))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>   SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
>   				char __user *optval, int __user *optlen)
>   {
> @@ -5628,6 +5721,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
>   	case SCTP_AUTO_ASCONF:
>   		retval = sctp_getsockopt_auto_asconf(sk, len, optval, optlen);
>   		break;
> +	case SCTP_PEER_ADDR_THLDS:
> +		retval = sctp_getsockopt_paddr_thresholds(sk, optval, len);
> +		break;
>   	default:
>   		retval = -ENOPROTOOPT;
>   		break;
> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> index e5fe639..2b2bfe9 100644
> --- a/net/sctp/sysctl.c
> +++ b/net/sctp/sysctl.c
> @@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
>   		.extra2		= &int_max
>   	},
>   	{
> +		.procname	= "pf_retrans",
> +		.data		= &sctp_pf_retrans,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &int_max
> +	},
> +	{
>   		.procname	= "max_init_retransmits",
>   		.data		= &sctp_max_retrans_init,
>   		.maxlen		= sizeof(int),
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index b026ba0..194d0f3 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -85,6 +85,7 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
>
>   	/* Initialize the default path max_retrans.  */
>   	peer->pathmaxrxt  = sctp_max_retrans_path;
> +	peer->pf_retrans  = sctp_pf_retrans;
>
>   	INIT_LIST_HEAD(&peer->transmitted);
>   	INIT_LIST_HEAD(&peer->send_ready);
> @@ -585,7 +586,8 @@ unsigned long sctp_transport_timeout(struct sctp_transport *t)
>   {
>   	unsigned long timeout;
>   	timeout = t->rto + sctp_jitter(t->rto);
> -	if (t->state != SCTP_UNCONFIRMED)
> +	if ((t->state != SCTP_UNCONFIRMED) &&
> +	    (t->state != SCTP_PF))
>   		timeout += t->hbinterval;
>   	timeout += jiffies;
>   	return timeout;
>

next prev parent reply	other threads:[~2012-07-18 21:23 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-13 18:26 [PATCH] sctp: Implement quick failover draft from tsvwg Neil Horman
2012-07-13 18:26 ` Neil Horman
2012-07-14 18:12 ` Vlad Yasevich
2012-07-14 18:12   ` Vlad Yasevich
2012-07-14 21:22   ` Neil Horman
2012-07-14 21:22     ` Neil Horman
2012-07-18 18:01 ` [PATCH v2] " Neil Horman
2012-07-18 18:01   ` Neil Horman
2012-07-18 20:30   ` Joe Perches
2012-07-18 20:30     ` Joe Perches
2012-07-19 10:45     ` Neil Horman
2012-07-19 10:45       ` Neil Horman
2012-07-19 16:54       ` Joe Perches
2012-07-19 16:54         ` Joe Perches
2012-07-18 21:23   ` Vlad Yasevich [this message]
2012-07-18 21:23     ` Vlad Yasevich
2012-07-19 10:46     ` Neil Horman
2012-07-19 10:46       ` Neil Horman
2012-07-19 16:51 ` [PATCH v3] " Neil Horman
2012-07-19 16:51   ` Neil Horman
2012-07-20 16:51   ` Flavio Leitner
2012-07-20 16:51     ` Flavio Leitner
2012-07-20 17:19 ` [PATCH v4] " Neil Horman
2012-07-20 17:19   ` Neil Horman
2012-07-20 17:55   ` Vlad Yasevich
2012-07-20 17:55     ` Vlad Yasevich
2012-07-20 18:36     ` Neil Horman
2012-07-20 18:36       ` Neil Horman
2012-07-20 18:51 ` [PATCH v5] " Neil Horman
2012-07-20 18:51   ` Neil Horman
2012-07-20 19:10   ` Flavio Leitner
2012-07-20 19:10     ` Flavio Leitner
2012-07-20 19:31     ` David Miller
2012-07-20 19:31       ` David Miller
2012-07-23 14:28       ` Flavio Leitner
2012-07-23 14:28         ` Flavio Leitner
2012-07-21  6:45   ` Vlad Yasevich
2012-07-21  6:45     ` Vlad Yasevich
2012-07-21 11:03     ` Neil Horman
2012-07-21 11:03       ` Neil Horman
2012-07-21 17:56 ` [PATCH v6] " Neil Horman
2012-07-21 17:56   ` Neil Horman
2012-07-22 18:18   ` Vlad Yasevich
2012-07-22 18:18     ` Vlad Yasevich
2012-07-22 19:14     ` David Miller
2012-07-22 19:14       ` David Miller
2012-07-22 19:18       ` Neil Horman <nhorman@tuxdriver.com>
2012-07-22 19:18         ` Neil Horman <nhorman@tuxdriver.com>

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50072939.5030600@gmail.com \
    --to=vyasevich@gmail.com \
    --cc=davem@davemloft.net \
    --cc=linux-sctp@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=sri@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.