* [PATCHv3 2/6] skbuff: convert to skb_orphan_frags
From: Michael S. Tsirkin @ 2012-07-20 19:23 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <cover.1342812067.git.mst@redhat.com>
Reduce code duplication a bit using the new helper.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
net/core/skbuff.c | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ccfcb7d..438bbc5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -804,10 +804,8 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
{
struct sk_buff *n;
- if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
- if (skb_copy_ubufs(skb, gfp_mask))
- return NULL;
- }
+ if (skb_orphan_frags(skb, gfp_mask))
+ return NULL;
n = skb + 1;
if (skb->fclone == SKB_FCLONE_ORIG &&
@@ -927,12 +925,10 @@ struct sk_buff *__pskb_copy(struct sk_buff *skb, int headroom, gfp_t gfp_mask)
if (skb_shinfo(skb)->nr_frags) {
int i;
- if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
- if (skb_copy_ubufs(skb, gfp_mask)) {
- kfree_skb(n);
- n = NULL;
- goto out;
- }
+ if (skb_orphan_frags(skb, gfp_mask)) {
+ kfree_skb(n);
+ n = NULL;
+ goto out;
}
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
@@ -1005,10 +1001,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
*/
if (skb_cloned(skb)) {
/* copy this zero copy skb frags */
- if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
- if (skb_copy_ubufs(skb, gfp_mask))
- goto nofrags;
- }
+ if (skb_orphan_frags(skb, gfp_mask))
+ goto nofrags;
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
skb_frag_ref(skb, i);
--
MST
^ permalink raw reply related
* [PATCHv3 1/6] skbuff: add an api to orphan frags
From: Michael S. Tsirkin @ 2012-07-20 19:23 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem
In-Reply-To: <cover.1342812067.git.mst@redhat.com>
Many places do
if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY))
skb_copy_ubufs(skb, gfp_mask);
to copy and invoke frag destructors if necessary.
Add an inline helper for this.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
include/linux/skbuff.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 642cb73..d205c4b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1667,6 +1667,22 @@ static inline void skb_orphan(struct sk_buff *skb)
}
/**
+ * skb_orphan_frags - orphan the frags contained in a buffer
+ * @skb: buffer to orphan frags from
+ * @gfp_mask: allocation mask for replacement pages
+ *
+ * For each frag in the SKB which needs a destructor (i.e. has an
+ * owner) create a copy of that frag and release the original
+ * page by calling the destructor.
+ */
+static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
+{
+ if (likely(!(skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY)))
+ return 0;
+ return skb_copy_ubufs(skb, gfp_mask);
+}
+
+/**
* __skb_queue_purge - empty a list
* @list: list to empty
*
--
MST
^ permalink raw reply related
* [PATCHv3 0/6] tun zerocopy support
From: Michael S. Tsirkin @ 2012-07-20 19:23 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Jason Wang, eric.dumazet, netdev, linux-kernel, ebiederm, davem
This adds support for experimental zero copy transmit to tun.
This includes some patches from Ian's patchset to support zerocopy with tun,
so it should help that work progress: we are still trying to figure out
how to make everything work properly with tcp but tun seems easier, and
it's helpful by itself since not everyone can use macvtap.
Same as with macvtap, I get single-percentage wins in CPU utilization
on guest to external from this patchset, and a performance regression on
guest to host, so more work is needed until this feature can move out of
experimental status, but I think it's useful for some people already.
Pls review and consider for 3.6.
There's some code duplication between tun and macvtap now: common code
could move to net/core/datagram.c, this patch does not do this yet.
Changes from v2:
Fixed some bugs so it's stable now
Michael S. Tsirkin (6):
skbuff: add an api to orphan frags
skbuff: convert to skb_orphan_frags
skbuff: export skb_copy_ubufs
tun: orphan frags on xmit
net: orphan frags on receive
tun: experimental zero copy tx support
drivers/net/tun.c | 148 +++++++++++++++++++++++++++++++++++++++++++++----
include/linux/skbuff.h | 16 ++++++
net/core/dev.c | 7 ++-
net/core/skbuff.c | 24 +++-----
4 files changed, 167 insertions(+), 28 deletions(-)
--
MST
^ permalink raw reply
* Re: [PATCH v5] sctp: Implement quick failover draft from tsvwg
From: Flavio Leitner @ 2012-07-20 19:10 UTC (permalink / raw)
To: Neil Horman
Cc: netdev, Vlad Yasevich, Sridhar Samudrala, David S. Miller,
linux-sctp, joe
In-Reply-To: <1342810319-27457-1-git-send-email-nhorman@tuxdriver.com>
On Fri, 20 Jul 2012 14:51:59 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:
> I've seen several attempts recently made to do quick failover of sctp transports
> by reducing various retransmit timers and counters. While its possible to
> implement a faster failover on multihomed sctp associations, its not
> particularly robust, in that it can lead to unneeded retransmits, as well as
> false connection failures due to intermittent latency on a network.
>
> Instead, lets implement the new ietf quick failover draft found here:
> http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
>
> This will let the sctp stack identify transports that have had a small number of
> errors, and avoid using them quickly until their reliability can be
> re-established. I've tested this out on two virt guests connected via multiple
> isolated virt networks and believe its in compliance with the above draft and
> works well.
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Vlad Yasevich <vyasevich@gmail.com>
> CC: Sridhar Samudrala <sri@us.ibm.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: linux-sctp@vger.kernel.org
> CC: joe@perches.com
>
> ---
> Change notes:
>
> V2)
> - Added socket option API from section 6.1 of the specification, as per
> request from Vlad. Adding this socket option allows us to alter both the path
> maximum retransmit value and the path partial failure threshold for each
> transport and the association as a whole.
>
> - Added a per transport pf_retrans value, and initialized it from the
> association value. This makes each transport independently configurable as per
> the socket option above, and prevents changes in the sysctl from bleeding into
> an already created association.
>
> V3)
> - Cleaned up some line spacing (Joe Perches)
> - Fixed some socket option user data sanitization (Vlad Yasevich)
>
> V4)
> - Added additional documentation (Flavio Leitner)
>
> V5)
> - Modified setsockopt option to ignore 0 pathmaxrxt rather than return
> error (Vlad Yasevich)
> - Modified getsocopt to return option length written (Vlad Y.)
> ---
> Documentation/networking/ip-sysctl.txt | 14 +++++
> include/net/sctp/constants.h | 1 +
> include/net/sctp/structs.h | 20 ++++++-
> include/net/sctp/user.h | 11 ++++
> net/sctp/associola.c | 37 ++++++++++--
> net/sctp/outqueue.c | 6 +-
> net/sctp/sm_sideeffect.c | 33 +++++++++-
> net/sctp/socket.c | 100 ++++++++++++++++++++++++++++++++
> net/sctp/sysctl.c | 9 +++
> net/sctp/transport.c | 4 +-
> 10 files changed, 220 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 47b6c79..c636f9c 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1408,6 +1408,20 @@ path_max_retrans - INTEGER
>
> Default: 5
>
> +pf_retrans - INTEGER
> + The number of retransmissions that will be attempted on a given path
> + before traffic is redirected to an alternate transport (should one
> + exist). Note this is distinct from path_max_retrans, as a path that
> + passes the pf_retrans threshold can still be used. Its only
> + deprioritized when a transmission path is selected by the stack. This
> + setting is primarily used to enable fast failover mechanisms without
> + having to reduce path_max_retrans to a very low value. See:
> + http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> + for details. Note also that a value of pf_retrans > path_max_retrans
> + disables this feature
> +
> + Default: 0
> +
> rto_initial - INTEGER
> The initial round trip timeout value in milliseconds that will be used
> in calculating round trip times. This is the initial time interval
> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
> index 942b864..d053d2e 100644
> --- a/include/net/sctp/constants.h
> +++ b/include/net/sctp/constants.h
> @@ -334,6 +334,7 @@ typedef enum {
> typedef enum {
> SCTP_TRANSPORT_UP,
> SCTP_TRANSPORT_DOWN,
> + SCTP_TRANSPORT_PF,
> } sctp_transport_cmd_t;
>
> /* These are the address scopes defined mainly for IPv4 addresses
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index e4652fe..cee0678 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -161,6 +161,12 @@ extern struct sctp_globals {
> int max_retrans_path;
> int max_retrans_init;
>
> + /* Potentially-Failed.Max.Retrans sysctl value
> + * taken from:
> + * http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
> + */
> + int pf_retrans;
> +
> /*
> * Policy for preforming sctp/socket accounting
> * 0 - do socket level accounting, all assocs share sk_sndbuf
> @@ -258,6 +264,7 @@ extern struct sctp_globals {
> #define sctp_sndbuf_policy (sctp_globals.sndbuf_policy)
> #define sctp_rcvbuf_policy (sctp_globals.rcvbuf_policy)
> #define sctp_max_retrans_path (sctp_globals.max_retrans_path)
> +#define sctp_pf_retrans (sctp_globals.pf_retrans)
> #define sctp_max_retrans_init (sctp_globals.max_retrans_init)
> #define sctp_sack_timeout (sctp_globals.sack_timeout)
> #define sctp_hb_interval (sctp_globals.hb_interval)
> @@ -987,10 +994,15 @@ struct sctp_transport {
>
> /* This is the max_retrans value for the transport and will
> * be initialized from the assocs value. This can be changed
> - * using SCTP_SET_PEER_ADDR_PARAMS socket option.
> + * using the SCTP_SET_PEER_ADDR_PARAMS socket option.
> */
> __u16 pathmaxrxt;
>
> + /* This is the partially failed retrans value for the transport
> + * and will be initialized from the assocs value. This can be changed
> + * using the SCTP_PEER_ADDR_THLDS socket option
> + */
> + int pf_retrans;
> /* PMTU : The current known path MTU. */
> __u32 pathmtu;
>
> @@ -1660,6 +1672,12 @@ struct sctp_association {
> */
> int max_retrans;
>
> + /* This is the partially failed retrans value for the transport
> + * and will be initialized from the assocs value. This can be
> + * changed using the SCTP_PEER_ADDR_THLDS socket option
> + */
> + int pf_retrans;
> +
> /* Maximum number of times the endpoint will retransmit INIT */
> __u16 max_init_attempts;
>
> diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
> index 0842ef0..1b02d7a 100644
> --- a/include/net/sctp/user.h
> +++ b/include/net/sctp/user.h
> @@ -93,6 +93,7 @@ typedef __s32 sctp_assoc_t;
> #define SCTP_GET_ASSOC_NUMBER 28 /* Read only */
> #define SCTP_GET_ASSOC_ID_LIST 29 /* Read only */
> #define SCTP_AUTO_ASCONF 30
> +#define SCTP_PEER_ADDR_THLDS 31
>
> /* Internal Socket Options. Some of the sctp library functions are
> * implemented using these socket options.
> @@ -649,6 +650,7 @@ struct sctp_paddrinfo {
> */
> enum sctp_spinfo_state {
> SCTP_INACTIVE,
> + SCTP_PF,
> SCTP_ACTIVE,
> SCTP_UNCONFIRMED,
> SCTP_UNKNOWN = 0xffff /* Value used for transport state unknown */
> @@ -741,4 +743,13 @@ typedef struct {
> int sd;
> } sctp_peeloff_arg_t;
>
> +/*
> + * Peer Address Thresholds socket option
> + */
> +struct sctp_paddrthlds {
> + sctp_assoc_t spt_assoc_id;
> + struct sockaddr_storage spt_address;
> + __u16 spt_pathmaxrxt;
> + __u16 spt_pathpfthld;
> +};
> #endif /* __net_sctp_user_h__ */
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 5bc9ab1..90fe36b 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -124,6 +124,8 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
> * socket values.
> */
> asoc->max_retrans = sp->assocparams.sasoc_asocmaxrxt;
> + asoc->pf_retrans = sctp_pf_retrans;
> +
> asoc->rto_initial = msecs_to_jiffies(sp->rtoinfo.srto_initial);
> asoc->rto_max = msecs_to_jiffies(sp->rtoinfo.srto_max);
> asoc->rto_min = msecs_to_jiffies(sp->rtoinfo.srto_min);
> @@ -685,6 +687,9 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
> /* Set the path max_retrans. */
> peer->pathmaxrxt = asoc->pathmaxrxt;
>
> + /* And the partial failure retrnas threshold */
> + peer->pf_retrans = asoc->pf_retrans;
> +
> /* Initialize the peer's SACK delay timeout based on the
> * association configured value.
> */
> @@ -840,6 +845,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
> struct sctp_ulpevent *event;
> struct sockaddr_storage addr;
> int spc_state = 0;
> + bool ulp_notify = true;
>
> /* Record the transition on the transport. */
> switch (command) {
> @@ -853,6 +859,14 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
> spc_state = SCTP_ADDR_CONFIRMED;
> else
> spc_state = SCTP_ADDR_AVAILABLE;
> + /* Don't inform ULP about transition from PF to
> + * active state and set cwnd to 1, see SCTP
> + * Quick failover draft section 5.1, point 5
> + */
> + if (transport->state == SCTP_PF) {
> + ulp_notify = false;
> + transport->cwnd = 1;
> + }
> transport->state = SCTP_ACTIVE;
> break;
>
> @@ -871,6 +885,11 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
> spc_state = SCTP_ADDR_UNREACHABLE;
> break;
>
> + case SCTP_TRANSPORT_PF:
> + transport->state = SCTP_PF;
> + ulp_notify = false;
> + break;
> +
> default:
> return;
> }
> @@ -878,12 +897,15 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
> /* Generate and send a SCTP_PEER_ADDR_CHANGE notification to the
> * user.
> */
> - memset(&addr, 0, sizeof(struct sockaddr_storage));
> - memcpy(&addr, &transport->ipaddr, transport->af_specific->sockaddr_len);
> - event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
> - 0, spc_state, error, GFP_ATOMIC);
> - if (event)
> - sctp_ulpq_tail_event(&asoc->ulpq, event);
> + if (ulp_notify) {
> + memset(&addr, 0, sizeof(struct sockaddr_storage));
> + memcpy(&addr, &transport->ipaddr,
> + transport->af_specific->sockaddr_len);
> + event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
> + 0, spc_state, error, GFP_ATOMIC);
> + if (event)
> + sctp_ulpq_tail_event(&asoc->ulpq, event);
> + }
>
> /* Select new active and retran paths. */
>
> @@ -899,7 +921,8 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
> transports) {
>
> if ((t->state == SCTP_INACTIVE) ||
> - (t->state == SCTP_UNCONFIRMED))
> + (t->state == SCTP_UNCONFIRMED) ||
> + (t->state == SCTP_PF))
> continue;
> if (!first || t->last_time_heard > first->last_time_heard) {
> second = first;
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index a0fa19f..e7aa177c 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -792,7 +792,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
> if (!new_transport)
> new_transport = asoc->peer.active_path;
> } else if ((new_transport->state == SCTP_INACTIVE) ||
> - (new_transport->state == SCTP_UNCONFIRMED)) {
> + (new_transport->state == SCTP_UNCONFIRMED) ||
> + (new_transport->state == SCTP_PF)) {
> /* If the chunk is Heartbeat or Heartbeat Ack,
> * send it to chunk->transport, even if it's
> * inactive.
> @@ -987,7 +988,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
> new_transport = chunk->transport;
> if (!new_transport ||
> ((new_transport->state == SCTP_INACTIVE) ||
> - (new_transport->state == SCTP_UNCONFIRMED)))
> + (new_transport->state == SCTP_UNCONFIRMED) ||
> + (new_transport->state == SCTP_PF)))
> new_transport = asoc->peer.active_path;
> if (new_transport->state == SCTP_UNCONFIRMED)
> continue;
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index c96d1a8..285e26a 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -76,6 +76,8 @@ static int sctp_side_effects(sctp_event_t event_type, sctp_subtype_t subtype,
> sctp_cmd_seq_t *commands,
> gfp_t gfp);
>
> +static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds,
> + struct sctp_transport *t);
> /********************************************************************
> * Helper functions
> ********************************************************************/
> @@ -470,7 +472,8 @@ sctp_timer_event_t *sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES] = {
> * notification SHOULD be sent to the upper layer.
> *
> */
> -static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
> +static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
> + struct sctp_association *asoc,
> struct sctp_transport *transport,
> int is_hb)
> {
> @@ -495,6 +498,23 @@ static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
> transport->error_count++;
> }
>
> + /* If the transport error count is greater than the pf_retrans
> + * threshold, and less than pathmaxrtx, then mark this transport
> + * as Partially Failed, ee SCTP Quick Failover Draft, secon 5.1,
> + * point 1
> + */
> + if ((transport->state != SCTP_PF) &&
> + (asoc->pf_retrans < transport->pathmaxrxt) &&
> + (transport->error_count > asoc->pf_retrans)) {
> +
> + sctp_assoc_control_transport(asoc, transport,
> + SCTP_TRANSPORT_PF,
> + 0);
> +
> + /* Update the hb timer to resend a heartbeat every rto */
> + sctp_cmd_hb_timer_update(commands, transport);
> + }
> +
> if (transport->state != SCTP_INACTIVE &&
> (transport->error_count > transport->pathmaxrxt)) {
> SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
> @@ -699,6 +719,10 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
> SCTP_HEARTBEAT_SUCCESS);
> }
>
> + if (t->state == SCTP_PF)
> + sctp_assoc_control_transport(asoc, t, SCTP_TRANSPORT_UP,
> + SCTP_HEARTBEAT_SUCCESS);
> +
> /* The receiver of the HEARTBEAT ACK should also perform an
> * RTT measurement for that destination transport address
> * using the time value carried in the HEARTBEAT ACK chunk.
> @@ -1565,8 +1589,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
>
> case SCTP_CMD_STRIKE:
> /* Mark one strike against a transport. */
> - sctp_do_8_2_transport_strike(asoc, cmd->obj.transport,
> - 0);
> + sctp_do_8_2_transport_strike(commands, asoc,
> + cmd->obj.transport, 0);
> break;
>
> case SCTP_CMD_TRANSPORT_IDLE:
> @@ -1576,7 +1600,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
>
> case SCTP_CMD_TRANSPORT_HB_SENT:
> t = cmd->obj.transport;
> - sctp_do_8_2_transport_strike(asoc, t, 1);
> + sctp_do_8_2_transport_strike(commands, asoc,
> + t, 1);
> t->hb_sent = 1;
> break;
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index b3b8a8d..bba551f 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -3470,6 +3470,56 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
> }
>
>
> +/*
> + * SCTP_PEER_ADDR_THLDS
> + *
> + * This option allows us to alter the partially failed threshold for one or all
> + * transports in an association. See Section 6.1 of:
> + * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> + */
> +static int sctp_setsockopt_paddr_thresholds(struct sock *sk,
> + char __user *optval,
> + unsigned int optlen)
> +{
> + struct sctp_paddrthlds val;
> + struct sctp_transport *trans;
> + struct sctp_association *asoc;
> +
> + if (optlen < sizeof(struct sctp_paddrthlds))
> + return -EINVAL;
> + if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval,
> + sizeof(struct sctp_paddrthlds)))
> + return -EFAULT;
> +
> +
> + if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
> + asoc = sctp_id2assoc(sk, val.spt_assoc_id);
> + if (!asoc)
> + return -ENOENT;
> + list_for_each_entry(trans, &asoc->peer.transport_addr_list,
> + transports) {
> + if (val.spt_pathmaxrxt)
> + trans->pathmaxrxt = val.spt_pathmaxrxt;
> + trans->pf_retrans = val.spt_pathpfthld;
> + }
> +
> + if (val.spt_pathmaxrxt)
> + asoc->pathmaxrxt = val.spt_pathmaxrxt;
> + asoc->pf_retrans = val.spt_pathpfthld;
> + } else {
> + trans = sctp_addr_id2transport(sk, &val.spt_address,
> + val.spt_assoc_id);
> + if (!trans)
> + return -ENOENT;
> +
> + if (val.spt_pathmaxrxt)
> + trans->pathmaxrxt = val.spt_pathmaxrxt;
> + trans->pf_retrans = val.spt_pathpfthld;
> + }
> +
> + return 0;
> +}
> +
> /* API 6.2 setsockopt(), getsockopt()
> *
> * Applications use setsockopt() and getsockopt() to set or retrieve
> @@ -3619,6 +3669,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
> case SCTP_AUTO_ASCONF:
> retval = sctp_setsockopt_auto_asconf(sk, optval, optlen);
> break;
> + case SCTP_PEER_ADDR_THLDS:
> + retval = sctp_setsockopt_paddr_thresholds(sk, optval, optlen);
> + break;
> default:
> retval = -ENOPROTOOPT;
> break;
> @@ -5490,6 +5543,50 @@ static int sctp_getsockopt_assoc_ids(struct sock *sk, int len,
> return 0;
> }
>
> +/*
> + * SCTP_PEER_ADDR_THLDS
> + *
> + * This option allows us to fetch the partially failed threshold for one or all
> + * transports in an association. See Section 6.1 of:
> + * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> + */
> +static int sctp_getsockopt_paddr_thresholds(struct sock *sk,
> + char __user *optval,
> + int optlen)
> +{
> + struct sctp_paddrthlds val;
> + struct sctp_transport *trans;
> + struct sctp_association *asoc;
> +
> + if (optlen < sizeof(struct sctp_paddrthlds))
> + return -EINVAL;
> + optlen = sizeof(struct sctp_paddrthlds);
> + if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval, optlen))
> + return -EFAULT;
> +
> + if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
> + asoc = sctp_id2assoc(sk, val.spt_assoc_id);
> + if (!asoc)
> + return -ENOENT;
> +
> + val.spt_pathpfthld = asoc->pf_retrans;
> + val.spt_pathmaxrxt = asoc->pathmaxrxt;
> + } else {
> + trans = sctp_addr_id2transport(sk, &val.spt_address,
> + val.spt_assoc_id);
> + if (!trans)
> + return -ENOENT;
> +
> + val.spt_pathmaxrxt = trans->pathmaxrxt;
> + val.spt_pathpfthld = trans->pf_retrans;
> + }
> +
> + if (copy_to_user(optval, &val, optlen))
> + return -EFAULT;
> +
> + return optlen;
> +}
> +
> SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
> char __user *optval, int __user *optlen)
> {
> @@ -5628,6 +5725,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
> case SCTP_AUTO_ASCONF:
> retval = sctp_getsockopt_auto_asconf(sk, len, optval, optlen);
> break;
> + case SCTP_PEER_ADDR_THLDS:
> + retval = sctp_getsockopt_paddr_thresholds(sk, optval, len);
> + break;
> default:
> retval = -ENOPROTOOPT;
> break;
> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
> index e5fe639..2b2bfe9 100644
> --- a/net/sctp/sysctl.c
> +++ b/net/sctp/sysctl.c
> @@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
> .extra2 = &int_max
> },
> {
> + .procname = "pf_retrans",
> + .data = &sctp_pf_retrans,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = &zero,
> + .extra2 = &int_max
> + },
> + {
> .procname = "max_init_retransmits",
> .data = &sctp_max_retrans_init,
> .maxlen = sizeof(int),
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index b026ba0..194d0f3 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -85,6 +85,7 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
>
> /* Initialize the default path max_retrans. */
> peer->pathmaxrxt = sctp_max_retrans_path;
> + peer->pf_retrans = sctp_pf_retrans;
>
> INIT_LIST_HEAD(&peer->transmitted);
> INIT_LIST_HEAD(&peer->send_ready);
> @@ -585,7 +586,8 @@ unsigned long sctp_transport_timeout(struct sctp_transport *t)
> {
> unsigned long timeout;
> timeout = t->rto + sctp_jitter(t->rto);
> - if (t->state != SCTP_UNCONFIRMED)
> + if ((t->state != SCTP_UNCONFIRMED) &&
> + (t->state != SCTP_PF))
> timeout += t->hbinterval;
> timeout += jiffies;
> return timeout;
Reviewed-by: Flavio Leitner <fbl@redhat.com>
fbl
^ permalink raw reply
* Re: [PATCH] b44: add 64 bit stats
From: Kevin Groeneveld @ 2012-07-20 18:56 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1342761865.2626.5572.camel@edumazet-glaptop>
On Fri, Jul 20, 2012 at 1:24 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Absolutely. You can argue that probably nobody use this driver on a
>> 32bit UP machine, but technically speaking the current implementation is
>> racy.
>
> In fact all network drivers should use the _bh version.
Okay, thanks for clarifying.
> Could you send a patch for all of them, based on net-next tree ?
Sure, I can work on that. It should be a relatively easy thing to
update. I can probably send a patch within the next couple days.
Kevin
^ permalink raw reply
* [PATCH v5] sctp: Implement quick failover draft from tsvwg
From: Neil Horman @ 2012-07-20 18:51 UTC (permalink / raw)
To: netdev
Cc: Neil Horman, Vlad Yasevich, Sridhar Samudrala, David S. Miller,
linux-sctp, joe
In-Reply-To: <1342203998-24037-1-git-send-email-nhorman@tuxdriver.com>
I've seen several attempts recently made to do quick failover of sctp transports
by reducing various retransmit timers and counters. While its possible to
implement a faster failover on multihomed sctp associations, its not
particularly robust, in that it can lead to unneeded retransmits, as well as
false connection failures due to intermittent latency on a network.
Instead, lets implement the new ietf quick failover draft found here:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
This will let the sctp stack identify transports that have had a small number of
errors, and avoid using them quickly until their reliability can be
re-established. I've tested this out on two virt guests connected via multiple
isolated virt networks and believe its in compliance with the above draft and
works well.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
CC: joe@perches.com
---
Change notes:
V2)
- Added socket option API from section 6.1 of the specification, as per
request from Vlad. Adding this socket option allows us to alter both the path
maximum retransmit value and the path partial failure threshold for each
transport and the association as a whole.
- Added a per transport pf_retrans value, and initialized it from the
association value. This makes each transport independently configurable as per
the socket option above, and prevents changes in the sysctl from bleeding into
an already created association.
V3)
- Cleaned up some line spacing (Joe Perches)
- Fixed some socket option user data sanitization (Vlad Yasevich)
V4)
- Added additional documentation (Flavio Leitner)
V5)
- Modified setsockopt option to ignore 0 pathmaxrxt rather than return
error (Vlad Yasevich)
- Modified getsocopt to return option length written (Vlad Y.)
---
Documentation/networking/ip-sysctl.txt | 14 +++++
include/net/sctp/constants.h | 1 +
include/net/sctp/structs.h | 20 ++++++-
include/net/sctp/user.h | 11 ++++
net/sctp/associola.c | 37 ++++++++++--
net/sctp/outqueue.c | 6 +-
net/sctp/sm_sideeffect.c | 33 +++++++++-
net/sctp/socket.c | 100 ++++++++++++++++++++++++++++++++
net/sctp/sysctl.c | 9 +++
net/sctp/transport.c | 4 +-
10 files changed, 220 insertions(+), 15 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 47b6c79..c636f9c 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1408,6 +1408,20 @@ path_max_retrans - INTEGER
Default: 5
+pf_retrans - INTEGER
+ The number of retransmissions that will be attempted on a given path
+ before traffic is redirected to an alternate transport (should one
+ exist). Note this is distinct from path_max_retrans, as a path that
+ passes the pf_retrans threshold can still be used. Its only
+ deprioritized when a transmission path is selected by the stack. This
+ setting is primarily used to enable fast failover mechanisms without
+ having to reduce path_max_retrans to a very low value. See:
+ http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
+ for details. Note also that a value of pf_retrans > path_max_retrans
+ disables this feature
+
+ Default: 0
+
rto_initial - INTEGER
The initial round trip timeout value in milliseconds that will be used
in calculating round trip times. This is the initial time interval
diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index 942b864..d053d2e 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -334,6 +334,7 @@ typedef enum {
typedef enum {
SCTP_TRANSPORT_UP,
SCTP_TRANSPORT_DOWN,
+ SCTP_TRANSPORT_PF,
} sctp_transport_cmd_t;
/* These are the address scopes defined mainly for IPv4 addresses
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index e4652fe..cee0678 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -161,6 +161,12 @@ extern struct sctp_globals {
int max_retrans_path;
int max_retrans_init;
+ /* Potentially-Failed.Max.Retrans sysctl value
+ * taken from:
+ * http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
+ */
+ int pf_retrans;
+
/*
* Policy for preforming sctp/socket accounting
* 0 - do socket level accounting, all assocs share sk_sndbuf
@@ -258,6 +264,7 @@ extern struct sctp_globals {
#define sctp_sndbuf_policy (sctp_globals.sndbuf_policy)
#define sctp_rcvbuf_policy (sctp_globals.rcvbuf_policy)
#define sctp_max_retrans_path (sctp_globals.max_retrans_path)
+#define sctp_pf_retrans (sctp_globals.pf_retrans)
#define sctp_max_retrans_init (sctp_globals.max_retrans_init)
#define sctp_sack_timeout (sctp_globals.sack_timeout)
#define sctp_hb_interval (sctp_globals.hb_interval)
@@ -987,10 +994,15 @@ struct sctp_transport {
/* This is the max_retrans value for the transport and will
* be initialized from the assocs value. This can be changed
- * using SCTP_SET_PEER_ADDR_PARAMS socket option.
+ * using the SCTP_SET_PEER_ADDR_PARAMS socket option.
*/
__u16 pathmaxrxt;
+ /* This is the partially failed retrans value for the transport
+ * and will be initialized from the assocs value. This can be changed
+ * using the SCTP_PEER_ADDR_THLDS socket option
+ */
+ int pf_retrans;
/* PMTU : The current known path MTU. */
__u32 pathmtu;
@@ -1660,6 +1672,12 @@ struct sctp_association {
*/
int max_retrans;
+ /* This is the partially failed retrans value for the transport
+ * and will be initialized from the assocs value. This can be
+ * changed using the SCTP_PEER_ADDR_THLDS socket option
+ */
+ int pf_retrans;
+
/* Maximum number of times the endpoint will retransmit INIT */
__u16 max_init_attempts;
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 0842ef0..1b02d7a 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -93,6 +93,7 @@ typedef __s32 sctp_assoc_t;
#define SCTP_GET_ASSOC_NUMBER 28 /* Read only */
#define SCTP_GET_ASSOC_ID_LIST 29 /* Read only */
#define SCTP_AUTO_ASCONF 30
+#define SCTP_PEER_ADDR_THLDS 31
/* Internal Socket Options. Some of the sctp library functions are
* implemented using these socket options.
@@ -649,6 +650,7 @@ struct sctp_paddrinfo {
*/
enum sctp_spinfo_state {
SCTP_INACTIVE,
+ SCTP_PF,
SCTP_ACTIVE,
SCTP_UNCONFIRMED,
SCTP_UNKNOWN = 0xffff /* Value used for transport state unknown */
@@ -741,4 +743,13 @@ typedef struct {
int sd;
} sctp_peeloff_arg_t;
+/*
+ * Peer Address Thresholds socket option
+ */
+struct sctp_paddrthlds {
+ sctp_assoc_t spt_assoc_id;
+ struct sockaddr_storage spt_address;
+ __u16 spt_pathmaxrxt;
+ __u16 spt_pathpfthld;
+};
#endif /* __net_sctp_user_h__ */
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 5bc9ab1..90fe36b 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -124,6 +124,8 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
* socket values.
*/
asoc->max_retrans = sp->assocparams.sasoc_asocmaxrxt;
+ asoc->pf_retrans = sctp_pf_retrans;
+
asoc->rto_initial = msecs_to_jiffies(sp->rtoinfo.srto_initial);
asoc->rto_max = msecs_to_jiffies(sp->rtoinfo.srto_max);
asoc->rto_min = msecs_to_jiffies(sp->rtoinfo.srto_min);
@@ -685,6 +687,9 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
/* Set the path max_retrans. */
peer->pathmaxrxt = asoc->pathmaxrxt;
+ /* And the partial failure retrnas threshold */
+ peer->pf_retrans = asoc->pf_retrans;
+
/* Initialize the peer's SACK delay timeout based on the
* association configured value.
*/
@@ -840,6 +845,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
struct sctp_ulpevent *event;
struct sockaddr_storage addr;
int spc_state = 0;
+ bool ulp_notify = true;
/* Record the transition on the transport. */
switch (command) {
@@ -853,6 +859,14 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
spc_state = SCTP_ADDR_CONFIRMED;
else
spc_state = SCTP_ADDR_AVAILABLE;
+ /* Don't inform ULP about transition from PF to
+ * active state and set cwnd to 1, see SCTP
+ * Quick failover draft section 5.1, point 5
+ */
+ if (transport->state == SCTP_PF) {
+ ulp_notify = false;
+ transport->cwnd = 1;
+ }
transport->state = SCTP_ACTIVE;
break;
@@ -871,6 +885,11 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
spc_state = SCTP_ADDR_UNREACHABLE;
break;
+ case SCTP_TRANSPORT_PF:
+ transport->state = SCTP_PF;
+ ulp_notify = false;
+ break;
+
default:
return;
}
@@ -878,12 +897,15 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
/* Generate and send a SCTP_PEER_ADDR_CHANGE notification to the
* user.
*/
- memset(&addr, 0, sizeof(struct sockaddr_storage));
- memcpy(&addr, &transport->ipaddr, transport->af_specific->sockaddr_len);
- event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
- 0, spc_state, error, GFP_ATOMIC);
- if (event)
- sctp_ulpq_tail_event(&asoc->ulpq, event);
+ if (ulp_notify) {
+ memset(&addr, 0, sizeof(struct sockaddr_storage));
+ memcpy(&addr, &transport->ipaddr,
+ transport->af_specific->sockaddr_len);
+ event = sctp_ulpevent_make_peer_addr_change(asoc, &addr,
+ 0, spc_state, error, GFP_ATOMIC);
+ if (event)
+ sctp_ulpq_tail_event(&asoc->ulpq, event);
+ }
/* Select new active and retran paths. */
@@ -899,7 +921,8 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
transports) {
if ((t->state == SCTP_INACTIVE) ||
- (t->state == SCTP_UNCONFIRMED))
+ (t->state == SCTP_UNCONFIRMED) ||
+ (t->state == SCTP_PF))
continue;
if (!first || t->last_time_heard > first->last_time_heard) {
second = first;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index a0fa19f..e7aa177c 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -792,7 +792,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
if (!new_transport)
new_transport = asoc->peer.active_path;
} else if ((new_transport->state == SCTP_INACTIVE) ||
- (new_transport->state == SCTP_UNCONFIRMED)) {
+ (new_transport->state == SCTP_UNCONFIRMED) ||
+ (new_transport->state == SCTP_PF)) {
/* If the chunk is Heartbeat or Heartbeat Ack,
* send it to chunk->transport, even if it's
* inactive.
@@ -987,7 +988,8 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
new_transport = chunk->transport;
if (!new_transport ||
((new_transport->state == SCTP_INACTIVE) ||
- (new_transport->state == SCTP_UNCONFIRMED)))
+ (new_transport->state == SCTP_UNCONFIRMED) ||
+ (new_transport->state == SCTP_PF)))
new_transport = asoc->peer.active_path;
if (new_transport->state == SCTP_UNCONFIRMED)
continue;
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index c96d1a8..285e26a 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -76,6 +76,8 @@ static int sctp_side_effects(sctp_event_t event_type, sctp_subtype_t subtype,
sctp_cmd_seq_t *commands,
gfp_t gfp);
+static void sctp_cmd_hb_timer_update(sctp_cmd_seq_t *cmds,
+ struct sctp_transport *t);
/********************************************************************
* Helper functions
********************************************************************/
@@ -470,7 +472,8 @@ sctp_timer_event_t *sctp_timer_events[SCTP_NUM_TIMEOUT_TYPES] = {
* notification SHOULD be sent to the upper layer.
*
*/
-static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
+static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
+ struct sctp_association *asoc,
struct sctp_transport *transport,
int is_hb)
{
@@ -495,6 +498,23 @@ static void sctp_do_8_2_transport_strike(struct sctp_association *asoc,
transport->error_count++;
}
+ /* If the transport error count is greater than the pf_retrans
+ * threshold, and less than pathmaxrtx, then mark this transport
+ * as Partially Failed, ee SCTP Quick Failover Draft, secon 5.1,
+ * point 1
+ */
+ if ((transport->state != SCTP_PF) &&
+ (asoc->pf_retrans < transport->pathmaxrxt) &&
+ (transport->error_count > asoc->pf_retrans)) {
+
+ sctp_assoc_control_transport(asoc, transport,
+ SCTP_TRANSPORT_PF,
+ 0);
+
+ /* Update the hb timer to resend a heartbeat every rto */
+ sctp_cmd_hb_timer_update(commands, transport);
+ }
+
if (transport->state != SCTP_INACTIVE &&
(transport->error_count > transport->pathmaxrxt)) {
SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
@@ -699,6 +719,10 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
SCTP_HEARTBEAT_SUCCESS);
}
+ if (t->state == SCTP_PF)
+ sctp_assoc_control_transport(asoc, t, SCTP_TRANSPORT_UP,
+ SCTP_HEARTBEAT_SUCCESS);
+
/* The receiver of the HEARTBEAT ACK should also perform an
* RTT measurement for that destination transport address
* using the time value carried in the HEARTBEAT ACK chunk.
@@ -1565,8 +1589,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
case SCTP_CMD_STRIKE:
/* Mark one strike against a transport. */
- sctp_do_8_2_transport_strike(asoc, cmd->obj.transport,
- 0);
+ sctp_do_8_2_transport_strike(commands, asoc,
+ cmd->obj.transport, 0);
break;
case SCTP_CMD_TRANSPORT_IDLE:
@@ -1576,7 +1600,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
case SCTP_CMD_TRANSPORT_HB_SENT:
t = cmd->obj.transport;
- sctp_do_8_2_transport_strike(asoc, t, 1);
+ sctp_do_8_2_transport_strike(commands, asoc,
+ t, 1);
t->hb_sent = 1;
break;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index b3b8a8d..bba551f 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3470,6 +3470,56 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
}
+/*
+ * SCTP_PEER_ADDR_THLDS
+ *
+ * This option allows us to alter the partially failed threshold for one or all
+ * transports in an association. See Section 6.1 of:
+ * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
+ */
+static int sctp_setsockopt_paddr_thresholds(struct sock *sk,
+ char __user *optval,
+ unsigned int optlen)
+{
+ struct sctp_paddrthlds val;
+ struct sctp_transport *trans;
+ struct sctp_association *asoc;
+
+ if (optlen < sizeof(struct sctp_paddrthlds))
+ return -EINVAL;
+ if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval,
+ sizeof(struct sctp_paddrthlds)))
+ return -EFAULT;
+
+
+ if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
+ asoc = sctp_id2assoc(sk, val.spt_assoc_id);
+ if (!asoc)
+ return -ENOENT;
+ list_for_each_entry(trans, &asoc->peer.transport_addr_list,
+ transports) {
+ if (val.spt_pathmaxrxt)
+ trans->pathmaxrxt = val.spt_pathmaxrxt;
+ trans->pf_retrans = val.spt_pathpfthld;
+ }
+
+ if (val.spt_pathmaxrxt)
+ asoc->pathmaxrxt = val.spt_pathmaxrxt;
+ asoc->pf_retrans = val.spt_pathpfthld;
+ } else {
+ trans = sctp_addr_id2transport(sk, &val.spt_address,
+ val.spt_assoc_id);
+ if (!trans)
+ return -ENOENT;
+
+ if (val.spt_pathmaxrxt)
+ trans->pathmaxrxt = val.spt_pathmaxrxt;
+ trans->pf_retrans = val.spt_pathpfthld;
+ }
+
+ return 0;
+}
+
/* API 6.2 setsockopt(), getsockopt()
*
* Applications use setsockopt() and getsockopt() to set or retrieve
@@ -3619,6 +3669,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
case SCTP_AUTO_ASCONF:
retval = sctp_setsockopt_auto_asconf(sk, optval, optlen);
break;
+ case SCTP_PEER_ADDR_THLDS:
+ retval = sctp_setsockopt_paddr_thresholds(sk, optval, optlen);
+ break;
default:
retval = -ENOPROTOOPT;
break;
@@ -5490,6 +5543,50 @@ static int sctp_getsockopt_assoc_ids(struct sock *sk, int len,
return 0;
}
+/*
+ * SCTP_PEER_ADDR_THLDS
+ *
+ * This option allows us to fetch the partially failed threshold for one or all
+ * transports in an association. See Section 6.1 of:
+ * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
+ */
+static int sctp_getsockopt_paddr_thresholds(struct sock *sk,
+ char __user *optval,
+ int optlen)
+{
+ struct sctp_paddrthlds val;
+ struct sctp_transport *trans;
+ struct sctp_association *asoc;
+
+ if (optlen < sizeof(struct sctp_paddrthlds))
+ return -EINVAL;
+ optlen = sizeof(struct sctp_paddrthlds);
+ if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval, optlen))
+ return -EFAULT;
+
+ if (sctp_is_any(sk, (const union sctp_addr *)&val.spt_address)) {
+ asoc = sctp_id2assoc(sk, val.spt_assoc_id);
+ if (!asoc)
+ return -ENOENT;
+
+ val.spt_pathpfthld = asoc->pf_retrans;
+ val.spt_pathmaxrxt = asoc->pathmaxrxt;
+ } else {
+ trans = sctp_addr_id2transport(sk, &val.spt_address,
+ val.spt_assoc_id);
+ if (!trans)
+ return -ENOENT;
+
+ val.spt_pathmaxrxt = trans->pathmaxrxt;
+ val.spt_pathpfthld = trans->pf_retrans;
+ }
+
+ if (copy_to_user(optval, &val, optlen))
+ return -EFAULT;
+
+ return optlen;
+}
+
SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
@@ -5628,6 +5725,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
case SCTP_AUTO_ASCONF:
retval = sctp_getsockopt_auto_asconf(sk, len, optval, optlen);
break;
+ case SCTP_PEER_ADDR_THLDS:
+ retval = sctp_getsockopt_paddr_thresholds(sk, optval, len);
+ break;
default:
retval = -ENOPROTOOPT;
break;
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index e5fe639..2b2bfe9 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -141,6 +141,15 @@ static ctl_table sctp_table[] = {
.extra2 = &int_max
},
{
+ .procname = "pf_retrans",
+ .data = &sctp_pf_retrans,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &int_max
+ },
+ {
.procname = "max_init_retransmits",
.data = &sctp_max_retrans_init,
.maxlen = sizeof(int),
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index b026ba0..194d0f3 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -85,6 +85,7 @@ static struct sctp_transport *sctp_transport_init(struct sctp_transport *peer,
/* Initialize the default path max_retrans. */
peer->pathmaxrxt = sctp_max_retrans_path;
+ peer->pf_retrans = sctp_pf_retrans;
INIT_LIST_HEAD(&peer->transmitted);
INIT_LIST_HEAD(&peer->send_ready);
@@ -585,7 +586,8 @@ unsigned long sctp_transport_timeout(struct sctp_transport *t)
{
unsigned long timeout;
timeout = t->rto + sctp_jitter(t->rto);
- if (t->state != SCTP_UNCONFIRMED)
+ if ((t->state != SCTP_UNCONFIRMED) &&
+ (t->state != SCTP_PF))
timeout += t->hbinterval;
timeout += jiffies;
return timeout;
--
1.7.7.6
^ permalink raw reply related
* netfilter,rcu: hang in nf_conntrack_net_exit
From: Sasha Levin @ 2012-07-20 18:43 UTC (permalink / raw)
To: pablo, kaber, davem
Cc: netfilter-devel, netdev, coreteam, linux-kernel@vger.kernel.org
Hi all,
I've stumbled on the following while fuzzing with trinity inside a KVM tools guest, using the latest -next kernel.
[ 483.990135] INFO: task kworker/u:0:6 blocked for more than 120 seconds.
[ 483.991328] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 483.992500] kworker/u:0 D ffff88000adfa4c8 3160 6 2 0x00000000
[ 483.993527] ffff88000d655a60 0000000000000086 ffff880035bd7640 00000000001d7640
[ 483.994667] ffff88000d655fd8 ffff88000d655fd8 ffff88000d655fd8 ffff88000d655fd8
[ 483.995945] ffff880008448000 ffff88000d65b000 ffff88000d655a50 ffff88000d655c18
[ 483.997202] Call Trace:
[ 483.997653] [<ffffffff83788d25>] schedule+0x55/0x60
[ 483.998449] [<ffffffff83786d7a>] schedule_timeout+0x3a/0x370
[ 483.999461] [<ffffffff8115d4a9>] ? mark_held_locks+0xf9/0x130
[ 484.000016] [<ffffffff83789443>] ? wait_for_common+0xf3/0x170
[ 484.000016] [<ffffffff8378a44b>] ? _raw_spin_unlock_irq+0x2b/0x80
[ 484.000016] [<ffffffff8115d758>] ? trace_hardirqs_on_caller+0x118/0x140
[ 484.000016] [<ffffffff8378944b>] wait_for_common+0xfb/0x170
[ 484.000016] [<ffffffff8112fd90>] ? try_to_wake_up+0x290/0x290
[ 484.000016] [<ffffffff811a7960>] ? __call_rcu+0x390/0x390
[ 484.000016] [<ffffffff83789568>] wait_for_completion+0x18/0x20
[ 484.000016] [<ffffffff8111548f>] wait_rcu_gp+0x6f/0xa0
[ 484.000016] [<ffffffff811130f0>] ? perf_trace_rcu_utilization+0xd0/0xd0
[ 484.000016] [<ffffffff83789394>] ? wait_for_common+0x44/0x170
[ 484.000016] [<ffffffff811aa126>] synchronize_rcu+0x86/0x90
[ 484.000016] [<ffffffff83131455>] synchronize_net+0x35/0x40
[ 484.000016] [<ffffffff83194125>] nf_conntrack_cleanup+0x55/0x70
[ 484.000016] [<ffffffff831948a2>] nf_conntrack_net_exit+0x72/0x80
[ 484.000016] [<ffffffff83125285>] ops_exit_list.isra.0+0x35/0x70
[ 484.000016] [<ffffffff83125c70>] cleanup_net+0x100/0x1a0
[ 484.000016] [<ffffffff8110c926>] process_one_work+0x3e6/0x790
[ 484.000016] [<ffffffff8110c7e0>] ? process_one_work+0x2a0/0x790
[ 484.000016] [<ffffffff83125b70>] ? net_drop_ns+0x40/0x40
[ 484.000016] [<ffffffff8110d568>] ? worker_thread+0x48/0x380
[ 484.000016] [<ffffffff8110d719>] worker_thread+0x1f9/0x380
[ 484.000016] [<ffffffff8110d520>] ? manage_workers.isra.8+0x110/0x110
[ 484.000016] [<ffffffff8111823d>] kthread+0xad/0xc0
[ 484.000016] [<ffffffff8378d134>] kernel_thread_helper+0x4/0x10
[ 484.000016] [<ffffffff8378b234>] ? retint_restore_args+0x13/0x13
[ 484.000016] [<ffffffff81118190>] ? insert_kthread_work+0x40/0x40
[ 484.000016] [<ffffffff8378d130>] ? gs_change+0x13/0x13
[ 484.000016] 3 locks held by kworker/u:0/6:
[ 484.000016] #0: (netns){.+.+.+}, at: [<ffffffff8110c7e0>] process_one_work+0x2a0/0x790
[ 484.000016] #1: (net_cleanup_work){+.+.+.}, at: [<ffffffff8110c7e0>] process_one_work+0x2a0/0x790
[ 484.000016] #2: (net_mutex){+.+.+.}, at: [<ffffffff83125bf0>] cleanup_net+0x80/0x1a0
^ permalink raw reply
* Re: [PATCH v4] sctp: Implement quick failover draft from tsvwg
From: Neil Horman @ 2012-07-20 18:36 UTC (permalink / raw)
To: Vlad Yasevich; +Cc: netdev, Sridhar Samudrala, David S. Miller, linux-sctp, joe
In-Reply-To: <50099B97.9070701@gmail.com>
On Fri, Jul 20, 2012 at 01:55:35PM -0400, Vlad Yasevich wrote:
> On 07/20/2012 01:19 PM, Neil Horman wrote:
> >I've seen several attempts recently made to do quick failover of sctp transports
> >by reducing various retransmit timers and counters. While its possible to
> >implement a faster failover on multihomed sctp associations, its not
> >particularly robust, in that it can lead to unneeded retransmits, as well as
> >false connection failures due to intermittent latency on a network.
> >
> >Instead, lets implement the new ietf quick failover draft found here:
> >http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
> >
> >This will let the sctp stack identify transports that have had a small number of
> >errors, and avoid using them quickly until their reliability can be
> >re-established. I've tested this out on two virt guests connected via multiple
> >isolated virt networks and believe its in compliance with the above draft and
> >works well.
> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: Vlad Yasevich <vyasevich@gmail.com>
> >CC: Sridhar Samudrala <sri@us.ibm.com>
> >CC: "David S. Miller" <davem@davemloft.net>
> >CC: linux-sctp@vger.kernel.org
> >CC: joe@perches.com
> >
> >---
> >Change notes:
> >
> >V2)
> >- Added socket option API from section 6.1 of the specification, as per
> >request from Vlad. Adding this socket option allows us to alter both the path
> >maximum retransmit value and the path partial failure threshold for each
> >transport and the association as a whole.
> >
> >- Added a per transport pf_retrans value, and initialized it from the
> >association value. This makes each transport independently configurable as per
> >the socket option above, and prevents changes in the sysctl from bleeding into
> >an already created association.
> >
> >V3)
> >- Cleaned up some line spacing (Joe Perches)
> >- Fixed some socket option user data sanitization (Vlad Yasevich)
> >
> >V4)
> >- Added additional documentation (Flavio Leitner)
> >---
> > Documentation/networking/ip-sysctl.txt | 14 +++++
> > include/net/sctp/constants.h | 1 +
> > include/net/sctp/structs.h | 20 ++++++-
> > include/net/sctp/user.h | 11 ++++
> > net/sctp/associola.c | 37 ++++++++++--
> > net/sctp/outqueue.c | 6 +-
> > net/sctp/sm_sideeffect.c | 33 +++++++++-
> > net/sctp/socket.c | 100 ++++++++++++++++++++++++++++++++
> > net/sctp/sysctl.c | 9 +++
> > net/sctp/transport.c | 4 +-
> > 10 files changed, 220 insertions(+), 15 deletions(-)
> >
>
> [ snip ]
>
> >
> >diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> >index b3b8a8d..fef9bfa 100644
> >--- a/net/sctp/socket.c
> >+++ b/net/sctp/socket.c
> >@@ -3470,6 +3470,56 @@ static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
> > }
> >
> >
> >+/*
> >+ * SCTP_PEER_ADDR_THLDS
> >+ *
> >+ * This option allows us to alter the partially failed threshold for one or all
> >+ * transports in an association. See Section 6.1 of:
> >+ * http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt
> >+ */
> >+static int sctp_setsockopt_paddr_thresholds(struct sock *sk,
> >+ char __user *optval,
> >+ unsigned int optlen)
> >+{
> >+ struct sctp_paddrthlds val;
> >+ struct sctp_transport *trans;
> >+ struct sctp_association *asoc;
> >+
> >+ if (optlen < sizeof(struct sctp_paddrthlds))
> >+ return -EINVAL;
> >+ if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval,
> >+ sizeof(struct sctp_paddrthlds)))
> >+ return -EFAULT;
> >+
> >+ /* path_max_retrans shouldn't ever be zero */
> >+ if (!val.spt_pathmaxrxt)
> >+ return -EINVAL;
>
> I am not sure I like this solution. This means that the application
> must fetch the pathmaxrx and then write the same value back here.
> Why not simply ignore the patthmaxrxt if it's 0? That way someone
> can just tweak the pf value without changing the pathmaxrxt.
>
>
Yeah, I can make that change.
Neil
^ permalink raw reply
* [patch iproute2 v2] iplink: add support for num[tr]xqueues
From: Jiri Pirko @ 2012-07-20 18:27 UTC (permalink / raw)
To: netdev; +Cc: davem, edumazet, shemminger
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
include/linux/if_link.h | 2 ++
ip/iplink.c | 20 ++++++++++++++++++++
man/man8/ip-link.8.in | 13 +++++++++++++
3 files changed, 35 insertions(+)
V1->V2 - Use "matches" instead of strcmp
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 00e5868..46f03db 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -140,6 +140,8 @@ enum {
IFLA_EXT_MASK, /* Extended info mask, VFs, etc */
IFLA_PROMISCUITY, /* Promiscuity count: > 0 means acts PROMISC */
#define IFLA_PROMISCUITY IFLA_PROMISCUITY
+ IFLA_NUM_TX_QUEUES,
+ IFLA_NUM_RX_QUEUES,
__IFLA_MAX
};
diff --git a/ip/iplink.c b/ip/iplink.c
index 679091e..4111871 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -48,6 +48,8 @@ void iplink_usage(void)
fprintf(stderr, " [ address LLADDR ]\n");
fprintf(stderr, " [ broadcast LLADDR ]\n");
fprintf(stderr, " [ mtu MTU ]\n");
+ fprintf(stderr, " [ numtxqueues QUEUE_COUNT ]\n");
+ fprintf(stderr, " [ numrxqueues QUEUE_COUNT ]\n");
fprintf(stderr, " type TYPE [ ARGS ]\n");
fprintf(stderr, " ip link delete DEV type TYPE [ ARGS ]\n");
fprintf(stderr, "\n");
@@ -279,6 +281,8 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
int mtu = -1;
int netns = -1;
int vf = -1;
+ int numtxqueues = -1;
+ int numrxqueues = -1;
*group = -1;
ret = argc;
@@ -445,6 +449,22 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
invarg("Invalid operstate\n", *argv);
addattr8(&req->n, sizeof(*req), IFLA_OPERSTATE, state);
+ } else if (matches(*argv, "numtxqueues") == 0) {
+ NEXT_ARG();
+ if (numtxqueues != -1)
+ duparg("numtxqueues", *argv);
+ if (get_integer(&numtxqueues, *argv, 0))
+ invarg("Invalid \"numtxqueues\" value\n", *argv);
+ addattr_l(&req->n, sizeof(*req), IFLA_NUM_TX_QUEUES,
+ &numtxqueues, 4);
+ } else if (matches(*argv, "numrxqueues") == 0) {
+ NEXT_ARG();
+ if (numrxqueues != -1)
+ duparg("numrxqueues", *argv);
+ if (get_integer(&numrxqueues, *argv, 0))
+ invarg("Invalid \"numrxqueues\" value\n", *argv);
+ addattr_l(&req->n, sizeof(*req), IFLA_NUM_RX_QUEUES,
+ &numrxqueues, 4);
} else {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 9386cc6..8a24e51 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -40,6 +40,11 @@ ip-link \- network device configuration
.RB "[ " mtu
.IR MTU " ]"
.br
+.RB "[ " numtxqueues
+.IR QUEUE_COUNT " ]"
+.RB "[ " numrxqueues
+.IR QUEUE_COUNT " ]"
+.br
.BR type " TYPE"
.RI "[ " ARGS " ]"
@@ -156,6 +161,14 @@ Link types:
- Ethernet Bridge device
.in -8
+.TP
+.BI numtxqueues " QUEUE_COUNT "
+specifies the number of transmit queues for new device.
+
+.TP
+.BI numrxqueues " QUEUE_COUNT "
+specifies the number of receive queues for new device.
+
.SS ip link delete - delete virtual link
.I DEVICE
specifies the virtual device to act operate on.
--
1.7.10.4
^ permalink raw reply related
* Re: [PATCH] Crash in tun
From: David Miller @ 2012-07-20 18:23 UTC (permalink / raw)
To: mikulas; +Cc: eric.dumazet, maxk, vtun, netdev
In-Reply-To: <alpine.DEB.2.00.1207191746170.7550@artax.karlin.mff.cuni.cz>
From: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Date: Thu, 19 Jul 2012 18:13:36 +0200 (CEST)
> tun: fix a crash bug and a memory leak
>
> This patch fixes a crash
> tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel ->
> sock_release -> iput(SOCK_INODE(sock))
> introduced by commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d
>
> The problem is that this socket is embedded in struct tun_struct, it has
> no inode, iput is called on invalid inode, which modifies invalid memory
> and optionally causes a crash.
>
> sock_release also decrements sockets_in_use, this causes a bug that
> "sockets: used" field in /proc/*/net/sockstat keeps on decreasing when
> creating and closing tun devices.
>
> This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs
> sock_release to not free the inode and not decrement sockets_in_use,
> fixing both memory corruption and sockets_in_use underflow.
>
> It should be backported to 3.3 an 3.4 stabke.
>
> Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
> Cc: stable@kernel.org
Applied.
^ permalink raw reply
* Re: [PATCH] atl1c: fix issue of io access mode for AR8152 v2.1
From: David Miller @ 2012-07-20 18:23 UTC (permalink / raw)
To: cjren; +Cc: netdev, linux-kernel, qca-linux-team, nic-devel
In-Reply-To: <1342753318-4507-1-git-send-email-cjren@qca.qualcomm.com>
From: <cjren@qca.qualcomm.com>
Date: Fri, 20 Jul 2012 11:01:58 +0800
> When io access mode is enabled by BOOTROM or BIOS for AR8152 v2.1,
> the register can't be read/write by memory access mode.
> Clearing Bit 8 of Register 0x21c could fixed the issue.
>
> Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com>
> Cc: stable <stable@vger.kernel.org>
> Signed-off-by: xiong <xiong@qca.qualcomm.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: ethernet: davinci_emac: add pm_runtime support
From: David Miller @ 2012-07-20 18:23 UTC (permalink / raw)
To: mgreer; +Cc: netdev, linux-omap, linux-arm-kernel, nsekhar, khilman
In-Reply-To: <1342736577-30477-1-git-send-email-mgreer@animalcreek.com>
From: "Mark A. Greer" <mgreer@animalcreek.com>
Date: Thu, 19 Jul 2012 15:22:57 -0700
> From: "Mark A. Greer" <mgreer@animalcreek.com>
>
> Add pm_runtime support to the TI Davinci EMAC driver.
>
> CC: Sekhar Nori <nsekhar@ti.com>
> CC: Kevin Hilman <khilman@ti.com>
> Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
This patch doesn't apply at all to net-next
^ permalink raw reply
* Re: [PATCH V2 resend] ipv6: fix incorrect route 'expires' value passed to userspace
From: David Miller @ 2012-07-20 18:22 UTC (permalink / raw)
To: David.Laight; +Cc: lw, netdev, shemminger
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B6F9C@saturn3.aculab.com>
From: "David Laight" <David.Laight@ACULAB.COM>
Date: Fri, 20 Jul 2012 11:32:05 +0100
>> - else if (rt->dst.expires - jiffies < INT_MAX)
>> - expires = rt->dst.expires - jiffies;
>> + else if ((long)rt->dst.expires - (long)jiffies > INT_MIN
>> + && (long)rt->dst.expires - (long)jiffies <
> INT_MAX)
>> + expires = (long)rt->dst.expires - (long)jiffies;
>> else
>> - expires = INT_MAX;
>> + expires = time_is_after_jiffies(rt->dst.expires) ?
> INT_MAX : INT_MIN;
>
> I can't help feeling there is a better way to do this.
> Maybe:
> long expires = rt->dst.expires - jiffies;
> if (expires != (int)expires)
> expires = expires > 0 ? INT_MAX : INT_MIN;
> Although maybe -INT_MAX instead of INT_MIN.
This patch also does not apply at all to net-next, so needs to be
redone regardless.
^ permalink raw reply
* Re: [PATCH] gianfar: add support for wake-on-packet
From: David Miller @ 2012-07-20 18:22 UTC (permalink / raw)
To: chenhui.zhao; +Cc: netdev, scottwood, linux-kernel, leoli
In-Reply-To: <1342788723-27703-1-git-send-email-chenhui.zhao@freescale.com>
From: Zhao Chenhui <chenhui.zhao@freescale.com>
Date: Fri, 20 Jul 2012 20:52:03 +0800
> Note: The local ip/mac address is the ethernet primary IP/MAC address of
> the station. Do not support multiple IP/MAC addresses.
I'm not applying this.
There is no such concept of "primary IP address" for interfaces,
just picking the first interface address you find is completel
bogus.
Someone might add 100 IP addresses, then the one that's "primary",
and vice versa.
^ permalink raw reply
* Re: [patch iproute2] iplink: add support for num[tr]xqueues
From: Jiri Pirko @ 2012-07-20 18:22 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, netdev, edumazet, shemminger
In-Reply-To: <s6ijrlvetpfdqgjl8x20788e.1342807975041@email.android.com>
Fri, Jul 20, 2012 at 08:12:55PM CEST, stephen.hemminger@vyatta.com wrote:
>
>Or use matches("numtxqueues") rather than strcmp
Okay - I will repost with "matches"
^ permalink raw reply
* Re: [PATCH] ipv4: show pmtu in route list
From: David Miller @ 2012-07-20 18:16 UTC (permalink / raw)
To: ja; +Cc: netdev
In-Reply-To: <1342774928-29301-1-git-send-email-ja@ssi.bg>
From: Julian Anastasov <ja@ssi.bg>
Date: Fri, 20 Jul 2012 12:02:08 +0300
> Is this patch still useful if routing cache is removed?
It is, since this function still gets used for rtnetlink route
queries. So I'll apply this, thanks Julian!
Which reminds me that we don't have way to inspect the new TCP metrics
cache. Would someone like to work on that?
^ permalink raw reply
* Re: [net-next 0/9][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-07-20 18:13 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1342747446-16132-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 19 Jul 2012 18:23:57 -0700
> This series contains updates to ixgbe.
>
> The following are changes since commit 769162e38b91e1d300752e666260fa6c7b203fbc:
> Merge branch 'net' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
Pulled, thanks Jeff.
^ permalink raw reply
* Re: [PATCH v2 net-next] tcp: fix ABC in tcp_slow_start()
From: Neal Cardwell @ 2012-07-20 18:13 UTC (permalink / raw)
To: David Miller
Cc: stephen.hemminger, ycheng, eric.dumazet, netdev, therbert,
shemminger, johnwheffner, nanditad
In-Reply-To: <20120720.110234.810125972347182800.davem@davemloft.net>
On Fri, Jul 20, 2012 at 11:02 AM, David Miller <davem@davemloft.net> wrote:
> From: Stephen Hemminger <stephen.hemminger@vyatta.com>
> Date: Fri, 20 Jul 2012 11:01:43 -0700
>
>> TCP ABC was an experiment that failed. It solves a problem that does
>> not exist on Linux. I did the implementation mostly to prove that.
>>
>> Perhaps it abc should just be removed?
>
> This is my impression as well.
Removing ABC sounds good to me as well.
neal
^ permalink raw reply
* Re: [patch iproute2] iplink: add support for num[tr]xqueues
From: Stephen Hemminger @ 2012-07-20 18:12 UTC (permalink / raw)
To: David Miller; +Cc: jiri, netdev, edumazet, shemminger
Or use matches("numtxqueues") rather than strcmp
^ permalink raw reply
* Re: [GIT] net has been merged into net-next
From: David Miller @ 2012-07-20 18:11 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <1342744598.2616.15.camel@jtkirshe-mobl>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 19 Jul 2012 17:36:38 -0700
> On Thu, 2012-07-19 at 11:31 -0700, David Miller wrote:
>> Jeff, please double check my merge work on the ixgbevf driver, and send
>> me any necessary fixups.
>>
>> Thanks!
>
> Your conflict resolution was correct, so no fix-ups needed. Thanks
> Dave.
Thanks a lot Jeff.
^ permalink raw reply
* Re: [patch iproute2] iplink: add support for num[tr]xqueues
From: David Miller @ 2012-07-20 18:10 UTC (permalink / raw)
To: jiri; +Cc: stephen.hemminger, netdev, edumazet, shemminger
In-Reply-To: <20120720180813.GA1560@minipsycho.orion>
From: Jiri Pirko <jiri@resnulli.us>
Date: Fri, 20 Jul 2012 20:08:13 +0200
> Fri, Jul 20, 2012 at 05:52:18PM CEST, stephen.hemminger@vyatta.com wrote:
>>I like the option, but numtxqueue is too verbose for the syntax model
>>of iproute. Why not use txq and rxq?
>
> There is "txqueuelen" present already in iplink. I tried to be uniform here.
> Isn't "txq" and "rxq" rather too short? And afterall, these parameters
> are not supposed to be used on daily basis by anyone :)
Perhaps a good compromise is numtxq and numrxq?
^ permalink raw reply
* Re: [patch net-next 5/6] bond_sysfs: use ream_num_tx_queues rather than params.tx_queue
From: Jiri Pirko @ 2012-07-20 18:09 UTC (permalink / raw)
To: David Miller; +Cc: bhutchings, netdev, edumazet, shemminger, fubar, andy
In-Reply-To: <20120720.110453.200101053484551575.davem@davemloft.net>
Fri, Jul 20, 2012 at 08:04:53PM CEST, davem@davemloft.net wrote:
>From: Ben Hutchings <bhutchings@solarflare.com>
>Date: Fri, 20 Jul 2012 16:03:00 +0100
>
>> Typo in the subject line. :-)
Doh!
>
>I'll fix this up when I apply it :-)
Thanks Dave :) I will use clipboard next time...
^ permalink raw reply
* Re: [RFC PATCH] net: Add support for virtual machine device queues (VMDQ)
From: Ben Hutchings @ 2012-07-20 18:09 UTC (permalink / raw)
To: John Fastabend
Cc: or.gerlitz, davem, roland, netdev, ali, sean.hefty, shlomop
In-Reply-To: <20120718220544.22619.97136.stgit@i40e.jf1>
On Wed, 2012-07-18 at 18:05 -0400, John Fastabend wrote:
> This adds support to allow virtual net devices to be created. These
> devices can be managed independtly of the physical function but
> use the same physical link.
>
> This is analagous to an offloaded macvlan device. The primary
> advantage to VMDQ net devices over virtual functions is they can
> be added and removed dynamically as needed.
Is VMDQ intended to become a generic name?
> Sending this for Or Gerlitz to take a peak at and see if this
> could be used for his ipoib bits. Its not pretty as is and
> likely needs some work its just an idea at this point use at
> your own risk I believe it compiles.
[...]
> +static int vmdq_newlink(struct net *src_net, struct net_device *dev,
> + struct nlattr *tb[], struct nlattr *data[])
> +{
> + struct net_device *lowerdev;
> + int err = -EOPNOTSUPP;
> +
> + if (!tb[IFLA_LINK])
> + return -EINVAL;
> +
> + lowerdev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
> + if (!lowerdev)
> + return -ENODEV;
> +
> + if (!tb[IFLA_MTU])
> + dev->mtu = lowerdev->mtu;
> + else if (dev->mtu > lowerdev->mtu)
> + return -EINVAL;
> +
> + if (lowerdev->netdev_ops->ndo_add_vmdq)
> + err = lowerdev->netdev_ops->ndo_add_vmdq(lowerdev, dev);
Why isn't the device allocation left to the lower device driver? It
seems like this would simplify things quite a bit.
[...]
> +int vmdq_get_tx_queues(struct net *net, struct nlattr *tb[])
> +{
> + struct net_device *lowerdev;
> +
> + if (!tb[IFLA_LINK])
> + return -EINVAL;
> +
> + lowerdev = __dev_get_by_index(net, nla_get_u32(tb[IFLA_LINK]));
> + if (!lowerdev)
> + return -ENODEV;
> +
> + return lowerdev->num_tx_queues;
> +}
[...]
Why should this match the lower device? Is the assumption that it will
share the lower device's TX queues and only have its own RX queue(s)?
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [patch net-next 0/6] net: add team multiqueue support and do comple of thing on the way
From: David Miller @ 2012-07-20 18:08 UTC (permalink / raw)
To: jiri; +Cc: netdev, edumazet, shemminger, fubar, andy
In-Reply-To: <1342787331-1866-1-git-send-email-jiri@resnulli.us>
From: Jiri Pirko <jiri@resnulli.us>
Date: Fri, 20 Jul 2012 14:28:45 +0200
> This patchset represents the way I walked when I was adding multiqueue support for
> team driver.
>
> Jiri Pirko (6):
> net: honour netif_set_real_num_tx_queues() retval
> rtnl: allow to specify different num for rx and tx queue count
> rtnl: allow to specify number of rx and tx queues on device creation
> net: rename bond_queue_mapping to slave_dev_queue_mapping
> bond_sysfs: use ream_num_tx_queues rather than params.tx_queue
> team: add multiqueue support
All applied, thanks Jiri.
^ permalink raw reply
* Re: [patch iproute2] iplink: add support for num[tr]xqueues
From: Jiri Pirko @ 2012-07-20 18:08 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, David Miller, edumazet, shemminger
In-Reply-To: <lh9v01i5y1bg7t9j24kof0f8.1342799538355@email.android.com>
Fri, Jul 20, 2012 at 05:52:18PM CEST, stephen.hemminger@vyatta.com wrote:
>I like the option, but numtxqueue is too verbose for the syntax model
>of iproute. Why not use txq and rxq?
There is "txqueuelen" present already in iplink. I tried to be uniform here.
Isn't "txq" and "rxq" rather too short? And afterall, these parameters
are not supposed to be used on daily basis by anyone :)
Jirka
>
>Sent from my ASUS Pad
>
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>---
>> include/linux/if_link.h | 2 ++
>> ip/iplink.c | 20 ++++++++++++++++++++
>> man/man8/ip-link.8.in | 13 +++++++++++++
>> 3 files changed, 35 insertions(+)
>>
>>diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>>index 00e5868..46f03db 100644
>>--- a/include/linux/if_link.h
>>+++ b/include/linux/if_link.h
>>@@ -140,6 +140,8 @@ enum {
>> IFLA_EXT_MASK, /* Extended info mask, VFs, etc */
>> IFLA_PROMISCUITY, /* Promiscuity count: > 0 means acts PROMISC */
>> #define IFLA_PROMISCUITY IFLA_PROMISCUITY
>>+ IFLA_NUM_TX_QUEUES,
>>+ IFLA_NUM_RX_QUEUES,
>> __IFLA_MAX
>> };
>>
>>diff --git a/ip/iplink.c b/ip/iplink.c
>>index 679091e..0baa128 100644
>>--- a/ip/iplink.c
>>+++ b/ip/iplink.c
>>@@ -48,6 +48,8 @@ void iplink_usage(void)
>> fprintf(stderr, " [ address LLADDR ]\n");
>> fprintf(stderr, " [ broadcast LLADDR ]\n");
>> fprintf(stderr, " [ mtu MTU ]\n");
>>+ fprintf(stderr, " [ numtxqueues QUEUE_COUNT ]\n");
>>+ fprintf(stderr, " [ numrxqueues QUEUE_COUNT ]\n");
>> fprintf(stderr, " type TYPE [ ARGS ]\n");
>> fprintf(stderr, " ip link delete DEV type TYPE [ ARGS ]\n");
>> fprintf(stderr, "\n");
>>@@ -279,6 +281,8 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
>> int mtu = -1;
>> int netns = -1;
>> int vf = -1;
>>+ int numtxqueues = -1;
>>+ int numrxqueues = -1;
>>
>> *group = -1;
>> ret = argc;
>>@@ -445,6 +449,22 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
>> invarg("Invalid operstate\n", *argv);
>>
>> addattr8(&req->n, sizeof(*req), IFLA_OPERSTATE, state);
>>+ } else if (strcmp(*argv, "numtxqueues") == 0) {
>>+ NEXT_ARG();
>>+ if (numtxqueues != -1)
>>+ duparg("numtxqueues", *argv);
>>+ if (get_integer(&numtxqueues, *argv, 0))
>>+ invarg("Invalid \"numtxqueues\" value\n", *argv);
>>+ addattr_l(&req->n, sizeof(*req), IFLA_NUM_TX_QUEUES,
>>+ &numtxqueues, 4);
>>+ } else if (strcmp(*argv, "numrxqueues") == 0) {
>>+ NEXT_ARG();
>>+ if (numrxqueues != -1)
>>+ duparg("numrxqueues", *argv);
>>+ if (get_integer(&numrxqueues, *argv, 0))
>>+ invarg("Invalid \"numrxqueues\" value\n", *argv);
>>+ addattr_l(&req->n, sizeof(*req), IFLA_NUM_RX_QUEUES,
>>+ &numrxqueues, 4);
>> } else {
>> if (strcmp(*argv, "dev") == 0) {
>> NEXT_ARG();
>>diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
>>index 9386cc6..8a24e51 100644
>>--- a/man/man8/ip-link.8.in
>>+++ b/man/man8/ip-link.8.in
>>@@ -40,6 +40,11 @@ ip-link \- network device configuration
>> .RB "[ " mtu
>> .IR MTU " ]"
>> .br
>>+.RB "[ " numtxqueues
>>+.IR QUEUE_COUNT " ]"
>>+.RB "[ " numrxqueues
>>+.IR QUEUE_COUNT " ]"
>>+.br
>> .BR type " TYPE"
>> .RI "[ " ARGS " ]"
>>
>>@@ -156,6 +161,14 @@ Link types:
>> - Ethernet Bridge device
>> .in -8
>>
>>+.TP
>>+.BI numtxqueues " QUEUE_COUNT "
>>+specifies the number of transmit queues for new device.
>>+
>>+.TP
>>+.BI numrxqueues " QUEUE_COUNT "
>>+specifies the number of receive queues for new device.
>>+
>> .SS ip link delete - delete virtual link
>> .I DEVICE
>> specifies the virtual device to act operate on.
>>--
>>1.7.10.4
>>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox