Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 -next 2/2] tcp: syncookies: reduce mss table to four values
From: David Miller @ 2013-09-24 14:40 UTC (permalink / raw)
  To: fw; +Cc: netdev
In-Reply-To: <1379709176-1625-2-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Fri, 20 Sep 2013 22:32:56 +0200

> Halve mss table size to make blind cookie guessing more difficult.
> This is sad since the tables were already small, but there
> is little alternative except perhaps adding more precise mss information
> in the tcp timestamp.  Timestamps are unfortunately not ubiquitous.
> 
> Guessing all possible cookie values still has 8-in 2**32 chance.
> 
> Reported-by: Jakob Lell <jakob@jakoblell.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied.

Thanks for following up on all of my feedback.

^ permalink raw reply

* Re: [PATCH v3 -next 1/2] tcp: syncookies: reduce cookie lifetime to 128 seconds
From: David Miller @ 2013-09-24 14:40 UTC (permalink / raw)
  To: fw; +Cc: netdev
In-Reply-To: <1379709176-1625-1-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Fri, 20 Sep 2013 22:32:55 +0200

> We currently accept cookies that were created less than 4 minutes ago
> (ie, cookies with counter delta 0-3).  Combined with the 8 mss table
> values, this yields 32 possible values (out of 2**32) that will be valid.
> 
> Reducing the lifetime to < 2 minutes halves the guessing chance while
> still providing a large enough period.
> 
> While at it, get rid of jiffies value -- they overflow too quickly on
> 32 bit platforms.
> 
> getnstimeofday is used to create a counter that increments every 64s.
> perf shows getnstimeofday cost is negible compared to sha_transform;
> normal tcp initial sequence number generation uses getnstimeofday, too.
> 
> Reported-by: Jakob Lell <jakob@jakoblell.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied.

^ permalink raw reply

* Re: [net-next PATCH 0/4] cpsw: support for control module register
From: David Miller @ 2013-09-24 14:34 UTC (permalink / raw)
  To: mugunthanvnm
  Cc: netdev, zonque, bcousson, tony, devicetree, linux-omap,
	linux-arm-kernel
In-Reply-To: <1379704841-32693-1-git-send-email-mugunthanvnm@ti.com>

From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Sat, 21 Sep 2013 00:50:37 +0530

> This patch series adds the support for configuring GMII_SEL register
> of control module to select the phy mode type and also to configure
> the clock source for RMII phy mode whether to use internal clock or
> the external clock from the phy itself.
> 
> Till now CPSW works as this configuration is done in U-Boot and carried
> over to the kernel. But during suspend/resume Control module tends to
> lose its configured value for GMII_SEL register in AM33xx PG1.0, so
> if CPSW is used in RMII or RGMII mode, on resume cpsw is not working
> as GMII_SEL register lost its configuration values.
> 
> The initial version of the patch is done by Daniel Mack but as per
> Tony's comment he wants it as a seperate driver as it is done in USB
> control module. I have created a seperate driver for the same.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH 01/10] can: Remove extern from function prototypes
From: Marc Kleine-Budde @ 2013-09-24 14:22 UTC (permalink / raw)
  To: David Miller; +Cc: joe, netdev, linux-kernel, wg, linux-can
In-Reply-To: <20130924.100209.1219862004963547693.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

On 09/24/2013 04:02 PM, David Miller wrote:
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Date: Tue, 24 Sep 2013 09:37:12 +0200
> 
>> On 09/24/2013 12:11 AM, Joe Perches wrote:
>>> There are a mix of function prototypes with and without extern
>>> in the kernel sources.  Standardize on not using extern for
>>> function prototypes.
>>>
>>> Function prototypes don't need to be written with extern.
>>> extern is assumed by the compiler.  Its use is as unnecessary as
>>> using auto to declare automatic/local variables in a block.
>>>
>>> Signed-off-by: Joe Perches <joe@perches.com>
>>
>> Thx, added to linux-can-next. The patch will be included in the next
>> pull request to David.
> 
> Marc, I'm trying to just quickly apply these all into my tree directly.

Fine with me.

Tnx, Marc


-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply

* Re: [PATCH] skge: fix invalid value passed to pci_unmap_sigle
From: David Miller @ 2013-09-24 14:18 UTC (permalink / raw)
  To: mpatocka; +Cc: netdev, romieu, i.gnatenko.brain, stephen
In-Reply-To: <alpine.LRH.2.02.1309201352010.1763@file01.intranet.prod.int.rdu2.redhat.com>

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Fri, 20 Sep 2013 13:53:22 -0400 (EDT)

> In my patch c194992cbe71c20bb3623a566af8d11b0bfaa721 I didn't fix the skge

Always refer to commits, not just by SHA ID, but also with the commit
header line text in parenthesis and double quotes, for this you'd say:

c194992cbe71c20bb3623a566af8d11b0bfaa721 ("skge: fix broken driver")

Using just the SHA ID is completely ambiguous, because the SHA ID will
be entirely different if this commit is added to a different tree, such
as -stable.

> bug correctly. The value of the new mapping (not old) was passed to
> pci_unmap_single.
> 
> If we enable CONFIG_DMA_API_DEBUG, it results in this warning:
> WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:986 check_sync+0x4c4/0x580()
> skge 0000:02:07.0: DMA-API: device driver tries to sync DMA memory it has
> not allocated [device address=0x000000023a0096c0] [size=1536 bytes]
> 
> This patch makes the skge driver pass the correct value to
> pci_unmap_single and fixes the warning. It copies the old descriptor to
> on-stack variable "ee" and unmaps it if mapping of the new descriptor
> succeeded.
> 
> This patch should be backported to 3.11-stable.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Reported-by: Francois Romieu <romieu@fr.zoreil.com>
> Tested-by: Mikulas Patocka <mpatocka@redhat.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH ] net: raw: do not report ICMP redirects to user space
From: David Miller @ 2013-09-24 14:17 UTC (permalink / raw)
  To: duanj.fnst; +Cc: netdev
In-Reply-To: <523C21A5.4050504@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Fri, 20 Sep 2013 18:21:25 +0800

> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> 
> Redirect isn't an error condition, it should leave
> the error handler without touching the socket.
> 
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>

Applied.

^ permalink raw reply

* Re: [PATCH ] net: udp: do not report ICMP redirects to user space
From: David Miller @ 2013-09-24 14:17 UTC (permalink / raw)
  To: duanj.fnst; +Cc: netdev
In-Reply-To: <523C216C.90707@cn.fujitsu.com>

From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Fri, 20 Sep 2013 18:20:28 +0800

> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> 
> Redirect isn't an error condition, it should leave
> the error handler without touching the socket.
> 
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>

Applied.

^ permalink raw reply

* Re: [net 5/6] i40e: better return values
From: David Miller @ 2013-09-24 14:12 UTC (permalink / raw)
  To: joe; +Cc: jeffrey.t.kirsher, jesse.brandeburg, netdev, gospo, sassmann
In-Reply-To: <1380022486.3575.74.camel@joe-AO722>

From: Joe Perches <joe@perches.com>
Date: Tue, 24 Sep 2013 04:34:46 -0700

> On Tue, 2013-09-24 at 02:45 -0700, Jeff Kirsher wrote:
> 
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> []
>> @@ -3339,9 +3345,7 @@ static u8 i40e_dcb_get_num_tc(struct i40e_dcbx_config *dcbcfg)
>>  	/* Traffic class index starts from zero so
>>  	 * increment to return the actual count
>>  	 */
>> -	num_tc++;
>> -
>> -	return num_tc;
>> +	return num_tc++;
> 
> Ick.  post_increment problem.
> 
> 	return ++num_tc;
> 
> There's nothing wrong with the original code
> unless this is a bugfix which should be documented
> better than "better return values".

Agreed, this style of coding is asking for a bug.

If you want to return "num_tc PLUS ONE" just say that:

	return num_tc + 1;

Why even use pre/post increment when the variable has no other
use than as a return value?

^ permalink raw reply

* Re: [PATCH 01/10] can: Remove extern from function prototypes
From: David Miller @ 2013-09-24 14:11 UTC (permalink / raw)
  To: joe; +Cc: netdev, linux-kernel, wg, mkl, linux-can
In-Reply-To: <5570169a078375fa8662adeb2a7f24c1ae718bfb.1379974101.git.joe@perches.com>


Series applied, thanks Joe.

^ permalink raw reply

* Re: [PATCH 01/10] can: Remove extern from function prototypes
From: David Miller @ 2013-09-24 14:02 UTC (permalink / raw)
  To: mkl; +Cc: joe, netdev, linux-kernel, wg, linux-can
In-Reply-To: <52414128.8030706@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Tue, 24 Sep 2013 09:37:12 +0200

> On 09/24/2013 12:11 AM, Joe Perches wrote:
>> There are a mix of function prototypes with and without extern
>> in the kernel sources.  Standardize on not using extern for
>> function prototypes.
>> 
>> Function prototypes don't need to be written with extern.
>> extern is assumed by the compiler.  Its use is as unnecessary as
>> using auto to declare automatic/local variables in a block.
>> 
>> Signed-off-by: Joe Perches <joe@perches.com>
> 
> Thx, added to linux-can-next. The patch will be included in the next
> pull request to David.

Marc, I'm trying to just quickly apply these all into my tree directly.

Thanks.

^ permalink raw reply

* [PATCH net-next v3 2/2] ipv4: processing ancillary IP_TOS or IP_TTL
From: Francesco Fusco @ 2013-09-24 13:43 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1379944641.git.ffusco@redhat.com>

If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().

The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.

Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
---
 v1->v2
  - reworked the entire patch
  - modified the ttl field in the struct inet_cork from __s16 to __u8:
    0 means that the TTL is not specified
  - the tos field in the struct inet_cork is still __s16: 
    -1 means tha the tos is not set
  - modified the priority field in the struct inet_cork from __u32 to 
    char.
  - introduced the get_rttos and get_rtconn_flags functions
 v2->v3
  - no code changes, rebase to net-next

 include/net/inet_sock.h |  3 +++
 include/net/ip.h        | 11 +++++++++++
 include/net/route.h     |  1 +
 net/ipv4/icmp.c         |  5 +++++
 net/ipv4/ip_output.c    | 13 ++++++++++---
 net/ipv4/ping.c         |  4 +++-
 net/ipv4/raw.c          |  4 +++-
 net/ipv4/udp.c          |  4 +++-
 8 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 636d203..f314177 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -103,6 +103,9 @@ struct inet_cork {
 	int			length; /* Total length of all frames */
 	struct dst_entry	*dst;
 	u8			tx_flags;
+	__u8			ttl;
+	__s16			tos;
+	char			priority;
 };
 
 struct inet_cork_full {
diff --git a/include/net/ip.h b/include/net/ip.h
index 0135f38..77b4f9b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -28,6 +28,7 @@
 #include <linux/skbuff.h>
 
 #include <net/inet_sock.h>
+#include <net/route.h>
 #include <net/snmp.h>
 #include <net/flow.h>
 
@@ -140,6 +141,16 @@ static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4)
 	return __ip_make_skb(sk, fl4, &sk->sk_write_queue, &inet_sk(sk)->cork.base);
 }
 
+static inline __u8 get_rttos(struct ipcm_cookie* ipc, struct inet_sock *inet)
+{
+	return (ipc->tos != -1) ? RT_TOS(ipc->tos) : RT_TOS(inet->tos);
+}
+
+static inline __u8 get_rtconn_flags(struct ipcm_cookie* ipc, struct sock* sk)
+{
+	return (ipc->tos != -1) ? RT_CONN_FLAGS_TOS(sk, ipc->tos) : RT_CONN_FLAGS(sk);
+}
+
 /* datagram.c */
 int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 
diff --git a/include/net/route.h b/include/net/route.h
index 6f572ca..0ad8e01 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -39,6 +39,7 @@
 #define RTO_ONLINK	0x01
 
 #define RT_CONN_FLAGS(sk)   (RT_TOS(inet_sk(sk)->tos) | sock_flag(sk, SOCK_LOCALROUTE))
+#define RT_CONN_FLAGS_TOS(sk,tos)   (RT_TOS(tos) | sock_flag(sk, SOCK_LOCALROUTE))
 
 struct fib_nh;
 struct fib_info;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 5f7d11a..5c0e8bc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -353,6 +353,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	saddr = fib_compute_spec_dst(skb);
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
+
 	if (icmp_param->replyopts.opt.opt.optlen) {
 		ipc.opt = &icmp_param->replyopts.opt;
 		if (ipc.opt->opt.srr)
@@ -608,6 +611,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	ipc.addr = iph->saddr;
 	ipc.opt = &icmp_param->replyopts.opt;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
 			       type, code, icmp_param);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index a04d872..7d8357b 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1060,6 +1060,9 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 			 rt->dst.dev->mtu : dst_mtu(&rt->dst);
 	cork->dst = &rt->dst;
 	cork->length = 0;
+	cork->ttl = ipc->ttl;
+	cork->tos = ipc->tos;
+	cork->priority = ipc->priority;
 	cork->tx_flags = ipc->tx_flags;
 
 	return 0;
@@ -1311,7 +1314,9 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	if (cork->flags & IPCORK_OPT)
 		opt = cork->opt;
 
-	if (rt->rt_type == RTN_MULTICAST)
+	if (cork->ttl != 0)
+		ttl = cork->ttl;
+	else if (rt->rt_type == RTN_MULTICAST)
 		ttl = inet->mc_ttl;
 	else
 		ttl = ip_select_ttl(inet, &rt->dst);
@@ -1319,7 +1324,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	iph = ip_hdr(skb);
 	iph->version = 4;
 	iph->ihl = 5;
-	iph->tos = inet->tos;
+	iph->tos = (cork->tos != -1) ? cork->tos : inet->tos;
 	iph->frag_off = df;
 	iph->ttl = ttl;
 	iph->protocol = sk->sk_protocol;
@@ -1331,7 +1336,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 		ip_options_build(skb, opt, cork->addr, rt, 0);
 	}
 
-	skb->priority = sk->sk_priority;
+	skb->priority = (cork->tos != -1) ? cork->priority: sk->sk_priority;
 	skb->mark = sk->sk_mark;
 	/*
 	 * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
@@ -1481,6 +1486,8 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
 	ipc.addr = daddr;
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	if (replyopts.opt.opt.optlen) {
 		ipc.opt = &replyopts.opt;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index d7d9882..706d108e 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -713,6 +713,8 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	ipc.opt = NULL;
 	ipc.oif = sk->sk_bound_dev_if;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	sock_tx_timestamp(sk, &ipc.tx_flags);
 
@@ -744,7 +746,7 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			return -EINVAL;
 		faddr = ipc.opt->opt.faddr;
 	}
-	tos = RT_TOS(inet->tos);
+	tos = get_rttos(&ipc, inet);
 	if (sock_flag(sk, SOCK_LOCALROUTE) ||
 	    (msg->msg_flags & MSG_DONTROUTE) ||
 	    (ipc.opt && ipc.opt->opt.is_strictroute)) {
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index bfec521..a3fe534 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -517,6 +517,8 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	ipc.addr = inet->inet_saddr;
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 	ipc.oif = sk->sk_bound_dev_if;
 
 	if (msg->msg_controllen) {
@@ -556,7 +558,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			daddr = ipc.opt->opt.faddr;
 		}
 	}
-	tos = RT_CONN_FLAGS(sk);
+	tos = get_rtconn_flags(&ipc, sk);
 	if (msg->msg_flags & MSG_DONTROUTE)
 		tos |= RTO_ONLINK;
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 74d2c95..22462d94 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -855,6 +855,8 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 
 	ipc.opt = NULL;
 	ipc.tx_flags = 0;
+	ipc.ttl = 0;
+	ipc.tos = -1;
 
 	getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
 
@@ -938,7 +940,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		faddr = ipc.opt->opt.faddr;
 		connected = 0;
 	}
-	tos = RT_TOS(inet->tos);
+	tos = get_rttos(&ipc, inet);
 	if (sock_flag(sk, SOCK_LOCALROUTE) ||
 	    (msg->msg_flags & MSG_DONTROUTE) ||
 	    (ipc.opt && ipc.opt->opt.is_strictroute)) {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v3 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
From: Francesco Fusco @ 2013-09-24 13:43 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <cover.1379944641.git.ffusco@redhat.com>

This patch enables the IP_TTL and IP_TOS values passed from userspace to
be stored in the ipcm_cookie struct. Three fields are added to the struct:

- the TTL, expressed as __u8.
  The allowed values are in the [1-255].
  A value of 0 means that the TTL is not specified.

- the TOS, expressed as __s16.
  The allowed values are in the range [0,255].
  A value of -1 means that the TOS is not specified.

- the priority, expressed as a char and computed when
  handling the ancillary data.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
---
 v1->v2
  - changed the icmp_cookie ttl field from __s16 to __u8.
    A value of 0 means that the TTL has not been specified
  - to tos field is still __s16. The user can specify
    values in the range 0-255 included, therefore I use
    a value of -1 as a flag saying that the value has
    not been specified
  - the priority it is now a char instead of a __u32, 
    which is the return type of rt_tos2priority
  - improved commit message
 v1->v2
  - no code changes, rebase to net-next

 include/net/ip.h       |  3 +++
 net/ipv4/ip_sockglue.c | 20 +++++++++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index c1f192b..0135f38 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -56,6 +56,9 @@ struct ipcm_cookie {
 	int			oif;
 	struct ip_options_rcu	*opt;
 	__u8			tx_flags;
+	__u8			ttl;
+	__s16			tos;
+	char			priority;
 };
 
 #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index d9c4f11..56e3445 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -189,7 +189,7 @@ EXPORT_SYMBOL(ip_cmsg_recv);
 
 int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
 {
-	int err;
+	int err, val;
 	struct cmsghdr *cmsg;
 
 	for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
@@ -215,6 +215,24 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
 			ipc->addr = info->ipi_spec_dst.s_addr;
 			break;
 		}
+		case IP_TTL:
+			if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+				return -EINVAL;
+			val = *(int *)CMSG_DATA(cmsg);
+			if (val < 1 || val > 255)
+				return -EINVAL;
+			ipc->ttl = val;
+			break;
+		case IP_TOS:
+			if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+				return -EINVAL;
+			val = *(int *)CMSG_DATA(cmsg);
+			if (val < 0 || val > 255)
+				return -EINVAL;
+			ipc->tos = val;
+			ipc->priority = rt_tos2priority(ipc->tos);
+			break;
+
 		default:
 			return -EINVAL;
 		}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v3 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg()
From: Francesco Fusco @ 2013-09-24 13:43 UTC (permalink / raw)
  To: davem; +Cc: netdev

There is no way to set the IP_TOS field on a per-packet basis in IPv4, while
IPv6 has such a mechanism. Therefore one has to fall back to the setsockopt()
in case of IPv4. 

Using the existing per-socket option is not convenient particularly in the
situations where multiple threads have to use the same socket data requiring
per-thread TOS values. In fact this would involve calling setsockopt() before
sendmsg() every time.

Francesco Fusco (2):
  ipv4: IP_TOS and IP_TTL can be specified as ancillary data
  ipv4: processing ancillary IP_TOS or IP_TTL

 include/net/inet_sock.h |  3 +++
 include/net/ip.h        | 14 ++++++++++++++
 include/net/route.h     |  1 +
 net/ipv4/icmp.c         |  5 +++++
 net/ipv4/ip_output.c    | 13 ++++++++++---
 net/ipv4/ip_sockglue.c  | 20 +++++++++++++++++++-
 net/ipv4/ping.c         |  4 +++-
 net/ipv4/raw.c          |  4 +++-
 net/ipv4/udp.c          |  4 +++-
 9 files changed, 61 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH net 0/4] bridge: Fix problems around the PVID
From: Vlad Yasevich @ 2013-09-24 13:35 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Toshiaki Makita, David Miller, netdev, Fernando Luis Vazquez Cao,
	Patrick McHardy
In-Reply-To: <1380023107.3162.53.camel@ubuntu-vm-makita>

On 09/24/2013 07:45 AM, Toshiaki Makita wrote:
> On Mon, 2013-09-23 at 10:41 -0400, Vlad Yasevich wrote:
>> On 09/17/2013 04:12 AM, Toshiaki Makita wrote:
>>> On Mon, 2013-09-16 at 13:49 -0400, Vlad Yasevich wrote:
>>>> On 09/13/2013 08:06 AM, Toshiaki Makita wrote:
>>>>> On Thu, 2013-09-12 at 16:00 -0400, David Miller wrote:
>>>>>> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>>>>>> Date: Tue, 10 Sep 2013 19:27:54 +0900
>>>>>>
>>>>>>> There seem to be some undesirable behaviors related with PVID.
>>>>>>> 1. It has no effect assigning PVID to a port. PVID cannot be applied
>>>>>>> to any frame regardless of whether we set it or not.
>>>>>>> 2. FDB entries learned via frames applied PVID are registered with
>>>>>>> VID 0 rather than VID value of PVID.
>>>>>>> 3. We can set 0 or 4095 as a PVID that are not allowed in IEEE 802.1Q.
>>>>>>> This leads interoperational problems such as sending frames with VID
>>>>>>> 4095, which is not allowed in IEEE 802.1Q, and treating frames with VID
>>>>>>> 0 as they belong to VLAN 0, which is expected to be handled as they have
>>>>>>> no VID according to IEEE 802.1Q.
>>>>>>>
>>>>>>> Note: 2nd and 3rd problems are potential and not exposed unless 1st problem
>>>>>>> is fixed, because we cannot activate PVID due to it.
>>>>>>
>>>>>> Please work out the issues in patch #2 with Vlad and resubmit this
>>>>>> series.
>>>>>>
>>>>>> Thank you.
>>>>>
>>>>> I'm hovering between whether we should fix the issue by changing vlan 0
>>>>> interface behavior in 8021q module or enabling a bridge port to sending
>>>>> priority-tagged frames, or another better way.
>>>>>
>>>>> If you could comment it, I'd appreciate it :)
>>>>>
>>>>>
>>>>> BTW, I think what is discussed in patch #2 is another problem about
>>>>> handling priority-tags, and it exists without this patch set applied.
>>>>> It looks like that we should prepare another patch set than this to fix
>>>>> that problem.
>>>>>
>>>>> Should I include patches that fix the priority-tags problem in this
>>>>> patch set and resubmit them all together?
>>>>>
>>>>
>>>> I am thinking that we might need to do it in bridge and it looks like
>>>> the simplest way to do it is to have default priority regeneration table
>>>> (table 6-5 from 802.1Q doc).
>>>>
>>>> That way I think we would conform to the spec.
>>>>
>>>> -vlad
>>>
>>> Unfortunately I don't think the default priority regeneration table
>>> resolves the problem because IEEE 802.1Q says that a VLAN-aware bridge
>>> can transmit untagged or VLAN-tagged frames only (the end of section 7.5
>>> and 8.1.7).
>>>
>>> No mechanism to send priority-tagged frames is found as far as I can see
>>> the standard. I think the regenerated priority is used for outgoing PCP
>>> field only if egress policy is not untagged (i.e. transmitting as
>>> VLAN-tagged), and unused if untagged (Section 6.9.2 3rd/4th Paragraph).
>>>
>>> If we want to transmit priority-tagged frames from a bridge port, I
>>> think we need to implement a new (optional) feature that is above the
>>> standard, as I stated previously.
>>>
>>> How do you feel about adding a per-port policy that enables a bridge to
>>> send priority-tagged frames instead of untagged frames when egress
>>> policy for the port is untagged?
>>> With this change, we can transmit frames for a given vlan as either all
>>> untagged, all priority-tagged or all VLAN-tagged.
>>
>> That would work.  What I am thinking is that we do it by special casing
>> the vid 0 egress policy specification.  Let it be untagged by default
>> and if it is tagged, then we preserve the priority field and forward
>> it on.
>>
>> This keeps the API stable and doesn't require user/admin from knowing
>> exactly what happens.  Default operation conforms to the spec and allows
>> simple change to make it backward-compatible.
>>
>> What do you think.  I've done a simple prototype of this an it seems to
>> work with the VMs I am testing with.
>
> Are you saying that
> - by default, set the 0th bit of untagged_bitmap; and
> - if we unset the 0th bit and set the "vid"th bit, we transmit frames
> classified as belonging to VLAN "vid" as priority-tagged?
>
> If so, though it's attractive to keep current API, I'm worried about if
> it could be a bit confusing and not intuitive for kernel/iproute2
> developers that VID 0 has a special meaning only in the egress policy.
> Wouldn't it be better to adding a new member to struct net_port_vlans
> instead of using VID 0 of untagged_bitmap?
>
> Or are you saying that we use a new flag in struct net_port_vlans but
> use the BRIDGE_VLAN_INFO_UNTAGGED bit with VID 0 in netlink to set the
> flag?
>
> Even in that case, I'm afraid that it might be confusing for developers
> for the same reason. We are going to prohibit to specify VID with 0 (and
> 4095) in adding/deleting a FDB entry or a vlan filtering entry, but it
> would allow us to use VID 0 only when a vlan filtering entry is
> configured.
> I am thinking a new nlattr is a straightforward approach to configure
> it.

By making this an explicit attribute it makes vid 0 a special case for
any automatic tool that would provision such filtering.  Seeing vid 0
would mean that these tools would have to know that this would have to
be translated to a different attribute instead of setting the policy
values.

How it is implemented internally in the kernel isn't as big of an issue.
We can do it as a separate flag or as part of existing policy.

-vlad

>
> Thanks,
>
> Toshiaki Makita
>
>>
>> -vlad
>>
>>>
>>> Thanks,
>>>
>>> Toshiaki Makita
>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Toshiaki Makita
>>>>>
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>
>

^ permalink raw reply

* [PATCH] net: net_secret should not depend on TCP
From: Eric Dumazet @ 2013-09-24 13:19 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <20130924054532.GA24446@order.stressinduktion.org>

From: Eric Dumazet <edumazet@google.com>

A host might need net_secret[] and never open a single socket. 

Problem added in commit aebda156a570782
("net: defer net_secret[] initialization")

Based on prior patch from Hannes Frederic Sowa.

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/secure_seq.h |    1 -
 net/core/secure_seq.c    |   27 ++++++++++++++++++++++++---
 net/ipv4/af_inet.c       |    4 +---
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index 6ca975b..c2e542b 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -3,7 +3,6 @@
 
 #include <linux/types.h>
 
-extern void net_secret_init(void);
 extern __u32 secure_ip_id(__be32 daddr);
 extern __u32 secure_ipv6_id(const __be32 daddr[4]);
 extern u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 6a2f13c..3f1ec15 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -10,11 +10,24 @@
 
 #include <net/secure_seq.h>
 
-static u32 net_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned;
+#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
-void net_secret_init(void)
+static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
+
+static void net_secret_init(void)
 {
-	get_random_bytes(net_secret, sizeof(net_secret));
+	u32 tmp;
+	int i;
+
+	if (likely(net_secret[0]))
+		return;
+
+	for (i = NET_SECRET_SIZE; i > 0;) {
+		do {
+			get_random_bytes(&tmp, sizeof(tmp));
+		} while (!tmp);
+		cmpxchg(&net_secret[--i], 0, tmp);
+	}
 }
 
 #ifdef CONFIG_INET
@@ -42,6 +55,7 @@ __u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 	u32 hash[MD5_DIGEST_WORDS];
 	u32 i;
 
+	net_secret_init();
 	memcpy(hash, saddr, 16);
 	for (i = 0; i < 4; i++)
 		secret[i] = net_secret[i] + (__force u32)daddr[i];
@@ -63,6 +77,7 @@ u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 	u32 hash[MD5_DIGEST_WORDS];
 	u32 i;
 
+	net_secret_init();
 	memcpy(hash, saddr, 16);
 	for (i = 0; i < 4; i++)
 		secret[i] = net_secret[i] + (__force u32) daddr[i];
@@ -82,6 +97,7 @@ __u32 secure_ip_id(__be32 daddr)
 {
 	u32 hash[MD5_DIGEST_WORDS];
 
+	net_secret_init();
 	hash[0] = (__force __u32) daddr;
 	hash[1] = net_secret[13];
 	hash[2] = net_secret[14];
@@ -96,6 +112,7 @@ __u32 secure_ipv6_id(const __be32 daddr[4])
 {
 	__u32 hash[4];
 
+	net_secret_init();
 	memcpy(hash, daddr, 16);
 	md5_transform(hash, net_secret);
 
@@ -107,6 +124,7 @@ __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 {
 	u32 hash[MD5_DIGEST_WORDS];
 
+	net_secret_init();
 	hash[0] = (__force u32)saddr;
 	hash[1] = (__force u32)daddr;
 	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
@@ -121,6 +139,7 @@ u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
 {
 	u32 hash[MD5_DIGEST_WORDS];
 
+	net_secret_init();
 	hash[0] = (__force u32)saddr;
 	hash[1] = (__force u32)daddr;
 	hash[2] = (__force u32)dport ^ net_secret[14];
@@ -140,6 +159,7 @@ u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 	u32 hash[MD5_DIGEST_WORDS];
 	u64 seq;
 
+	net_secret_init();
 	hash[0] = (__force u32)saddr;
 	hash[1] = (__force u32)daddr;
 	hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
@@ -164,6 +184,7 @@ u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
 	u64 seq;
 	u32 i;
 
+	net_secret_init();
 	memcpy(hash, saddr, 16);
 	for (i = 0; i < 4; i++)
 		secret[i] = net_secret[i] + daddr[i];
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7a1874b..cfeb85c 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -263,10 +263,8 @@ void build_ehash_secret(void)
 		get_random_bytes(&rnd, sizeof(rnd));
 	} while (rnd == 0);
 
-	if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0) {
+	if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0)
 		get_random_bytes(&ipv6_hash_secret, sizeof(ipv6_hash_secret));
-		net_secret_init();
-	}
 }
 EXPORT_SYMBOL(build_ehash_secret);
 

^ permalink raw reply related

* Re: [PATCH] ptp: add range check on n_samples
From: Richard Cochran @ 2013-09-24 12:50 UTC (permalink / raw)
  To: Dong Zhu; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20130924070557.GA28795@zhudong.nay.redhat.com>

On Tue, Sep 24, 2013 at 03:05:57PM +0800, Dong Zhu wrote:
> From d4eb97e8d5def76d46167c91059147e2c7d33433 Mon Sep 17 00:00:00 2001
> 
> When using PTP_SYS_OFFSET ioctl to measure the time offset between the
> PHC and system clock, we need to specify the number of measurements, the
> valid value of n_samples is between 1 to 25. If n_samples <= 0 or > 25
> it makes no sense, so this patch intends to add a range check.

The field, n_samples, is unsigned, so the check is not needed.

Thanks,
Richard
 
> Signed-off-by: Dong Zhu <bluezhudong@gmail.com>
> ---
>  drivers/ptp/ptp_chardev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> index 34a0c60..4e85b23 100644
> --- a/drivers/ptp/ptp_chardev.c
> +++ b/drivers/ptp/ptp_chardev.c
> @@ -104,7 +104,8 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
>  			err = -EFAULT;
>  			break;
>  		}
> -		if (sysoff->n_samples > PTP_MAX_SAMPLES) {
> +		if (sysoff->n_samples <= 0 ||
> +		    sysoff->n_samples > PTP_MAX_SAMPLES) {
>  			err = -EINVAL;
>  			break;
>  		}
> -- 
> 1.7.11.7
> 
> -- 
> Best Regards,
> Dong Zhu

^ permalink raw reply

* Re: [pchecks v1 2/4] Use raw cpu ops for calls that would trigger with checks
From: Eric Dumazet @ 2013-09-24 12:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, Tejun Heo, akpm, Steven Rostedt, linux-kernel,
	Peter Zijlstra, netdev
In-Reply-To: <20130924073250.GD28538@gmail.com>

On Tue, 2013-09-24 at 09:32 +0200, Ingo Molnar wrote:
> (netdev Cc:-ed)
> 
> * Christoph Lameter <cl@linux.com> wrote:
> 
> > These location triggered during testing with KVM.
> > 
> > These are fetches without preemption off where we judged that
> > to be more performance efficient or where other means of
> > providing synchronization (BH handling) are available.
> 
> > Index: linux/include/net/snmp.h
> > ===================================================================
> > --- linux.orig/include/net/snmp.h	2013-09-12 13:26:29.216103951 -0500
> > +++ linux/include/net/snmp.h	2013-09-12 13:26:29.208104037 -0500
> > @@ -126,7 +126,7 @@ struct linux_xfrm_mib {
> >  	extern __typeof__(type) __percpu *name[SNMP_ARRAY_SZ]
> >  
> >  #define SNMP_INC_STATS_BH(mib, field)	\
> > -			__this_cpu_inc(mib[0]->mibs[field])
> > +			raw_cpu_inc(mib[0]->mibs[field])
> >  
> >  #define SNMP_INC_STATS_USER(mib, field)	\
> >  			this_cpu_inc(mib[0]->mibs[field])
> > @@ -141,7 +141,7 @@ struct linux_xfrm_mib {
> >  			this_cpu_dec(mib[0]->mibs[field])
> >  
> >  #define SNMP_ADD_STATS_BH(mib, field, addend)	\
> > -			__this_cpu_add(mib[0]->mibs[field], addend)
> > +			raw_cpu_add(mib[0]->mibs[field], addend)
> 
> Are the networking folks fine with allowing unafe operations of SNMP stats 
> in preemptible sections, or should the kernel produce an optional warning 
> message if CONFIG_PREEMPT_DEBUG=y and these ops are used in preemptible 
> (non-bh, non-irq-handler, non-irqs-off, etc.) sections?
> 
> RAW_SNMP_*_STATS() ops could be used to annotate those places where that 
> kind of usage is safe.


I would rather not use RAW_ prefix in the macro, but add debugging
check to make sure we use _BH() variant in the right context.

BUG_ON(!in_softirq())

^ permalink raw reply

* SMSC 9303 support
From: Gary Thomas @ 2013-09-24 12:21 UTC (permalink / raw)
  To: netdev

I need to support the SMSC9303 in an embedded system.  I'm not
finding any [explicit] support for this device in the latest
mainline kernel.  Did I miss something?

To be clear, the SMSC9303 is a 3-port managed ethernet switch
capable of supporting 802.1D/802.1Q directly. This switch is
driven by a single MAC via MII/RMII and exposes the other two
ports via physical PHYs.  What I need it to do is behave like
two external, separate devices.  I was thinking that what I need
to do is treat these as VLAN devices since the switch can manage
the routing.

Does this seem like a reasonable approach?
How do I "hook up" my normal ethernet driver to it?  To the hardware
it just looks like any other MII/RMII PHY.  The device is managed
separately via I2C.  I can have that set up separately if necessary.

Thanks for any pointers/ideas

-- 
------------------------------------------------------------
Gary Thomas                 |  Consulting for the
MLB Associates              |    Embedded world
------------------------------------------------------------

^ permalink raw reply

* RE: [PATCH 1/2] net: Toeplitz library functions
From: Eric Dumazet @ 2013-09-24 12:24 UTC (permalink / raw)
  To: David Laight; +Cc: Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B7355@saturn3.aculab.com>

On Tue, 2013-09-24 at 09:32 +0100, David Laight wrote:
> > +static inline unsigned int
> > +toeplitz_hash(const unsigned char *bytes,
> > +	      struct toeplitz *toeplitz, int n)
> > +{
> > +	int i;
> > +	unsigned int result = 0;
> > +
> > +	for (i = 0; i < n; i++)
> > +		result ^= toeplitz->key_cache[i][bytes[i]];
> > +
> > +        return result;
> > +};
> 
> That is a horrid hash function to be calculating in software.
> 
> The code looks very much like a simple 32bit CRC.
> It isn't entirely clears exactly where the 'key' gets included,
> but I suspect it is just xored with the data bytes.
> 
> Using in it hardware is probably fine - the hardware can do
> it cheaply (in dedicated logic) as the frame arrives.
> The CRC polynomial probably collapses to a few XOR operations
> when done byte by byte (the hdlc crc16 collapses to 3 levels
> of xor).
> 
> IIRC jhash() works on 32bit quantities - so has far fewer
> maths operations and well as not having all the random data
> accesses (cache misses and displacing other parts of the
> working set from the cache).

Anyway, I really hope Toeplitz hash is not restricted to 0-255 range

Tom implementation is not Toeplitz, but a degenerated hash.

^ permalink raw reply

* [PATCH v4 net-next 14/27] bonding: rework bond_ab_arp_probe() to use bond_for_each_slave()
From: Veaceslav Falico @ 2013-09-24 11:46 UTC (permalink / raw)
  To: netdev; +Cc: jiri, Veaceslav Falico, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>

Currently it uses the hard-to-rcuify bond_for_each_slave_from(), and also
it doesn't check every slave for disrepencies between the actual
IS_UP(slave) and the slave->link == BOND_LINK_UP, but only till we find the
next suitable slave.

Fix this by using bond_for_each_slave() and storing the first good slave in
*before till we find the current_arp_slave, after that we store the first good
slave in new_slave. If new_slave is empty - use the slave stored in before,
and if it's also empty - then we didn't find any suitable slave.

Also, in the meanwhile, check for each slave status.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---

Notes:
    v3  -> v4:
    No change.
    
    v2  -> v3:
    No change.
    
    v1  -> v2:
    No changes.
    
    RFC -> v1:
    New patch.

 drivers/net/bonding/bond_main.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6abbfac..3c96b1b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2761,8 +2761,9 @@ do_failover:
  */
 static void bond_ab_arp_probe(struct bonding *bond)
 {
-	struct slave *slave, *next_slave;
-	int i;
+	struct slave *slave, *before = NULL, *new_slave = NULL;
+	struct list_head *iter;
+	bool found = false;
 
 	read_lock(&bond->curr_slave_lock);
 
@@ -2792,18 +2793,12 @@ static void bond_ab_arp_probe(struct bonding *bond)
 
 	bond_set_slave_inactive_flags(bond->current_arp_slave);
 
-	/* search for next candidate */
-	next_slave = bond_next_slave(bond, bond->current_arp_slave);
-	bond_for_each_slave_from(bond, slave, i, next_slave) {
-		if (IS_UP(slave->dev)) {
-			slave->link = BOND_LINK_BACK;
-			bond_set_slave_active_flags(slave);
-			bond_arp_send_all(bond, slave);
-			slave->jiffies = jiffies;
-			bond->current_arp_slave = slave;
-			break;
-		}
+	bond_for_each_slave(bond, slave, iter) {
+		if (!found && !before && IS_UP(slave->dev))
+			before = slave;
 
+		if (found && !new_slave && IS_UP(slave->dev))
+			new_slave = slave;
 		/* if the link state is up at this point, we
 		 * mark it down - this can happen if we have
 		 * simultaneous link failures and
@@ -2811,7 +2806,7 @@ static void bond_ab_arp_probe(struct bonding *bond)
 		 * one the current slave so it is still marked
 		 * up when it is actually down
 		 */
-		if (slave->link == BOND_LINK_UP) {
+		if (!IS_UP(slave->dev) && slave->link == BOND_LINK_UP) {
 			slave->link = BOND_LINK_DOWN;
 			if (slave->link_failure_count < UINT_MAX)
 				slave->link_failure_count++;
@@ -2821,7 +2816,22 @@ static void bond_ab_arp_probe(struct bonding *bond)
 			pr_info("%s: backup interface %s is now down.\n",
 				bond->dev->name, slave->dev->name);
 		}
+		if (slave == bond->current_arp_slave)
+			found = true;
 	}
+
+	if (!new_slave && before)
+		new_slave = before;
+
+	if (!new_slave)
+		return;
+
+	new_slave->link = BOND_LINK_BACK;
+	bond_set_slave_active_flags(new_slave);
+	bond_arp_send_all(bond, new_slave);
+	new_slave->jiffies = jiffies;
+	bond->current_arp_slave = new_slave;
+
 }
 
 void bond_activebackup_arp_mon(struct work_struct *work)
-- 
1.8.4

^ permalink raw reply related

* [PATCH v4 net-next 27/27] net: create sysfs symlinks for neighbour devices
From: Veaceslav Falico @ 2013-09-24 11:47 UTC (permalink / raw)
  To: netdev
  Cc: jiri, Veaceslav Falico, Jay Vosburgh, Andy Gospodarek,
	David S. Miller, Eric Dumazet, Alexander Duyck
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>

Also, remove the same functionality from bonding - it will be already done
for any device that links to its lower/upper neighbour.

The links will be created for dev's kobject, and will look like
lower_eth0 for lower device eth0 and upper_bridge0 for upper device
bridge0.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---

Notes:
    v3  -> v4:
    Convert the logic to the new functions, which don't have the bool upper
    argument.
    
    v2  -> v3:
    No change.
    
    v1  -> v2:
    No changes.
    
    RFC -> v1:
    Rename lower devices from slave_eth0 to lower_eth0, so that it meets
    the lower/upper naming. I've searched for any active users of bonding's
    slave_eth0, and seems that nobody's actively using it or relying on it.

 drivers/net/bonding/bond_main.c  | 11 +----------
 drivers/net/bonding/bond_sysfs.c | 21 ---------------------
 drivers/net/bonding/bonding.h    |  2 --
 net/core/dev.c                   | 35 ++++++++++++++++++++++++++++++++++-
 4 files changed, 35 insertions(+), 34 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d494045..d5c3153 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1612,15 +1612,11 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 
 	read_unlock(&bond->lock);
 
-	res = bond_create_slave_symlinks(bond_dev, slave_dev);
-	if (res)
-		goto err_detach;
-
 	res = netdev_rx_handler_register(slave_dev, bond_handle_frame,
 					 new_slave);
 	if (res) {
 		pr_debug("Error %d calling netdev_rx_handler_register\n", res);
-		goto err_dest_symlinks;
+		goto err_detach;
 	}
 
 	res = bond_master_upper_dev_link(bond_dev, slave_dev, new_slave);
@@ -1642,9 +1638,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 err_unregister:
 	netdev_rx_handler_unregister(slave_dev);
 
-err_dest_symlinks:
-	bond_destroy_slave_symlinks(bond_dev, slave_dev);
-
 err_detach:
 	if (!USES_PRIMARY(bond->params.mode))
 		bond_hw_addr_flush(bond_dev, slave_dev);
@@ -1842,8 +1835,6 @@ static int __bond_release_one(struct net_device *bond_dev,
 			bond_dev->name, slave_dev->name, bond_dev->name);
 
 	/* must do this from outside any spinlocks */
-	bond_destroy_slave_symlinks(bond_dev, slave_dev);
-
 	vlan_vids_del_by_dev(slave_dev, bond_dev);
 
 	/* If the mode USES_PRIMARY, then this cases was handled above by
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 1c57246..e06c644 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -168,27 +168,6 @@ static const struct class_attribute class_attr_bonding_masters = {
 	.namespace = bonding_namespace,
 };
 
-int bond_create_slave_symlinks(struct net_device *master,
-			       struct net_device *slave)
-{
-	char linkname[IFNAMSIZ+7];
-
-	/* create a link from the master to the slave */
-	sprintf(linkname, "slave_%s", slave->name);
-	return sysfs_create_link(&(master->dev.kobj), &(slave->dev.kobj),
-				 linkname);
-}
-
-void bond_destroy_slave_symlinks(struct net_device *master,
-				 struct net_device *slave)
-{
-	char linkname[IFNAMSIZ+7];
-
-	sprintf(linkname, "slave_%s", slave->name);
-	sysfs_remove_link(&(master->dev.kobj), linkname);
-}
-
-
 /*
  * Show the slaves in the current bond.
  */
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 18116ec..1fd37c8 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -437,8 +437,6 @@ int bond_create(struct net *net, const char *name);
 int bond_create_sysfs(struct bond_net *net);
 void bond_destroy_sysfs(struct bond_net *net);
 void bond_prepare_sysfs_group(struct bonding *bond);
-int bond_create_slave_symlinks(struct net_device *master, struct net_device *slave);
-void bond_destroy_slave_symlinks(struct net_device *master, struct net_device *slave);
 int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev);
 int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
 void bond_mii_monitor(struct work_struct *);
diff --git a/net/core/dev.c b/net/core/dev.c
index de443ee..25ab6fe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4584,6 +4584,7 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
 					void *private, bool master)
 {
 	struct netdev_adjacent *adj;
+	char linkname[IFNAMSIZ+7];
 	int ret;
 
 	adj = __netdev_find_adj(dev, adj_dev, dev_list);
@@ -4606,12 +4607,26 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
 	pr_debug("dev_hold for %s, because of link added from %s to %s\n",
 		 adj_dev->name, dev->name, adj_dev->name);
 
+	if (dev_list == &dev->adj_list.lower) {
+		sprintf(linkname, "lower_%s", adj_dev->name);
+		ret = sysfs_create_link(&(dev->dev.kobj),
+					&(adj_dev->dev.kobj), linkname);
+		if (ret)
+			goto free_adj;
+	} else if (dev_list == &dev->adj_list.upper) {
+		sprintf(linkname, "upper_%s", adj_dev->name);
+		ret = sysfs_create_link(&(dev->dev.kobj),
+					&(adj_dev->dev.kobj), linkname);
+		if (ret)
+			goto free_adj;
+	}
+
 	/* Ensure that master link is always the first item in list. */
 	if (master) {
 		ret = sysfs_create_link(&(dev->dev.kobj),
 					&(adj_dev->dev.kobj), "master");
 		if (ret)
-			goto free_adj;
+			goto remove_symlinks;
 
 		list_add_rcu(&adj->list, dev_list);
 	} else {
@@ -4620,6 +4635,15 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
 
 	return 0;
 
+remove_symlinks:
+	if (dev_list == &dev->adj_list.lower) {
+		sprintf(linkname, "lower_%s", adj_dev->name);
+		sysfs_remove_link(&(dev->dev.kobj), linkname);
+	} else if (dev_list == &dev->adj_list.upper) {
+		sprintf(linkname, "upper_%s", adj_dev->name);
+		sysfs_remove_link(&(dev->dev.kobj), linkname);
+	}
+
 free_adj:
 	kfree(adj);
 
@@ -4631,6 +4655,7 @@ void __netdev_adjacent_dev_remove(struct net_device *dev,
 				  struct list_head *dev_list)
 {
 	struct netdev_adjacent *adj;
+	char linkname[IFNAMSIZ+7];
 
 	adj = __netdev_find_adj(dev, adj_dev, dev_list);
 
@@ -4650,6 +4675,14 @@ void __netdev_adjacent_dev_remove(struct net_device *dev,
 	if (adj->master)
 		sysfs_remove_link(&(dev->dev.kobj), "master");
 
+	if (dev_list == &dev->adj_list.lower) {
+		sprintf(linkname, "lower_%s", adj_dev->name);
+		sysfs_remove_link(&(dev->dev.kobj), linkname);
+	} else if (dev_list == &dev->adj_list.upper) {
+		sprintf(linkname, "upper_%s", adj_dev->name);
+		sysfs_remove_link(&(dev->dev.kobj), linkname);
+	}
+
 	list_del_rcu(&adj->list);
 	pr_debug("dev_put for %s, because link removed from %s to %s\n",
 		 adj_dev->name, dev->name, adj_dev->name);
-- 
1.8.4

^ permalink raw reply related

* [PATCH v4 net-next 26/27] net: expose the master link to sysfs, and remove it from bond
From: Veaceslav Falico @ 2013-09-24 11:47 UTC (permalink / raw)
  To: netdev
  Cc: jiri, Veaceslav Falico, Jay Vosburgh, Andy Gospodarek,
	David S. Miller, Eric Dumazet, Alexander Duyck
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>

Currently, we can have only one master upper neighbour, so it would be
useful to create a symlink to it in the sysfs device directory, the way
that bonding now does it, for every device. Lower devices from
bridge/team/etc will automagically get it, so we could rely on it.

Also, remove the same functionality from bonding.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---

Notes:
    v3  -> v4:
    Convert the logic to the new functions, which don't have the bool upper
    argument.
    
    v2  -> v3:
    No change.
    
    v1  -> v2:
    No changes.
    
    RFC -> v1:
    No changes.

 drivers/net/bonding/bond_sysfs.c | 20 +++-----------------
 net/core/dev.c                   | 19 +++++++++++++++++--
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 04d95d6..1c57246 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -172,24 +172,11 @@ int bond_create_slave_symlinks(struct net_device *master,
 			       struct net_device *slave)
 {
 	char linkname[IFNAMSIZ+7];
-	int ret = 0;
 
-	/* first, create a link from the slave back to the master */
-	ret = sysfs_create_link(&(slave->dev.kobj), &(master->dev.kobj),
-				"master");
-	if (ret)
-		return ret;
-	/* next, create a link from the master to the slave */
+	/* create a link from the master to the slave */
 	sprintf(linkname, "slave_%s", slave->name);
-	ret = sysfs_create_link(&(master->dev.kobj), &(slave->dev.kobj),
-				linkname);
-
-	/* free the master link created earlier in case of error */
-	if (ret)
-		sysfs_remove_link(&(slave->dev.kobj), "master");
-
-	return ret;
-
+	return sysfs_create_link(&(master->dev.kobj), &(slave->dev.kobj),
+				 linkname);
 }
 
 void bond_destroy_slave_symlinks(struct net_device *master,
@@ -197,7 +184,6 @@ void bond_destroy_slave_symlinks(struct net_device *master,
 {
 	char linkname[IFNAMSIZ+7];
 
-	sysfs_remove_link(&(slave->dev.kobj), "master");
 	sprintf(linkname, "slave_%s", slave->name);
 	sysfs_remove_link(&(master->dev.kobj), linkname);
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index acc1181..de443ee 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4584,6 +4584,7 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
 					void *private, bool master)
 {
 	struct netdev_adjacent *adj;
+	int ret;
 
 	adj = __netdev_find_adj(dev, adj_dev, dev_list);
 
@@ -4606,12 +4607,23 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
 		 adj_dev->name, dev->name, adj_dev->name);
 
 	/* Ensure that master link is always the first item in list. */
-	if (master)
+	if (master) {
+		ret = sysfs_create_link(&(dev->dev.kobj),
+					&(adj_dev->dev.kobj), "master");
+		if (ret)
+			goto free_adj;
+
 		list_add_rcu(&adj->list, dev_list);
-	else
+	} else {
 		list_add_tail_rcu(&adj->list, dev_list);
+	}
 
 	return 0;
+
+free_adj:
+	kfree(adj);
+
+	return ret;
 }
 
 void __netdev_adjacent_dev_remove(struct net_device *dev,
@@ -4635,6 +4647,9 @@ void __netdev_adjacent_dev_remove(struct net_device *dev,
 		return;
 	}
 
+	if (adj->master)
+		sysfs_remove_link(&(dev->dev.kobj), "master");
+
 	list_del_rcu(&adj->list);
 	pr_debug("dev_put for %s, because link removed from %s to %s\n",
 		 adj_dev->name, dev->name, adj_dev->name);
-- 
1.8.4

^ permalink raw reply related

* [PATCH v4 net-next 25/27] vlan: unlink the upper neighbour before unregistering
From: Veaceslav Falico @ 2013-09-24 11:47 UTC (permalink / raw)
  To: netdev; +Cc: jiri, Veaceslav Falico, Patrick McHardy, David S. Miller
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>

On netdev unregister we're removing also all of its sysfs-associated stuff,
including the sysfs symlinks that are controlled by netdev neighbour code.
Also, it's a subtle race condition - cause we can still access it after
unregistering.

Move the unlinking right before the unregistering to fix both.

CC: Patrick McHardy <kaber@trash.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---

Notes:
    v3  -> v4:
    No change.

    v2  -> v3:
    No change.

    v1  -> v2:
    No changes.

    RFC -> v2:
    New patch.

 net/8021q/vlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 69b4a35..b3d17d1 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -98,14 +98,14 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
 		vlan_gvrp_request_leave(dev);

 	vlan_group_set_device(grp, vlan->vlan_proto, vlan_id, NULL);
+
+	netdev_upper_dev_unlink(real_dev, dev);
 	/* Because unregister_netdevice_queue() makes sure at least one rcu
 	 * grace period is respected before device freeing,
 	 * we dont need to call synchronize_net() here.
 	 */
 	unregister_netdevice_queue(dev, head);

-	netdev_upper_dev_unlink(real_dev, dev);
-
 	if (grp->nr_vlan_devs == 0) {
 		vlan_mvrp_uninit_applicant(real_dev);
 		vlan_gvrp_uninit_applicant(real_dev);
-- 
1.8.4

^ permalink raw reply related

* [PATCH v4 net-next 23/27] bonding: remove slave lists
From: Veaceslav Falico @ 2013-09-24 11:47 UTC (permalink / raw)
  To: netdev; +Cc: jiri, Veaceslav Falico, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>

And all the initialization.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---

Notes:
    v3  -> v4:
    No change.
    
    v2  -> v3:
    No change.
    
    RFC -> v2:
    New patch.

 drivers/net/bonding/bond_main.c | 4 ----
 drivers/net/bonding/bonding.h   | 2 --
 2 files changed, 6 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6aa345a..d494045 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -972,7 +972,6 @@ void bond_select_active_slave(struct bonding *bond)
  */
 static void bond_attach_slave(struct bonding *bond, struct slave *new_slave)
 {
-	list_add_tail_rcu(&new_slave->list, &bond->slave_list);
 	bond->slave_cnt++;
 }
 
@@ -988,7 +987,6 @@ static void bond_attach_slave(struct bonding *bond, struct slave *new_slave)
  */
 static void bond_detach_slave(struct bonding *bond, struct slave *slave)
 {
-	list_del_rcu(&slave->list);
 	bond->slave_cnt--;
 }
 
@@ -1374,7 +1372,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		res = -ENOMEM;
 		goto err_undo_flags;
 	}
-	INIT_LIST_HEAD(&new_slave->list);
 	/*
 	 * Set the new_slave's queue_id to be zero.  Queue ID mapping
 	 * is set via sysfs or module option if desired.
@@ -4022,7 +4019,6 @@ static void bond_setup(struct net_device *bond_dev)
 	/* initialize rwlocks */
 	rwlock_init(&bond->lock);
 	rwlock_init(&bond->curr_slave_lock);
-	INIT_LIST_HEAD(&bond->slave_list);
 	bond->params = bonding_defaults;
 
 	/* Initialize pointers */
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 1b68cec..18116ec 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -165,7 +165,6 @@ struct bond_parm_tbl {
 
 struct slave {
 	struct net_device *dev; /* first - useful for panic debug */
-	struct list_head list;
 	struct bonding *bond; /* our master */
 	int    delay;
 	unsigned long jiffies;
@@ -205,7 +204,6 @@ struct slave {
  */
 struct bonding {
 	struct   net_device *dev; /* first - useful for panic debug */
-	struct   list_head slave_list;
 	struct   slave *curr_active_slave;
 	struct   slave *current_arp_slave;
 	struct   slave *primary_slave;
-- 
1.8.4

^ permalink raw reply related

* [PATCH v4 net-next 24/27] vlan: link the upper neighbour only after registering
From: Veaceslav Falico @ 2013-09-24 11:47 UTC (permalink / raw)
  To: netdev; +Cc: jiri, Veaceslav Falico, Patrick McHardy, David S. Miller
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>

Otherwise users might access it without being fully registered, as per
sysfs - it only inits in register_netdevice(), so is unusable till it is
called.

CC: Patrick McHardy <kaber@trash.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---

Notes:
    v3  -> v4:
    No change.
    
    v2  -> v3:
    No change.
    
    v1  -> v2:
    No changes.
    
    RFC -> v1:
    No changes.

 net/8021q/vlan.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 61fc573..69b4a35 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -169,13 +169,13 @@ int register_vlan_dev(struct net_device *dev)
 	if (err < 0)
 		goto out_uninit_mvrp;
 
-	err = netdev_upper_dev_link(real_dev, dev);
-	if (err)
-		goto out_uninit_mvrp;
-
 	err = register_netdevice(dev);
 	if (err < 0)
-		goto out_upper_dev_unlink;
+		goto out_uninit_mvrp;
+
+	err = netdev_upper_dev_link(real_dev, dev);
+	if (err)
+		goto out_unregister_netdev;
 
 	/* Account for reference in struct vlan_dev_priv */
 	dev_hold(real_dev);
@@ -191,8 +191,8 @@ int register_vlan_dev(struct net_device *dev)
 
 	return 0;
 
-out_upper_dev_unlink:
-	netdev_upper_dev_unlink(real_dev, dev);
+out_unregister_netdev:
+	unregister_netdevice(dev);
 out_uninit_mvrp:
 	if (grp->nr_vlan_devs == 0)
 		vlan_mvrp_uninit_applicant(real_dev);
-- 
1.8.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox