* Re: [PATCH v3 -next 2/2] tcp: syncookies: reduce mss table to four values
From: David Miller @ 2013-09-24 14:40 UTC (permalink / raw)
To: fw; +Cc: netdev
In-Reply-To: <1379709176-1625-2-git-send-email-fw@strlen.de>
From: Florian Westphal <fw@strlen.de>
Date: Fri, 20 Sep 2013 22:32:56 +0200
> Halve mss table size to make blind cookie guessing more difficult.
> This is sad since the tables were already small, but there
> is little alternative except perhaps adding more precise mss information
> in the tcp timestamp. Timestamps are unfortunately not ubiquitous.
>
> Guessing all possible cookie values still has 8-in 2**32 chance.
>
> Reported-by: Jakob Lell <jakob@jakoblell.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
Applied.
Thanks for following up on all of my feedback.
^ permalink raw reply
* Re: [PATCH v3 -next 1/2] tcp: syncookies: reduce cookie lifetime to 128 seconds
From: David Miller @ 2013-09-24 14:40 UTC (permalink / raw)
To: fw; +Cc: netdev
In-Reply-To: <1379709176-1625-1-git-send-email-fw@strlen.de>
From: Florian Westphal <fw@strlen.de>
Date: Fri, 20 Sep 2013 22:32:55 +0200
> We currently accept cookies that were created less than 4 minutes ago
> (ie, cookies with counter delta 0-3). Combined with the 8 mss table
> values, this yields 32 possible values (out of 2**32) that will be valid.
>
> Reducing the lifetime to < 2 minutes halves the guessing chance while
> still providing a large enough period.
>
> While at it, get rid of jiffies value -- they overflow too quickly on
> 32 bit platforms.
>
> getnstimeofday is used to create a counter that increments every 64s.
> perf shows getnstimeofday cost is negible compared to sha_transform;
> normal tcp initial sequence number generation uses getnstimeofday, too.
>
> Reported-by: Jakob Lell <jakob@jakoblell.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
Applied.
^ permalink raw reply
* Re: [net-next PATCH 0/4] cpsw: support for control module register
From: David Miller @ 2013-09-24 14:34 UTC (permalink / raw)
To: mugunthanvnm
Cc: netdev, zonque, bcousson, tony, devicetree, linux-omap,
linux-arm-kernel
In-Reply-To: <1379704841-32693-1-git-send-email-mugunthanvnm@ti.com>
From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Sat, 21 Sep 2013 00:50:37 +0530
> This patch series adds the support for configuring GMII_SEL register
> of control module to select the phy mode type and also to configure
> the clock source for RMII phy mode whether to use internal clock or
> the external clock from the phy itself.
>
> Till now CPSW works as this configuration is done in U-Boot and carried
> over to the kernel. But during suspend/resume Control module tends to
> lose its configured value for GMII_SEL register in AM33xx PG1.0, so
> if CPSW is used in RMII or RGMII mode, on resume cpsw is not working
> as GMII_SEL register lost its configuration values.
>
> The initial version of the patch is done by Daniel Mack but as per
> Tony's comment he wants it as a seperate driver as it is done in USB
> control module. I have created a seperate driver for the same.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH 01/10] can: Remove extern from function prototypes
From: Marc Kleine-Budde @ 2013-09-24 14:22 UTC (permalink / raw)
To: David Miller; +Cc: joe, netdev, linux-kernel, wg, linux-can
In-Reply-To: <20130924.100209.1219862004963547693.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]
On 09/24/2013 04:02 PM, David Miller wrote:
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Date: Tue, 24 Sep 2013 09:37:12 +0200
>
>> On 09/24/2013 12:11 AM, Joe Perches wrote:
>>> There are a mix of function prototypes with and without extern
>>> in the kernel sources. Standardize on not using extern for
>>> function prototypes.
>>>
>>> Function prototypes don't need to be written with extern.
>>> extern is assumed by the compiler. Its use is as unnecessary as
>>> using auto to declare automatic/local variables in a block.
>>>
>>> Signed-off-by: Joe Perches <joe@perches.com>
>>
>> Thx, added to linux-can-next. The patch will be included in the next
>> pull request to David.
>
> Marc, I'm trying to just quickly apply these all into my tree directly.
Fine with me.
Tnx, Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
^ permalink raw reply
* Re: [PATCH] skge: fix invalid value passed to pci_unmap_sigle
From: David Miller @ 2013-09-24 14:18 UTC (permalink / raw)
To: mpatocka; +Cc: netdev, romieu, i.gnatenko.brain, stephen
In-Reply-To: <alpine.LRH.2.02.1309201352010.1763@file01.intranet.prod.int.rdu2.redhat.com>
From: Mikulas Patocka <mpatocka@redhat.com>
Date: Fri, 20 Sep 2013 13:53:22 -0400 (EDT)
> In my patch c194992cbe71c20bb3623a566af8d11b0bfaa721 I didn't fix the skge
Always refer to commits, not just by SHA ID, but also with the commit
header line text in parenthesis and double quotes, for this you'd say:
c194992cbe71c20bb3623a566af8d11b0bfaa721 ("skge: fix broken driver")
Using just the SHA ID is completely ambiguous, because the SHA ID will
be entirely different if this commit is added to a different tree, such
as -stable.
> bug correctly. The value of the new mapping (not old) was passed to
> pci_unmap_single.
>
> If we enable CONFIG_DMA_API_DEBUG, it results in this warning:
> WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:986 check_sync+0x4c4/0x580()
> skge 0000:02:07.0: DMA-API: device driver tries to sync DMA memory it has
> not allocated [device address=0x000000023a0096c0] [size=1536 bytes]
>
> This patch makes the skge driver pass the correct value to
> pci_unmap_single and fixes the warning. It copies the old descriptor to
> on-stack variable "ee" and unmaps it if mapping of the new descriptor
> succeeded.
>
> This patch should be backported to 3.11-stable.
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Reported-by: Francois Romieu <romieu@fr.zoreil.com>
> Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH ] net: raw: do not report ICMP redirects to user space
From: David Miller @ 2013-09-24 14:17 UTC (permalink / raw)
To: duanj.fnst; +Cc: netdev
In-Reply-To: <523C21A5.4050504@cn.fujitsu.com>
From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Fri, 20 Sep 2013 18:21:25 +0800
> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
>
> Redirect isn't an error condition, it should leave
> the error handler without touching the socket.
>
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Applied.
^ permalink raw reply
* Re: [PATCH ] net: udp: do not report ICMP redirects to user space
From: David Miller @ 2013-09-24 14:17 UTC (permalink / raw)
To: duanj.fnst; +Cc: netdev
In-Reply-To: <523C216C.90707@cn.fujitsu.com>
From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Fri, 20 Sep 2013 18:20:28 +0800
> From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
>
> Redirect isn't an error condition, it should leave
> the error handler without touching the socket.
>
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Applied.
^ permalink raw reply
* Re: [net 5/6] i40e: better return values
From: David Miller @ 2013-09-24 14:12 UTC (permalink / raw)
To: joe; +Cc: jeffrey.t.kirsher, jesse.brandeburg, netdev, gospo, sassmann
In-Reply-To: <1380022486.3575.74.camel@joe-AO722>
From: Joe Perches <joe@perches.com>
Date: Tue, 24 Sep 2013 04:34:46 -0700
> On Tue, 2013-09-24 at 02:45 -0700, Jeff Kirsher wrote:
>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> []
>> @@ -3339,9 +3345,7 @@ static u8 i40e_dcb_get_num_tc(struct i40e_dcbx_config *dcbcfg)
>> /* Traffic class index starts from zero so
>> * increment to return the actual count
>> */
>> - num_tc++;
>> -
>> - return num_tc;
>> + return num_tc++;
>
> Ick. post_increment problem.
>
> return ++num_tc;
>
> There's nothing wrong with the original code
> unless this is a bugfix which should be documented
> better than "better return values".
Agreed, this style of coding is asking for a bug.
If you want to return "num_tc PLUS ONE" just say that:
return num_tc + 1;
Why even use pre/post increment when the variable has no other
use than as a return value?
^ permalink raw reply
* Re: [PATCH 01/10] can: Remove extern from function prototypes
From: David Miller @ 2013-09-24 14:11 UTC (permalink / raw)
To: joe; +Cc: netdev, linux-kernel, wg, mkl, linux-can
In-Reply-To: <5570169a078375fa8662adeb2a7f24c1ae718bfb.1379974101.git.joe@perches.com>
Series applied, thanks Joe.
^ permalink raw reply
* Re: [PATCH 01/10] can: Remove extern from function prototypes
From: David Miller @ 2013-09-24 14:02 UTC (permalink / raw)
To: mkl; +Cc: joe, netdev, linux-kernel, wg, linux-can
In-Reply-To: <52414128.8030706@pengutronix.de>
From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Tue, 24 Sep 2013 09:37:12 +0200
> On 09/24/2013 12:11 AM, Joe Perches wrote:
>> There are a mix of function prototypes with and without extern
>> in the kernel sources. Standardize on not using extern for
>> function prototypes.
>>
>> Function prototypes don't need to be written with extern.
>> extern is assumed by the compiler. Its use is as unnecessary as
>> using auto to declare automatic/local variables in a block.
>>
>> Signed-off-by: Joe Perches <joe@perches.com>
>
> Thx, added to linux-can-next. The patch will be included in the next
> pull request to David.
Marc, I'm trying to just quickly apply these all into my tree directly.
Thanks.
^ permalink raw reply
* [PATCH net-next v3 2/2] ipv4: processing ancillary IP_TOS or IP_TTL
From: Francesco Fusco @ 2013-09-24 13:43 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <cover.1379944641.git.ffusco@redhat.com>
If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().
The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.
Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
---
v1->v2
- reworked the entire patch
- modified the ttl field in the struct inet_cork from __s16 to __u8:
0 means that the TTL is not specified
- the tos field in the struct inet_cork is still __s16:
-1 means tha the tos is not set
- modified the priority field in the struct inet_cork from __u32 to
char.
- introduced the get_rttos and get_rtconn_flags functions
v2->v3
- no code changes, rebase to net-next
include/net/inet_sock.h | 3 +++
include/net/ip.h | 11 +++++++++++
include/net/route.h | 1 +
net/ipv4/icmp.c | 5 +++++
net/ipv4/ip_output.c | 13 ++++++++++---
net/ipv4/ping.c | 4 +++-
net/ipv4/raw.c | 4 +++-
net/ipv4/udp.c | 4 +++-
8 files changed, 39 insertions(+), 6 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 636d203..f314177 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -103,6 +103,9 @@ struct inet_cork {
int length; /* Total length of all frames */
struct dst_entry *dst;
u8 tx_flags;
+ __u8 ttl;
+ __s16 tos;
+ char priority;
};
struct inet_cork_full {
diff --git a/include/net/ip.h b/include/net/ip.h
index 0135f38..77b4f9b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -28,6 +28,7 @@
#include <linux/skbuff.h>
#include <net/inet_sock.h>
+#include <net/route.h>
#include <net/snmp.h>
#include <net/flow.h>
@@ -140,6 +141,16 @@ static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4)
return __ip_make_skb(sk, fl4, &sk->sk_write_queue, &inet_sk(sk)->cork.base);
}
+static inline __u8 get_rttos(struct ipcm_cookie* ipc, struct inet_sock *inet)
+{
+ return (ipc->tos != -1) ? RT_TOS(ipc->tos) : RT_TOS(inet->tos);
+}
+
+static inline __u8 get_rtconn_flags(struct ipcm_cookie* ipc, struct sock* sk)
+{
+ return (ipc->tos != -1) ? RT_CONN_FLAGS_TOS(sk, ipc->tos) : RT_CONN_FLAGS(sk);
+}
+
/* datagram.c */
int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
diff --git a/include/net/route.h b/include/net/route.h
index 6f572ca..0ad8e01 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -39,6 +39,7 @@
#define RTO_ONLINK 0x01
#define RT_CONN_FLAGS(sk) (RT_TOS(inet_sk(sk)->tos) | sock_flag(sk, SOCK_LOCALROUTE))
+#define RT_CONN_FLAGS_TOS(sk,tos) (RT_TOS(tos) | sock_flag(sk, SOCK_LOCALROUTE))
struct fib_nh;
struct fib_info;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 5f7d11a..5c0e8bc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -353,6 +353,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
saddr = fib_compute_spec_dst(skb);
ipc.opt = NULL;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
+
if (icmp_param->replyopts.opt.opt.optlen) {
ipc.opt = &icmp_param->replyopts.opt;
if (ipc.opt->opt.srr)
@@ -608,6 +611,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
ipc.addr = iph->saddr;
ipc.opt = &icmp_param->replyopts.opt;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos,
type, code, icmp_param);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index a04d872..7d8357b 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1060,6 +1060,9 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
rt->dst.dev->mtu : dst_mtu(&rt->dst);
cork->dst = &rt->dst;
cork->length = 0;
+ cork->ttl = ipc->ttl;
+ cork->tos = ipc->tos;
+ cork->priority = ipc->priority;
cork->tx_flags = ipc->tx_flags;
return 0;
@@ -1311,7 +1314,9 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
if (cork->flags & IPCORK_OPT)
opt = cork->opt;
- if (rt->rt_type == RTN_MULTICAST)
+ if (cork->ttl != 0)
+ ttl = cork->ttl;
+ else if (rt->rt_type == RTN_MULTICAST)
ttl = inet->mc_ttl;
else
ttl = ip_select_ttl(inet, &rt->dst);
@@ -1319,7 +1324,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
iph = ip_hdr(skb);
iph->version = 4;
iph->ihl = 5;
- iph->tos = inet->tos;
+ iph->tos = (cork->tos != -1) ? cork->tos : inet->tos;
iph->frag_off = df;
iph->ttl = ttl;
iph->protocol = sk->sk_protocol;
@@ -1331,7 +1336,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
ip_options_build(skb, opt, cork->addr, rt, 0);
}
- skb->priority = sk->sk_priority;
+ skb->priority = (cork->tos != -1) ? cork->priority: sk->sk_priority;
skb->mark = sk->sk_mark;
/*
* Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
@@ -1481,6 +1486,8 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
ipc.addr = daddr;
ipc.opt = NULL;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
if (replyopts.opt.opt.optlen) {
ipc.opt = &replyopts.opt;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index d7d9882..706d108e 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -713,6 +713,8 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.opt = NULL;
ipc.oif = sk->sk_bound_dev_if;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
sock_tx_timestamp(sk, &ipc.tx_flags);
@@ -744,7 +746,7 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
return -EINVAL;
faddr = ipc.opt->opt.faddr;
}
- tos = RT_TOS(inet->tos);
+ tos = get_rttos(&ipc, inet);
if (sock_flag(sk, SOCK_LOCALROUTE) ||
(msg->msg_flags & MSG_DONTROUTE) ||
(ipc.opt && ipc.opt->opt.is_strictroute)) {
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index bfec521..a3fe534 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -517,6 +517,8 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.addr = inet->inet_saddr;
ipc.opt = NULL;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
ipc.oif = sk->sk_bound_dev_if;
if (msg->msg_controllen) {
@@ -556,7 +558,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
daddr = ipc.opt->opt.faddr;
}
}
- tos = RT_CONN_FLAGS(sk);
+ tos = get_rtconn_flags(&ipc, sk);
if (msg->msg_flags & MSG_DONTROUTE)
tos |= RTO_ONLINK;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 74d2c95..22462d94 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -855,6 +855,8 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.opt = NULL;
ipc.tx_flags = 0;
+ ipc.ttl = 0;
+ ipc.tos = -1;
getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag;
@@ -938,7 +940,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
faddr = ipc.opt->opt.faddr;
connected = 0;
}
- tos = RT_TOS(inet->tos);
+ tos = get_rttos(&ipc, inet);
if (sock_flag(sk, SOCK_LOCALROUTE) ||
(msg->msg_flags & MSG_DONTROUTE) ||
(ipc.opt && ipc.opt->opt.is_strictroute)) {
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next v3 1/2] ipv4: IP_TOS and IP_TTL can be specified as ancillary data
From: Francesco Fusco @ 2013-09-24 13:43 UTC (permalink / raw)
To: davem; +Cc: netdev
In-Reply-To: <cover.1379944641.git.ffusco@redhat.com>
This patch enables the IP_TTL and IP_TOS values passed from userspace to
be stored in the ipcm_cookie struct. Three fields are added to the struct:
- the TTL, expressed as __u8.
The allowed values are in the [1-255].
A value of 0 means that the TTL is not specified.
- the TOS, expressed as __s16.
The allowed values are in the range [0,255].
A value of -1 means that the TOS is not specified.
- the priority, expressed as a char and computed when
handling the ancillary data.
Signed-off-by: Francesco Fusco <ffusco@redhat.com>
---
v1->v2
- changed the icmp_cookie ttl field from __s16 to __u8.
A value of 0 means that the TTL has not been specified
- to tos field is still __s16. The user can specify
values in the range 0-255 included, therefore I use
a value of -1 as a flag saying that the value has
not been specified
- the priority it is now a char instead of a __u32,
which is the return type of rt_tos2priority
- improved commit message
v1->v2
- no code changes, rebase to net-next
include/net/ip.h | 3 +++
net/ipv4/ip_sockglue.c | 20 +++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index c1f192b..0135f38 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -56,6 +56,9 @@ struct ipcm_cookie {
int oif;
struct ip_options_rcu *opt;
__u8 tx_flags;
+ __u8 ttl;
+ __s16 tos;
+ char priority;
};
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index d9c4f11..56e3445 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -189,7 +189,7 @@ EXPORT_SYMBOL(ip_cmsg_recv);
int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
{
- int err;
+ int err, val;
struct cmsghdr *cmsg;
for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
@@ -215,6 +215,24 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc)
ipc->addr = info->ipi_spec_dst.s_addr;
break;
}
+ case IP_TTL:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+ return -EINVAL;
+ val = *(int *)CMSG_DATA(cmsg);
+ if (val < 1 || val > 255)
+ return -EINVAL;
+ ipc->ttl = val;
+ break;
+ case IP_TOS:
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
+ return -EINVAL;
+ val = *(int *)CMSG_DATA(cmsg);
+ if (val < 0 || val > 255)
+ return -EINVAL;
+ ipc->tos = val;
+ ipc->priority = rt_tos2priority(ipc->tos);
+ break;
+
default:
return -EINVAL;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next v3 0/2] ipv4: per-datagram IP_TOS and IP_TTL via sendmsg()
From: Francesco Fusco @ 2013-09-24 13:43 UTC (permalink / raw)
To: davem; +Cc: netdev
There is no way to set the IP_TOS field on a per-packet basis in IPv4, while
IPv6 has such a mechanism. Therefore one has to fall back to the setsockopt()
in case of IPv4.
Using the existing per-socket option is not convenient particularly in the
situations where multiple threads have to use the same socket data requiring
per-thread TOS values. In fact this would involve calling setsockopt() before
sendmsg() every time.
Francesco Fusco (2):
ipv4: IP_TOS and IP_TTL can be specified as ancillary data
ipv4: processing ancillary IP_TOS or IP_TTL
include/net/inet_sock.h | 3 +++
include/net/ip.h | 14 ++++++++++++++
include/net/route.h | 1 +
net/ipv4/icmp.c | 5 +++++
net/ipv4/ip_output.c | 13 ++++++++++---
net/ipv4/ip_sockglue.c | 20 +++++++++++++++++++-
net/ipv4/ping.c | 4 +++-
net/ipv4/raw.c | 4 +++-
net/ipv4/udp.c | 4 +++-
9 files changed, 61 insertions(+), 7 deletions(-)
--
1.8.3.1
^ permalink raw reply
* Re: [PATCH net 0/4] bridge: Fix problems around the PVID
From: Vlad Yasevich @ 2013-09-24 13:35 UTC (permalink / raw)
To: Toshiaki Makita
Cc: Toshiaki Makita, David Miller, netdev, Fernando Luis Vazquez Cao,
Patrick McHardy
In-Reply-To: <1380023107.3162.53.camel@ubuntu-vm-makita>
On 09/24/2013 07:45 AM, Toshiaki Makita wrote:
> On Mon, 2013-09-23 at 10:41 -0400, Vlad Yasevich wrote:
>> On 09/17/2013 04:12 AM, Toshiaki Makita wrote:
>>> On Mon, 2013-09-16 at 13:49 -0400, Vlad Yasevich wrote:
>>>> On 09/13/2013 08:06 AM, Toshiaki Makita wrote:
>>>>> On Thu, 2013-09-12 at 16:00 -0400, David Miller wrote:
>>>>>> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>>>>>> Date: Tue, 10 Sep 2013 19:27:54 +0900
>>>>>>
>>>>>>> There seem to be some undesirable behaviors related with PVID.
>>>>>>> 1. It has no effect assigning PVID to a port. PVID cannot be applied
>>>>>>> to any frame regardless of whether we set it or not.
>>>>>>> 2. FDB entries learned via frames applied PVID are registered with
>>>>>>> VID 0 rather than VID value of PVID.
>>>>>>> 3. We can set 0 or 4095 as a PVID that are not allowed in IEEE 802.1Q.
>>>>>>> This leads interoperational problems such as sending frames with VID
>>>>>>> 4095, which is not allowed in IEEE 802.1Q, and treating frames with VID
>>>>>>> 0 as they belong to VLAN 0, which is expected to be handled as they have
>>>>>>> no VID according to IEEE 802.1Q.
>>>>>>>
>>>>>>> Note: 2nd and 3rd problems are potential and not exposed unless 1st problem
>>>>>>> is fixed, because we cannot activate PVID due to it.
>>>>>>
>>>>>> Please work out the issues in patch #2 with Vlad and resubmit this
>>>>>> series.
>>>>>>
>>>>>> Thank you.
>>>>>
>>>>> I'm hovering between whether we should fix the issue by changing vlan 0
>>>>> interface behavior in 8021q module or enabling a bridge port to sending
>>>>> priority-tagged frames, or another better way.
>>>>>
>>>>> If you could comment it, I'd appreciate it :)
>>>>>
>>>>>
>>>>> BTW, I think what is discussed in patch #2 is another problem about
>>>>> handling priority-tags, and it exists without this patch set applied.
>>>>> It looks like that we should prepare another patch set than this to fix
>>>>> that problem.
>>>>>
>>>>> Should I include patches that fix the priority-tags problem in this
>>>>> patch set and resubmit them all together?
>>>>>
>>>>
>>>> I am thinking that we might need to do it in bridge and it looks like
>>>> the simplest way to do it is to have default priority regeneration table
>>>> (table 6-5 from 802.1Q doc).
>>>>
>>>> That way I think we would conform to the spec.
>>>>
>>>> -vlad
>>>
>>> Unfortunately I don't think the default priority regeneration table
>>> resolves the problem because IEEE 802.1Q says that a VLAN-aware bridge
>>> can transmit untagged or VLAN-tagged frames only (the end of section 7.5
>>> and 8.1.7).
>>>
>>> No mechanism to send priority-tagged frames is found as far as I can see
>>> the standard. I think the regenerated priority is used for outgoing PCP
>>> field only if egress policy is not untagged (i.e. transmitting as
>>> VLAN-tagged), and unused if untagged (Section 6.9.2 3rd/4th Paragraph).
>>>
>>> If we want to transmit priority-tagged frames from a bridge port, I
>>> think we need to implement a new (optional) feature that is above the
>>> standard, as I stated previously.
>>>
>>> How do you feel about adding a per-port policy that enables a bridge to
>>> send priority-tagged frames instead of untagged frames when egress
>>> policy for the port is untagged?
>>> With this change, we can transmit frames for a given vlan as either all
>>> untagged, all priority-tagged or all VLAN-tagged.
>>
>> That would work. What I am thinking is that we do it by special casing
>> the vid 0 egress policy specification. Let it be untagged by default
>> and if it is tagged, then we preserve the priority field and forward
>> it on.
>>
>> This keeps the API stable and doesn't require user/admin from knowing
>> exactly what happens. Default operation conforms to the spec and allows
>> simple change to make it backward-compatible.
>>
>> What do you think. I've done a simple prototype of this an it seems to
>> work with the VMs I am testing with.
>
> Are you saying that
> - by default, set the 0th bit of untagged_bitmap; and
> - if we unset the 0th bit and set the "vid"th bit, we transmit frames
> classified as belonging to VLAN "vid" as priority-tagged?
>
> If so, though it's attractive to keep current API, I'm worried about if
> it could be a bit confusing and not intuitive for kernel/iproute2
> developers that VID 0 has a special meaning only in the egress policy.
> Wouldn't it be better to adding a new member to struct net_port_vlans
> instead of using VID 0 of untagged_bitmap?
>
> Or are you saying that we use a new flag in struct net_port_vlans but
> use the BRIDGE_VLAN_INFO_UNTAGGED bit with VID 0 in netlink to set the
> flag?
>
> Even in that case, I'm afraid that it might be confusing for developers
> for the same reason. We are going to prohibit to specify VID with 0 (and
> 4095) in adding/deleting a FDB entry or a vlan filtering entry, but it
> would allow us to use VID 0 only when a vlan filtering entry is
> configured.
> I am thinking a new nlattr is a straightforward approach to configure
> it.
By making this an explicit attribute it makes vid 0 a special case for
any automatic tool that would provision such filtering. Seeing vid 0
would mean that these tools would have to know that this would have to
be translated to a different attribute instead of setting the policy
values.
How it is implemented internally in the kernel isn't as big of an issue.
We can do it as a separate flag or as part of existing policy.
-vlad
>
> Thanks,
>
> Toshiaki Makita
>
>>
>> -vlad
>>
>>>
>>> Thanks,
>>>
>>> Toshiaki Makita
>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Toshiaki Makita
>>>>>
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>
>
^ permalink raw reply
* [PATCH] net: net_secret should not depend on TCP
From: Eric Dumazet @ 2013-09-24 13:19 UTC (permalink / raw)
To: Hannes Frederic Sowa; +Cc: Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <20130924054532.GA24446@order.stressinduktion.org>
From: Eric Dumazet <edumazet@google.com>
A host might need net_secret[] and never open a single socket.
Problem added in commit aebda156a570782
("net: defer net_secret[] initialization")
Based on prior patch from Hannes Frederic Sowa.
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/secure_seq.h | 1 -
net/core/secure_seq.c | 27 ++++++++++++++++++++++++---
net/ipv4/af_inet.c | 4 +---
3 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index 6ca975b..c2e542b 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -3,7 +3,6 @@
#include <linux/types.h>
-extern void net_secret_init(void);
extern __u32 secure_ip_id(__be32 daddr);
extern __u32 secure_ipv6_id(const __be32 daddr[4]);
extern u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 6a2f13c..3f1ec15 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -10,11 +10,24 @@
#include <net/secure_seq.h>
-static u32 net_secret[MD5_MESSAGE_BYTES / 4] ____cacheline_aligned;
+#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
-void net_secret_init(void)
+static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned;
+
+static void net_secret_init(void)
{
- get_random_bytes(net_secret, sizeof(net_secret));
+ u32 tmp;
+ int i;
+
+ if (likely(net_secret[0]))
+ return;
+
+ for (i = NET_SECRET_SIZE; i > 0;) {
+ do {
+ get_random_bytes(&tmp, sizeof(tmp));
+ } while (!tmp);
+ cmpxchg(&net_secret[--i], 0, tmp);
+ }
}
#ifdef CONFIG_INET
@@ -42,6 +55,7 @@ __u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
u32 hash[MD5_DIGEST_WORDS];
u32 i;
+ net_secret_init();
memcpy(hash, saddr, 16);
for (i = 0; i < 4; i++)
secret[i] = net_secret[i] + (__force u32)daddr[i];
@@ -63,6 +77,7 @@ u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
u32 hash[MD5_DIGEST_WORDS];
u32 i;
+ net_secret_init();
memcpy(hash, saddr, 16);
for (i = 0; i < 4; i++)
secret[i] = net_secret[i] + (__force u32) daddr[i];
@@ -82,6 +97,7 @@ __u32 secure_ip_id(__be32 daddr)
{
u32 hash[MD5_DIGEST_WORDS];
+ net_secret_init();
hash[0] = (__force __u32) daddr;
hash[1] = net_secret[13];
hash[2] = net_secret[14];
@@ -96,6 +112,7 @@ __u32 secure_ipv6_id(const __be32 daddr[4])
{
__u32 hash[4];
+ net_secret_init();
memcpy(hash, daddr, 16);
md5_transform(hash, net_secret);
@@ -107,6 +124,7 @@ __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
{
u32 hash[MD5_DIGEST_WORDS];
+ net_secret_init();
hash[0] = (__force u32)saddr;
hash[1] = (__force u32)daddr;
hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
@@ -121,6 +139,7 @@ u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport)
{
u32 hash[MD5_DIGEST_WORDS];
+ net_secret_init();
hash[0] = (__force u32)saddr;
hash[1] = (__force u32)daddr;
hash[2] = (__force u32)dport ^ net_secret[14];
@@ -140,6 +159,7 @@ u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
u32 hash[MD5_DIGEST_WORDS];
u64 seq;
+ net_secret_init();
hash[0] = (__force u32)saddr;
hash[1] = (__force u32)daddr;
hash[2] = ((__force u16)sport << 16) + (__force u16)dport;
@@ -164,6 +184,7 @@ u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
u64 seq;
u32 i;
+ net_secret_init();
memcpy(hash, saddr, 16);
for (i = 0; i < 4; i++)
secret[i] = net_secret[i] + daddr[i];
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7a1874b..cfeb85c 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -263,10 +263,8 @@ void build_ehash_secret(void)
get_random_bytes(&rnd, sizeof(rnd));
} while (rnd == 0);
- if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0) {
+ if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0)
get_random_bytes(&ipv6_hash_secret, sizeof(ipv6_hash_secret));
- net_secret_init();
- }
}
EXPORT_SYMBOL(build_ehash_secret);
^ permalink raw reply related
* Re: [PATCH] ptp: add range check on n_samples
From: Richard Cochran @ 2013-09-24 12:50 UTC (permalink / raw)
To: Dong Zhu; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20130924070557.GA28795@zhudong.nay.redhat.com>
On Tue, Sep 24, 2013 at 03:05:57PM +0800, Dong Zhu wrote:
> From d4eb97e8d5def76d46167c91059147e2c7d33433 Mon Sep 17 00:00:00 2001
>
> When using PTP_SYS_OFFSET ioctl to measure the time offset between the
> PHC and system clock, we need to specify the number of measurements, the
> valid value of n_samples is between 1 to 25. If n_samples <= 0 or > 25
> it makes no sense, so this patch intends to add a range check.
The field, n_samples, is unsigned, so the check is not needed.
Thanks,
Richard
> Signed-off-by: Dong Zhu <bluezhudong@gmail.com>
> ---
> drivers/ptp/ptp_chardev.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
> index 34a0c60..4e85b23 100644
> --- a/drivers/ptp/ptp_chardev.c
> +++ b/drivers/ptp/ptp_chardev.c
> @@ -104,7 +104,8 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
> err = -EFAULT;
> break;
> }
> - if (sysoff->n_samples > PTP_MAX_SAMPLES) {
> + if (sysoff->n_samples <= 0 ||
> + sysoff->n_samples > PTP_MAX_SAMPLES) {
> err = -EINVAL;
> break;
> }
> --
> 1.7.11.7
>
> --
> Best Regards,
> Dong Zhu
^ permalink raw reply
* Re: [pchecks v1 2/4] Use raw cpu ops for calls that would trigger with checks
From: Eric Dumazet @ 2013-09-24 12:45 UTC (permalink / raw)
To: Ingo Molnar
Cc: Christoph Lameter, Tejun Heo, akpm, Steven Rostedt, linux-kernel,
Peter Zijlstra, netdev
In-Reply-To: <20130924073250.GD28538@gmail.com>
On Tue, 2013-09-24 at 09:32 +0200, Ingo Molnar wrote:
> (netdev Cc:-ed)
>
> * Christoph Lameter <cl@linux.com> wrote:
>
> > These location triggered during testing with KVM.
> >
> > These are fetches without preemption off where we judged that
> > to be more performance efficient or where other means of
> > providing synchronization (BH handling) are available.
>
> > Index: linux/include/net/snmp.h
> > ===================================================================
> > --- linux.orig/include/net/snmp.h 2013-09-12 13:26:29.216103951 -0500
> > +++ linux/include/net/snmp.h 2013-09-12 13:26:29.208104037 -0500
> > @@ -126,7 +126,7 @@ struct linux_xfrm_mib {
> > extern __typeof__(type) __percpu *name[SNMP_ARRAY_SZ]
> >
> > #define SNMP_INC_STATS_BH(mib, field) \
> > - __this_cpu_inc(mib[0]->mibs[field])
> > + raw_cpu_inc(mib[0]->mibs[field])
> >
> > #define SNMP_INC_STATS_USER(mib, field) \
> > this_cpu_inc(mib[0]->mibs[field])
> > @@ -141,7 +141,7 @@ struct linux_xfrm_mib {
> > this_cpu_dec(mib[0]->mibs[field])
> >
> > #define SNMP_ADD_STATS_BH(mib, field, addend) \
> > - __this_cpu_add(mib[0]->mibs[field], addend)
> > + raw_cpu_add(mib[0]->mibs[field], addend)
>
> Are the networking folks fine with allowing unafe operations of SNMP stats
> in preemptible sections, or should the kernel produce an optional warning
> message if CONFIG_PREEMPT_DEBUG=y and these ops are used in preemptible
> (non-bh, non-irq-handler, non-irqs-off, etc.) sections?
>
> RAW_SNMP_*_STATS() ops could be used to annotate those places where that
> kind of usage is safe.
I would rather not use RAW_ prefix in the macro, but add debugging
check to make sure we use _BH() variant in the right context.
BUG_ON(!in_softirq())
^ permalink raw reply
* SMSC 9303 support
From: Gary Thomas @ 2013-09-24 12:21 UTC (permalink / raw)
To: netdev
I need to support the SMSC9303 in an embedded system. I'm not
finding any [explicit] support for this device in the latest
mainline kernel. Did I miss something?
To be clear, the SMSC9303 is a 3-port managed ethernet switch
capable of supporting 802.1D/802.1Q directly. This switch is
driven by a single MAC via MII/RMII and exposes the other two
ports via physical PHYs. What I need it to do is behave like
two external, separate devices. I was thinking that what I need
to do is treat these as VLAN devices since the switch can manage
the routing.
Does this seem like a reasonable approach?
How do I "hook up" my normal ethernet driver to it? To the hardware
it just looks like any other MII/RMII PHY. The device is managed
separately via I2C. I can have that set up separately if necessary.
Thanks for any pointers/ideas
--
------------------------------------------------------------
Gary Thomas | Consulting for the
MLB Associates | Embedded world
------------------------------------------------------------
^ permalink raw reply
* RE: [PATCH 1/2] net: Toeplitz library functions
From: Eric Dumazet @ 2013-09-24 12:24 UTC (permalink / raw)
To: David Laight; +Cc: Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B7355@saturn3.aculab.com>
On Tue, 2013-09-24 at 09:32 +0100, David Laight wrote:
> > +static inline unsigned int
> > +toeplitz_hash(const unsigned char *bytes,
> > + struct toeplitz *toeplitz, int n)
> > +{
> > + int i;
> > + unsigned int result = 0;
> > +
> > + for (i = 0; i < n; i++)
> > + result ^= toeplitz->key_cache[i][bytes[i]];
> > +
> > + return result;
> > +};
>
> That is a horrid hash function to be calculating in software.
>
> The code looks very much like a simple 32bit CRC.
> It isn't entirely clears exactly where the 'key' gets included,
> but I suspect it is just xored with the data bytes.
>
> Using in it hardware is probably fine - the hardware can do
> it cheaply (in dedicated logic) as the frame arrives.
> The CRC polynomial probably collapses to a few XOR operations
> when done byte by byte (the hdlc crc16 collapses to 3 levels
> of xor).
>
> IIRC jhash() works on 32bit quantities - so has far fewer
> maths operations and well as not having all the random data
> accesses (cache misses and displacing other parts of the
> working set from the cache).
Anyway, I really hope Toeplitz hash is not restricted to 0-255 range
Tom implementation is not Toeplitz, but a degenerated hash.
^ permalink raw reply
* [PATCH v4 net-next 14/27] bonding: rework bond_ab_arp_probe() to use bond_for_each_slave()
From: Veaceslav Falico @ 2013-09-24 11:46 UTC (permalink / raw)
To: netdev; +Cc: jiri, Veaceslav Falico, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>
Currently it uses the hard-to-rcuify bond_for_each_slave_from(), and also
it doesn't check every slave for disrepencies between the actual
IS_UP(slave) and the slave->link == BOND_LINK_UP, but only till we find the
next suitable slave.
Fix this by using bond_for_each_slave() and storing the first good slave in
*before till we find the current_arp_slave, after that we store the first good
slave in new_slave. If new_slave is empty - use the slave stored in before,
and if it's also empty - then we didn't find any suitable slave.
Also, in the meanwhile, check for each slave status.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
Notes:
v3 -> v4:
No change.
v2 -> v3:
No change.
v1 -> v2:
No changes.
RFC -> v1:
New patch.
drivers/net/bonding/bond_main.c | 38 ++++++++++++++++++++++++--------------
1 file changed, 24 insertions(+), 14 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6abbfac..3c96b1b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2761,8 +2761,9 @@ do_failover:
*/
static void bond_ab_arp_probe(struct bonding *bond)
{
- struct slave *slave, *next_slave;
- int i;
+ struct slave *slave, *before = NULL, *new_slave = NULL;
+ struct list_head *iter;
+ bool found = false;
read_lock(&bond->curr_slave_lock);
@@ -2792,18 +2793,12 @@ static void bond_ab_arp_probe(struct bonding *bond)
bond_set_slave_inactive_flags(bond->current_arp_slave);
- /* search for next candidate */
- next_slave = bond_next_slave(bond, bond->current_arp_slave);
- bond_for_each_slave_from(bond, slave, i, next_slave) {
- if (IS_UP(slave->dev)) {
- slave->link = BOND_LINK_BACK;
- bond_set_slave_active_flags(slave);
- bond_arp_send_all(bond, slave);
- slave->jiffies = jiffies;
- bond->current_arp_slave = slave;
- break;
- }
+ bond_for_each_slave(bond, slave, iter) {
+ if (!found && !before && IS_UP(slave->dev))
+ before = slave;
+ if (found && !new_slave && IS_UP(slave->dev))
+ new_slave = slave;
/* if the link state is up at this point, we
* mark it down - this can happen if we have
* simultaneous link failures and
@@ -2811,7 +2806,7 @@ static void bond_ab_arp_probe(struct bonding *bond)
* one the current slave so it is still marked
* up when it is actually down
*/
- if (slave->link == BOND_LINK_UP) {
+ if (!IS_UP(slave->dev) && slave->link == BOND_LINK_UP) {
slave->link = BOND_LINK_DOWN;
if (slave->link_failure_count < UINT_MAX)
slave->link_failure_count++;
@@ -2821,7 +2816,22 @@ static void bond_ab_arp_probe(struct bonding *bond)
pr_info("%s: backup interface %s is now down.\n",
bond->dev->name, slave->dev->name);
}
+ if (slave == bond->current_arp_slave)
+ found = true;
}
+
+ if (!new_slave && before)
+ new_slave = before;
+
+ if (!new_slave)
+ return;
+
+ new_slave->link = BOND_LINK_BACK;
+ bond_set_slave_active_flags(new_slave);
+ bond_arp_send_all(bond, new_slave);
+ new_slave->jiffies = jiffies;
+ bond->current_arp_slave = new_slave;
+
}
void bond_activebackup_arp_mon(struct work_struct *work)
--
1.8.4
^ permalink raw reply related
* [PATCH v4 net-next 27/27] net: create sysfs symlinks for neighbour devices
From: Veaceslav Falico @ 2013-09-24 11:47 UTC (permalink / raw)
To: netdev
Cc: jiri, Veaceslav Falico, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Alexander Duyck
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>
Also, remove the same functionality from bonding - it will be already done
for any device that links to its lower/upper neighbour.
The links will be created for dev's kobject, and will look like
lower_eth0 for lower device eth0 and upper_bridge0 for upper device
bridge0.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
Notes:
v3 -> v4:
Convert the logic to the new functions, which don't have the bool upper
argument.
v2 -> v3:
No change.
v1 -> v2:
No changes.
RFC -> v1:
Rename lower devices from slave_eth0 to lower_eth0, so that it meets
the lower/upper naming. I've searched for any active users of bonding's
slave_eth0, and seems that nobody's actively using it or relying on it.
drivers/net/bonding/bond_main.c | 11 +----------
drivers/net/bonding/bond_sysfs.c | 21 ---------------------
drivers/net/bonding/bonding.h | 2 --
net/core/dev.c | 35 ++++++++++++++++++++++++++++++++++-
4 files changed, 35 insertions(+), 34 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d494045..d5c3153 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1612,15 +1612,11 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
read_unlock(&bond->lock);
- res = bond_create_slave_symlinks(bond_dev, slave_dev);
- if (res)
- goto err_detach;
-
res = netdev_rx_handler_register(slave_dev, bond_handle_frame,
new_slave);
if (res) {
pr_debug("Error %d calling netdev_rx_handler_register\n", res);
- goto err_dest_symlinks;
+ goto err_detach;
}
res = bond_master_upper_dev_link(bond_dev, slave_dev, new_slave);
@@ -1642,9 +1638,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
err_unregister:
netdev_rx_handler_unregister(slave_dev);
-err_dest_symlinks:
- bond_destroy_slave_symlinks(bond_dev, slave_dev);
-
err_detach:
if (!USES_PRIMARY(bond->params.mode))
bond_hw_addr_flush(bond_dev, slave_dev);
@@ -1842,8 +1835,6 @@ static int __bond_release_one(struct net_device *bond_dev,
bond_dev->name, slave_dev->name, bond_dev->name);
/* must do this from outside any spinlocks */
- bond_destroy_slave_symlinks(bond_dev, slave_dev);
-
vlan_vids_del_by_dev(slave_dev, bond_dev);
/* If the mode USES_PRIMARY, then this cases was handled above by
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 1c57246..e06c644 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -168,27 +168,6 @@ static const struct class_attribute class_attr_bonding_masters = {
.namespace = bonding_namespace,
};
-int bond_create_slave_symlinks(struct net_device *master,
- struct net_device *slave)
-{
- char linkname[IFNAMSIZ+7];
-
- /* create a link from the master to the slave */
- sprintf(linkname, "slave_%s", slave->name);
- return sysfs_create_link(&(master->dev.kobj), &(slave->dev.kobj),
- linkname);
-}
-
-void bond_destroy_slave_symlinks(struct net_device *master,
- struct net_device *slave)
-{
- char linkname[IFNAMSIZ+7];
-
- sprintf(linkname, "slave_%s", slave->name);
- sysfs_remove_link(&(master->dev.kobj), linkname);
-}
-
-
/*
* Show the slaves in the current bond.
*/
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 18116ec..1fd37c8 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -437,8 +437,6 @@ int bond_create(struct net *net, const char *name);
int bond_create_sysfs(struct bond_net *net);
void bond_destroy_sysfs(struct bond_net *net);
void bond_prepare_sysfs_group(struct bonding *bond);
-int bond_create_slave_symlinks(struct net_device *master, struct net_device *slave);
-void bond_destroy_slave_symlinks(struct net_device *master, struct net_device *slave);
int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev);
int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
void bond_mii_monitor(struct work_struct *);
diff --git a/net/core/dev.c b/net/core/dev.c
index de443ee..25ab6fe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4584,6 +4584,7 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
void *private, bool master)
{
struct netdev_adjacent *adj;
+ char linkname[IFNAMSIZ+7];
int ret;
adj = __netdev_find_adj(dev, adj_dev, dev_list);
@@ -4606,12 +4607,26 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
pr_debug("dev_hold for %s, because of link added from %s to %s\n",
adj_dev->name, dev->name, adj_dev->name);
+ if (dev_list == &dev->adj_list.lower) {
+ sprintf(linkname, "lower_%s", adj_dev->name);
+ ret = sysfs_create_link(&(dev->dev.kobj),
+ &(adj_dev->dev.kobj), linkname);
+ if (ret)
+ goto free_adj;
+ } else if (dev_list == &dev->adj_list.upper) {
+ sprintf(linkname, "upper_%s", adj_dev->name);
+ ret = sysfs_create_link(&(dev->dev.kobj),
+ &(adj_dev->dev.kobj), linkname);
+ if (ret)
+ goto free_adj;
+ }
+
/* Ensure that master link is always the first item in list. */
if (master) {
ret = sysfs_create_link(&(dev->dev.kobj),
&(adj_dev->dev.kobj), "master");
if (ret)
- goto free_adj;
+ goto remove_symlinks;
list_add_rcu(&adj->list, dev_list);
} else {
@@ -4620,6 +4635,15 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
return 0;
+remove_symlinks:
+ if (dev_list == &dev->adj_list.lower) {
+ sprintf(linkname, "lower_%s", adj_dev->name);
+ sysfs_remove_link(&(dev->dev.kobj), linkname);
+ } else if (dev_list == &dev->adj_list.upper) {
+ sprintf(linkname, "upper_%s", adj_dev->name);
+ sysfs_remove_link(&(dev->dev.kobj), linkname);
+ }
+
free_adj:
kfree(adj);
@@ -4631,6 +4655,7 @@ void __netdev_adjacent_dev_remove(struct net_device *dev,
struct list_head *dev_list)
{
struct netdev_adjacent *adj;
+ char linkname[IFNAMSIZ+7];
adj = __netdev_find_adj(dev, adj_dev, dev_list);
@@ -4650,6 +4675,14 @@ void __netdev_adjacent_dev_remove(struct net_device *dev,
if (adj->master)
sysfs_remove_link(&(dev->dev.kobj), "master");
+ if (dev_list == &dev->adj_list.lower) {
+ sprintf(linkname, "lower_%s", adj_dev->name);
+ sysfs_remove_link(&(dev->dev.kobj), linkname);
+ } else if (dev_list == &dev->adj_list.upper) {
+ sprintf(linkname, "upper_%s", adj_dev->name);
+ sysfs_remove_link(&(dev->dev.kobj), linkname);
+ }
+
list_del_rcu(&adj->list);
pr_debug("dev_put for %s, because link removed from %s to %s\n",
adj_dev->name, dev->name, adj_dev->name);
--
1.8.4
^ permalink raw reply related
* [PATCH v4 net-next 26/27] net: expose the master link to sysfs, and remove it from bond
From: Veaceslav Falico @ 2013-09-24 11:47 UTC (permalink / raw)
To: netdev
Cc: jiri, Veaceslav Falico, Jay Vosburgh, Andy Gospodarek,
David S. Miller, Eric Dumazet, Alexander Duyck
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>
Currently, we can have only one master upper neighbour, so it would be
useful to create a symlink to it in the sysfs device directory, the way
that bonding now does it, for every device. Lower devices from
bridge/team/etc will automagically get it, so we could rely on it.
Also, remove the same functionality from bonding.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
Notes:
v3 -> v4:
Convert the logic to the new functions, which don't have the bool upper
argument.
v2 -> v3:
No change.
v1 -> v2:
No changes.
RFC -> v1:
No changes.
drivers/net/bonding/bond_sysfs.c | 20 +++-----------------
net/core/dev.c | 19 +++++++++++++++++--
2 files changed, 20 insertions(+), 19 deletions(-)
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 04d95d6..1c57246 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -172,24 +172,11 @@ int bond_create_slave_symlinks(struct net_device *master,
struct net_device *slave)
{
char linkname[IFNAMSIZ+7];
- int ret = 0;
- /* first, create a link from the slave back to the master */
- ret = sysfs_create_link(&(slave->dev.kobj), &(master->dev.kobj),
- "master");
- if (ret)
- return ret;
- /* next, create a link from the master to the slave */
+ /* create a link from the master to the slave */
sprintf(linkname, "slave_%s", slave->name);
- ret = sysfs_create_link(&(master->dev.kobj), &(slave->dev.kobj),
- linkname);
-
- /* free the master link created earlier in case of error */
- if (ret)
- sysfs_remove_link(&(slave->dev.kobj), "master");
-
- return ret;
-
+ return sysfs_create_link(&(master->dev.kobj), &(slave->dev.kobj),
+ linkname);
}
void bond_destroy_slave_symlinks(struct net_device *master,
@@ -197,7 +184,6 @@ void bond_destroy_slave_symlinks(struct net_device *master,
{
char linkname[IFNAMSIZ+7];
- sysfs_remove_link(&(slave->dev.kobj), "master");
sprintf(linkname, "slave_%s", slave->name);
sysfs_remove_link(&(master->dev.kobj), linkname);
}
diff --git a/net/core/dev.c b/net/core/dev.c
index acc1181..de443ee 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4584,6 +4584,7 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
void *private, bool master)
{
struct netdev_adjacent *adj;
+ int ret;
adj = __netdev_find_adj(dev, adj_dev, dev_list);
@@ -4606,12 +4607,23 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
adj_dev->name, dev->name, adj_dev->name);
/* Ensure that master link is always the first item in list. */
- if (master)
+ if (master) {
+ ret = sysfs_create_link(&(dev->dev.kobj),
+ &(adj_dev->dev.kobj), "master");
+ if (ret)
+ goto free_adj;
+
list_add_rcu(&adj->list, dev_list);
- else
+ } else {
list_add_tail_rcu(&adj->list, dev_list);
+ }
return 0;
+
+free_adj:
+ kfree(adj);
+
+ return ret;
}
void __netdev_adjacent_dev_remove(struct net_device *dev,
@@ -4635,6 +4647,9 @@ void __netdev_adjacent_dev_remove(struct net_device *dev,
return;
}
+ if (adj->master)
+ sysfs_remove_link(&(dev->dev.kobj), "master");
+
list_del_rcu(&adj->list);
pr_debug("dev_put for %s, because link removed from %s to %s\n",
adj_dev->name, dev->name, adj_dev->name);
--
1.8.4
^ permalink raw reply related
* [PATCH v4 net-next 25/27] vlan: unlink the upper neighbour before unregistering
From: Veaceslav Falico @ 2013-09-24 11:47 UTC (permalink / raw)
To: netdev; +Cc: jiri, Veaceslav Falico, Patrick McHardy, David S. Miller
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>
On netdev unregister we're removing also all of its sysfs-associated stuff,
including the sysfs symlinks that are controlled by netdev neighbour code.
Also, it's a subtle race condition - cause we can still access it after
unregistering.
Move the unlinking right before the unregistering to fix both.
CC: Patrick McHardy <kaber@trash.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
Notes:
v3 -> v4:
No change.
v2 -> v3:
No change.
v1 -> v2:
No changes.
RFC -> v2:
New patch.
net/8021q/vlan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 69b4a35..b3d17d1 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -98,14 +98,14 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
vlan_gvrp_request_leave(dev);
vlan_group_set_device(grp, vlan->vlan_proto, vlan_id, NULL);
+
+ netdev_upper_dev_unlink(real_dev, dev);
/* Because unregister_netdevice_queue() makes sure at least one rcu
* grace period is respected before device freeing,
* we dont need to call synchronize_net() here.
*/
unregister_netdevice_queue(dev, head);
- netdev_upper_dev_unlink(real_dev, dev);
-
if (grp->nr_vlan_devs == 0) {
vlan_mvrp_uninit_applicant(real_dev);
vlan_gvrp_uninit_applicant(real_dev);
--
1.8.4
^ permalink raw reply related
* [PATCH v4 net-next 23/27] bonding: remove slave lists
From: Veaceslav Falico @ 2013-09-24 11:47 UTC (permalink / raw)
To: netdev; +Cc: jiri, Veaceslav Falico, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>
And all the initialization.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
Notes:
v3 -> v4:
No change.
v2 -> v3:
No change.
RFC -> v2:
New patch.
drivers/net/bonding/bond_main.c | 4 ----
drivers/net/bonding/bonding.h | 2 --
2 files changed, 6 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6aa345a..d494045 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -972,7 +972,6 @@ void bond_select_active_slave(struct bonding *bond)
*/
static void bond_attach_slave(struct bonding *bond, struct slave *new_slave)
{
- list_add_tail_rcu(&new_slave->list, &bond->slave_list);
bond->slave_cnt++;
}
@@ -988,7 +987,6 @@ static void bond_attach_slave(struct bonding *bond, struct slave *new_slave)
*/
static void bond_detach_slave(struct bonding *bond, struct slave *slave)
{
- list_del_rcu(&slave->list);
bond->slave_cnt--;
}
@@ -1374,7 +1372,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
res = -ENOMEM;
goto err_undo_flags;
}
- INIT_LIST_HEAD(&new_slave->list);
/*
* Set the new_slave's queue_id to be zero. Queue ID mapping
* is set via sysfs or module option if desired.
@@ -4022,7 +4019,6 @@ static void bond_setup(struct net_device *bond_dev)
/* initialize rwlocks */
rwlock_init(&bond->lock);
rwlock_init(&bond->curr_slave_lock);
- INIT_LIST_HEAD(&bond->slave_list);
bond->params = bonding_defaults;
/* Initialize pointers */
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 1b68cec..18116ec 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -165,7 +165,6 @@ struct bond_parm_tbl {
struct slave {
struct net_device *dev; /* first - useful for panic debug */
- struct list_head list;
struct bonding *bond; /* our master */
int delay;
unsigned long jiffies;
@@ -205,7 +204,6 @@ struct slave {
*/
struct bonding {
struct net_device *dev; /* first - useful for panic debug */
- struct list_head slave_list;
struct slave *curr_active_slave;
struct slave *current_arp_slave;
struct slave *primary_slave;
--
1.8.4
^ permalink raw reply related
* [PATCH v4 net-next 24/27] vlan: link the upper neighbour only after registering
From: Veaceslav Falico @ 2013-09-24 11:47 UTC (permalink / raw)
To: netdev; +Cc: jiri, Veaceslav Falico, Patrick McHardy, David S. Miller
In-Reply-To: <1380023227-9576-1-git-send-email-vfalico@redhat.com>
Otherwise users might access it without being fully registered, as per
sysfs - it only inits in register_netdevice(), so is unusable till it is
called.
CC: Patrick McHardy <kaber@trash.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
Notes:
v3 -> v4:
No change.
v2 -> v3:
No change.
v1 -> v2:
No changes.
RFC -> v1:
No changes.
net/8021q/vlan.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 61fc573..69b4a35 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -169,13 +169,13 @@ int register_vlan_dev(struct net_device *dev)
if (err < 0)
goto out_uninit_mvrp;
- err = netdev_upper_dev_link(real_dev, dev);
- if (err)
- goto out_uninit_mvrp;
-
err = register_netdevice(dev);
if (err < 0)
- goto out_upper_dev_unlink;
+ goto out_uninit_mvrp;
+
+ err = netdev_upper_dev_link(real_dev, dev);
+ if (err)
+ goto out_unregister_netdev;
/* Account for reference in struct vlan_dev_priv */
dev_hold(real_dev);
@@ -191,8 +191,8 @@ int register_vlan_dev(struct net_device *dev)
return 0;
-out_upper_dev_unlink:
- netdev_upper_dev_unlink(real_dev, dev);
+out_unregister_netdev:
+ unregister_netdevice(dev);
out_uninit_mvrp:
if (grp->nr_vlan_devs == 0)
vlan_mvrp_uninit_applicant(real_dev);
--
1.8.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox