Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] tipc: some small patches
From: Jon Maloy @ 2013-09-24  9:27 UTC (permalink / raw)
  To: davem
  Cc: netdev, Paul Gortmaker, erik.hugne, ying.xue, maloy,
	tipc-discussion, Jon Maloy

Some small and relatively straightforward patches that fix a number of
minor issues. The patches are functionally unrelated to each other.

Ying Xue (5):
  tipc: silence sparse warnings
  tipc: make bearer and media naming consistent
  tipc: avoid unnecessary lookup for tipc bearer instance
  tipc: correct return value of recv_msg routine
  tipc: correct return value of link_cmd_set_value routine

 net/tipc/bearer.c    |   18 ++++---------
 net/tipc/bearer.h    |   10 ++++----
 net/tipc/eth_media.c |   68 +++++++++++++++++++++++++-------------------------
 net/tipc/ib_media.c  |   58 +++++++++++++++++++++---------------------
 net/tipc/link.c      |   33 ++++++++++++++++--------
 net/tipc/msg.c       |    4 +--
 net/tipc/socket.c    |    6 ++---
 7 files changed, 100 insertions(+), 97 deletions(-)

-- 
1.7.9.5

^ permalink raw reply

* [PATCH net-next 1/5] tipc: silence sparse warnings
From: Jon Maloy @ 2013-09-24  9:27 UTC (permalink / raw)
  To: davem
  Cc: netdev, Paul Gortmaker, erik.hugne, ying.xue, maloy,
	tipc-discussion, Andreas Bofjäll, Jon Maloy
In-Reply-To: <1380014868-2797-1-git-send-email-jon.maloy@ericsson.com>

From: Ying Xue <ying.xue@windriver.com>

Eliminate below sparse warnings:

net/tipc/link.c:1210:37: warning: cast removes address space of expression
net/tipc/link.c:1218:59: warning: incorrect type in argument 2 (different address spaces)
net/tipc/link.c:1218:59:    expected void const [noderef] <asn:1>*from
net/tipc/link.c:1218:59:    got unsigned char const [usertype] *[assigned] sect_crs
net/tipc/msg.c:96:61: warning: incorrect type in argument 3 (different address spaces)
net/tipc/msg.c:96:61:    expected void const *from
net/tipc/msg.c:96:61:    got void [noderef] <asn:1>*const iov_base
net/tipc/socket.c:341:49: warning: Using plain integer as NULL pointer
net/tipc/socket.c:1371:36: warning: Using plain integer as NULL pointer
net/tipc/socket.c:1694:57: warning: Using plain integer as NULL pointer

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Andreas Bofjäll <andreas.bofjall@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/link.c   |    5 +++--
 net/tipc/msg.c    |    4 ++--
 net/tipc/socket.c |    6 +++---
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 0cc3d90..40521ae 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1165,7 +1165,7 @@ static int link_send_sections_long(struct tipc_port *sender,
 	struct tipc_msg fragm_hdr;
 	struct sk_buff *buf, *buf_chain, *prev;
 	u32 fragm_crs, fragm_rest, hsz, sect_rest;
-	const unchar *sect_crs;
+	const unchar __user *sect_crs;
 	int curr_sect;
 	u32 fragm_no;
 	int res = 0;
@@ -1207,7 +1207,8 @@ again:
 
 		if (!sect_rest) {
 			sect_rest = msg_sect[++curr_sect].iov_len;
-			sect_crs = (const unchar *)msg_sect[curr_sect].iov_base;
+			sect_crs =
+			  (const unchar __user *)msg_sect[curr_sect].iov_base;
 		}
 
 		if (sect_rest < fragm_rest)
diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index ced60e2..37cfb57 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -93,8 +93,8 @@ int tipc_msg_build(struct tipc_msg *hdr, struct iovec const *msg_sect,
 	skb_copy_to_linear_data(*buf, hdr, hsz);
 	for (res = 1, cnt = 0; res && (cnt < num_sect); cnt++) {
 		skb_copy_to_linear_data_offset(*buf, pos,
-					       msg_sect[cnt].iov_base,
-					       msg_sect[cnt].iov_len);
+			(const void __force *)msg_sect[cnt].iov_base,
+			msg_sect[cnt].iov_len);
 		pos += msg_sect[cnt].iov_len;
 	}
 	if (likely(res))
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 6cc7ddd..0ff921d 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -338,7 +338,7 @@ static int release(struct socket *sock)
 		buf = __skb_dequeue(&sk->sk_receive_queue);
 		if (buf == NULL)
 			break;
-		if (TIPC_SKB_CB(buf)->handle != 0)
+		if (TIPC_SKB_CB(buf)->handle != NULL)
 			kfree_skb(buf);
 		else {
 			if ((sock->state == SS_CONNECTING) ||
@@ -1368,7 +1368,7 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 		return TIPC_ERR_OVERLOAD;
 
 	/* Enqueue message */
-	TIPC_SKB_CB(buf)->handle = 0;
+	TIPC_SKB_CB(buf)->handle = NULL;
 	__skb_queue_tail(&sk->sk_receive_queue, buf);
 	skb_set_owner_r(buf, sk);
 
@@ -1691,7 +1691,7 @@ restart:
 		/* Disconnect and send a 'FIN+' or 'FIN-' message to peer */
 		buf = __skb_dequeue(&sk->sk_receive_queue);
 		if (buf) {
-			if (TIPC_SKB_CB(buf)->handle != 0) {
+			if (TIPC_SKB_CB(buf)->handle != NULL) {
 				kfree_skb(buf);
 				goto restart;
 			}
-- 
1.7.9.5

^ permalink raw reply related

* IP_FREEBIND with IPv6
From: David Madore @ 2013-09-24  9:24 UTC (permalink / raw)
  To: linux-netdev mailing-list

Dear list,

I have two questions regarding the IP_FREEBIND option in IPv6.

Consider the following script, which sets IP_FREEBIND and then tries
to bind to some arbitrary non-local IPv6 address:

### cut after ###
#! /usr/bin/env python
import socket
if not hasattr(socket, 'IP_FREEBIND'):
    socket.IP_FREEBIND = 15
s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.setsockopt(socket.SOL_IP, socket.IP_FREEBIND, 1)
s.bind(("fd42:dead:beef::1", 0))
print "success"
### cut before ###

* This fails (with EADDRNOTAVAIL) under the long-term 3.2 kernels (I
  tried with 3.2.50 and also the current Debian stable kernel,
  3.2.0-4-amd64).  On the other hand, it succeeds with a more recent
  kernel (I tried 3.10.10).  Is this to be considered a bug in the 3.2
  kernel?  I attempted to find which commit fixed this behavior,
  without success: can someone point out to the appropriate patch?
  Could this conceivably be proposed for inclusion in 3.2?

* If one sets net.ipv4.ip_nonlocal_bind to 1 then (according to ip(7))
  the script should work even without the s.setsockopt(socket.SOL_IP,
  socket.IP_FREEBIND, 1) line.  I have failed to make this work on any
  version of the kernel.  Again, is this a bug or a misunderstanding
  on my part of what net.ipv4.ip_nonlocal_bind should do?  (I am aware
  it says "ipv4" in the sysctl name, but since there is no
  corresponding net.ipv6.ip_nonlocal_bind, I don't see what else one
  is supposed to use.)

Best regards,

-- 
     David A. Madore
   ( http://www.madore.org/~david/ )

^ permalink raw reply

* Re: [PATCH] ipvs: improved SH fallback strategy
From: Alexander Frolkin @ 2013-09-24  9:32 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Julian Anastasov, lvs-devel, Wensong Zhang, Simon Horman, netdev,
	linux-kernel
In-Reply-To: <524099BA.5020303@cogentembedded.com>

Improve the SH fallback realserver selection strategy.
 
With sh and sh-fallback, if a realserver is down, this attempts to
distribute the traffic that would have gone to that server evenly
among the remaining servers.
 
Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk>
--
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 3588fae..0db7d01 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -120,22 +120,33 @@ static inline struct ip_vs_dest *
 ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s,
 		      const union nf_inet_addr *addr, __be16 port)
 {
-	unsigned int offset;
-	unsigned int hash;
+	unsigned int offset, roffset;
+	unsigned int hash, ihash;
 	struct ip_vs_dest *dest;
 
-	for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) {
-		hash = ip_vs_sh_hashkey(svc->af, addr, port, offset);
-		dest = rcu_dereference(s->buckets[hash].dest);
-		if (!dest)
-			break;
-		if (is_unavailable(dest))
-			IP_VS_DBG_BUF(6, "SH: selected unavailable server "
-				      "%s:%d (offset %d)",
+	ihash = ip_vs_sh_hashkey(svc->af, addr, port, 0);
+	dest = rcu_dereference(s->buckets[ihash].dest);
+	if (!dest)
+		return NULL;
+	if (is_unavailable(dest)) {
+		IP_VS_DBG_BUF(6, "SH: selected unavailable server "
+		      "%s:%d, reselecting",
+		      IP_VS_DBG_ADDR(svc->af, &dest->addr),
+		      ntohs(dest->port));
+		for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) {
+			roffset = (offset + ihash) % IP_VS_SH_TAB_SIZE;
+			hash = ip_vs_sh_hashkey(svc->af, addr, port, roffset);
+			dest = rcu_dereference(s->buckets[hash].dest);
+			if (is_unavailable(dest))
+				IP_VS_DBG_BUF(6, "SH: selected unavailable "
+				      "server %s:%d (offset %d), reselecting",
 				      IP_VS_DBG_ADDR(svc->af, &dest->addr),
-				      ntohs(dest->port), offset);
-		else
-			return dest;
+				      ntohs(dest->port), roffset);
+			else
+				return dest;
+		}
+	} else {
+		return dest;
 	}
 
 	return NULL;

^ permalink raw reply related

* Re: [PATCH] stable_kernel_rules.txt: Exclude networking from stable rules
From: Christoph Hellwig @ 2013-09-24  8:48 UTC (permalink / raw)
  To: Joe Perches
  Cc: stephen, linux-doc, Greg Kroah-Hartman, LKML, xfs,
	Christoph Hellwig, Mikulas Patocka, Rob Landley, netdev,
	David Miller
In-Reply-To: <1379968445.3575.60.camel@joe-AO722>

On Mon, Sep 23, 2013 at 01:34:05PM -0700, Joe Perches wrote:
> Maybe adding a mechanism to MAINTAINERS would be better.
> Maybe a default B: (backport?) of stable@vger.kernel.org
> with a per-subsystem override?

Sounds fine to me.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply

* RE: [PATCH 1/2] net: Toeplitz library functions
From: David Laight @ 2013-09-24  8:32 UTC (permalink / raw)
  To: Tom Herbert, davem; +Cc: netdev, jesse.brandeburg
In-Reply-To: <alpine.DEB.2.02.1309231535030.23896@tomh.mtv.corp.google.com>

> +static inline unsigned int
> +toeplitz_hash(const unsigned char *bytes,
> +	      struct toeplitz *toeplitz, int n)
> +{
> +	int i;
> +	unsigned int result = 0;
> +
> +	for (i = 0; i < n; i++)
> +		result ^= toeplitz->key_cache[i][bytes[i]];
> +
> +        return result;
> +};

That is a horrid hash function to be calculating in software.

The code looks very much like a simple 32bit CRC.
It isn't entirely clears exactly where the 'key' gets included,
but I suspect it is just xored with the data bytes.

Using in it hardware is probably fine - the hardware can do
it cheaply (in dedicated logic) as the frame arrives.
The CRC polynomial probably collapses to a few XOR operations
when done byte by byte (the hdlc crc16 collapses to 3 levels
of xor).

IIRC jhash() works on 32bit quantities - so has far fewer
maths operations and well as not having all the random data
accesses (cache misses and displacing other parts of the
working set from the cache).

I also thought the hash was arranged so that tx and rx packets
for a single connection hash to the same value?

	David

^ permalink raw reply

* [PATCH] powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
From: Aida Mynzhasova @ 2013-09-24  7:39 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: devicetree, netdev

Currently IEEE 1588 timer reference clock source is determined through
hard-coded value in gianfar_ptp driver. This patch allows to select ptp
clock source by means of device tree file node.

For instance:

	fsl,cksel = <0>;

for using external (TSEC_TMR_CLK input) high precision timer
reference clock.

Other acceptable values:

	<1> : eTSEC system clock
	<2> : eTSEC1 transmit clock
	<3> : RTC clock input

When this attribute isn't used, eTSEC system clock will serve as
IEEE 1588 timer reference clock.

Signed-off-by: Aida Mynzhasova <aida.mynzhasova@skitlab.ru>
---
 Documentation/devicetree/bindings/net/fsl-tsec-phy.txt | 2 ++
 drivers/net/ethernet/freescale/gianfar_ptp.c           | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
index 2c6be03..2f889f1 100644
--- a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
+++ b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
@@ -86,6 +86,7 @@ General Properties:
 
 Clock Properties:
 
+  - fsl,cksel        Timer reference clock source.
   - fsl,tclk-period  Timer reference clock period in nanoseconds.
   - fsl,tmr-prsc     Prescaler, divides the output clock.
   - fsl,tmr-add      Frequency compensation value.
@@ -121,6 +122,7 @@ Example:
 		reg = <0x24E00 0xB0>;
 		interrupts = <12 0x8 13 0x8>;
 		interrupt-parent = < &ipic >;
+		fsl,cksel       = <1>;
 		fsl,tclk-period = <10>;
 		fsl,tmr-prsc    = <100>;
 		fsl,tmr-add     = <0x999999A4>;
diff --git a/drivers/net/ethernet/freescale/gianfar_ptp.c b/drivers/net/ethernet/freescale/gianfar_ptp.c
index 098f133..e006a09 100644
--- a/drivers/net/ethernet/freescale/gianfar_ptp.c
+++ b/drivers/net/ethernet/freescale/gianfar_ptp.c
@@ -452,7 +452,9 @@ static int gianfar_ptp_probe(struct platform_device *dev)
 	err = -ENODEV;
 
 	etsects->caps = ptp_gianfar_caps;
-	etsects->cksel = DEFAULT_CKSEL;
+
+	if (get_of_u32(node, "fsl,cksel", &etsects->cksel))
+		etsects->cksel = DEFAULT_CKSEL;
 
 	if (get_of_u32(node, "fsl,tclk-period", &etsects->tclk_period) ||
 	    get_of_u32(node, "fsl,tmr-prsc", &etsects->tmr_prsc) ||
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH 01/10] can: Remove extern from function prototypes
From: Marc Kleine-Budde @ 2013-09-24  7:37 UTC (permalink / raw)
  To: Joe Perches
  Cc: netdev, David S. Miller, linux-kernel, Wolfgang Grandegger,
	linux-can
In-Reply-To: <5570169a078375fa8662adeb2a7f24c1ae718bfb.1379974101.git.joe@perches.com>

[-- Attachment #1: Type: text/plain, Size: 826 bytes --]

On 09/24/2013 12:11 AM, Joe Perches wrote:
> There are a mix of function prototypes with and without extern
> in the kernel sources.  Standardize on not using extern for
> function prototypes.
> 
> Function prototypes don't need to be written with extern.
> extern is assumed by the compiler.  Its use is as unnecessary as
> using auto to declare automatic/local variables in a block.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Thx, added to linux-can-next. The patch will be included in the next
pull request to David.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply

* Re: [pchecks v1 2/4] Use raw cpu ops for calls that would trigger with checks
From: Ingo Molnar @ 2013-09-24  7:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tejun Heo, akpm, Steven Rostedt, linux-kernel, Peter Zijlstra,
	netdev
In-Reply-To: <000001414c3d064a-ebe0610b-6951-4a74-bd33-8480e3e1e364-000000@email.amazonses.com>


(netdev Cc:-ed)

* Christoph Lameter <cl@linux.com> wrote:

> These location triggered during testing with KVM.
> 
> These are fetches without preemption off where we judged that
> to be more performance efficient or where other means of
> providing synchronization (BH handling) are available.

> Index: linux/include/net/snmp.h
> ===================================================================
> --- linux.orig/include/net/snmp.h	2013-09-12 13:26:29.216103951 -0500
> +++ linux/include/net/snmp.h	2013-09-12 13:26:29.208104037 -0500
> @@ -126,7 +126,7 @@ struct linux_xfrm_mib {
>  	extern __typeof__(type) __percpu *name[SNMP_ARRAY_SZ]
>  
>  #define SNMP_INC_STATS_BH(mib, field)	\
> -			__this_cpu_inc(mib[0]->mibs[field])
> +			raw_cpu_inc(mib[0]->mibs[field])
>  
>  #define SNMP_INC_STATS_USER(mib, field)	\
>  			this_cpu_inc(mib[0]->mibs[field])
> @@ -141,7 +141,7 @@ struct linux_xfrm_mib {
>  			this_cpu_dec(mib[0]->mibs[field])
>  
>  #define SNMP_ADD_STATS_BH(mib, field, addend)	\
> -			__this_cpu_add(mib[0]->mibs[field], addend)
> +			raw_cpu_add(mib[0]->mibs[field], addend)

Are the networking folks fine with allowing unafe operations of SNMP stats 
in preemptible sections, or should the kernel produce an optional warning 
message if CONFIG_PREEMPT_DEBUG=y and these ops are used in preemptible 
(non-bh, non-irq-handler, non-irqs-off, etc.) sections?

RAW_SNMP_*_STATS() ops could be used to annotate those places where that 
kind of usage is safe.

Thanks,

	Ingo

^ permalink raw reply

* [PATCH] ptp: add range check on n_samples
From: Dong Zhu @ 2013-09-24  7:05 UTC (permalink / raw)
  To: Richard Cochran, David Miller; +Cc: netdev, linux-kernel

>From d4eb97e8d5def76d46167c91059147e2c7d33433 Mon Sep 17 00:00:00 2001

When using PTP_SYS_OFFSET ioctl to measure the time offset between the
PHC and system clock, we need to specify the number of measurements, the
valid value of n_samples is between 1 to 25. If n_samples <= 0 or > 25
it makes no sense, so this patch intends to add a range check.

Signed-off-by: Dong Zhu <bluezhudong@gmail.com>
---
 drivers/ptp/ptp_chardev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index 34a0c60..4e85b23 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -104,7 +104,8 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 			err = -EFAULT;
 			break;
 		}
-		if (sysoff->n_samples > PTP_MAX_SAMPLES) {
+		if (sysoff->n_samples <= 0 ||
+		    sysoff->n_samples > PTP_MAX_SAMPLES) {
 			err = -EINVAL;
 			break;
 		}
-- 
1.7.11.7

-- 
Best Regards,
Dong Zhu

^ permalink raw reply related

* Re: [PATCH net] vxlan: Use RCU apis to access sk_user_data.
From: Eric Dumazet @ 2013-09-24  5:51 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev, Jesse Gross
In-Reply-To: <1380000501-13969-1-git-send-email-pshelar@nicira.com>

On Mon, 2013-09-23 at 22:28 -0700, Pravin B Shelar wrote:
> Use of RCU api makes vxlan code easier to understand.  It also
> fixes bug due to missing ACCESS_ONCE() on sk_user_data dereference.
> In rare case without ACCESS_ONCE() compiler might omit vs on
> sk_user_data dereference.
> Compiler can use vs as alias for sk->sk_user_data, resulting in
> multiple sk_user_data dereference in rcu read context which
> could change.
> 
> CC: Jesse Gross <jesse@nicira.com>
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
> ---
>  drivers/net/vxlan.c |    9 +++------
>  include/net/sock.h  |    2 ++
>  2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index d1292fe..3519a71 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -952,8 +952,7 @@ void vxlan_sock_release(struct vxlan_sock *vs)
>  
>  	spin_lock(&vn->sock_lock);
>  	hlist_del_rcu(&vs->hlist);
> -	smp_wmb();
> -	vs->sock->sk->sk_user_data = NULL;
> +	rcu_assign_pointer(sk_user_data_rcu(vs->sock->sk), NULL);

RCU_INIT_POINTER(sk_user_data_rcu(vs->sock->sk), NULL)

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Hannes Frederic Sowa @ 2013-09-24  5:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <1380001118.3165.41.camel@edumazet-glaptop>

On Mon, Sep 23, 2013 at 10:38:38PM -0700, Eric Dumazet wrote:
> On Tue, 2013-09-24 at 05:35 +0200, Hannes Frederic Sowa wrote:
> 
> > > build_ehash_secret builds up the data which seeds fragmentation ids, ephermal
> > > port randomization etc. Could we drop the check of sock->type? I guess the
> > > idea was that in-kernel sockets of type raw/udp do not seed the keys when no
> > > entropy is available?
> > 
> > Would this be better (I checked inet_ehash_secret, ipv6_hash_secret
> > and net_secret to actual get initialized)?
> > 
> 
> inet_ehash_secret is used only to make jhash() for tcp ehash, not for
> fragmentation ids or other uses (port randomization).
> 
> 
> > [PATCH] inet: initialize hash secret values on first non-kernel socket creation
> > 
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > ---
> 
> Why ? This looks buggy to me.

It does initialize the rest of the key values (net_secret_init), too:

258 void build_ehash_secret(void)
259 {
260         u32 rnd;
261 
262         do {
263                 get_random_bytes(&rnd, sizeof(rnd));
264         } while (rnd == 0);
265 
266         if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0) {
267                 get_random_bytes(&ipv6_hash_secret, sizeof(ipv6_hash_secret));
268                 net_secret_init();
269         }
270 }

Maybe I overlooked something?

Thanks,

  Hannes

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Eric Dumazet @ 2013-09-24  5:38 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <20130924033505.GB22393@order.stressinduktion.org>

On Tue, 2013-09-24 at 05:35 +0200, Hannes Frederic Sowa wrote:

> > build_ehash_secret builds up the data which seeds fragmentation ids, ephermal
> > port randomization etc. Could we drop the check of sock->type? I guess the
> > idea was that in-kernel sockets of type raw/udp do not seed the keys when no
> > entropy is available?
> 
> Would this be better (I checked inet_ehash_secret, ipv6_hash_secret
> and net_secret to actual get initialized)?
> 

inet_ehash_secret is used only to make jhash() for tcp ehash, not for
fragmentation ids or other uses (port randomization).


> [PATCH] inet: initialize hash secret values on first non-kernel socket creation
> 
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---

Why ? This looks buggy to me.

^ permalink raw reply

* Re: iSCSI support in Linux
From: Rayagond K @ 2013-09-24  5:30 UTC (permalink / raw)
  To: Jayamohan.K; +Cc: netdev
In-Reply-To: <CAEc=gqbARiCaaZbvBAAKzEfkVnZL3QN4pDb+5Ljgn5EB8G3ZyA@mail.gmail.com>

Hi Jay,

Thanks a lot for your response.

There is seems to be generic iSCSI driver (/drivers/scsi/iscsi_tcp.c).
 So why are we having separate drivers/scsi/be2iscsi ?  is it specific
to emulex storage device ? This point is not clear to me.

I am network driver person, I am not familiar with SCSI/iSCSI. So any
help would be appreciated.

Thanks
Rayagond

On Mon, Sep 23, 2013 at 10:05 PM, Jayamohan.K <jayamohank@gmail.com> wrote:
> Yes, there is support for offloading in Linux (open-iscsi) and it is under
> scsi directory.
>
> One example would be the Emulex driver at  drivers/scsi/be2iscsi
> Disclaimer: I am the Maintainer for it
>
> Thanks
> Jay
>
>
> On Mon, Sep 23, 2013 at 3:54 AM, Rayagond K <rayagond@vayavyalabs.com>
> wrote:
>>
>> Hi All,
>>
>> I am checking iSCSI support in Linux, during the search over internet
>> I got to know that iSCSI standard is implemented in Linux with kernel
>> version 2.6.20 and later. But I didn't understand one thing clearly
>> that is there any NIC offloading features related iSCSI ? if so, is
>> there any support in Linux  for such offloading features ? Any example
>> NIC driver in LXR with iSCSI implementation ?
>>
>>
>> Thanks
>> Rayagond.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply

* [PATCH net] vxlan: Use RCU apis to access sk_user_data.
From: Pravin B Shelar @ 2013-09-24  5:28 UTC (permalink / raw)
  To: netdev; +Cc: Pravin B Shelar, Jesse Gross

Use of RCU api makes vxlan code easier to understand.  It also
fixes bug due to missing ACCESS_ONCE() on sk_user_data dereference.
In rare case without ACCESS_ONCE() compiler might omit vs on
sk_user_data dereference.
Compiler can use vs as alias for sk->sk_user_data, resulting in
multiple sk_user_data dereference in rcu read context which
could change.

CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 drivers/net/vxlan.c |    9 +++------
 include/net/sock.h  |    2 ++
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d1292fe..3519a71 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -952,8 +952,7 @@ void vxlan_sock_release(struct vxlan_sock *vs)
 
 	spin_lock(&vn->sock_lock);
 	hlist_del_rcu(&vs->hlist);
-	smp_wmb();
-	vs->sock->sk->sk_user_data = NULL;
+	rcu_assign_pointer(sk_user_data_rcu(vs->sock->sk), NULL);
 	vxlan_notify_del_rx_port(sk);
 	spin_unlock(&vn->sock_lock);
 
@@ -1048,8 +1047,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 
 	port = inet_sk(sk)->inet_sport;
 
-	smp_read_barrier_depends();
-	vs = (struct vxlan_sock *)sk->sk_user_data;
+	vs = rcu_dereference(sk_user_data_rcu(sk));
 	if (!vs)
 		goto drop;
 
@@ -2302,8 +2300,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 	atomic_set(&vs->refcnt, 1);
 	vs->rcv = rcv;
 	vs->data = data;
-	smp_wmb();
-	vs->sock->sk->sk_user_data = vs;
+	rcu_assign_pointer(sk_user_data_rcu(vs->sock->sk), vs);
 
 	spin_lock(&vn->sock_lock);
 	hlist_add_head_rcu(&vs->hlist, vs_head(net, port));
diff --git a/include/net/sock.h b/include/net/sock.h
index 6ba2e7b..ffd1356 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -409,6 +409,8 @@ struct sock {
 	void                    (*sk_destruct)(struct sock *sk);
 };
 
+#define sk_user_data_rcu(sk) ((*((void __rcu **)&(sk)->sk_user_data)))
+
 /*
  * SK_CAN_REUSE and SK_NO_REUSE on a socket mean that the socket is OK
  * or not whether his port will be reused by someone else. SK_FORCE_REUSE
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH 08/12] bluetooth: Remove extern from function prototypes
From: Marcel Holtmann @ 2013-09-24  4:24 UTC (permalink / raw)
  To: Joe Perches
  Cc: netdev, David S. Miller, linux-kernel, Gustavo Padovan,
	Johan Hedberg, linux-bluetooth
In-Reply-To: <e9994a2b839277a9237364b59e655060f1fcb957.1379961014.git.joe@perches.com>

Hi Joe,

> There are a mix of function prototypes with and without extern
> in the kernel sources.  Standardize on not using extern for
> function prototypes.
> 
> Function prototypes don't need to be written with extern.
> extern is assumed by the compiler.  Its use is as unnecessary as
> using auto to declare automatic/local variables in a block.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
> include/net/bluetooth/bluetooth.h | 16 ++++++++--------
> include/net/bluetooth/hci_core.h  | 23 +++++++++++------------
> include/net/bluetooth/rfcomm.h    |  4 ++--
> 3 files changed, 21 insertions(+), 22 deletions(-)

Acked-by: Marcel Holtmann <marcel@holtmann.org>

Regards

Marcel

^ permalink raw reply

* [PATCH] net: Delay default_device_exit_batch until no devices are unregistering v2
From: Eric W. Biederman @ 2013-09-24  4:19 UTC (permalink / raw)
  To: David Miller; +Cc: fruggeri, edumazet, jiri, alexander.h.duyck, amwang, netdev
In-Reply-To: <20130917.235247.344101545141336143.davem@davemloft.net>


There is currently serialization network namespaces exiting and
network devices exiting as the final part of netdev_run_todo does not
happen under the rtnl_lock.  This is compounded by the fact that the
only list of devices unregistering in netdev_run_todo is local to the
netdev_run_todo.

This lack of serialization in extreme cases results in network devices
unregistering in netdev_run_todo after the loopback device of their
network namespace has been freed (making dst_ifdown unsafe), and after
the their network namespace has exited (making the NETDEV_UNREGISTER,
and NETDEV_UNREGISTER_FINAL callbacks unsafe).

Add the missing serialization by a per network namespace count of how
many network devices are unregistering and having a wait queue that is
woken up whenever the count is decreased.  The count and wait queue
allow default_device_exit_batch to wait until all of the unregistration
activity for a network namespace has finished before proceeding to
unregister the loopback device and then allowing the network namespace
to exit.

Only a single global wait queue is used because there is a single global
lock, and there is a single waiter, per network namespace wait queues
would be a waste of resources.

The per network namespace count of unregistering devices gives a
progress guarantee because the number of network devices unregistering
in an exiting network namespace must ultimately drop to zero (assuming
network device unregistration completes).

The basic logic remains the same as in v1.  This patch is now half
comment and half rtnl_lock_unregistering an expanded version of
wait_event performs no extra work in the common case where no network
devices are unregistering when we get to default_device_exit_batch.

Reported-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 include/net/net_namespace.h |    1 +
 net/core/dev.c              |   49 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 1 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 84e37b1..be46311 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -74,6 +74,7 @@ struct net {
 	struct hlist_head	*dev_index_head;
 	unsigned int		dev_base_seq;	/* protected by rtnl_mutex */
 	int			ifindex;
+	unsigned int		dev_unreg_count;
 
 	/* core fib_rules */
 	struct list_head	rules_ops;
diff --git a/net/core/dev.c b/net/core/dev.c
index 5d702fe..7fa75e7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5002,10 +5002,12 @@ static int dev_new_index(struct net *net)
 
 /* Delayed registration/unregisteration */
 static LIST_HEAD(net_todo_list);
+static DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wq);
 
 static void net_set_todo(struct net_device *dev)
 {
 	list_add_tail(&dev->todo_list, &net_todo_list);
+	dev_net(dev)->dev_unreg_count++;
 }
 
 static void rollback_registered_many(struct list_head *head)
@@ -5673,6 +5675,12 @@ void netdev_run_todo(void)
 		if (dev->destructor)
 			dev->destructor(dev);
 
+		/* Report a network device has been unregistered */
+		rtnl_lock();
+		dev_net(dev)->dev_unreg_count--;
+		__rtnl_unlock();
+		wake_up(&netdev_unregistering_wq);
+
 		/* Free network device */
 		kobject_put(&dev->dev.kobj);
 	}
@@ -6358,6 +6366,34 @@ static void __net_exit default_device_exit(struct net *net)
 	rtnl_unlock();
 }
 
+static void __net_exit rtnl_lock_unregistering(struct list_head *net_list)
+{
+	/* Return with the rtnl_lock held when there are no network
+	 * devices unregistering in any network namespace in net_list.
+	 */
+	struct net *net;
+	bool unregistering;
+	DEFINE_WAIT(wait);
+
+	for (;;) {
+		prepare_to_wait(&netdev_unregistering_wq, &wait,
+				TASK_UNINTERRUPTIBLE);
+		unregistering = false;
+		rtnl_lock();
+		list_for_each_entry(net, net_list, exit_list) {
+			if (net->dev_unreg_count > 0) {
+				unregistering = true;
+				break;
+			}
+		}
+		if (!unregistering)
+			break;
+		__rtnl_unlock();
+		schedule();
+	}
+	finish_wait(&netdev_unregistering_wq, &wait);
+}
+
 static void __net_exit default_device_exit_batch(struct list_head *net_list)
 {
 	/* At exit all network devices most be removed from a network
@@ -6369,7 +6405,18 @@ static void __net_exit default_device_exit_batch(struct list_head *net_list)
 	struct net *net;
 	LIST_HEAD(dev_kill_list);
 
-	rtnl_lock();
+	/* To prevent network device cleanup code from dereferencing
+	 * loopback devices or network devices that have been freed
+	 * wait here for all pending unregistrations to complete,
+	 * before unregistring the loopback device and allowing the
+	 * network namespace be freed.
+	 *
+	 * The netdev todo list containing all network devices
+	 * unregistrations that happen in default_device_exit_batch
+	 * will run in the rtnl_unlock() at the end of
+	 * default_device_exit_batch.
+	 */
+	rtnl_lock_unregistering(net_list);
 	list_for_each_entry(net, net_list, exit_list) {
 		for_each_netdev_reverse(net, dev) {
 			if (dev->rtnl_link_ops)
-- 
1.7.5.4

^ permalink raw reply related

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Hannes Frederic Sowa @ 2013-09-24  3:35 UTC (permalink / raw)
  To: Eric Dumazet, Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <20130924023038.GA22393@order.stressinduktion.org>

On Tue, Sep 24, 2013 at 04:30:38AM +0200, Hannes Frederic Sowa wrote:
> On Mon, Sep 23, 2013 at 05:03:11PM -0700, Eric Dumazet wrote:
> > On Mon, 2013-09-23 at 15:41 -0700, Tom Herbert wrote:
> > 
> > > +#ifdef CONFIG_NET_TOEPLITZ
> > > +	toeplitz_net = toeplitz_alloc();
> > > +	if (!toeplitz_net)
> > > +		goto out;
> > > +
> > > +	toeplitz_init(toeplitz_net, NULL);
> > > +#endif
> > > +
> > 
> > Hmm
> > 
> > 1) Security alert here.
> > 
> > Many devices (lets say Android phones) have no entropy at this point,
> > all devices will have same toeplitz key.
> > 
> > Check build_ehash_secret() for a possible point for the feeding of the
> > key. (and commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 )
> > 
> > If hardware toeplitz is ever used, we want to make sure every host uses
> > a private and hidden Toeplitz key.
> build_ehash_secret builds up the data which seeds fragmentation ids, ephermal
> port randomization etc. Could we drop the check of sock->type? I guess the
> idea was that in-kernel sockets of type raw/udp do not seed the keys when no
> entropy is available?

Would this be better (I checked inet_ehash_secret, ipv6_hash_secret
and net_secret to actual get initialized)?

[PATCH] inet: initialize hash secret values on first non-kernel socket creation

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/af_inet.c  | 5 ++---
 net/ipv6/af_inet6.c | 4 +---
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7a1874b..489834a 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -286,9 +286,8 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
 	int try_loading_module = 0;
 	int err;
 
-	if (unlikely(!inet_ehash_secret))
-		if (sock->type != SOCK_RAW && sock->type != SOCK_DGRAM)
-			build_ehash_secret();
+	if (unlikely(!inet_ehash_secret && !kern))
+		build_ehash_secret();
 
 	sock->state = SS_UNCONNECTED;
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 7c96100..dbf8c35 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -110,9 +110,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 	int try_loading_module = 0;
 	int err;
 
-	if (sock->type != SOCK_RAW &&
-	    sock->type != SOCK_DGRAM &&
-	    !inet_ehash_secret)
+	if (unlikely(!inet_ehash_secret && !kern))
 		build_ehash_secret();
 
 	/* Look for the requested type/protocol pair. */
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v2.39 7/7] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2013-09-24  2:49 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Ben Pfaff, Pravin Shelar, dev@openvswitch.org, netdev, Ravi K,
	Isaku Yamahata, Joe Stringer
In-Reply-To: <CAEP_g=8i9djcOWhNEKweg8Qx8LhQ4jD142af2PgfR59yo=SsAA@mail.gmail.com>

On Mon, Sep 23, 2013 at 06:38:23PM -0700, Jesse Gross wrote:
> On Mon, Sep 23, 2013 at 6:32 PM, Simon Horman <horms@verge.net.au> wrote:
> > On Mon, Sep 23, 2013 at 02:17:50PM -0700, Jesse Gross wrote:
> >> On Sat, Sep 21, 2013 at 10:34 PM, Simon Horman <horms@verge.net.au> wrote:
> >> > On Thu, Sep 19, 2013 at 12:21:33PM -0500, Jesse Gross wrote:
> >> >> On Thu, Sep 19, 2013 at 10:57 AM, Simon Horman <horms@verge.net.au> wrote:
> >> >> > On Mon, Sep 16, 2013 at 03:38:21PM -0500, Jesse Gross wrote:
> >> >> >> On Mon, Sep 9, 2013 at 12:20 AM, Simon Horman <horms@verge.net.au> wrote:
> >> >> One other consideration in the OVS case - with recirculation we may
> >> >> hit this code multiple times and the difference in behavior could be
> >> >> surprising. However, on the other hand, we need to be careful because
> >> >> skb->cb is not guaranteed to be initialized to zero.
> >> >
> >> > Thanks, that is also not something that I had considered.
> >> >
> >> > I'm not sure, but I think that we can rely on skb->cb
> >> > not being clobbered between rounds of recirculation.
> >> > Or at the very least I think we could save and restore it
> >> > as necessary.
> >>
> >> Yes, it should be safe to assume this.
> >>
> >> > So I think if we could be careful to make sure that inner_protocol
> >> > is in a sane state the first time we see the skb but not
> >> > each time it is recirculated then I think things should work out.
> >> >
> >> > In my current implementation of recirculation the datapath
> >> > side is driven ovs_dp_process_received_packet(). So by my reasoning
> >> > above I think it would make sense to reset the inner_protocol there
> >> > and in ovs_packet_cmd_execute() rather than in ovs_execute_actions()
> >> > which each of those functions call.
> >>
> >> I think that would work, however, I wonder if it's the right place in
> >> general, independent of this compatibility issue. I guess it still
> >> seems like the ideal thing to do is to move this as close to where it
> >> is necessary as possible, specifically in mpls_push(). Is there a
> >> reason to not put it there (again, other than the out-of-tree
> >> compatibility issues)?
> >
> > I agree that should work, out-of-tree compatibility issues aside.
> >
> > Perhaps a solution is to have a conditional set_inner_protocol call inside
> > push_mpls, where the condition is that inner_protocol is zero.
> > And a reset_inner_protocol call earlier on, a call that sets inner_protocol
> > to zero only if the compatibility code is in use and thus it resides in
> > struct ovs_gso_cb. This call could be remove once the compatibility
> > code is no longer needed, that is once kernels older than 3.11 are no
> > longer supported.
> 
> I agree that's probably the right solution.

Thanks, I will see about making it so.

^ permalink raw reply

* Re: [RESEND PATCH iproute2] vxlan: add ipv6 support
From: Cong Wang @ 2013-09-24  2:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, David S. Miller
In-Reply-To: <20130923132049.7fa5673e@nehalam.linuxnetplumber.net>

On Mon, 2013-09-23 at 13:20 -0700, Stephen Hemminger wrote:
> On Sat, 21 Sep 2013 10:35:24 +0800
> Cong Wang <amwang@redhat.com> wrote:
> 
> > +			if (!inet_pton(AF_INET, *argv, &gaddr)) {
> > +				if (!inet_pton(AF_INET6, *argv, &gaddr6)) {
> > +					fprintf(stderr, "Invalid address \"%s\"\n", *argv);
> > +					return -1;
> > +				} else if (!IN6_IS_ADDR_MULTICAST(&gaddr6))
> > +					invarg("invald group address", *argv);
> > +			} else if (!IN_MULTICAST(ntohl(gaddr)))
> > +					invarg("invald group address", *argv);
> 
> This is ugly, can't it be done more generically by checking for ':' in address.
> Or even use getaddrinfo.
> 
> Hate to have lots of special code to handle both address types.
> 

Alright, I will introduce a helper function for it.

Thanks!

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Hannes Frederic Sowa @ 2013-09-24  2:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <1379980991.3165.37.camel@edumazet-glaptop>

On Mon, Sep 23, 2013 at 05:03:11PM -0700, Eric Dumazet wrote:
> On Mon, 2013-09-23 at 15:41 -0700, Tom Herbert wrote:
> 
> > +#ifdef CONFIG_NET_TOEPLITZ
> > +	toeplitz_net = toeplitz_alloc();
> > +	if (!toeplitz_net)
> > +		goto out;
> > +
> > +	toeplitz_init(toeplitz_net, NULL);
> > +#endif
> > +
> 
> Hmm
> 
> 1) Security alert here.
> 
> Many devices (lets say Android phones) have no entropy at this point,
> all devices will have same toeplitz key.
> 
> Check build_ehash_secret() for a possible point for the feeding of the
> key. (and commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 )
> 
> If hardware toeplitz is ever used, we want to make sure every host uses
> a private and hidden Toeplitz key.

I just had a look at it myself and have one question:

ipv6/af_inet6.c:
112         if (sock->type != SOCK_RAW &&
113             sock->type != SOCK_DGRAM &&
114             !inet_ehash_secret)
115                 build_ehash_secret();

ipv4/af_inet.c:
289         if (unlikely(!inet_ehash_secret))
290                 if (sock->type != SOCK_RAW && sock->type != SOCK_DGRAM)
291                         build_ehash_secret();


Why do we care about the sock->type?

build_ehash_secret builds up the data which seeds fragmentation ids, ephermal
port randomization etc. Could we drop the check of sock->type? I guess the
idea was that in-kernel sockets of type raw/udp do not seed the keys when no
entropy is available?

Thanks,

  Hannes

^ permalink raw reply

* linux-next: manual merge of the ipsec-next tree with the net-next tree
From: Stephen Rothwell @ 2013-09-24  2:16 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: linux-next, linux-kernel, Fan Du, Joe Perches, David Miller,
	netdev

[-- Attachment #1: Type: text/plain, Size: 4878 bytes --]

Hi Steffen,

Today's linux-next merge of the ipsec-next tree got a conflict in
include/net/xfrm.h between commit d511337a1eda ("xfrm.h: Remove extern
from function prototypes") from the net-next tree and commit aba826958830
("{ipv4,xfrm}: Introduce xfrm_tunnel_notifier for xfrm tunnel mode
callback") from the ipsec-next tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc include/net/xfrm.h
index 7657461,c7afa6e..0000000
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@@ -1493,39 -1495,35 +1499,39 @@@ static inline int xfrm4_rcv_spi(struct 
  	return xfrm4_rcv_encap(skb, nexthdr, spi, 0);
  }
  
 -extern int xfrm4_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 -extern int xfrm4_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 -extern int xfrm4_output(struct sk_buff *skb);
 -extern int xfrm4_output_finish(struct sk_buff *skb);
 -extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
 -extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
 -extern int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel_notifier *handler);
 -extern int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel_notifier *handler);
 -extern int xfrm6_extract_header(struct sk_buff *skb);
 -extern int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
 -extern int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
 -extern int xfrm6_transport_finish(struct sk_buff *skb, int async);
 -extern int xfrm6_rcv(struct sk_buff *skb);
 -extern int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 -			    xfrm_address_t *saddr, u8 proto);
 -extern int xfrm6_tunnel_register(struct xfrm6_tunnel *handler, unsigned short family);
 -extern int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler, unsigned short family);
 -extern __be32 xfrm6_tunnel_alloc_spi(struct net *net, xfrm_address_t *saddr);
 -extern __be32 xfrm6_tunnel_spi_lookup(struct net *net, const xfrm_address_t *saddr);
 -extern int xfrm6_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 -extern int xfrm6_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 -extern int xfrm6_output(struct sk_buff *skb);
 -extern int xfrm6_output_finish(struct sk_buff *skb);
 -extern int xfrm6_find_1stfragopt(struct xfrm_state *x, struct sk_buff *skb,
 -				 u8 **prevhdr);
 +int xfrm4_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 +int xfrm4_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 +int xfrm4_output(struct sk_buff *skb);
 +int xfrm4_output_finish(struct sk_buff *skb);
 +int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
 +int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
- int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel *handler);
- int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel *handler);
++int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel_notifier *handler);
++int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel_notifier *handler);
 +void xfrm4_local_error(struct sk_buff *skb, u32 mtu);
 +int xfrm6_extract_header(struct sk_buff *skb);
 +int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
 +int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
 +int xfrm6_transport_finish(struct sk_buff *skb, int async);
 +int xfrm6_rcv(struct sk_buff *skb);
 +int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 +		     xfrm_address_t *saddr, u8 proto);
 +int xfrm6_tunnel_register(struct xfrm6_tunnel *handler, unsigned short family);
 +int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler,
 +			    unsigned short family);
 +__be32 xfrm6_tunnel_alloc_spi(struct net *net, xfrm_address_t *saddr);
 +__be32 xfrm6_tunnel_spi_lookup(struct net *net, const xfrm_address_t *saddr);
 +int xfrm6_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 +int xfrm6_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 +int xfrm6_output(struct sk_buff *skb);
 +int xfrm6_output_finish(struct sk_buff *skb);
 +int xfrm6_find_1stfragopt(struct xfrm_state *x, struct sk_buff *skb,
 +			  u8 **prevhdr);
 +void xfrm6_local_error(struct sk_buff *skb, u32 mtu);
  
  #ifdef CONFIG_XFRM
 -extern int xfrm4_udp_encap_rcv(struct sock *sk, struct sk_buff *skb);
 -extern int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen);
 +int xfrm4_udp_encap_rcv(struct sock *sk, struct sk_buff *skb);
 +int xfrm_user_policy(struct sock *sk, int optname,
 +		     u8 __user *optval, int optlen);
  #else
  static inline int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen)
  {

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: David Miller @ 2013-09-24  1:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev, jesse.brandeburg
In-Reply-To: <1379980991.3165.37.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 23 Sep 2013 17:03:11 -0700

> 1) Security alert here.
> 
> Many devices (lets say Android phones) have no entropy at this point,
> all devices will have same toeplitz key.
> 
> Check build_ehash_secret() for a possible point for the feeding of the
> key. (and commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 )
> 
> If hardware toeplitz is ever used, we want to make sure every host uses
> a private and hidden Toeplitz key.
> 
> 2) Also it seems a given tuple would hash the same on different
> namespaces. Could be a problem if one particular TCP hash bucket is
> holding thousand of sockets.
> 
> 3) jhash() is fast, there is no possible cache line misses

4) Random input to the hash is now not used at all, instant exploit
   because now any attacker can open up connections over and over that
   will all hash to the same hash bucket making our lookups linear.

^ permalink raw reply

* Re: [PATCH v2.39 7/7] datapath: Add basic MPLS support to kernel
From: Jesse Gross @ 2013-09-24  1:38 UTC (permalink / raw)
  To: Simon Horman
  Cc: Ben Pfaff, Pravin Shelar, dev@openvswitch.org, netdev, Ravi K,
	Isaku Yamahata, Joe Stringer
In-Reply-To: <20130924013222.GL25601@verge.net.au>

On Mon, Sep 23, 2013 at 6:32 PM, Simon Horman <horms@verge.net.au> wrote:
> On Mon, Sep 23, 2013 at 02:17:50PM -0700, Jesse Gross wrote:
>> On Sat, Sep 21, 2013 at 10:34 PM, Simon Horman <horms@verge.net.au> wrote:
>> > On Thu, Sep 19, 2013 at 12:21:33PM -0500, Jesse Gross wrote:
>> >> On Thu, Sep 19, 2013 at 10:57 AM, Simon Horman <horms@verge.net.au> wrote:
>> >> > On Mon, Sep 16, 2013 at 03:38:21PM -0500, Jesse Gross wrote:
>> >> >> On Mon, Sep 9, 2013 at 12:20 AM, Simon Horman <horms@verge.net.au> wrote:
>> >> One other consideration in the OVS case - with recirculation we may
>> >> hit this code multiple times and the difference in behavior could be
>> >> surprising. However, on the other hand, we need to be careful because
>> >> skb->cb is not guaranteed to be initialized to zero.
>> >
>> > Thanks, that is also not something that I had considered.
>> >
>> > I'm not sure, but I think that we can rely on skb->cb
>> > not being clobbered between rounds of recirculation.
>> > Or at the very least I think we could save and restore it
>> > as necessary.
>>
>> Yes, it should be safe to assume this.
>>
>> > So I think if we could be careful to make sure that inner_protocol
>> > is in a sane state the first time we see the skb but not
>> > each time it is recirculated then I think things should work out.
>> >
>> > In my current implementation of recirculation the datapath
>> > side is driven ovs_dp_process_received_packet(). So by my reasoning
>> > above I think it would make sense to reset the inner_protocol there
>> > and in ovs_packet_cmd_execute() rather than in ovs_execute_actions()
>> > which each of those functions call.
>>
>> I think that would work, however, I wonder if it's the right place in
>> general, independent of this compatibility issue. I guess it still
>> seems like the ideal thing to do is to move this as close to where it
>> is necessary as possible, specifically in mpls_push(). Is there a
>> reason to not put it there (again, other than the out-of-tree
>> compatibility issues)?
>
> I agree that should work, out-of-tree compatibility issues aside.
>
> Perhaps a solution is to have a conditional set_inner_protocol call inside
> push_mpls, where the condition is that inner_protocol is zero.
> And a reset_inner_protocol call earlier on, a call that sets inner_protocol
> to zero only if the compatibility code is in use and thus it resides in
> struct ovs_gso_cb. This call could be remove once the compatibility
> code is no longer needed, that is once kernels older than 3.11 are no
> longer supported.

I agree that's probably the right solution.

^ permalink raw reply

* Re: [PATCH v2.39 7/7] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2013-09-24  1:33 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Pravin Shelar, dev@openvswitch.org, netdev, Ravi K,
	Isaku Yamahata, Joe Stringer
In-Reply-To: <CAEP_g=_Di4yUzR0ka_Ma73DPKRoeCYW-ZZvQ=_n7OVDmpRKbTQ@mail.gmail.com>

On Mon, Sep 23, 2013 at 02:24:31PM -0700, Jesse Gross wrote:
> On Mon, Sep 23, 2013 at 12:47 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> > This patch does not work since vport-netdev does not include compat
> > gso header. after including gso.h it gives me compiler error.
> > Can you post combined patch with fixes?
> 
> I think it's probably because of this:
> 
> +#if 1 // LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
> +#define dev_queue_xmit rpl_dev_queue_xmit
> +#endif
> 
> But otherwise, I agree the approach is much nicer than what is currently there.

Sorry for letting that slip through.

I'll post a more complete patch after doing some more testing.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox