Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 0/3] initialize fragment hash secrets with net_get_random_once
From: Hannes Frederic Sowa @ 2013-10-23  9:06 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel

Hi!

This series switches the inet_frag.rnd hash initialization to
net_get_random_once.

Included patches:
 ipv4: initialize ip4_frags hash secret as late
 ipv6: split inet6_hash_frag for netfilter and
 inet: remove old fragmentation hash initializing

Diffstat:
 include/net/inet_frag.h                 |  4 ++++
 include/net/ipv6.h                      |  2 --
 net/ipv4/inet_fragment.c                |  3 ---
 net/ipv4/ip_fragment.c                  |  1 +
 net/ipv6/netfilter/nf_conntrack_reasm.c | 16 ++++++++++++++--
 net/ipv6/reassembly.c                   | 12 ++++++------
 6 files changed, 25 insertions(+), 13 deletions(-)

Greetings,

  Hannes


^ permalink raw reply

* [PATCH net-next 1/3] ipv4: initialize ip4_frags hash secret as late as possible
From: Hannes Frederic Sowa @ 2013-10-23  9:06 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, Hannes Frederic Sowa, Eric Dumazet,
	David S. Miller
In-Reply-To: <1382519217-750-1-git-send-email-hannes@stressinduktion.org>

Defer the generation of the first hash secret for the ipv4 fragmentation
cache as late as possible.

ip4_frags.rnd gets initial seeded by inet_frags_init and regulary
reseeded by inet_frag_secret_rebuild. Either we call ipqhashfn directly
from ip_fragment.c in which case we initialize the secret directly.

If we first get called by inet_frag_secret_rebuild we install a new secret
by a manual call to get_random_bytes. This secret will be overwritten
as soon as the first call to ipqhashfn happens. This is safe because we
won't race while publishing the new secrets with anyone else.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/ip_fragment.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index b66910a..2481993 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -106,6 +106,7 @@ struct ip4_create_arg {

 static unsigned int ipqhashfn(__be16 id, __be32 saddr, __be32 daddr, u8 prot)
 {
+	net_get_random_once(&ip4_frags.rnd, sizeof(ip4_frags.rnd));
 	return jhash_3words((__force u32)id << 16 | prot,
 			    (__force u32)saddr, (__force u32)daddr,
 			    ip4_frags.rnd) & (INETFRAGS_HASHSZ - 1);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 2/3] ipv6: split inet6_hash_frag for netfilter and initialize secrets with net_get_random_once
From: Hannes Frederic Sowa @ 2013-10-23  9:06 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, Hannes Frederic Sowa, David S. Miller,
	Eric Dumazet, Pablo Neira Ayuso
In-Reply-To: <1382519217-750-1-git-send-email-hannes@stressinduktion.org>

Defer the fragmentation hash secret initialization for IPv6 like the
previous patch did for IPv4.

Because the netfilter logic reuses the hash secret we have to split it
first. Thus introduce a new nf_hash_frag function which takes care to
seed the hash secret.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/ipv6.h                      |  2 --
 net/ipv6/netfilter/nf_conntrack_reasm.c | 16 ++++++++++++++--
 net/ipv6/reassembly.c                   | 12 ++++++------
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index a35055f..dd96638 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -805,8 +805,6 @@ int ip6_mc_source(int add, int omode, struct sock *sk,
 int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf);
 int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
 		  struct group_filter __user *optval, int __user *optlen);
-unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
-			     const struct in6_addr *daddr, u32 rnd);
 
 #ifdef CONFIG_PROC_FS
 int ac6_proc_init(struct net *net);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index dffdc1a..4a25826 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -144,12 +144,24 @@ static inline u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h)
 	return 1 << (ipv6_get_dsfield(ipv6h) & INET_ECN_MASK);
 }
 
+static unsigned int nf_hash_frag(__be32 id, const struct in6_addr *saddr,
+				 const struct in6_addr *daddr)
+{
+	u32 c;
+
+	net_get_random_once(&nf_frags.rnd, sizeof(nf_frags.rnd));
+	c = jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
+			 (__force u32)id, nf_frags.rnd);
+	return c & (INETFRAGS_HASHSZ - 1);
+}
+
+
 static unsigned int nf_hashfn(struct inet_frag_queue *q)
 {
 	const struct frag_queue *nq;
 
 	nq = container_of(q, struct frag_queue, q);
-	return inet6_hash_frag(nq->id, &nq->saddr, &nq->daddr, nf_frags.rnd);
+	return nf_hash_frag(nq->id, &nq->saddr, &nq->daddr);
 }
 
 static void nf_skb_free(struct sk_buff *skb)
@@ -185,7 +197,7 @@ static inline struct frag_queue *fq_find(struct net *net, __be32 id,
 	arg.ecn = ecn;
 
 	read_lock_bh(&nf_frags.lock);
-	hash = inet6_hash_frag(id, src, dst, nf_frags.rnd);
+	hash = nf_hash_frag(id, src, dst);
 
 	q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash);
 	local_bh_enable();
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 1aeb473..cc85a9b 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -82,24 +82,24 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
  * callers should be careful not to use the hash value outside the ipfrag_lock
  * as doing so could race with ipfrag_hash_rnd being recalculated.
  */
-unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
-			     const struct in6_addr *daddr, u32 rnd)
+static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
+				    const struct in6_addr *daddr)
 {
 	u32 c;
 
+	net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd));
 	c = jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
-			 (__force u32)id, rnd);
+			 (__force u32)id, ip6_frags.rnd);
 
 	return c & (INETFRAGS_HASHSZ - 1);
 }
-EXPORT_SYMBOL_GPL(inet6_hash_frag);
 
 static unsigned int ip6_hashfn(struct inet_frag_queue *q)
 {
 	struct frag_queue *fq;
 
 	fq = container_of(q, struct frag_queue, q);
-	return inet6_hash_frag(fq->id, &fq->saddr, &fq->daddr, ip6_frags.rnd);
+	return inet6_hash_frag(fq->id, &fq->saddr, &fq->daddr);
 }
 
 bool ip6_frag_match(struct inet_frag_queue *q, void *a)
@@ -193,7 +193,7 @@ fq_find(struct net *net, __be32 id, const struct in6_addr *src,
 	arg.ecn = ecn;
 
 	read_lock(&ip6_frags.lock);
-	hash = inet6_hash_frag(id, src, dst, ip6_frags.rnd);
+	hash = inet6_hash_frag(id, src, dst);
 
 	q = inet_frag_find(&net->ipv6.frags, &ip6_frags, &arg, hash);
 	if (IS_ERR_OR_NULL(q)) {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 3/3] inet: remove old fragmentation hash initializing
From: Hannes Frederic Sowa @ 2013-10-23  9:06 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, Hannes Frederic Sowa, David S. Miller,
	Eric Dumazet
In-Reply-To: <1382519217-750-1-git-send-email-hannes@stressinduktion.org>

All fragmentation hash secrets now get initialized by their
corresponding hash function with net_get_random_once. Thus we can
eliminate the initial seeding.

Also provide a comment that hash secret seeding happens at the first
call to the corresponding hashing function.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/inet_frag.h  | 4 ++++
 net/ipv4/inet_fragment.c | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index bfcbc00..6f59de9 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -64,6 +64,10 @@ struct inet_frags {
 	rwlock_t		lock ____cacheline_aligned_in_smp;
 	int			secret_interval;
 	struct timer_list	secret_timer;
+
+	/* The first call to hashfn is responsible to initialize
+	 * rnd. This is best done with net_get_random_once.
+	 */
 	u32			rnd;
 	int			qsize;
 
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index c5313a9..bb075fc 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -93,9 +93,6 @@ void inet_frags_init(struct inet_frags *f)
 	}
 	rwlock_init(&f->lock);
 
-	f->rnd = (u32) ((totalram_pages ^ (totalram_pages >> 7)) ^
-				   (jiffies ^ (jiffies >> 6)));
-
 	setup_timer(&f->secret_timer, inet_frag_secret_rebuild,
 			(unsigned long)f);
 	f->secret_timer.expires = jiffies + f->secret_interval;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 0/3] netfilter fixes for net
From: Pablo Neira Ayuso @ 2013-10-23  9:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains three netfilter fixes for your net
tree, they are:

* A couple of fixes to resolve info leak to userspace due to uninitialized
  memory area in ulogd, from Mathias Krause.

* Fix instruction ordering issues that may lead to the access of
  uninitialized data in x_tables. The problem involves the table update
 (producer) and the main packet matching (consumer) routines. Detected in
  SMP ARMv7, from Will Deacon.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

Thanks!

----------------------------------------------------------------

The following changes since commit c31eeaced22ce8bd61268a3c595d542bb38c0a4f:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2013-10-01 12:58:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

for you to fetch changes up to b416c144f46af1a30ddfa4e4319a8f077381ad63:

  netfilter: x_tables: fix ordering of jumpstack allocation and table update (2013-10-22 10:11:29 +0200)

----------------------------------------------------------------
Mathias Krause (2):
      netfilter: ebt_ulog: fix info leaks
      netfilter: ipt_ULOG: fix info leaks

Will Deacon (1):
      netfilter: x_tables: fix ordering of jumpstack allocation and table update

 net/bridge/netfilter/ebt_ulog.c |    9 +++------
 net/ipv4/netfilter/arp_tables.c |    5 +++++
 net/ipv4/netfilter/ip_tables.c  |    5 +++++
 net/ipv4/netfilter/ipt_ULOG.c   |    7 +------
 net/ipv6/netfilter/ip6_tables.c |    5 +++++
 net/netfilter/x_tables.c        |    7 ++++++-
 6 files changed, 25 insertions(+), 13 deletions(-)

^ permalink raw reply

* [PATCH 1/3] netfilter: ebt_ulog: fix info leaks
From: Pablo Neira Ayuso @ 2013-10-23  9:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1382519724-3953-1-git-send-email-pablo@netfilter.org>

From: Mathias Krause <minipli@googlemail.com>

The ulog messages leak heap bytes by the means of padding bytes and
incompletely filled string arrays. Fix those by memset(0)'ing the
whole struct before filling it.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/ebt_ulog.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c
index 5180938..7c470c3 100644
--- a/net/bridge/netfilter/ebt_ulog.c
+++ b/net/bridge/netfilter/ebt_ulog.c
@@ -181,6 +181,7 @@ static void ebt_ulog_packet(struct net *net, unsigned int hooknr,
 	ub->qlen++;
 
 	pm = nlmsg_data(nlh);
+	memset(pm, 0, sizeof(*pm));
 
 	/* Fill in the ulog data */
 	pm->version = EBT_ULOG_VERSION;
@@ -193,8 +194,6 @@ static void ebt_ulog_packet(struct net *net, unsigned int hooknr,
 	pm->hook = hooknr;
 	if (uloginfo->prefix != NULL)
 		strcpy(pm->prefix, uloginfo->prefix);
-	else
-		*(pm->prefix) = '\0';
 
 	if (in) {
 		strcpy(pm->physindev, in->name);
@@ -204,16 +203,14 @@ static void ebt_ulog_packet(struct net *net, unsigned int hooknr,
 			strcpy(pm->indev, br_port_get_rcu(in)->br->dev->name);
 		else
 			strcpy(pm->indev, in->name);
-	} else
-		pm->indev[0] = pm->physindev[0] = '\0';
+	}
 
 	if (out) {
 		/* If out exists, then out is a bridge port */
 		strcpy(pm->physoutdev, out->name);
 		/* rcu_read_lock()ed by nf_hook_slow */
 		strcpy(pm->outdev, br_port_get_rcu(out)->br->dev->name);
-	} else
-		pm->outdev[0] = pm->physoutdev[0] = '\0';
+	}
 
 	if (skb_copy_bits(skb, -ETH_HLEN, pm->data, copy_len) < 0)
 		BUG();
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/3] netfilter: ipt_ULOG: fix info leaks
From: Pablo Neira Ayuso @ 2013-10-23  9:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1382519724-3953-1-git-send-email-pablo@netfilter.org>

From: Mathias Krause <minipli@googlemail.com>

The ulog messages leak heap bytes by the means of padding bytes and
incompletely filled string arrays. Fix those by memset(0)'ing the
whole struct before filling it.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/ipt_ULOG.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_ULOG.c b/net/ipv4/netfilter/ipt_ULOG.c
index cbc2215..9cb993c 100644
--- a/net/ipv4/netfilter/ipt_ULOG.c
+++ b/net/ipv4/netfilter/ipt_ULOG.c
@@ -220,6 +220,7 @@ static void ipt_ulog_packet(struct net *net,
 	ub->qlen++;
 
 	pm = nlmsg_data(nlh);
+	memset(pm, 0, sizeof(*pm));
 
 	/* We might not have a timestamp, get one */
 	if (skb->tstamp.tv64 == 0)
@@ -238,8 +239,6 @@ static void ipt_ulog_packet(struct net *net,
 	}
 	else if (loginfo->prefix[0] != '\0')
 		strncpy(pm->prefix, loginfo->prefix, sizeof(pm->prefix));
-	else
-		*(pm->prefix) = '\0';
 
 	if (in && in->hard_header_len > 0 &&
 	    skb->mac_header != skb->network_header &&
@@ -251,13 +250,9 @@ static void ipt_ulog_packet(struct net *net,
 
 	if (in)
 		strncpy(pm->indev_name, in->name, sizeof(pm->indev_name));
-	else
-		pm->indev_name[0] = '\0';
 
 	if (out)
 		strncpy(pm->outdev_name, out->name, sizeof(pm->outdev_name));
-	else
-		pm->outdev_name[0] = '\0';
 
 	/* copy_len <= skb->len, so can't fail. */
 	if (skb_copy_bits(skb, 0, pm->payload, copy_len) < 0)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 3/3] netfilter: x_tables: fix ordering of jumpstack allocation and table update
From: Pablo Neira Ayuso @ 2013-10-23  9:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1382519724-3953-1-git-send-email-pablo@netfilter.org>

From: Will Deacon <will.deacon@arm.com>

During kernel stability testing on an SMP ARMv7 system, Yalin Wang
reported the following panic from the netfilter code:

  1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454
  [<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c)
  [<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104)
  [<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8)
  [<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c)
  [<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598)
  [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158)
  [<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc)
  [<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c)
  [<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50)
  [<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0)
  [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298)
  [<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8)
  [<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30)
  Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103)
  ---[ end trace da227214a82491bd ]---
  Kernel panic - not syncing: Fatal exception in interrupt

This comes about because CPU1 is executing xt_replace_table in response
to a setsockopt syscall, resulting in:

	ret = xt_jumpstack_alloc(newinfo);
		--> newinfo->jumpstack = kzalloc(size, GFP_KERNEL);

	[...]

	table->private = newinfo;
	newinfo->initial_entries = private->initial_entries;

Meanwhile, CPU0 is handling the network receive path and ends up in
ipt_do_table, resulting in:

	private = table->private;

	[...]

	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];

On weakly ordered memory architectures, the writes to table->private
and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
Furthermore, on architectures which don't respect ordering of address
dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.

This patch adds an smp_wmb() before the assignment to table->private
(which is essentially publishing newinfo) to ensure that all writes to
newinfo will be observed before plugging it into the table structure.
A dependent-read barrier is also added on the consumer sides, to ensure
the same ordering requirements are also respected there.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Tested-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/arp_tables.c |    5 +++++
 net/ipv4/netfilter/ip_tables.c  |    5 +++++
 net/ipv6/netfilter/ip6_tables.c |    5 +++++
 net/netfilter/x_tables.c        |    7 ++++++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 85a4f21..59da7cd 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -271,6 +271,11 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 	local_bh_disable();
 	addend = xt_write_recseq_begin();
 	private = table->private;
+	/*
+	 * Ensure we load private-> members after we've fetched the base
+	 * pointer.
+	 */
+	smp_read_barrier_depends();
 	table_base = private->entries[smp_processor_id()];
 
 	e = get_entry(table_base, private->hook_entry[hook]);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index d23118d..718dfbd 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -327,6 +327,11 @@ ipt_do_table(struct sk_buff *skb,
 	addend = xt_write_recseq_begin();
 	private = table->private;
 	cpu        = smp_processor_id();
+	/*
+	 * Ensure we load private-> members after we've fetched the base
+	 * pointer.
+	 */
+	smp_read_barrier_depends();
 	table_base = private->entries[cpu];
 	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
 	stackptr   = per_cpu_ptr(private->stackptr, cpu);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 44400c2..710238f 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -349,6 +349,11 @@ ip6t_do_table(struct sk_buff *skb,
 	local_bh_disable();
 	addend = xt_write_recseq_begin();
 	private = table->private;
+	/*
+	 * Ensure we load private-> members after we've fetched the base
+	 * pointer.
+	 */
+	smp_read_barrier_depends();
 	cpu        = smp_processor_id();
 	table_base = private->entries[cpu];
 	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8b03028..227aa11 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -845,8 +845,13 @@ xt_replace_table(struct xt_table *table,
 		return NULL;
 	}
 
-	table->private = newinfo;
 	newinfo->initial_entries = private->initial_entries;
+	/*
+	 * Ensure contents of newinfo are visible before assigning to
+	 * private.
+	 */
+	smp_wmb();
+	table->private = newinfo;
 
 	/*
 	 * Even though table entries have now been swapped, other CPU's
-- 
1.7.10.4

^ permalink raw reply related

* Charity Donation
From: Gillian and Adrian Bayford @ 2013-10-23  5:20 UTC (permalink / raw)
  To: Recipients

My wife and i won £148.6 Million Pounds last year, and we have done lot of charity donation, so we decide to give 1.5 Million Pounds each to 5 lucky people, lucky for you, your email, was given to us by Google management as one of our lucky precipitants.

For verification process see below Please read the article - http://www.bbc.co.uk/news/uk-england-19254228

Send Name, Country, Age, Occupation and Phone Number for details

Congratulations & Happy Celebrations in Advance,

Gillian and Adrian Bayford
Email: gillian.adrianbayford01@rogers.com

^ permalink raw reply

* Re: -27% netperf TCP_STREAM regression by "tcp_memcontrol: Kill struct tcp_memcontrol"
From: Eric W. Biederman @ 2013-10-23  9:43 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20131023061019.GA15698@localhost>

Fengguang Wu <fengguang.wu@intel.com> writes:

> On Tue, Oct 22, 2013 at 09:38:10PM -0700, Eric W. Biederman wrote:
>> David Miller <davem@davemloft.net> writes:
>> 
>> > From: fengguang.wu@intel.com
>> > Date: Tue, 22 Oct 2013 22:41:29 +0100
>> >
>> >> We noticed big netperf throughput regressions
>> >> 
>> >>     a4fe34bf902b8f709c63      2e685cad57906e19add7  
>> >> ------------------------  ------------------------  
>> >>                   707.40       -40.7%       419.60  lkp-nex04/micro/netperf/120s-200%-TCP_STREAM
>> >>                  2775.60       -23.7%      2116.40  lkp-sb03/micro/netperf/120s-200%-TCP_STREAM
>> >>                  3483.00       -27.2%      2536.00  TOTAL netperf.Throughput_Mbps
>> >> 
>> >> and bisected it to
>> >> 
>> >> commit 2e685cad57906e19add7189b5ff49dfb6aaa21d3
>> >> Author: Eric W. Biederman <ebiederm@xmission.com>
>> >> Date:   Sat Oct 19 16:26:19 2013 -0700
>> >> 
>> >>     tcp_memcontrol: Kill struct tcp_memcontrol
>> >
>> > Eric please look into this, I'd rather have a fix to apply than revert your
>> > work.
>> 
>> Will do I expect some ordering changed, and that changed the cache line
>> behavior.
>> 
>> If I can't find anything we can revert this one particular patch without
>> affecting anything else, but it would be nice to keep the data structure
>> smaller.
>> 
>> Fengguag what would I need to do to reproduce this?
>
> Eric, attached is the kernel config.
>
> We used these commands in the test:
>
>         netserver
>         netperf -t TCP_STREAM -c -C -l 120      # repeat 64 times and get average
>
> btw, we've got more complete change set (attached) and also noticed
> performance increase in the TCP_SENDFILE case:
>
>     a4fe34bf902b8f709c63      2e685cad57906e19add7
> ------------------------  ------------------------
>                   707.40       -40.7%       419.60  lkp-nex04/micro/netperf/120s-200%-TCP_STREAM
>                  2572.20       -17.7%      2116.20  lkp-sb03/micro/netperf/120s-200%-TCP_MAERTS
>                  2775.60       -23.7%      2116.40  lkp-sb03/micro/netperf/120s-200%-TCP_STREAM
>                  1006.60       -54.4%       459.40  lkp-sbx04/micro/netperf/120s-200%-TCP_STREAM
>                  3278.60       -25.2%      2453.80  lkp-t410/micro/netperf/120s-200%-TCP_MAERTS
>                  1902.80       +21.7%      2315.00  lkp-t410/micro/netperf/120s-200%-TCP_SENDFILE
>                  3345.40       -26.7%      2451.00  lkp-t410/micro/netperf/120s-200%-TCP_STREAM
>                 15588.60       -20.9%     12331.40  TOTAL netperf.Throughput_Mbps

I have a second question.  Do you mount the cgroup filesystem?  Do you
set memory.kmem.tcp.limit_in_bytes?

If you aren't setting any memory cgroup limits or creating any groups
this change should not have had any effect whatsoever.  And you haven't
mentioned it so I don't expect you are enabling the memory cgroup limits
explicitly.

If you have enabled the memory cgroups can you please describe your
configuration as that may play a significant role.

Eric

^ permalink raw reply

* RE: [PATCH 3/3] netfilter: x_tables: fix ordering of jumpstack allocation and table update
From: David Laight @ 2013-10-23  9:45 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1382519724-3953-4-git-send-email-pablo@netfilter.org>

> Subject: [PATCH 3/3] netfilter: x_tables: fix ordering of jumpstack allocation and table update
...
> Meanwhile, CPU0 is handling the network receive path and ends up in
> ipt_do_table, resulting in:
> 
> 	private = table->private;
> 
> 	[...]
> 
> 	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
> 
> On weakly ordered memory architectures, the writes to table->private
> and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
> Furthermore, on architectures which don't respect ordering of address
> dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.

Which reads might be out of order?
AFAICT they are strongly sequenced because they second depends on the
value read by the first.
So I don't see why the read barrier is needed.

I presume the above code is tied to a single cpu.

...
> 
> -	table->private = newinfo;
>  	newinfo->initial_entries = private->initial_entries;
> +	/*
> +	 * Ensure contents of newinfo are visible before assigning to
> +	 * private.
> +	 */
> +	smp_wmb();
> +	table->private = newinfo;

Those writes were in the wrong order on all systems.
Also gcc needs to be told not to reorder the writes even on non-smp
systems (if the code might be pre-empted).
So an asm volatile (:::"memory") is needed there even if no specific
synchronisation instruction is needed.

	David

^ permalink raw reply

* [PATCH 1/1] mac80211:Resetting connection monitor timers in transmit path
From: Dhahira Thesneem @ 2013-10-23 10:12 UTC (permalink / raw)
  To: Johannes Berg, John W. Linville, David S. Miller,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Dhahira Thesneem

Reset connection monitor timers when we are able to successfully transmit data to an AP.

Signed-off-by: Dhahira Thesneem <dhahira.thesneem-EvXpCiN+lbve9wHmmfpqLFaTQe2KTcn/@public.gmane.org>
---
 net/mac80211/tx.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 3456c04..e7725cf 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1700,7 +1700,8 @@ netdev_tx_t ieee80211_monitor_start_xmit(struct sk_buff *skb,
 
 	ieee80211_xmit(sdata, skb, chan->band);
 	rcu_read_unlock();
-
+	/*To reset connection monitor timers, after successful transmission*/
+	ieee80211_sta_reset_conn_monitor(sdata);
 	return NETDEV_TX_OK;
 
 fail_rcu:
@@ -2139,7 +2140,8 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
 
 	ieee80211_xmit(sdata, skb, band);
 	rcu_read_unlock();

^ permalink raw reply related

* Re: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
From: Antonio Quartulli @ 2013-10-23 10:28 UTC (permalink / raw)
  To: David Laight; +Cc: David S. Miller, netdev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B73A1@saturn3.aculab.com>

[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]

On Wed, Oct 23, 2013 at 09:33:49AM +0100, David Laight wrote:
> ...
> > > I can't remember which value you passed as 'offset' (and my mailer makes
> > > it hard to find), but to ease the code changes the offset of the udp data
> > > would make sense.
> > > In that case you still need to pass the source port.
> > 
> > I decided not to pass the source port because if the user is really interested
> > in it, it is still possible to get the udp_hdr from the skb and read its value.
> 
> It just seemed that there was no need to require that the hook re-parse
> the ip header just to find the source port.
> (ok it could assume that the udp header is just before the data)

Also David (M.) pointed out the same. I will keep the port as argument for
rx_hook.

>  
> > > If you do rx_hook(np, source_port, skb, offset) then if anyone manages to
> > > load an old module (or code that casts the assignement to rx_poll)
> > > at least it won't go 'bang'.
> > > Renaming the structure member will guarantee to generate compile errors.
> > 
> > so you suggest to rename rx_hook to something else to warn people about the
> > change?
> 
> Yes.
> 

mh..what about rx_skb_hook ? this way we also make it easy to notice the
difference (both in arguments and behaviour).

> > If we go for the "no udp port" approach they will get an error any way because
> > of the mismatching arguments.
> 
> No - you only get a warning when you assign a function pointer of the wrong type.
> And that is true even if you just change the type of the pointer.
> However code might already have a cast on the function pointer (eg because the
> hook has 'unsigned char *') - so you won't even get a warning.
> You then get an OOPS when the hook tries to read the buffer.
> 
> It is a really bad interface...
> There isn't even a flags/options (etc) word that can be used
> to detect enhancements.
> 


agreed. But I am not sure about what I could do to fix that.

My idea is to use the following API:

rx_skb_hook(struct netpoll *np, int source, struct sk_buff *skb, int len);


Any suggestion or objection?


Regards,


-- 
Antonio Quartulli

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH] netfilter: ipset: remove duplicate define
From: Michael Opdenacker @ 2013-10-23 10:36 UTC (permalink / raw)
  To: pablo, kaber, kadlec, davem
  Cc: netfilter-devel, netfilter, coreteam, netdev, linux-kernel,
	Michael Opdenacker

This patch removes a duplicate define from
net/netfilter/ipset/ip_set_hash_gen.h

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
---
 net/netfilter/ipset/ip_set_hash_gen.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 707bc52..b37a65f 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -243,7 +243,6 @@ hbucket_elem_add(struct hbucket *n, u8 ahash_max, size_t dsize)
 #define mtype_uadt		TOKEN(MTYPE, _uadt)
 #define mtype			MTYPE
 
-#define mtype_elem		TOKEN(MTYPE, _elem)
 #define mtype_add		TOKEN(MTYPE, _add)
 #define mtype_del		TOKEN(MTYPE, _del)
 #define mtype_test_cidrs	TOKEN(MTYPE, _test_cidrs)
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH 1/1] mac80211:Resetting connection monitor timers in transmit path
From: Krishna Chaitanya @ 2013-10-23 10:54 UTC (permalink / raw)
  To: Dhahira Thesneem
  Cc: Johannes Berg, John W. Linville, David S. Miller, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <1382523145-2302-1-git-send-email-dhahira.thesneem@mistralsolutions.com>

On Wed, Oct 23, 2013 at 3:42 PM, Dhahira Thesneem
<dhahira.thesneem@mistralsolutions.com> wrote:
>
> Reset connection monitor timers when we are able to successfully transmit data to an AP.
>
> Signed-off-by: Dhahira Thesneem <dhahira.thesneem@mistralsolutions.com>
> ---
>  net/mac80211/tx.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 3456c04..e7725cf 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -1700,7 +1700,8 @@ netdev_tx_t ieee80211_monitor_start_xmit(struct sk_buff *skb,
>
>         ieee80211_xmit(sdata, skb, chan->band);
>         rcu_read_unlock();
> -
> +       /*To reset connection monitor timers, after successful transmission*/
> +       ieee80211_sta_reset_conn_monitor(sdata);
>         return NETDEV_TX_OK;
>
>  fail_rcu:
> @@ -2139,7 +2140,8 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
>
>         ieee80211_xmit(sdata, skb, band);
>         rcu_read_unlock();
> -
> +       /*To reset connection monitor timers, after successful transmission*/
> +       ieee80211_sta_reset_conn_monitor(sdata);
>         return NETDEV_TX_OK;
>
>   fail_rcu:
> --
Successful data transmission should be checked in the tx_status not
after we transmit.
In fact its already taken care in status.c: through ieee80211_sta_tx_notify.

NACK.

^ permalink raw reply

* [PATCH net-next 1/2] net: make net_get_random_once irqsave
From: Hannes Frederic Sowa @ 2013-10-23 11:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, edumazet

I initial build a non-irqsave version of net_get_random_once because I
would liked to have the freedom to defer even the extraction process of
get_random_bytes until the nonblocking pool is fully seeded.

I don't think this is a good idea anymore and thus this patch makes
net_get_random_once irqsave. Now someone using net_get_random_once does
not need to care from where it is called.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/net.h | 1 -
 net/core/utils.c    | 7 ++++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index aca446b..b292a04 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -250,7 +250,6 @@ bool __net_get_random_once(void *buf, int nbytes, bool *done,
 #define ___NET_RANDOM_STATIC_KEY_INIT STATIC_KEY_INIT_FALSE
 #endif /* HAVE_JUMP_LABEL */
 
-/* BE CAREFUL: this function is not interrupt safe */
 #define net_get_random_once(buf, nbytes)				\
 	({								\
 		bool ___ret = false;					\
diff --git a/net/core/utils.c b/net/core/utils.c
index bf09371..2f737bf 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -370,16 +370,17 @@ bool __net_get_random_once(void *buf, int nbytes, bool *done,
 			   struct static_key *done_key)
 {
 	static DEFINE_SPINLOCK(lock);
+	unsigned long flags;
 
-	spin_lock_bh(&lock);
+	spin_lock_irqsave(&lock, flags);
 	if (*done) {
-		spin_unlock_bh(&lock);
+		spin_unlock_irqrestore(&lock, flags);
 		return false;
 	}
 
 	get_random_bytes(buf, nbytes);
 	*done = true;
-	spin_unlock_bh(&lock);
+	spin_unlock_irqrestore(&lock, flags);
 
 	__net_random_once_disable_jump(done_key);
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 2/2] net: initialize hashrnd in flow_dissector with net_get_random_once
From: Hannes Frederic Sowa @ 2013-10-23 11:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, edumazet

We also can defer the initialization of hashrnd in flow_dissector
to its first use. Since net_get_random_once is irqsave now we don't
have to audit the call paths if one of this functions get called by an
interrupt handler.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/core/flow_dissector.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index f8e25ac..5cac36e 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -184,6 +184,22 @@ ipv6:
 EXPORT_SYMBOL(skb_flow_dissect);
 
 static u32 hashrnd __read_mostly;
+static __always_inline void __flow_hash_secret_init(void)
+{
+	net_get_random_once(&hashrnd, sizeof(hashrnd));
+}
+
+static __always_inline u32 __flow_hash_3words(u32 a, u32 b, u32 c)
+{
+	__flow_hash_secret_init();
+	return jhash_3words(a, b, c, hashrnd);
+}
+
+static __always_inline u32 __flow_hash_1word(u32 a)
+{
+	__flow_hash_secret_init();
+	return jhash_1word(a, hashrnd);
+}
 
 /*
  * __skb_get_rxhash: calculate a flow hash based on src/dst addresses
@@ -210,9 +226,9 @@ void __skb_get_rxhash(struct sk_buff *skb)
 		swap(keys.port16[0], keys.port16[1]);
 	}
 
-	hash = jhash_3words((__force u32)keys.dst,
-			    (__force u32)keys.src,
-			    (__force u32)keys.ports, hashrnd);
+	hash = __flow_hash_3words((__force u32)keys.dst,
+				  (__force u32)keys.src,
+				  (__force u32)keys.ports);
 	if (!hash)
 		hash = 1;
 
@@ -248,7 +264,7 @@ u16 __skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb,
 		hash = skb->sk->sk_hash;
 	else
 		hash = (__force u16) skb->protocol;
-	hash = jhash_1word(hash, hashrnd);
+	hash = __flow_hash_1word(hash);
 
 	return (u16) (((u64) hash * qcount) >> 32) + qoffset;
 }
@@ -340,7 +356,7 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 				else
 					hash = (__force u16) skb->protocol ^
 					    skb->rxhash;
-				hash = jhash_1word(hash, hashrnd);
+				hash = __flow_hash_1word(hash);
 				queue_index = map->queues[
 				    ((u64)hash * map->len) >> 32];
 			}
@@ -395,11 +411,3 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 	skb_set_queue_mapping(skb, queue_index);
 	return netdev_get_tx_queue(dev, queue_index);
 }
-
-static int __init initialize_hashrnd(void)
-{
-	get_random_bytes(&hashrnd, sizeof(hashrnd));
-	return 0;
-}
-
-late_initcall_sync(initialize_hashrnd);
-- 
1.8.3.1

^ permalink raw reply related

* RE: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
From: David Laight @ 2013-10-23 11:18 UTC (permalink / raw)
  To: Antonio Quartulli; +Cc: David S. Miller, netdev
In-Reply-To: <20131023102848.GB1535@neomailbox.net>

> My idea is to use the following API:
> 
> rx_skb_hook(struct netpoll *np, int source, struct sk_buff *skb, int len);
> 
> Any suggestion or objection?

Don't you need to pass the offset of the udp data?

	David

^ permalink raw reply

* RE: [PATCH net v3] be2net: Warn users of possible broken functionality on BE2 cards with very old FW versions with latest driver
From: Somnath Kotur @ 2013-10-23 11:29 UTC (permalink / raw)
  To: Somnath Kotur, netdev@vger.kernel.org; +Cc: davem@davemloft.net
In-Reply-To: <c04a1288-7c23-499a-ae32-d4334fd39065@CMEXHTCAS1.ad.emulex.com>

Pls ignore this, sent this version out by mistake , will resend the correct one soon after this, extremely sorry for the inconvenience.

Thanks
Somnath

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Somnath Kotur
> Sent: Wednesday, October 23, 2013 1:49 PM
> To: netdev@vger.kernel.org
> Cc: davem@davemloft.net; Somnath Kotur
> Subject: [PATCH net v3] be2net: Warn users of possible broken functionality
> on BE2 cards with very old FW versions with latest driver
> 
> On very old FW versions < 4.0, the mailbox command to set interrupts on the
> card succeeds even though it is not supported and should have failed,
> leading to a scenario where interrupts do not work.
> Hence warn users to upgrade to a suitable FW version to avoid seeing broken
> functionality.
> 
> Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
> ---
> v3: Incorporated comments from Ben Hutchings and Joe Perches
> 
>  drivers/net/ethernet/emulex/benet/be.h      |   14 ++++++++++++++
>  drivers/net/ethernet/emulex/benet/be_main.c |    6 ++++++
>  2 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/ethernet/emulex/benet/be.h
> b/drivers/net/ethernet/emulex/benet/be.h
> index db02023..6a57051 100644
> --- a/drivers/net/ethernet/emulex/benet/be.h
> +++ b/drivers/net/ethernet/emulex/benet/be.h
> @@ -696,6 +696,20 @@ static inline int qnq_async_evt_rcvd(struct
> be_adapter *adapter)
>  	return adapter->flags & BE_FLAGS_QNQ_ASYNC_EVT_RCVD;  }
> 
> +static inline u32 fw_major_num(char *fw_ver_str) {
> +	u32 fw_major;
> +	char *next, *cp;
> +	char tmp_fw_ver[FW_VER_LEN];
> +
> +	strncpy(tmp_fw_ver, fw_ver_str, strlen(fw_ver_str));
> +	next = tmp_fw_ver;
> +	cp = strsep(&next, ".");
> +	sscanf(cp, "%i", &fw_major);
> +
> +	return fw_major;
> +}
> +
>  extern void be_cq_notify(struct be_adapter *adapter, u16 qid, bool arm,
>  		u16 num_popped);
>  extern void be_link_status_update(struct be_adapter *adapter, u8
> link_status); diff --git a/drivers/net/ethernet/emulex/benet/be_main.c
> b/drivers/net/ethernet/emulex/benet/be_main.c
> index 2c38cc4..d8da961 100644
> --- a/drivers/net/ethernet/emulex/benet/be_main.c
> +++ b/drivers/net/ethernet/emulex/benet/be_main.c
> @@ -3247,6 +3247,12 @@ static int be_setup(struct be_adapter *adapter)
> 
>  	be_cmd_get_fw_ver(adapter, adapter->fw_ver, adapter-
> >fw_on_flash);
> 
> +	if (BE2_chip(adapter) && fw_major_num(adapter->fw_ver) < 4) {
> +		dev_err(dev, "Firmware on card is old(%s), IRQs may not
> work.",
> +			adapter->fw_ver);
> +		dev_err(dev, "Please upgrade firmware to version >=
> 4.0\n");
> +	}
> +
>  	if (adapter->vlans_added)
>  		be_vid_config(adapter);
> 
> --
> 1.6.0.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net v3] be2net: Warn users of possible broken functionality on BE2 cards with very old FW versions with latest driver
From: Somnath Kotur @ 2013-10-23 11:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, Somnath Kotur

On very old FW versions < 4.0, the mailbox command to set interrupts
on the card succeeds even though it is not supported and should have
failed, leading to a scenario where interrupts do not work.
Hence warn users to upgrade to a suitable FW version to avoid seeing
broken functionality.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
v3: Incorporated comments from Ben Hutchings, Joe Perches and Ivan Vercera

 drivers/net/ethernet/emulex/benet/be.h      |    9 +++++++++
 drivers/net/ethernet/emulex/benet/be_main.c |    6 ++++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index db02023..da9c04b 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -696,6 +696,15 @@ static inline int qnq_async_evt_rcvd(struct be_adapter *adapter)
 	return adapter->flags & BE_FLAGS_QNQ_ASYNC_EVT_RCVD;
 }
 
+static inline u32 fw_major_num(const char *fw_ver)
+{
+	int fw_major = 0;
+
+	sscanf(fw_ver, "%d.", &fw_major);
+
+	return fw_major;
+}
+
 extern void be_cq_notify(struct be_adapter *adapter, u16 qid, bool arm,
 		u16 num_popped);
 extern void be_link_status_update(struct be_adapter *adapter, u8 link_status);
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 2c38cc4..53ed58b 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3247,6 +3247,12 @@ static int be_setup(struct be_adapter *adapter)
 
 	be_cmd_get_fw_ver(adapter, adapter->fw_ver, adapter->fw_on_flash);
 
+	if (BE2_chip(adapter) && fw_major_num(adapter->fw_ver) < 4) {
+		dev_err(dev, "Firmware on card is old(%s), IRQs may not work",
+			adapter->fw_ver);
+		dev_err(dev, "Please upgrade firmware to version >= 4.0\n");
+	}
+
 	if (adapter->vlans_added)
 		be_vid_config(adapter);
 
-- 
1.6.0.2

^ permalink raw reply related

* Re: Big performance loss from 3.4.63 to 3.10.13 when routing ipv4
From: Wolfgang Walter @ 2013-10-23 11:33 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, hannes, netdev, klassert
In-Reply-To: <20131023081255.GB10148@secunet.com>

Am Mittwoch, 23. Oktober 2013, 10:12:55 schrieb Steffen Klassert:
> On Tue, Oct 22, 2013 at 03:46:38PM -0400, David Miller wrote:
> > From: Wolfgang Walter <linux@stwm.de>
> > Date: Tue, 22 Oct 2013 21:07:41 +0200
> > 
> > > Am Mittwoch, 2. Oktober 2013, 00:20:02 schrieb Hannes Frederic Sowa:
> > >> On Tue, Oct 01, 2013 at 06:39:32PM +0200, Wolfgang Walter wrote:
> > >> > All network traffic over the router become slow and sluggish. If one
> > >> > pings
> > >> > the router there is a packet loss. After about 2 minutes the traffic
> > >> > completely stalls for about 1 minute. Then it works again as in the
> > >> > beginning to then stall again. And so on.
> > >> 
> > >> Maybe dropwatch can give a first hint?
> > > 
> > > I finally found the problem:
> > > 
> > > In 3.10.x and 3.11.x the value of /proc/sys/net/ipv4/xfrm4_gc_thresh is
> > > 1024.
> > > 
> > > It is much higher in 3.4.x. If I increase this value in 3.10.x to the
> > > one I
> > > see on 3.4.x all works fine with 3.10.x
> > 
> > Steffen, here is yet another report about this issue.
> > 
> > I think we should resolve this soon, even bumping it to 2048 or 4096
> > and leaving it at that would be I think acceptable.
> 
> Yes, of course. Let's use 4096 as the default for ipv4 and ipv6.
> I'll take care of it next week.
> 

I don't know what this value actually means. But on 3.4.x it is much higher. 
On a machine with 512MB ram it is 32768, on a machine with 1GB ram it is 
262144 and with 16GB ram it is 4194304.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
From: Neil Horman @ 2013-10-23 11:34 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, Felix Fietkau,
	John Crispin, Jonas Gorski, Gary Thomas, Jamal Hadi Salim
In-Reply-To: <CAGVrzcaXFboRDn40+VciTes09w-jqTASNZz2GZNpTxbaV6D0Lw@mail.gmail.com>

On Tue, Oct 22, 2013 at 03:09:32PM -0700, Florian Fainelli wrote:
> 2013/10/22 Neil Horman <nhorman@tuxdriver.com>:
> > On Tue, Oct 22, 2013 at 12:59:12PM -0700, Florian Fainelli wrote:
> >> 2013/10/22 John Fastabend <john.r.fastabend@intel.com>:
> >> > On 10/22/2013 11:23 AM, Florian Fainelli wrote:
> >> >>
> >> >> This patch adds an Ethernet Switch generic netlink configuration API
> >> >> which allows for doing the required configuration of managed Ethernet
> >> >> switches commonly found in Wireless/Cable/DSL routers in the market.
> >> >>
> >> >> Since this API is based on the Generic Netlink infrastructure it is very
> >> >> easy to extend a particular switch driver to support additional features
> >> >> and to adapt it to specific switches.
> >> >>
> >> >
> >> >> So far the API includes support for:
> >> >>
> >> >> - getting/setting a port VLAN id
> >> >> - getting/setting VLAN port membership
> >> >> - getting a port link status
> >> >> - getting a port statistics counters
> >> >> - resetting a switch device
> >> >> - applying a configuration to a switch device
> >> >>
> >> >
> >> > Did you consider exposing each physical switch port as a netdevice on
> >> > the host? I would assume your switch driver could do this.
> >> >
> >> > Then you can drop the port specific attributes (link status, stats, etc)
> >> > and use existing interfaces. The win being my tools work equally well on
> >> > your real switch as they do on my software switch. Also by exposing net
> >> > devices you provide a mechanism to send packets over the port and trap
> >> > control packets.
> >>
> >> Well this is exactly what DSA does and which I do not like because it
> >> is completely overkill for most switches out there which are using
> >> 802.1q tags and do not prepend/append proprietary tags for internal
> >> traffic classification.
> >>
> >> >
> >> > Next instead of creating a switch specific netlink API could you use
> >> > the existing FDB API? Again what I would like is for my existing
> >> > applications to run on the switch without having to rewrite them. For
> >> > example it would be great to have 'bridge fdb show dev myswitch' report
> >> > the correct tables for both the Sw bridge, a real switch bridge, and
> >> > for the embedded SR-IOV bridge case.
> >>
> >> Ok, I know nothing about the FDB API, but will take a look and see if
> >> that sounds suitable for the embedded use cases.
> >>
> > Further to Johns comments, why are you creating a new netlink protocol for this?
> > It seems that 90% of what you want to accomplish above is handled by rtnetlink.
> > As long as you write your driver properly, most of that should "just work".
> 
> This is not a new netlink protocol, but a generic netlink family. Why
Thats hair splitting.  The point I'm making here is that you're creating a new
communication path from user space to the kernel to do something that we already
have a communication path to do.

> would I extend rtnetlink to cover the remaining 10% which are not
> going to be used on desktop and servers when a new generic netlink
> family is cheap and can be selectively disabled in the kernel?
90% of it is already done on servers and desktops using rtnetlink (thats my
point), and you can reasonably add the other 10% (I think), if you just expose
the switch ports as their own ethernet interfaces.  You say DSA is overkill, but
if you just add the other switch ports as their own ethernet interfaces, you
would get most of the above work for free, which seems to me like less overkill
than a new netlink family and userspace tools. 

Regards
Neil

^ permalink raw reply

* Re: [PATCH net] net: sctp: fix ASCONF to allow non SCTP_ADDR_SRC addresses in ipv6
From: Neil Horman @ 2013-10-23 11:36 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, linux-sctp, Michio Honda
In-Reply-To: <1382459696-1732-1-git-send-email-dborkman@redhat.com>

On Tue, Oct 22, 2013 at 06:34:56PM +0200, Daniel Borkmann wrote:
> Commit 8a07eb0a50 ("sctp: Add ASCONF operation on the single-homed host")
> implemented possible use of IPv4 addresses with non SCTP_ADDR_SRC state
> as source address when sending ASCONF (ADD) packets, but IPv6 part for
> that was not implemented in 8a07eb0a50. Therefore, as this is not restricted
> to IPv4-only, fix this up to allow the same for IPv6 addresses in SCTP.
> 
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Michio Honda <micchie@sfc.wide.ad.jp>
> ---
>  net/sctp/ipv6.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index e7b2d4f..96a5591 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -279,7 +279,9 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
>  		sctp_v6_to_addr(&dst_saddr, &fl6->saddr, htons(bp->port));
>  		rcu_read_lock();
>  		list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> -			if (!laddr->valid || (laddr->state != SCTP_ADDR_SRC))
> +			if (!laddr->valid || laddr->state == SCTP_ADDR_DEL ||
> +			    (laddr->state != SCTP_ADDR_SRC &&
> +			     !asoc->src_out_of_asoc_ok))
>  				continue;
>  
>  			/* Do not compare against v4 addrs */
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* Re: -27% netperf TCP_STREAM regression by "tcp_memcontrol: Kill struct tcp_memcontrol"
From: Fengguang Wu @ 2013-10-23 11:46 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <87a9i0l3v1.fsf@xmission.com>

On Wed, Oct 23, 2013 at 02:43:14AM -0700, Eric W. Biederman wrote:
> Fengguang Wu <fengguang.wu@intel.com> writes:
> 
> > On Tue, Oct 22, 2013 at 09:38:10PM -0700, Eric W. Biederman wrote:
> >> David Miller <davem@davemloft.net> writes:
> >> 
> >> > From: fengguang.wu@intel.com
> >> > Date: Tue, 22 Oct 2013 22:41:29 +0100
> >> >
> >> >> We noticed big netperf throughput regressions
> >> >> 
> >> >>     a4fe34bf902b8f709c63      2e685cad57906e19add7  
> >> >> ------------------------  ------------------------  
> >> >>                   707.40       -40.7%       419.60  lkp-nex04/micro/netperf/120s-200%-TCP_STREAM
> >> >>                  2775.60       -23.7%      2116.40  lkp-sb03/micro/netperf/120s-200%-TCP_STREAM
> >> >>                  3483.00       -27.2%      2536.00  TOTAL netperf.Throughput_Mbps
> >> >> 
> >> >> and bisected it to
> >> >> 
> >> >> commit 2e685cad57906e19add7189b5ff49dfb6aaa21d3
> >> >> Author: Eric W. Biederman <ebiederm@xmission.com>
> >> >> Date:   Sat Oct 19 16:26:19 2013 -0700
> >> >> 
> >> >>     tcp_memcontrol: Kill struct tcp_memcontrol
> >> >
> >> > Eric please look into this, I'd rather have a fix to apply than revert your
> >> > work.
> >> 
> >> Will do I expect some ordering changed, and that changed the cache line
> >> behavior.
> >> 
> >> If I can't find anything we can revert this one particular patch without
> >> affecting anything else, but it would be nice to keep the data structure
> >> smaller.
> >> 
> >> Fengguag what would I need to do to reproduce this?
> >
> > Eric, attached is the kernel config.
> >
> > We used these commands in the test:
> >
> >         netserver
> >         netperf -t TCP_STREAM -c -C -l 120      # repeat 64 times and get average

Sorry it's not about repeating, but running 64 netperf in parallel.
The number 64 is 2 times the number of logical CPUs.

> > btw, we've got more complete change set (attached) and also noticed
> > performance increase in the TCP_SENDFILE case:
> >
> >     a4fe34bf902b8f709c63      2e685cad57906e19add7
> > ------------------------  ------------------------
> >                   707.40       -40.7%       419.60  lkp-nex04/micro/netperf/120s-200%-TCP_STREAM
> >                  2572.20       -17.7%      2116.20  lkp-sb03/micro/netperf/120s-200%-TCP_MAERTS
> >                  2775.60       -23.7%      2116.40  lkp-sb03/micro/netperf/120s-200%-TCP_STREAM
> >                  1006.60       -54.4%       459.40  lkp-sbx04/micro/netperf/120s-200%-TCP_STREAM
> >                  3278.60       -25.2%      2453.80  lkp-t410/micro/netperf/120s-200%-TCP_MAERTS
> >                  1902.80       +21.7%      2315.00  lkp-t410/micro/netperf/120s-200%-TCP_SENDFILE
> >                  3345.40       -26.7%      2451.00  lkp-t410/micro/netperf/120s-200%-TCP_STREAM
> >                 15588.60       -20.9%     12331.40  TOTAL netperf.Throughput_Mbps
> 
> I have a second question.  Do you mount the cgroup filesystem?  Do you
> set memory.kmem.tcp.limit_in_bytes?

No I didn't mount cgroup at all.

> If you aren't setting any memory cgroup limits or creating any groups
> this change should not have had any effect whatsoever.  And you haven't
> mentioned it so I don't expect you are enabling the memory cgroup limits
> explicitly.
> 
> If you have enabled the memory cgroups can you please describe your
> configuration as that may play a significant role.
> 
> Eric

^ permalink raw reply

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
From: Jamal Hadi Salim @ 2013-10-23 11:47 UTC (permalink / raw)
  To: Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, Felix Fietkau,
	John Crispin, Jonas Gorski, Gary Thomas, Vlad Yasevich,
	Stephen Hemminger
In-Reply-To: <CAGVrzcaXFboRDn40+VciTes09w-jqTASNZz2GZNpTxbaV6D0Lw@mail.gmail.com>

On 10/22/13 18:09, Florian Fainelli wrote:
> 2013/10/22 Neil Horman <nhorman@tuxdriver.com>:
>> On Tue, Oct 22, 2013 at 12:59:12PM -0700, Florian Fainelli wrote:
>>> 2013/10/22 John Fastabend <john.r.fastabend@intel.com>:
>>>> On 10/22/2013 11:23 AM, Florian Fainelli wrote:
>>>>>
>
>>>
>>> Ok, I know nothing about the FDB API, but will take a look and see if
>>> that sounds suitable for the embedded use cases.
>>>
>> Further to Johns comments, why are you creating a new netlink protocol for this?
>> It seems that 90% of what you want to accomplish above is handled by rtnetlink.
>> As long as you write your driver properly, most of that should "just work".
>
> This is not a new netlink protocol, but a generic netlink family. Why
> would I extend rtnetlink to cover the remaining 10% which are not
> going to be used on desktop and servers when a new generic netlink
> family is cheap and can be selectively disabled in the kernel?
>

Florian,

I think it would be fantastic if you adopt the FDB API. The comment
to use rtnetlink configure is valid. You can configure hardware
switches as John has shown. I realize you guys have invested
tons of time and this stuff has been tested by tons of people and this
is a painful exercise to go through, but:
having more than one approach for configuring/controlling kernel
switch interfaces is not ideal. If you use the rtnetlink API then one
can configure both the Linux bridge, embedded intel switches, etc with
iproute2. i.e the switch becomes a bridge. I see a lot of commonolity
between your model based on what you described and the current bridge.
Pull the latest iproute2 code and look at "bridge" command.

Essentially, the current bridged could be described as an entity
that does L2 switching:
a) it has bridge ports which are any netdevs on Linux
b) it has an FDB which constitutes a MAC address as the lookup and 
optionally a VLAN. You can control learning and flooding.
c) it has vlan filtering capabilities which you can turn on/off. The
vlan capability to sellect PVIDs is also built in.
d) It has multicast snooping

I think your model needs #a and #b, you can ignore the rest.
I am not quiet sure how vlan port membership will apply; an fdb for
each entry will have a vlan. You could also create one bridge per vlan
(not the best  approach) - ccing Vlad and Stephen.

cheers,
jamal

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox