Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 1/2] net: make net_get_random_once irqsave
From: Hannes Frederic Sowa @ 2013-10-23 11:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, edumazet

I initial build a non-irqsave version of net_get_random_once because I
would liked to have the freedom to defer even the extraction process of
get_random_bytes until the nonblocking pool is fully seeded.

I don't think this is a good idea anymore and thus this patch makes
net_get_random_once irqsave. Now someone using net_get_random_once does
not need to care from where it is called.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/net.h | 1 -
 net/core/utils.c    | 7 ++++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index aca446b..b292a04 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -250,7 +250,6 @@ bool __net_get_random_once(void *buf, int nbytes, bool *done,
 #define ___NET_RANDOM_STATIC_KEY_INIT STATIC_KEY_INIT_FALSE
 #endif /* HAVE_JUMP_LABEL */
 
-/* BE CAREFUL: this function is not interrupt safe */
 #define net_get_random_once(buf, nbytes)				\
 	({								\
 		bool ___ret = false;					\
diff --git a/net/core/utils.c b/net/core/utils.c
index bf09371..2f737bf 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -370,16 +370,17 @@ bool __net_get_random_once(void *buf, int nbytes, bool *done,
 			   struct static_key *done_key)
 {
 	static DEFINE_SPINLOCK(lock);
+	unsigned long flags;
 
-	spin_lock_bh(&lock);
+	spin_lock_irqsave(&lock, flags);
 	if (*done) {
-		spin_unlock_bh(&lock);
+		spin_unlock_irqrestore(&lock, flags);
 		return false;
 	}
 
 	get_random_bytes(buf, nbytes);
 	*done = true;
-	spin_unlock_bh(&lock);
+	spin_unlock_irqrestore(&lock, flags);
 
 	__net_random_once_disable_jump(done_key);
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 1/1] mac80211:Resetting connection monitor timers in transmit path
From: Krishna Chaitanya @ 2013-10-23 10:54 UTC (permalink / raw)
  To: Dhahira Thesneem
  Cc: Johannes Berg, John W. Linville, David S. Miller, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <1382523145-2302-1-git-send-email-dhahira.thesneem@mistralsolutions.com>

On Wed, Oct 23, 2013 at 3:42 PM, Dhahira Thesneem
<dhahira.thesneem@mistralsolutions.com> wrote:
>
> Reset connection monitor timers when we are able to successfully transmit data to an AP.
>
> Signed-off-by: Dhahira Thesneem <dhahira.thesneem@mistralsolutions.com>
> ---
>  net/mac80211/tx.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 3456c04..e7725cf 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -1700,7 +1700,8 @@ netdev_tx_t ieee80211_monitor_start_xmit(struct sk_buff *skb,
>
>         ieee80211_xmit(sdata, skb, chan->band);
>         rcu_read_unlock();
> -
> +       /*To reset connection monitor timers, after successful transmission*/
> +       ieee80211_sta_reset_conn_monitor(sdata);
>         return NETDEV_TX_OK;
>
>  fail_rcu:
> @@ -2139,7 +2140,8 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
>
>         ieee80211_xmit(sdata, skb, band);
>         rcu_read_unlock();
> -
> +       /*To reset connection monitor timers, after successful transmission*/
> +       ieee80211_sta_reset_conn_monitor(sdata);
>         return NETDEV_TX_OK;
>
>   fail_rcu:
> --
Successful data transmission should be checked in the tx_status not
after we transmit.
In fact its already taken care in status.c: through ieee80211_sta_tx_notify.

NACK.

^ permalink raw reply

* [PATCH] netfilter: ipset: remove duplicate define
From: Michael Opdenacker @ 2013-10-23 10:36 UTC (permalink / raw)
  To: pablo, kaber, kadlec, davem
  Cc: netfilter-devel, netfilter, coreteam, netdev, linux-kernel,
	Michael Opdenacker

This patch removes a duplicate define from
net/netfilter/ipset/ip_set_hash_gen.h

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
---
 net/netfilter/ipset/ip_set_hash_gen.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 707bc52..b37a65f 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -243,7 +243,6 @@ hbucket_elem_add(struct hbucket *n, u8 ahash_max, size_t dsize)
 #define mtype_uadt		TOKEN(MTYPE, _uadt)
 #define mtype			MTYPE
 
-#define mtype_elem		TOKEN(MTYPE, _elem)
 #define mtype_add		TOKEN(MTYPE, _add)
 #define mtype_del		TOKEN(MTYPE, _del)
 #define mtype_test_cidrs	TOKEN(MTYPE, _test_cidrs)
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
From: Antonio Quartulli @ 2013-10-23 10:28 UTC (permalink / raw)
  To: David Laight; +Cc: David S. Miller, netdev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B73A1@saturn3.aculab.com>

[-- Attachment #1: Type: text/plain, Size: 2175 bytes --]

On Wed, Oct 23, 2013 at 09:33:49AM +0100, David Laight wrote:
> ...
> > > I can't remember which value you passed as 'offset' (and my mailer makes
> > > it hard to find), but to ease the code changes the offset of the udp data
> > > would make sense.
> > > In that case you still need to pass the source port.
> > 
> > I decided not to pass the source port because if the user is really interested
> > in it, it is still possible to get the udp_hdr from the skb and read its value.
> 
> It just seemed that there was no need to require that the hook re-parse
> the ip header just to find the source port.
> (ok it could assume that the udp header is just before the data)

Also David (M.) pointed out the same. I will keep the port as argument for
rx_hook.

>  
> > > If you do rx_hook(np, source_port, skb, offset) then if anyone manages to
> > > load an old module (or code that casts the assignement to rx_poll)
> > > at least it won't go 'bang'.
> > > Renaming the structure member will guarantee to generate compile errors.
> > 
> > so you suggest to rename rx_hook to something else to warn people about the
> > change?
> 
> Yes.
> 

mh..what about rx_skb_hook ? this way we also make it easy to notice the
difference (both in arguments and behaviour).

> > If we go for the "no udp port" approach they will get an error any way because
> > of the mismatching arguments.
> 
> No - you only get a warning when you assign a function pointer of the wrong type.
> And that is true even if you just change the type of the pointer.
> However code might already have a cast on the function pointer (eg because the
> hook has 'unsigned char *') - so you won't even get a warning.
> You then get an OOPS when the hook tries to read the buffer.
> 
> It is a really bad interface...
> There isn't even a flags/options (etc) word that can be used
> to detect enhancements.
> 


agreed. But I am not sure about what I could do to fix that.

My idea is to use the following API:

rx_skb_hook(struct netpoll *np, int source, struct sk_buff *skb, int len);


Any suggestion or objection?


Regards,


-- 
Antonio Quartulli

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH 1/1] mac80211:Resetting connection monitor timers in transmit path
From: Dhahira Thesneem @ 2013-10-23 10:12 UTC (permalink / raw)
  To: Johannes Berg, John W. Linville, David S. Miller,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Dhahira Thesneem

Reset connection monitor timers when we are able to successfully transmit data to an AP.

Signed-off-by: Dhahira Thesneem <dhahira.thesneem-EvXpCiN+lbve9wHmmfpqLFaTQe2KTcn/@public.gmane.org>
---
 net/mac80211/tx.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 3456c04..e7725cf 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1700,7 +1700,8 @@ netdev_tx_t ieee80211_monitor_start_xmit(struct sk_buff *skb,
 
 	ieee80211_xmit(sdata, skb, chan->band);
 	rcu_read_unlock();
-
+	/*To reset connection monitor timers, after successful transmission*/
+	ieee80211_sta_reset_conn_monitor(sdata);
 	return NETDEV_TX_OK;
 
 fail_rcu:
@@ -2139,7 +2140,8 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
 
 	ieee80211_xmit(sdata, skb, band);
 	rcu_read_unlock();

^ permalink raw reply related

* RE: [PATCH 3/3] netfilter: x_tables: fix ordering of jumpstack allocation and table update
From: David Laight @ 2013-10-23  9:45 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1382519724-3953-4-git-send-email-pablo@netfilter.org>

> Subject: [PATCH 3/3] netfilter: x_tables: fix ordering of jumpstack allocation and table update
...
> Meanwhile, CPU0 is handling the network receive path and ends up in
> ipt_do_table, resulting in:
> 
> 	private = table->private;
> 
> 	[...]
> 
> 	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
> 
> On weakly ordered memory architectures, the writes to table->private
> and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
> Furthermore, on architectures which don't respect ordering of address
> dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.

Which reads might be out of order?
AFAICT they are strongly sequenced because they second depends on the
value read by the first.
So I don't see why the read barrier is needed.

I presume the above code is tied to a single cpu.

...
> 
> -	table->private = newinfo;
>  	newinfo->initial_entries = private->initial_entries;
> +	/*
> +	 * Ensure contents of newinfo are visible before assigning to
> +	 * private.
> +	 */
> +	smp_wmb();
> +	table->private = newinfo;

Those writes were in the wrong order on all systems.
Also gcc needs to be told not to reorder the writes even on non-smp
systems (if the code might be pre-empted).
So an asm volatile (:::"memory") is needed there even if no specific
synchronisation instruction is needed.

	David

^ permalink raw reply

* Re: -27% netperf TCP_STREAM regression by "tcp_memcontrol: Kill struct tcp_memcontrol"
From: Eric W. Biederman @ 2013-10-23  9:43 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20131023061019.GA15698@localhost>

Fengguang Wu <fengguang.wu@intel.com> writes:

> On Tue, Oct 22, 2013 at 09:38:10PM -0700, Eric W. Biederman wrote:
>> David Miller <davem@davemloft.net> writes:
>> 
>> > From: fengguang.wu@intel.com
>> > Date: Tue, 22 Oct 2013 22:41:29 +0100
>> >
>> >> We noticed big netperf throughput regressions
>> >> 
>> >>     a4fe34bf902b8f709c63      2e685cad57906e19add7  
>> >> ------------------------  ------------------------  
>> >>                   707.40       -40.7%       419.60  lkp-nex04/micro/netperf/120s-200%-TCP_STREAM
>> >>                  2775.60       -23.7%      2116.40  lkp-sb03/micro/netperf/120s-200%-TCP_STREAM
>> >>                  3483.00       -27.2%      2536.00  TOTAL netperf.Throughput_Mbps
>> >> 
>> >> and bisected it to
>> >> 
>> >> commit 2e685cad57906e19add7189b5ff49dfb6aaa21d3
>> >> Author: Eric W. Biederman <ebiederm@xmission.com>
>> >> Date:   Sat Oct 19 16:26:19 2013 -0700
>> >> 
>> >>     tcp_memcontrol: Kill struct tcp_memcontrol
>> >
>> > Eric please look into this, I'd rather have a fix to apply than revert your
>> > work.
>> 
>> Will do I expect some ordering changed, and that changed the cache line
>> behavior.
>> 
>> If I can't find anything we can revert this one particular patch without
>> affecting anything else, but it would be nice to keep the data structure
>> smaller.
>> 
>> Fengguag what would I need to do to reproduce this?
>
> Eric, attached is the kernel config.
>
> We used these commands in the test:
>
>         netserver
>         netperf -t TCP_STREAM -c -C -l 120      # repeat 64 times and get average
>
> btw, we've got more complete change set (attached) and also noticed
> performance increase in the TCP_SENDFILE case:
>
>     a4fe34bf902b8f709c63      2e685cad57906e19add7
> ------------------------  ------------------------
>                   707.40       -40.7%       419.60  lkp-nex04/micro/netperf/120s-200%-TCP_STREAM
>                  2572.20       -17.7%      2116.20  lkp-sb03/micro/netperf/120s-200%-TCP_MAERTS
>                  2775.60       -23.7%      2116.40  lkp-sb03/micro/netperf/120s-200%-TCP_STREAM
>                  1006.60       -54.4%       459.40  lkp-sbx04/micro/netperf/120s-200%-TCP_STREAM
>                  3278.60       -25.2%      2453.80  lkp-t410/micro/netperf/120s-200%-TCP_MAERTS
>                  1902.80       +21.7%      2315.00  lkp-t410/micro/netperf/120s-200%-TCP_SENDFILE
>                  3345.40       -26.7%      2451.00  lkp-t410/micro/netperf/120s-200%-TCP_STREAM
>                 15588.60       -20.9%     12331.40  TOTAL netperf.Throughput_Mbps

I have a second question.  Do you mount the cgroup filesystem?  Do you
set memory.kmem.tcp.limit_in_bytes?

If you aren't setting any memory cgroup limits or creating any groups
this change should not have had any effect whatsoever.  And you haven't
mentioned it so I don't expect you are enabling the memory cgroup limits
explicitly.

If you have enabled the memory cgroups can you please describe your
configuration as that may play a significant role.

Eric

^ permalink raw reply

* Charity Donation
From: Gillian and Adrian Bayford @ 2013-10-23  5:20 UTC (permalink / raw)
  To: Recipients

My wife and i won £148.6 Million Pounds last year, and we have done lot of charity donation, so we decide to give 1.5 Million Pounds each to 5 lucky people, lucky for you, your email, was given to us by Google management as one of our lucky precipitants.

For verification process see below Please read the article - http://www.bbc.co.uk/news/uk-england-19254228

Send Name, Country, Age, Occupation and Phone Number for details

Congratulations & Happy Celebrations in Advance,

Gillian and Adrian Bayford
Email: gillian.adrianbayford01@rogers.com

^ permalink raw reply

* [PATCH 3/3] netfilter: x_tables: fix ordering of jumpstack allocation and table update
From: Pablo Neira Ayuso @ 2013-10-23  9:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1382519724-3953-1-git-send-email-pablo@netfilter.org>

From: Will Deacon <will.deacon@arm.com>

During kernel stability testing on an SMP ARMv7 system, Yalin Wang
reported the following panic from the netfilter code:

  1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454
  [<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c)
  [<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104)
  [<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8)
  [<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c)
  [<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598)
  [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158)
  [<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc)
  [<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c)
  [<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50)
  [<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0)
  [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298)
  [<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8)
  [<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30)
  Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103)
  ---[ end trace da227214a82491bd ]---
  Kernel panic - not syncing: Fatal exception in interrupt

This comes about because CPU1 is executing xt_replace_table in response
to a setsockopt syscall, resulting in:

	ret = xt_jumpstack_alloc(newinfo);
		--> newinfo->jumpstack = kzalloc(size, GFP_KERNEL);

	[...]

	table->private = newinfo;
	newinfo->initial_entries = private->initial_entries;

Meanwhile, CPU0 is handling the network receive path and ends up in
ipt_do_table, resulting in:

	private = table->private;

	[...]

	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];

On weakly ordered memory architectures, the writes to table->private
and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
Furthermore, on architectures which don't respect ordering of address
dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.

This patch adds an smp_wmb() before the assignment to table->private
(which is essentially publishing newinfo) to ensure that all writes to
newinfo will be observed before plugging it into the table structure.
A dependent-read barrier is also added on the consumer sides, to ensure
the same ordering requirements are also respected there.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Tested-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/arp_tables.c |    5 +++++
 net/ipv4/netfilter/ip_tables.c  |    5 +++++
 net/ipv6/netfilter/ip6_tables.c |    5 +++++
 net/netfilter/x_tables.c        |    7 ++++++-
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 85a4f21..59da7cd 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -271,6 +271,11 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 	local_bh_disable();
 	addend = xt_write_recseq_begin();
 	private = table->private;
+	/*
+	 * Ensure we load private-> members after we've fetched the base
+	 * pointer.
+	 */
+	smp_read_barrier_depends();
 	table_base = private->entries[smp_processor_id()];
 
 	e = get_entry(table_base, private->hook_entry[hook]);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index d23118d..718dfbd 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -327,6 +327,11 @@ ipt_do_table(struct sk_buff *skb,
 	addend = xt_write_recseq_begin();
 	private = table->private;
 	cpu        = smp_processor_id();
+	/*
+	 * Ensure we load private-> members after we've fetched the base
+	 * pointer.
+	 */
+	smp_read_barrier_depends();
 	table_base = private->entries[cpu];
 	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
 	stackptr   = per_cpu_ptr(private->stackptr, cpu);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 44400c2..710238f 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -349,6 +349,11 @@ ip6t_do_table(struct sk_buff *skb,
 	local_bh_disable();
 	addend = xt_write_recseq_begin();
 	private = table->private;
+	/*
+	 * Ensure we load private-> members after we've fetched the base
+	 * pointer.
+	 */
+	smp_read_barrier_depends();
 	cpu        = smp_processor_id();
 	table_base = private->entries[cpu];
 	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8b03028..227aa11 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -845,8 +845,13 @@ xt_replace_table(struct xt_table *table,
 		return NULL;
 	}
 
-	table->private = newinfo;
 	newinfo->initial_entries = private->initial_entries;
+	/*
+	 * Ensure contents of newinfo are visible before assigning to
+	 * private.
+	 */
+	smp_wmb();
+	table->private = newinfo;
 
 	/*
 	 * Even though table entries have now been swapped, other CPU's
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/3] netfilter: ipt_ULOG: fix info leaks
From: Pablo Neira Ayuso @ 2013-10-23  9:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1382519724-3953-1-git-send-email-pablo@netfilter.org>

From: Mathias Krause <minipli@googlemail.com>

The ulog messages leak heap bytes by the means of padding bytes and
incompletely filled string arrays. Fix those by memset(0)'ing the
whole struct before filling it.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/ipt_ULOG.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_ULOG.c b/net/ipv4/netfilter/ipt_ULOG.c
index cbc2215..9cb993c 100644
--- a/net/ipv4/netfilter/ipt_ULOG.c
+++ b/net/ipv4/netfilter/ipt_ULOG.c
@@ -220,6 +220,7 @@ static void ipt_ulog_packet(struct net *net,
 	ub->qlen++;
 
 	pm = nlmsg_data(nlh);
+	memset(pm, 0, sizeof(*pm));
 
 	/* We might not have a timestamp, get one */
 	if (skb->tstamp.tv64 == 0)
@@ -238,8 +239,6 @@ static void ipt_ulog_packet(struct net *net,
 	}
 	else if (loginfo->prefix[0] != '\0')
 		strncpy(pm->prefix, loginfo->prefix, sizeof(pm->prefix));
-	else
-		*(pm->prefix) = '\0';
 
 	if (in && in->hard_header_len > 0 &&
 	    skb->mac_header != skb->network_header &&
@@ -251,13 +250,9 @@ static void ipt_ulog_packet(struct net *net,
 
 	if (in)
 		strncpy(pm->indev_name, in->name, sizeof(pm->indev_name));
-	else
-		pm->indev_name[0] = '\0';
 
 	if (out)
 		strncpy(pm->outdev_name, out->name, sizeof(pm->outdev_name));
-	else
-		pm->outdev_name[0] = '\0';
 
 	/* copy_len <= skb->len, so can't fail. */
 	if (skb_copy_bits(skb, 0, pm->payload, copy_len) < 0)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/3] netfilter: ebt_ulog: fix info leaks
From: Pablo Neira Ayuso @ 2013-10-23  9:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1382519724-3953-1-git-send-email-pablo@netfilter.org>

From: Mathias Krause <minipli@googlemail.com>

The ulog messages leak heap bytes by the means of padding bytes and
incompletely filled string arrays. Fix those by memset(0)'ing the
whole struct before filling it.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/ebt_ulog.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c
index 5180938..7c470c3 100644
--- a/net/bridge/netfilter/ebt_ulog.c
+++ b/net/bridge/netfilter/ebt_ulog.c
@@ -181,6 +181,7 @@ static void ebt_ulog_packet(struct net *net, unsigned int hooknr,
 	ub->qlen++;
 
 	pm = nlmsg_data(nlh);
+	memset(pm, 0, sizeof(*pm));
 
 	/* Fill in the ulog data */
 	pm->version = EBT_ULOG_VERSION;
@@ -193,8 +194,6 @@ static void ebt_ulog_packet(struct net *net, unsigned int hooknr,
 	pm->hook = hooknr;
 	if (uloginfo->prefix != NULL)
 		strcpy(pm->prefix, uloginfo->prefix);
-	else
-		*(pm->prefix) = '\0';
 
 	if (in) {
 		strcpy(pm->physindev, in->name);
@@ -204,16 +203,14 @@ static void ebt_ulog_packet(struct net *net, unsigned int hooknr,
 			strcpy(pm->indev, br_port_get_rcu(in)->br->dev->name);
 		else
 			strcpy(pm->indev, in->name);
-	} else
-		pm->indev[0] = pm->physindev[0] = '\0';
+	}
 
 	if (out) {
 		/* If out exists, then out is a bridge port */
 		strcpy(pm->physoutdev, out->name);
 		/* rcu_read_lock()ed by nf_hook_slow */
 		strcpy(pm->outdev, br_port_get_rcu(out)->br->dev->name);
-	} else
-		pm->outdev[0] = pm->physoutdev[0] = '\0';
+	}
 
 	if (skb_copy_bits(skb, -ETH_HLEN, pm->data, copy_len) < 0)
 		BUG();
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 0/3] netfilter fixes for net
From: Pablo Neira Ayuso @ 2013-10-23  9:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains three netfilter fixes for your net
tree, they are:

* A couple of fixes to resolve info leak to userspace due to uninitialized
  memory area in ulogd, from Mathias Krause.

* Fix instruction ordering issues that may lead to the access of
  uninitialized data in x_tables. The problem involves the table update
 (producer) and the main packet matching (consumer) routines. Detected in
  SMP ARMv7, from Will Deacon.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

Thanks!

----------------------------------------------------------------

The following changes since commit c31eeaced22ce8bd61268a3c595d542bb38c0a4f:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2013-10-01 12:58:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git master

for you to fetch changes up to b416c144f46af1a30ddfa4e4319a8f077381ad63:

  netfilter: x_tables: fix ordering of jumpstack allocation and table update (2013-10-22 10:11:29 +0200)

----------------------------------------------------------------
Mathias Krause (2):
      netfilter: ebt_ulog: fix info leaks
      netfilter: ipt_ULOG: fix info leaks

Will Deacon (1):
      netfilter: x_tables: fix ordering of jumpstack allocation and table update

 net/bridge/netfilter/ebt_ulog.c |    9 +++------
 net/ipv4/netfilter/arp_tables.c |    5 +++++
 net/ipv4/netfilter/ip_tables.c  |    5 +++++
 net/ipv4/netfilter/ipt_ULOG.c   |    7 +------
 net/ipv6/netfilter/ip6_tables.c |    5 +++++
 net/netfilter/x_tables.c        |    7 ++++++-
 6 files changed, 25 insertions(+), 13 deletions(-)

^ permalink raw reply

* [PATCH net-next 3/3] inet: remove old fragmentation hash initializing
From: Hannes Frederic Sowa @ 2013-10-23  9:06 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, Hannes Frederic Sowa, David S. Miller,
	Eric Dumazet
In-Reply-To: <1382519217-750-1-git-send-email-hannes@stressinduktion.org>

All fragmentation hash secrets now get initialized by their
corresponding hash function with net_get_random_once. Thus we can
eliminate the initial seeding.

Also provide a comment that hash secret seeding happens at the first
call to the corresponding hashing function.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/inet_frag.h  | 4 ++++
 net/ipv4/inet_fragment.c | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index bfcbc00..6f59de9 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -64,6 +64,10 @@ struct inet_frags {
 	rwlock_t		lock ____cacheline_aligned_in_smp;
 	int			secret_interval;
 	struct timer_list	secret_timer;
+
+	/* The first call to hashfn is responsible to initialize
+	 * rnd. This is best done with net_get_random_once.
+	 */
 	u32			rnd;
 	int			qsize;
 
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index c5313a9..bb075fc 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -93,9 +93,6 @@ void inet_frags_init(struct inet_frags *f)
 	}
 	rwlock_init(&f->lock);
 
-	f->rnd = (u32) ((totalram_pages ^ (totalram_pages >> 7)) ^
-				   (jiffies ^ (jiffies >> 6)));
-
 	setup_timer(&f->secret_timer, inet_frag_secret_rebuild,
 			(unsigned long)f);
 	f->secret_timer.expires = jiffies + f->secret_interval;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 2/3] ipv6: split inet6_hash_frag for netfilter and initialize secrets with net_get_random_once
From: Hannes Frederic Sowa @ 2013-10-23  9:06 UTC (permalink / raw)
  To: netdev
  Cc: netfilter-devel, Hannes Frederic Sowa, David S. Miller,
	Eric Dumazet, Pablo Neira Ayuso
In-Reply-To: <1382519217-750-1-git-send-email-hannes@stressinduktion.org>

Defer the fragmentation hash secret initialization for IPv6 like the
previous patch did for IPv4.

Because the netfilter logic reuses the hash secret we have to split it
first. Thus introduce a new nf_hash_frag function which takes care to
seed the hash secret.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/ipv6.h                      |  2 --
 net/ipv6/netfilter/nf_conntrack_reasm.c | 16 ++++++++++++++--
 net/ipv6/reassembly.c                   | 12 ++++++------
 3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index a35055f..dd96638 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -805,8 +805,6 @@ int ip6_mc_source(int add, int omode, struct sock *sk,
 int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf);
 int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
 		  struct group_filter __user *optval, int __user *optlen);
-unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
-			     const struct in6_addr *daddr, u32 rnd);
 
 #ifdef CONFIG_PROC_FS
 int ac6_proc_init(struct net *net);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index dffdc1a..4a25826 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -144,12 +144,24 @@ static inline u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h)
 	return 1 << (ipv6_get_dsfield(ipv6h) & INET_ECN_MASK);
 }
 
+static unsigned int nf_hash_frag(__be32 id, const struct in6_addr *saddr,
+				 const struct in6_addr *daddr)
+{
+	u32 c;
+
+	net_get_random_once(&nf_frags.rnd, sizeof(nf_frags.rnd));
+	c = jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
+			 (__force u32)id, nf_frags.rnd);
+	return c & (INETFRAGS_HASHSZ - 1);
+}
+
+
 static unsigned int nf_hashfn(struct inet_frag_queue *q)
 {
 	const struct frag_queue *nq;
 
 	nq = container_of(q, struct frag_queue, q);
-	return inet6_hash_frag(nq->id, &nq->saddr, &nq->daddr, nf_frags.rnd);
+	return nf_hash_frag(nq->id, &nq->saddr, &nq->daddr);
 }
 
 static void nf_skb_free(struct sk_buff *skb)
@@ -185,7 +197,7 @@ static inline struct frag_queue *fq_find(struct net *net, __be32 id,
 	arg.ecn = ecn;
 
 	read_lock_bh(&nf_frags.lock);
-	hash = inet6_hash_frag(id, src, dst, nf_frags.rnd);
+	hash = nf_hash_frag(id, src, dst);
 
 	q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash);
 	local_bh_enable();
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 1aeb473..cc85a9b 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -82,24 +82,24 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
  * callers should be careful not to use the hash value outside the ipfrag_lock
  * as doing so could race with ipfrag_hash_rnd being recalculated.
  */
-unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
-			     const struct in6_addr *daddr, u32 rnd)
+static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr,
+				    const struct in6_addr *daddr)
 {
 	u32 c;
 
+	net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd));
 	c = jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr),
-			 (__force u32)id, rnd);
+			 (__force u32)id, ip6_frags.rnd);
 
 	return c & (INETFRAGS_HASHSZ - 1);
 }
-EXPORT_SYMBOL_GPL(inet6_hash_frag);
 
 static unsigned int ip6_hashfn(struct inet_frag_queue *q)
 {
 	struct frag_queue *fq;
 
 	fq = container_of(q, struct frag_queue, q);
-	return inet6_hash_frag(fq->id, &fq->saddr, &fq->daddr, ip6_frags.rnd);
+	return inet6_hash_frag(fq->id, &fq->saddr, &fq->daddr);
 }
 
 bool ip6_frag_match(struct inet_frag_queue *q, void *a)
@@ -193,7 +193,7 @@ fq_find(struct net *net, __be32 id, const struct in6_addr *src,
 	arg.ecn = ecn;
 
 	read_lock(&ip6_frags.lock);
-	hash = inet6_hash_frag(id, src, dst, ip6_frags.rnd);
+	hash = inet6_hash_frag(id, src, dst);
 
 	q = inet_frag_find(&net->ipv6.frags, &ip6_frags, &arg, hash);
 	if (IS_ERR_OR_NULL(q)) {
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 1/3] ipv4: initialize ip4_frags hash secret as late as possible
From: Hannes Frederic Sowa @ 2013-10-23  9:06 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, Hannes Frederic Sowa, Eric Dumazet,
	David S. Miller
In-Reply-To: <1382519217-750-1-git-send-email-hannes@stressinduktion.org>

Defer the generation of the first hash secret for the ipv4 fragmentation
cache as late as possible.

ip4_frags.rnd gets initial seeded by inet_frags_init and regulary
reseeded by inet_frag_secret_rebuild. Either we call ipqhashfn directly
from ip_fragment.c in which case we initialize the secret directly.

If we first get called by inet_frag_secret_rebuild we install a new secret
by a manual call to get_random_bytes. This secret will be overwritten
as soon as the first call to ipqhashfn happens. This is safe because we
won't race while publishing the new secrets with anyone else.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/ip_fragment.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index b66910a..2481993 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -106,6 +106,7 @@ struct ip4_create_arg {

 static unsigned int ipqhashfn(__be16 id, __be32 saddr, __be32 daddr, u8 prot)
 {
+	net_get_random_once(&ip4_frags.rnd, sizeof(ip4_frags.rnd));
 	return jhash_3words((__force u32)id << 16 | prot,
 			    (__force u32)saddr, (__force u32)daddr,
 			    ip4_frags.rnd) & (INETFRAGS_HASHSZ - 1);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 0/3] initialize fragment hash secrets with net_get_random_once
From: Hannes Frederic Sowa @ 2013-10-23  9:06 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel

Hi!

This series switches the inet_frag.rnd hash initialization to
net_get_random_once.

Included patches:
 ipv4: initialize ip4_frags hash secret as late
 ipv6: split inet6_hash_frag for netfilter and
 inet: remove old fragmentation hash initializing

Diffstat:
 include/net/inet_frag.h                 |  4 ++++
 include/net/ipv6.h                      |  2 --
 net/ipv4/inet_fragment.c                |  3 ---
 net/ipv4/ip_fragment.c                  |  1 +
 net/ipv6/netfilter/nf_conntrack_reasm.c | 16 ++++++++++++++--
 net/ipv6/reassembly.c                   | 12 ++++++------
 6 files changed, 25 insertions(+), 13 deletions(-)

Greetings,

  Hannes


^ permalink raw reply

* Re: [PATCH net] net: sctp: fix ASCONF to allow non SCTP_ADDR_SRC addresses in ipv6
From: Vladislav Yasevich @ 2013-10-23  8:54 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, netdev@vger.kernel.org, linux-sctp@vger.kernel.org,
	Michio Honda
In-Reply-To: <1382459696-1732-1-git-send-email-dborkman@redhat.com>

On Tue, Oct 22, 2013 at 12:34 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
>
> Commit 8a07eb0a50 ("sctp: Add ASCONF operation on the single-homed host")
> implemented possible use of IPv4 addresses with non SCTP_ADDR_SRC state
> as source address when sending ASCONF (ADD) packets, but IPv6 part for
> that was not implemented in 8a07eb0a50. Therefore, as this is not restricted
> to IPv4-only, fix this up to allow the same for IPv6 addresses in SCTP.
>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Michio Honda <micchie@sfc.wide.ad.jp>

[stupid gmail]

Acked-by: Vlad Yasevich <vyasevich@gmail.com>

> ---
>  net/sctp/ipv6.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
> index e7b2d4f..96a5591 100644
> --- a/net/sctp/ipv6.c
> +++ b/net/sctp/ipv6.c
> @@ -279,7 +279,9 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
>                 sctp_v6_to_addr(&dst_saddr, &fl6->saddr, htons(bp->port));
>                 rcu_read_lock();
>                 list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> -                       if (!laddr->valid || (laddr->state != SCTP_ADDR_SRC))
> +                       if (!laddr->valid || laddr->state == SCTP_ADDR_DEL ||
> +                           (laddr->state != SCTP_ADDR_SRC &&
> +                            !asoc->src_out_of_asoc_ok))
>                                 continue;
>
>                         /* Do not compare against v4 addrs */
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
From: David Laight @ 2013-10-23  8:33 UTC (permalink / raw)
  To: Antonio Quartulli; +Cc: David S. Miller, netdev
In-Reply-To: <20131022171314.GL1544@neomailbox.net>

...
> > I can't remember which value you passed as 'offset' (and my mailer makes
> > it hard to find), but to ease the code changes the offset of the udp data
> > would make sense.
> > In that case you still need to pass the source port.
> 
> I decided not to pass the source port because if the user is really interested
> in it, it is still possible to get the udp_hdr from the skb and read its value.

It just seemed that there was no need to require that the hook re-parse
the ip header just to find the source port.
(ok it could assume that the udp header is just before the data)

> > If you do rx_hook(np, source_port, skb, offset) then if anyone manages to
> > load an old module (or code that casts the assignement to rx_poll)
> > at least it won't go 'bang'.
> > Renaming the structure member will guarantee to generate compile errors.
> 
> so you suggest to rename rx_hook to something else to warn people about the
> change?

Yes.

> If we go for the "no udp port" approach they will get an error any way because
> of the mismatching arguments.

No - you only get a warning when you assign a function pointer of the wrong type.
And that is true even if you just change the type of the pointer.
However code might already have a cast on the function pointer (eg because the
hook has 'unsigned char *') - so you won't even get a warning.
You then get an OOPS when the hook tries to read the buffer.

It is a really bad interface...
There isn't even a flags/options (etc) word that can be used
to detect enhancements.

	David

^ permalink raw reply

* [PATCH net v3] be2net: Warn users of possible broken functionality on BE2 cards with very old FW versions with latest driver
From: Somnath Kotur @ 2013-10-23  8:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, Somnath Kotur

On very old FW versions < 4.0, the mailbox command to set interrupts
on the card succeeds even though it is not supported and should have
failed, leading to a scenario where interrupts do not work.
Hence warn users to upgrade to a suitable FW version to avoid seeing
broken functionality.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
v3: Incorporated comments from Ben Hutchings and Joe Perches

 drivers/net/ethernet/emulex/benet/be.h      |   14 ++++++++++++++
 drivers/net/ethernet/emulex/benet/be_main.c |    6 ++++++
 2 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index db02023..6a57051 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -696,6 +696,20 @@ static inline int qnq_async_evt_rcvd(struct be_adapter *adapter)
 	return adapter->flags & BE_FLAGS_QNQ_ASYNC_EVT_RCVD;
 }
 
+static inline u32 fw_major_num(char *fw_ver_str)
+{
+	u32 fw_major;
+	char *next, *cp;
+	char tmp_fw_ver[FW_VER_LEN];
+
+	strncpy(tmp_fw_ver, fw_ver_str, strlen(fw_ver_str));
+	next = tmp_fw_ver;
+	cp = strsep(&next, ".");
+	sscanf(cp, "%i", &fw_major);
+
+	return fw_major;
+}
+
 extern void be_cq_notify(struct be_adapter *adapter, u16 qid, bool arm,
 		u16 num_popped);
 extern void be_link_status_update(struct be_adapter *adapter, u8 link_status);
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 2c38cc4..d8da961 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3247,6 +3247,12 @@ static int be_setup(struct be_adapter *adapter)
 
 	be_cmd_get_fw_ver(adapter, adapter->fw_ver, adapter->fw_on_flash);
 
+	if (BE2_chip(adapter) && fw_major_num(adapter->fw_ver) < 4) {
+		dev_err(dev, "Firmware on card is old(%s), IRQs may not work.",
+			adapter->fw_ver);
+		dev_err(dev, "Please upgrade firmware to version >= 4.0\n");
+	}
+
 	if (adapter->vlans_added)
 		be_vid_config(adapter);
 
-- 
1.6.0.2

^ permalink raw reply related

* Re: Big performance loss from 3.4.63 to 3.10.13 when routing ipv4
From: Steffen Klassert @ 2013-10-23  8:12 UTC (permalink / raw)
  To: David Miller; +Cc: linux, hannes, netdev, klassert
In-Reply-To: <20131022.154638.1989281525248160764.davem@davemloft.net>

On Tue, Oct 22, 2013 at 03:46:38PM -0400, David Miller wrote:
> From: Wolfgang Walter <linux@stwm.de>
> Date: Tue, 22 Oct 2013 21:07:41 +0200
> 
> > Am Mittwoch, 2. Oktober 2013, 00:20:02 schrieb Hannes Frederic Sowa:
> >> On Tue, Oct 01, 2013 at 06:39:32PM +0200, Wolfgang Walter wrote:
> >> > All network traffic over the router become slow and sluggish. If one pings
> >> > the router there is a packet loss. After about 2 minutes the traffic
> >> > completely stalls for about 1 minute. Then it works again as in the
> >> > beginning to then stall again. And so on.
> >> 
> >> Maybe dropwatch can give a first hint?
> >> 
> > 
> > I finally found the problem:
> > 
> > In 3.10.x and 3.11.x the value of /proc/sys/net/ipv4/xfrm4_gc_thresh is 1024.
> > 
> > It is much higher in 3.4.x. If I increase this value in 3.10.x to the one I 
> > see on 3.4.x all works fine with 3.10.x
> 
> Steffen, here is yet another report about this issue.
> 
> I think we should resolve this soon, even bumping it to 2048 or 4096
> and leaving it at that would be I think acceptable.
> 

Yes, of course. Let's use 4096 as the default for ipv4 and ipv6.
I'll take care of it next week.

Thanks!

^ permalink raw reply

* Re: [PATCH net-next 0/2] Removal of struct esp_data
From: Steffen Klassert @ 2013-10-23  8:07 UTC (permalink / raw)
  To: David Miller; +Cc: mathias.krause, netdev, herbert
In-Reply-To: <20131022.140510.1879404122981800534.davem@davemloft.net>

On Tue, Oct 22, 2013 at 02:05:10PM -0400, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Tue, 22 Oct 2013 15:08:22 +0200
> 
> > On Fri, Oct 18, 2013 at 01:55:36PM -0400, David Miller wrote:
> >> From: Mathias Krause <mathias.krause@secunet.com>
> >> Date: Fri, 18 Oct 2013 12:09:03 +0200
> >> 
> >> > This series removes one level of indirection when accessing the aead
> >> > crypto algorithm in ESP transforms by simply removing struct esp_data.
> >> > This results in smaller code and less memory usage per xfrm state.
> >> > 
> >> > Please apply!
> >> 
> >> No objections from me, I'll let Steffen pick this up.
> > 
> > I'm a bit hesitating with removing the padlen field. We resisted
> > several attempts to remove it in the past. It is currenly unused,
> > but it provides the infrastructure for ESP padding as defined
> > in RFC 4303. However, RFC 4303 recommends the use of TFC padding
> > instead to conceal the actual length of the packet. So I'm not
> > sure what's the actual usecase for ESP padding. I'll reconsider
> > this next week when I'm back at office.
> 
> Steffen, is it really the case that we cannot add it back later if we
> really need to?
> 

Well, I thought to either take this as a reminder to implement the
missing stuff or to take the removing patches if this is really obsolete.
I'll do one of both once I'm back from conference week in Edinburgh.

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-23  8:02 UTC (permalink / raw)
  To: Wei Liu, David Laight
  Cc: Jan Beulich, annie li, david.vrabel, ian.campbell, xen-devel,
	netdev
In-Reply-To: <20131018112417.GA20185@zion.uk.xensource.com>


On 2013-10-18 19:24, Wei Liu wrote:
> On Fri, Oct 18, 2013 at 09:40:33AM +0100, David Laight wrote:
>>>> My understanding is this patch does not simply double the span, it is
>>>> just stricter than the original one. Please check my previous comments,
>>>> I paste it here.
>>> No, the code (on a 32-bit arch) just _can't_ handle jiffies differences
>>> beyond 2^32, no matter how cleverly you use the respective macros.
>>> All arithmetic there is done modulo 2^32.
>> I haven't followed this discussion very closely but it might be possible
>> to arrange that the 'incorrect lack of credit' only occurs for a few
>> seconds every time 'jiffies' wraps - instead of half of the time.
>> Then you'd have to be extremely unlucky to hit the timing window.
>>
> As I understand it, this is the idea of this patch -- to narrow down the
> timing window.
Jan,  do you agree the idea or have better suggestion to me.

Thanks,
Jason
>
> Wei.
>
>> 	David
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: BUG: scheduling while atomic dev_set_promiscuity->__dev_notify_flags
From: Nicolas Dichtel @ 2013-10-23  7:48 UTC (permalink / raw)
  To: Alexei Starovoitov, Cong Wang; +Cc: netdev
In-Reply-To: <CAADnVQLY7-crQLO1Tc4MBLX8R66wAZztw-arnfO3NWO9sLW++Q@mail.gmail.com>

Le 23/10/2013 07:34, Alexei Starovoitov a écrit :
> On Tue, Oct 22, 2013 at 8:53 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Tue, 22 Oct 2013 at 01:04 GMT, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>>
>>> packet_notifier() does rcu_read_lock() before calling into packet_dev_mc() .
>>>
>>> Not sure how to fix it cleanly, other than disabling a notify here.
>>> Any suggestion?
>>>
>>
>> Passing a gfp flag to rtmsg_ifinfo() seems a right fix for me, but I don't
>> know if there is other better way to fix it.
>
> Indeed. rtnl_notify() already accepts gfp_t.
>
> The following diff fixes it for me:
> ---
>   include/linux/rtnetlink.h |    3 ++-
>   net/core/dev.c            |    2 +-
>   net/core/rtnetlink.c      |   12 +++++++++---
>   3 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
> index f28544b..0180523 100644
> --- a/include/linux/rtnetlink.h
> +++ b/include/linux/rtnetlink.h
> @@ -15,7 +15,8 @@ extern int rtnetlink_put_metrics(struct sk_buff
> *skb, u32 *metrics);
>   extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst,
>                     u32 id, long expires, u32 error);
>
> -extern void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change);
> +void __rtmsg_ifinfo(int type, struct net_device *dev, unsigned
> change, gfp_t flags);
Just nitpiking: putting the type and the name on the same line is better 
'unsigned change'.

> +void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change);
>
>   /* RTNL is used as a global lock for all changes to network configuration  */
>   extern void rtnl_lock(void);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0918aad..59b90fe 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5257,7 +5257,7 @@ void __dev_notify_flags(struct net_device *dev,
> unsigned int old_flags,
>       unsigned int changes = dev->flags ^ old_flags;
>
>       if (gchanges)
> -        rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges);
> +        __rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC);
>
>       if (changes & IFF_UP) {
>           if (dev->flags & IFF_UP)
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 4aedf03..5931af9 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1984,14 +1984,15 @@ static int rtnl_dump_all(struct sk_buff *skb,
> struct netlink_callback *cb)
>       return skb->len;
>   }
>
> -void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change)
> +void __rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> +            gfp_t flags)
'gfp_t flags' should be aligned with 'int type'

>   {
>       struct net *net = dev_net(dev);
>       struct sk_buff *skb;
>       int err = -ENOBUFS;
>       size_t if_info_size;
>
> -    skb = nlmsg_new((if_info_size = if_nlmsg_size(dev, 0)), GFP_KERNEL);
> +    skb = nlmsg_new((if_info_size = if_nlmsg_size(dev, 0)), flags);
>       if (skb == NULL)
>           goto errout;
>
> @@ -2002,12 +2003,17 @@ void rtmsg_ifinfo(int type, struct net_device
> *dev, unsigned int change)
>           kfree_skb(skb);
>           goto errout;
>       }
> -    rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
> +    rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags);
>       return;
>   errout:
>       if (err < 0)
>           rtnl_set_sk_err(net, RTNLGRP_LINK, err);
>   }
> +
> +void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change)
> +{
> +    __rtmsg_ifinfo(type, dev, change, GFP_KERNEL);
> +}
>   EXPORT_SYMBOL(rtmsg_ifinfo);
>
>   static int nlmsg_populate_fdb_fill(struct sk_buff *skb,
> --
>
> Nicolas, I've sent you my .config.
I got some pb with my testbed, hence I still didn't reproduce the bug.
I will make more test today, but I think this is the right patch.

> Any better ideas?
No and the patch is right for me (__dev_set_promiscuity() call
audit_log(..., GFP_ATOMIC) already).

^ permalink raw reply

* [PATCH 7/7] net: via-rhine: remove unnecessary pci_set_drvdata()
From: Jingoo Han @ 2013-10-23  7:09 UTC (permalink / raw)
  To: 'David S. Miller'
  Cc: netdev, 'Jingoo Han', 'Roger Luethi'
In-Reply-To: <000001cecfbe$44ce9c10$ce6bd430$%han@samsung.com>

The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
---
 drivers/net/ethernet/via/via-rhine.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c
index bdf697b..4a7293e 100644
--- a/drivers/net/ethernet/via/via-rhine.c
+++ b/drivers/net/ethernet/via/via-rhine.c
@@ -2292,7 +2292,6 @@ static void rhine_remove_one(struct pci_dev *pdev)

 	free_netdev(dev);
 	pci_disable_device(pdev);
-	pci_set_drvdata(pdev, NULL);
 }

 static void rhine_shutdown (struct pci_dev *pdev)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 6/7] net: tc35815: remove unnecessary pci_set_drvdata()
From: Jingoo Han @ 2013-10-23  7:09 UTC (permalink / raw)
  To: 'David S. Miller'; +Cc: netdev, 'Jingoo Han'
In-Reply-To: <000001cecfbe$44ce9c10$ce6bd430$%han@samsung.com>

The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
---
 drivers/net/ethernet/toshiba/tc35815.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/toshiba/tc35815.c b/drivers/net/ethernet/toshiba/tc35815.c
index a971b9c..1322546 100644
--- a/drivers/net/ethernet/toshiba/tc35815.c
+++ b/drivers/net/ethernet/toshiba/tc35815.c
@@ -887,7 +887,6 @@ static void tc35815_remove_one(struct pci_dev *pdev)
 	mdiobus_free(lp->mii_bus);
 	unregister_netdev(dev);
 	free_netdev(dev);
-	pci_set_drvdata(pdev, NULL);
 }
 
 static int
-- 
1.7.10.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox