Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 28/29] net/: rename net_random() to prandom_u32()
From: Neil Horman @ 2012-12-25  0:21 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-kernel, akpm, Jesse Gross, Venkat Venkatsubra,
	Vlad Yasevich, Sridhar Samudrala, Steffen Klassert, Herbert Xu,
	David S. Miller, linux-sctp, dev, netdev
In-Reply-To: <1356315256-6572-29-git-send-email-akinobu.mita@gmail.com>

On Mon, Dec 24, 2012 at 11:14:15AM +0900, Akinobu Mita wrote:
> Use more preferable function name which implies using a pseudo-random
> number generator.
> 
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> Cc: Jesse Gross <jesse@nicira.com>
> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Sridhar Samudrala <sri@us.ibm.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-sctp@vger.kernel.org
> Cc: dev@openvswitch.org
> Cc: netdev@vger.kernel.org
> ---
>  include/net/red.h         | 2 +-
>  net/802/garp.c            | 2 +-
>  net/openvswitch/actions.c | 2 +-
>  net/rds/bind.c            | 2 +-
>  net/sctp/socket.c         | 2 +-
>  net/xfrm/xfrm_state.c     | 2 +-
>  6 files changed, 6 insertions(+), 6 deletions(-)
> 
I'm largely indifferent to this patch, but I kind of feel like its just churn.
Whats the real advantage in making this change?  I grant that it clearly
indicates the type of random number generator we're using at a given call site,
But for those using net_random, you probably don't care too much about
the source of your random bits.  If you did really want true random vs.
pseudo-random data, you need to explicitly use the right call.  You're previous
patch series did good cleanup on differentiating the different random calls, but
this just seems like its removing what is otherwise useful indirection.
Neil

> diff --git a/include/net/red.h b/include/net/red.h
> index ef46058..168bb2f 100644
> --- a/include/net/red.h
> +++ b/include/net/red.h
> @@ -303,7 +303,7 @@ static inline unsigned long red_calc_qavg(const struct red_parms *p,
>  
>  static inline u32 red_random(const struct red_parms *p)
>  {
> -	return reciprocal_divide(net_random(), p->max_P_reciprocal);
> +	return reciprocal_divide(prandom_u32(), p->max_P_reciprocal);
>  }
>  
>  static inline int red_mark_probability(const struct red_parms *p,
> diff --git a/net/802/garp.c b/net/802/garp.c
> index 8456f5d..cf7410d 100644
> --- a/net/802/garp.c
> +++ b/net/802/garp.c
> @@ -397,7 +397,7 @@ static void garp_join_timer_arm(struct garp_applicant *app)
>  {
>  	unsigned long delay;
>  
> -	delay = (u64)msecs_to_jiffies(garp_join_time) * net_random() >> 32;
> +	delay = (u64)msecs_to_jiffies(garp_join_time) * prandom_u32() >> 32;
>  	mod_timer(&app->join_timer, jiffies + delay);
>  }
>  
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index ac2defe..257bc36 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -404,7 +404,7 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
>  		 a = nla_next(a, &rem)) {
>  		switch (nla_type(a)) {
>  		case OVS_SAMPLE_ATTR_PROBABILITY:
> -			if (net_random() >= nla_get_u32(a))
> +			if (prandom_u32() >= nla_get_u32(a))
>  				return 0;
>  			break;
>  
> diff --git a/net/rds/bind.c b/net/rds/bind.c
> index 637bde5..7f95f4b 100644
> --- a/net/rds/bind.c
> +++ b/net/rds/bind.c
> @@ -118,7 +118,7 @@ static int rds_add_bound(struct rds_sock *rs, __be32 addr, __be16 *port)
>  		rover = be16_to_cpu(*port);
>  		last = rover;
>  	} else {
> -		rover = max_t(u16, net_random(), 2);
> +		rover = max_t(u16, prandom_u32(), 2);
>  		last = rover - 1;
>  	}
>  
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 9e65758..95860aa 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -5899,7 +5899,7 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
>  
>  		inet_get_local_port_range(&low, &high);
>  		remaining = (high - low) + 1;
> -		rover = net_random() % remaining + low;
> +		rover = prandom_u32() % remaining + low;
>  
>  		do {
>  			rover++;
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index 3459692..35ddaab 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -1546,7 +1546,7 @@ int xfrm_alloc_spi(struct xfrm_state *x, u32 low, u32 high)
>  	} else {
>  		u32 spi = 0;
>  		for (h=0; h<high-low+1; h++) {
> -			spi = low + net_random()%(high-low+1);
> +			spi = low + prandom_u32() % (high - low + 1);
>  			x0 = xfrm_state_lookup(net, mark, &x->id.daddr, htonl(spi), x->id.proto, x->props.family);
>  			if (x0 == NULL) {
>  				x->id.spi = htonl(spi);
> -- 
> 1.7.11.7
> 
> 

^ permalink raw reply

* Re: [Patch v2] arp: fix a regression in arp_solicit()
From: David Miller @ 2012-12-25  2:43 UTC (permalink / raw)
  To: erdnetdev; +Cc: xiyou.wangcong, netdev, sedat.dilek, edumazet, ja
In-Reply-To: <1356367274.20133.9176.camel@edumazet-glaptop>

From: Eric Dumazet <erdnetdev@gmail.com>
Date: Mon, 24 Dec 2012 08:41:14 -0800

> On Mon, 2012-12-24 at 09:23 +0800, Cong Wang wrote:
>> From: Cong Wang <xiyou.wangcong@gmail.com>
>> 
>> Sedat reported the following commit caused a regression:
>> 
>> commit 9650388b5c56578fdccc79c57a8c82fb92b8e7f1
>> Author: Eric Dumazet <edumazet@google.com>
>> Date:   Fri Dec 21 07:32:10 2012 +0000
>> 
>>     ipv4: arp: fix a lockdep splat in arp_solicit
>> 
>> This is due to the 6th parameter of arp_send() needs to be NULL
>> for the broadcast case, the above commit changed it to an all-zero
>> array by mistake.
>> 
>> Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
>> Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
>> Cc: Sedat Dilek <sedat.dilek@gmail.com>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: Julian Anastasov <ja@ssi.bg>
>> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>> 
>> ---
> 
> Oops, thanks for fixing this.
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

Applied, thanks everyone.

^ permalink raw reply

* [PATCH 2/2] ipv6/ip6_gre: set transport header correctly
From: Isaku Yamahata @ 2012-12-25  2:51 UTC (permalink / raw)
  To: netdev; +Cc: yamahata
In-Reply-To: <a6eb43766a3959d111cde14b237d435de0b19e15.1356403805.git.yamahata@valinux.co.jp>

ip6gre_xmit2() incorrectly sets transport header to inner payload
instead of GRE header. It seems copy-and-pasted from ipip.c.
Set transport header to gre header.
(In ipip case the transport header is the inner ip header, so that's
correct.)

Found by inspection. In practice the incorrect transport header
doesn't matter because the skb usually is sent to another net_device
or socket, so the transport header isn't referenced.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 net/ipv6/ip6_gre.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 867466c..c727e47 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -758,8 +758,6 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
 		skb_dst_set_noref(skb, dst);
 	}
 
-	skb->transport_header = skb->network_header;
-
 	proto = NEXTHDR_GRE;
 	if (encap_limit >= 0) {
 		init_tel_txopt(&opt, encap_limit);
@@ -768,6 +766,7 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
 
 	skb_push(skb, gre_hlen);
 	skb_reset_network_header(skb);
+	skb_set_transport_header(skb, sizeof(*ipv6h));
 
 	/*
 	 *	Push down and install the IP header.
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/2] ipv4/ip_gre: set transport header correctly to gre header
From: Isaku Yamahata @ 2012-12-25  2:51 UTC (permalink / raw)
  To: netdev; +Cc: yamahata

ipgre_tunnel_xmit() incorrectly sets transport header to inner payload
instead of GRE header. It seems copy-and-pasted from ipip.c.
So set transport header to gre header.
(In ipip case the transport header is the inner ip header, so that's
correct.)

Found by inspection. In practice the incorrect transport header
doesn't matter because the skb usually is sent to another net_device
or socket, so the transport header isn't referenced.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
---
 net/ipv4/ip_gre.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 58cb627..303012a 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -912,9 +912,9 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 		/* Warning : tiph value might point to freed memory */
 	}

-	skb_reset_transport_header(skb);
 	skb_push(skb, gre_hlen);
 	skb_reset_network_header(skb);
+	skb_set_transport_header(skb, sizeof(*iph));
 	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 	IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
 			      IPSKB_REROUTED);
-- 
1.7.10.4

^ permalink raw reply related

* Re: kernel panic when running /etc/init.d/iptables restart
From: Gao feng @ 2012-12-25  5:36 UTC (permalink / raw)
  To: canqun zhang
  Cc: Patrick McHardy, netfilter-devel, netfilter, linux-kernel,
	netdev@vger.kernel.org
In-Reply-To: <CAFFEFTXxsPRf1Rmxyih0wL4O6q8G0OFmz0iRLX0tOYedYfajHw@mail.gmail.com>

cc netdev
Hi canqun:

On 2012/12/24 13:51, canqun zhang wrote:
> Hi Patrick，
> If i start  one lxc container instance, and then in the system there
> will be two net namespaces,one is init_net namespace, the other is
> created by lxc.If running "/etc/init.d/iptables restart",the system
> will be panic. I find iptables restarting will clean init_net
> namespace firstly,then clean the net namespace created by lxc,buf
> related functions about cleaning up init_net namespace will destroy
> global variables such as nf_ct_destroy,ip_ct_attach，etc.So,funtions
> cleaning up  the other net namespace will be panic.
> 

I'm afraid that the system will not panic.
When do rmmod nf_conntrack_ipv[4,6],we already call nf_ct_iterate_cleanup
to destroy the nf_conns which belongs to l[3,4]proto  protocols,At this
time the nf_ct_destroy still points to destroy_conntrack because the module
nf_conntrack is hold by l3 and l4proto.
You can check the function nf_conntrack_l[3,4]proto_unregister.

Can you make it a little clear?
The reproduction and oops dump stack is useful.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: kernel panic when running /etc/init.d/iptables restart
From: canqun zhang @ 2012-12-25  7:25 UTC (permalink / raw)
  To: Gao feng
  Cc: Patrick McHardy, netfilter-devel, netfilter, linux-kernel,
	netdev@vger.kernel.org
In-Reply-To: <50D93B43.8060303@cn.fujitsu.com>

Hi Gao feng
The stack information is as follows. The kenel will panic because the
nf_ct_destroy is NULL.

Reproduction:
(1) starting a lxc container
(2) iptables -t nat -A POSTROUTING -s 10.48.254.18 -o eth1 -j
MASQUERADE (run it on host machine)
(3) /etc/ini.d/iptables save (run it on host machine)
(4)/etc/init.d/iptables restart (run it on host machine)

Stack:
Pid: 0, comm: swapper Not tainted 2.6.32-279.14.1.rc3.el6.x86_64 #1
IBM System x3650 M4 -[7915IA4]-/00J6528
RIP: 0010:[<ffffffff81466949>]  [<ffffffff81466949>]
nf_conntrack_destroy+0x19/0x30
RSP: 0018:ffff880028303ab0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff881051b237c0 RCX: 00000000000000d4
RDX: 0000000000000500 RSI: 0000000000000000 RDI: ffff881056bc0528
RBP: ffff880028303ab0 R08: ffff8810514fc020 R09: ffff8810574b6110
R10: 0000000000000000 R11: 0000000000000004 R12: ffffffff814445fe
R13: ffff88105327fba8 R14: ffff882059ed6e00 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f3484002098 CR3: 0000002056792000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8810590b2000, task ffff88205954c040)
Stack:
 ffff880028303ad0 ffffffff8142febd ffff880028303b30 ffff881051b237c0
<d> ffff880028303af0 ffffffff8142fc36 0000000200000004 ffff881051b237c0
<d> ffff880028303b20 ffffffff8142fd82 ffff880028303b20 ffff882059ed6d80
Call Trace:
 <IRQ>
 [<ffffffff8142febd>] skb_release_head_state+0xed/0x110
 [<ffffffff8142fc36>] __kfree_skb+0x16/0xa0
 [<ffffffff8142fd82>] kfree_skb+0x42/0x90
 [<ffffffff814445fe>] __neigh_event_send+0x11e/0x1e0
 [<ffffffff814447f3>] neigh_resolve_output+0x133/0x370
 [<ffffffff81054d55>] ? select_idle_sibling+0x95/0x150
 [<ffffffff814774b7>] ip_finish_output+0x237/0x310
 [<ffffffff81477648>] ip_output+0xb8/0xc0
 [<ffffffff81476945>] ip_local_out+0x25/0x30
 [<ffffffff81476e20>] ip_queue_xmit+0x190/0x420
 [<ffffffff8106012c>] ? try_to_wake_up+0x24c/0x3e0
 [<ffffffff8148bbae>] tcp_transmit_skb+0x3fe/0x7b0
 [<ffffffff8148cfda>] tcp_retransmit_skb+0x1ba/0x5f0
 [<ffffffff81053463>] ? __wake_up+0x53/0x70
 [<ffffffff8148fd00>] ? tcp_write_timer+0x0/0x200
 [<ffffffff8148f85f>] tcp_retransmit_timer+0x1df/0x680
 [<ffffffff8148fe98>] tcp_write_timer+0x198/0x200
 [<ffffffff8107e907>] run_timer_softirq+0x197/0x340
 [<ffffffff810a2350>] ? tick_sched_timer+0x0/0xc0
 [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30
 [<ffffffff81073f31>] __do_softirq+0xc1/0x1e0
 [<ffffffff81096d30>] ? hrtimer_interrupt+0x140/0x250
 [<ffffffff8100c24c>] call_softirq+0x1c/0x30
 [<ffffffff8100de85>] do_softirq+0x65/0xa0
 [<ffffffff81073d15>] irq_exit+0x85/0x90
 [<ffffffff81506050>] smp_apic_timer_interrupt+0x70/0x9b
 [<ffffffff8100bc13>] apic_timer_interrupt+0x13/0x20
 <EOI>
 [<ffffffff812cd9ae>] ? intel_idle+0xde/0x170
 [<ffffffff812cd991>] ? intel_idle+0xc1/0x170
 [<ffffffff8109922d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff81407827>] cpuidle_idle_call+0xa7/0x140
 [<ffffffff81009e06>] cpu_idle+0xb6/0x110
 [<ffffffff814f714f>] start_secondary+0x22a/0x26d
Code: 02 ff d0 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
e5 0f 1f 44 00 00 48 8b 05 90 7c ba 00 48 85 c0 74 04 ff d0 c9 c3 <0f>
0b 0f 1f 44 00 00 eb f9 66 66 66 66 66
2e 0f 1f 84 00 00 00

2012/12/25 Gao feng <gaofeng@cn.fujitsu.com>:
> cc netdev
> Hi canqun:
>
> On 2012/12/24 13:51, canqun zhang wrote:
>> Hi Patrick，
>> If i start  one lxc container instance, and then in the system there
>> will be two net namespaces,one is init_net namespace, the other is
>> created by lxc.If running "/etc/init.d/iptables restart",the system
>> will be panic. I find iptables restarting will clean init_net
>> namespace firstly,then clean the net namespace created by lxc,buf
>> related functions about cleaning up init_net namespace will destroy
>> global variables such as nf_ct_destroy,ip_ct_attach，etc.So,funtions
>> cleaning up  the other net namespace will be panic.
>>
>
> I'm afraid that the system will not panic.
> When do rmmod nf_conntrack_ipv[4,6],we already call nf_ct_iterate_cleanup
> to destroy the nf_conns which belongs to l[3,4]proto  protocols,At this
> time the nf_ct_destroy still points to destroy_conntrack because the module
> nf_conntrack is hold by l3 and l4proto.
> You can check the function nf_conntrack_l[3,4]proto_unregister.
>
> Can you make it a little clear?
> The reproduction and oops dump stack is useful.
>
> Thanks!

^ permalink raw reply

* Re: [PATCH 1/7] batman-adv: Move soft-interface initialization to ndo_init
From: Marek Lindner @ 2012-12-25  8:19 UTC (permalink / raw)
  To: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Sven Eckelmann, David Miller,
	Pau Koning
In-Reply-To: <CANiGF9_4QP0ts2kV9Bruk_8L7h1870LMcrRhtsYu35q=COF0hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tuesday, December 25, 2012 07:18:37 Pau Koning wrote:
> >> What decision? I only see something about Patch 6 and Patch 7: Antonio
> >> Quartulli doesn't want rtnetlink support in batman-adv. I don't see
> >> any other statement about the rest of the patchset.
> > 
> > Quoting Antonio:
> > 
> > 
> > Anyway, I discussed about this together with the others and it seems
> > that a proper solution now is to wait before merging this patchset and
> > fix the current sysfs/rtnl_lock problem first. What do you think?
> > 
> > Adding a new API without fixing the current one doesn't sound like a good
> > move.
> > <<<
> > 
> > What isn't clear ?
> 
> AFAIK patch 6 and 7 are the "new" rtnetlink API. The rest can be
> categorized as fixes for bugs and for the wrong way to implementation
> things.

After looking at the patches again, I'd say only patch1 & patch2 could be 
categorized as "fixes" while the remaining patches are preparation patches for 
the netlink stuff which is added in the last step. Whether patch1 & patch2 
actually fix problems batman-adv already has today or whether they are 
necessary once you use netlink is unclear to me. If they are unrelated to this 
patchset they'd be sent separately, wouldn't they ?

Cheers,
Marek

^ permalink raw reply

* Re: [PATCH] net, batman: don't crash on zero length strings in routing_algo
From: Marek Lindner @ 2012-12-25  8:30 UTC (permalink / raw)
  To: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
  Cc: Pau Koning, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX, Sasha Levin,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <201212250336.36874.lindner_marek-LWAfsSFWpa4@public.gmane.org>

On Tuesday, December 25, 2012 03:36:36 Marek Lindner wrote:
> > >> > empty, this assumption is wrong:
> > >> Why isn't this patch part of Linux 3.7? It seems to be a bugfix and it
> > >> was sent early enough?
> > > 
> > > The patch received a reply mere 2 hours after it was sent. Again,
> > > please read all mails before making noise.
> > 
> > Ok, leaving this problem unsolved is the correct way to handle it?
> > Nobody is allowed to say anything?
> 
> No, of course you are allowed "to say" something. I was simply pointing you
> to the explanation why the patch wasn't merged yet.

Let me add here: Feel free to do the required cleanup work and re-submit the 
patch. Nobody stops you from doing that as well.  :-)

Cheers,
Marek

^ permalink raw reply

* Re: kernel panic when running /etc/init.d/iptables restart
From: Gao feng @ 2012-12-25  8:38 UTC (permalink / raw)
  To: canqun zhang
  Cc: Patrick McHardy, netfilter-devel, netfilter, linux-kernel,
	netdev@vger.kernel.org
In-Reply-To: <CAFFEFTULzSysz+JCBC==VtKD8u1KO6EZnSdQ5XWM7um-zs9_pg@mail.gmail.com>

On 2012/12/25 15:25, canqun zhang wrote:
> Hi Gao feng
> The stack information is as follows. The kenel will panic because the
> nf_ct_destroy is NULL.
> 
> Reproduction:
> (1) starting a lxc container
> (2) iptables -t nat -A POSTROUTING -s 10.48.254.18 -o eth1 -j
> MASQUERADE (run it on host machine)
> (3) /etc/ini.d/iptables save (run it on host machine)
> (4)/etc/init.d/iptables restart (run it on host machine)
> 

Thanks!
It seems that nf_conntrack_l[3,4]proto_unregister doesn't make sure
nf_conns of the proto being destroyed.

If I'm right, there is another problem even your fix this panic problem.
the l3,14proto will be unregistered before all of it's nf_conns being destroyed.
So even nf_ct_destroy is not NULL,in destroy_conntrack we are not able to
find the right l4proto,the l4proto->destroy will be incorrect.resources will
not be released correctly.

So I think the root problem is we do register/unregister, set/unset both on the
first net (init_net), Maybe it's better to do register set on the first net, and
do unregister unset on the last net.

^ permalink raw reply

* Re: kernel panic when running /etc/init.d/iptables restart
From: Gao feng @ 2012-12-25  8:50 UTC (permalink / raw)
  To: canqun zhang
  Cc: Patrick McHardy, netfilter-devel, netfilter, linux-kernel,
	netdev@vger.kernel.org
In-Reply-To: <CAFFEFTULzSysz+JCBC==VtKD8u1KO6EZnSdQ5XWM7um-zs9_pg@mail.gmail.com>

On 2012/12/25 15:25, canqun zhang wrote:
> Hi Gao feng
> The stack information is as follows. The kenel will panic because the
> nf_ct_destroy is NULL.

Thanks!
It seems that nf_conntrack_l[3,4]proto_unregister doesn't make sure
nf_conns of the proto being destroyed.

If I'm right, there is another problem even your fix this panic problem.
the l3,14proto will be unregistered before all of it's nf_conns being destroyed.
So even nf_ct_destroy is not NULL,in destroy_conntrack we are not able to
find the right l4proto,the l4proto->destroy will be incorrect.resources will
not be released correctly.

So I think the root problem is we do register/unregister, set/unset both on the
first net (init_net), Maybe it's better to do register set on the first net, and
do unregister unset on the last net.

^ permalink raw reply

* Why tcp_sacktag_walk specially process next_dup?
From: Yi Li @ 2012-12-25 10:35 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev

Hi Ilpo,
I am a kernel newbie, maybe this question is simple.
If you have some free time, could you help me ?

I am reading your commit 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=68f8353b480e5f2e136c38a511abdbb88eaa8ce2, 
through this code path:

tcp_sacktag_write_queue() {
     if (tcp_sack_cache_ok(tp, cache) && !dup_sack &&
             after(end_seq, cache->start_seq)) {

             /* Head todo? */
             if (before(start_seq, cache->start_seq)) {
                         skb = tcp_sacktag_skip(skb, sk, &state,
                                start_seq);
                 skb = tcp_sacktag_walk(skb, sk, next_dup,
                                &state,
                                start_seq,
                                cache->start_seq,
                                dup_sack);
             }

}

and when we come to tcp_sacktag_walk(), comparing the current processing 
sack block
with cache, we have:  start_seq < cache->start_seq, and we now need to 
process the
bytes between (start_seq, cache->start_seq) in tcp_write_queue.

But in tcp_sacktag_walk(), why we first check the seqence space in 
next_dup ?
I know this is about D-SACK, and I have read the rfc2883, but I am still 
confused.
I have some questions:
1. Why we introduce a next_dup variable in SACK processing, is it better 
for performance optimization?
      As there is dup_sack variable, will this pre-processing of sack 
block be mixed with dup_sack ?
2. What does this test statement means in tcp_sacktag_walk:
      if ((next_dup != NULL) &&
             before(TCP_SKB_CB(skb)->seq, next_dup->end_seq)) { 
---------------------> A
             in_sack = tcp_match_skb_to_sack(sk, skb,
                             next_dup->start_seq,
                             next_dup->end_seq);
             if (in_sack > 0)
                 dup_sack = true;
         }
as far as i know, if tcp_skb_pcout(skb)>1, this condition maybe exist:
        skb->seq   < current_sack_block.start_seq < 
current_sack_block.end_seq < next_dup->start_seq < next_dup->end_seq.
So, I do not understand what the code A really does.

^ permalink raw reply

* Re: kernel panic when running /etc/init.d/iptables restart
From: canqun zhang @ 2012-12-25 10:45 UTC (permalink / raw)
  To: Gao feng
  Cc: Patrick McHardy, netfilter-devel, netfilter, linux-kernel,
	netdev@vger.kernel.org
In-Reply-To: <50D965F9.7090007@cn.fujitsu.com>

Thanks for your suggestion,i will modify this patch and take tests.

2012/12/25 Gao feng <gaofeng@cn.fujitsu.com>:
> On 2012/12/25 15:25, canqun zhang wrote:
>> Hi Gao feng
>> The stack information is as follows. The kenel will panic because the
>> nf_ct_destroy is NULL.
>>
>> Reproduction:
>> (1) starting a lxc container
>> (2) iptables -t nat -A POSTROUTING -s 10.48.254.18 -o eth1 -j
>> MASQUERADE (run it on host machine)
>> (3) /etc/ini.d/iptables save (run it on host machine)
>> (4)/etc/init.d/iptables restart (run it on host machine)
>>
>
> Thanks!
> It seems that nf_conntrack_l[3,4]proto_unregister doesn't make sure
> nf_conns of the proto being destroyed.
>
> If I'm right, there is another problem even your fix this panic problem.
> the l3,14proto will be unregistered before all of it's nf_conns being destroyed.
> So even nf_ct_destroy is not NULL,in destroy_conntrack we are not able to
> find the right l4proto,the l4proto->destroy will be incorrect.resources will
> not be released correctly.
>
> So I think the root problem is we do register/unregister, set/unset both on the
> first net (init_net), Maybe it's better to do register set on the first net, and
> do unregister unset on the last net.

^ permalink raw reply

* Re: [PATCH 19/29] batman-adv: fix random jitter calculation
From: Antonio Quartulli @ 2012-12-25 11:26 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Simon Wunderlich,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Marek Lindner,
	David S. Miller
In-Reply-To: <1356315256-6572-20-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1372 bytes --]

On Mon, Dec 24, 2012 at 11:14:06AM +0900, Akinobu Mita wrote:
> batadv_iv_ogm_emit_send_time() attempts to calculates a random integer
> in the range of 'orig_interval +- BATADV_JITTER' by the below lines.
> 
>         msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER;
>         msecs += (random32() % 2 * BATADV_JITTER);
> 
> But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER'
> because '%' and '*' have same precedence and associativity is
> left-to-right.
> 
> This adds the parentheses at the appropriate position so that it matches
> original intension.
> 
> Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
> Cc: Simon Wunderlich <siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX@public.gmane.org>
> Cc: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
> Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org
> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---

Acked-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>

But I would suggest to apply this change to net, since it is a fix.

Cheers,


-- 
Antonio Quartulli

..each of us alone is worth nothing..
Ernesto "Che" Guevara

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 20/29] batman-adv: rename random32() to prandom_u32()
From: Antonio Quartulli @ 2012-12-25 11:30 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Simon Wunderlich,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Marek Lindner,
	David S. Miller
In-Reply-To: <1356315256-6572-21-git-send-email-akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

On Mon, Dec 24, 2012 at 11:14:07AM +0900, Akinobu Mita wrote:
> Use more preferable function name which implies using a pseudo-random
> number generator.
> 
> Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
> Cc: Simon Wunderlich <siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX@public.gmane.org>
> Cc: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
> Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org
> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Acked-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>

-- 
Antonio Quartulli

..each of us alone is worth nothing..
Ernesto "Che" Guevara

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 28/29] net/: rename net_random() to prandom_u32()
From: Akinobu Mita @ 2012-12-25 11:47 UTC (permalink / raw)
  To: Neil Horman
  Cc: linux-kernel, akpm, Jesse Gross, Venkat Venkatsubra,
	Vlad Yasevich, Sridhar Samudrala, Steffen Klassert, Herbert Xu,
	David S. Miller, linux-sctp, dev, netdev
In-Reply-To: <20121225002149.GA5235@neilslaptop.think-freely.org>

2012/12/25 Neil Horman <nhorman@tuxdriver.com>:
> On Mon, Dec 24, 2012 at 11:14:15AM +0900, Akinobu Mita wrote:
>> Use more preferable function name which implies using a pseudo-random
>> number generator.
>>
>> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
>> Cc: Jesse Gross <jesse@nicira.com>
>> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
>> Cc: Vlad Yasevich <vyasevich@gmail.com>
>> Cc: Sridhar Samudrala <sri@us.ibm.com>
>> Cc: Neil Horman <nhorman@tuxdriver.com>
>> Cc: Steffen Klassert <steffen.klassert@secunet.com>
>> Cc: Herbert Xu <herbert@gondor.apana.org.au>
>> Cc: "David S. Miller" <davem@davemloft.net>
>> Cc: linux-sctp@vger.kernel.org
>> Cc: dev@openvswitch.org
>> Cc: netdev@vger.kernel.org
>> ---
>>  include/net/red.h         | 2 +-
>>  net/802/garp.c            | 2 +-
>>  net/openvswitch/actions.c | 2 +-
>>  net/rds/bind.c            | 2 +-
>>  net/sctp/socket.c         | 2 +-
>>  net/xfrm/xfrm_state.c     | 2 +-
>>  6 files changed, 6 insertions(+), 6 deletions(-)
>>
> I'm largely indifferent to this patch, but I kind of feel like its just churn.
> Whats the real advantage in making this change?  I grant that it clearly
> indicates the type of random number generator we're using at a given call site,
> But for those using net_random, you probably don't care too much about
> the source of your random bits.  If you did really want true random vs.
> pseudo-random data, you need to explicitly use the right call.  You're previous
> patch series did good cleanup on differentiating the different random calls, but
> this just seems like its removing what is otherwise useful indirection.

I overlooked the importance of  net_random() indirection.
Thanks for the feedback. I'll leave all net_random() callers as-is in
the next version.

^ permalink raw reply

* Re: [PATCH 1/2] IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len
From: Bart Van Assche @ 2012-12-25 12:10 UTC (permalink / raw)
  To: Mike Marciniszyn
  Cc: venkat.x.venkatsubra-QHcLZuEGTsvQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A,
	rds-devel-N0ozoZBvEnrZJqsBc5GL+g, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20121221180149.23716.54707.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>

On 12/21/12 19:01, Mike Marciniszyn wrote:
> 0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs")
> added uses of sg_dma_len() and sg_dma_address(). This makes
> RDS DOA with the qib driver.
> 
> IB ulps should use ib_sg_dma_len() and ib_sg_dma_address
> respectively since some HCAs overload ib_sg_dma* operations.
> 
> Signed-off-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>   net/rds/ib_recv.c |    9 ++++++---
>   1 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
> index 8c5bc85..8eb9501 100644
> --- a/net/rds/ib_recv.c
> +++ b/net/rds/ib_recv.c
> @@ -339,8 +339,8 @@ static int rds_ib_recv_refill_one(struct rds_connection *conn,
>   	sge->length = sizeof(struct rds_header);
>   
>   	sge = &recv->r_sge[1];
> -	sge->addr = sg_dma_address(&recv->r_frag->f_sg);
> -	sge->length = sg_dma_len(&recv->r_frag->f_sg);
> +	sge->addr = ib_sg_dma_address(ic->i_cm_id->device, &recv->r_frag->f_sg);
> +	sge->length = ib_sg_dma_len(ic->i_cm_id->device, &recv->r_frag->f_sg);
>   
>   	ret = 0;
>   out:
> @@ -381,7 +381,10 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill)
>   		ret = ib_post_recv(ic->i_cm_id->qp, &recv->r_wr, &failed_wr);
>   		rdsdebug("recv %p ibinc %p page %p addr %lu ret %d\n", recv,
>   			 recv->r_ibinc, sg_page(&recv->r_frag->f_sg),
> -			 (long) sg_dma_address(&recv->r_frag->f_sg), ret);
> +			 (long) ib_sg_dma_address(
> +				ic->i_cm_id->device,
> +				&recv->r_frag->f_sg),
> +			ret);
>   		if (ret) {
>   			rds_ib_conn_error(conn, "recv post on "
>   			       "%pI4 returned %d, disconnecting and "

The above patch will slow down RDS for anyone who is not using QLogic
HCA's, isn' it ? How about replacing the above patch with the
(untested) patch below ?

diff --git a/drivers/infiniband/hw/ehca/ehca_mrmw.c b/drivers/infiniband/hw/ehca/ehca_mrmw.c
index 8784486..ec9adf8 100644
--- a/drivers/infiniband/hw/ehca/ehca_mrmw.c
+++ b/drivers/infiniband/hw/ehca/ehca_mrmw.c
@@ -2588,16 +2588,6 @@ static void ehca_dma_unmap_sg(struct ib_device *dev, struct scatterlist *sg,
 	/* This is only a stub; nothing to be done here */
 }
 
-static u64 ehca_dma_address(struct ib_device *dev, struct scatterlist *sg)
-{
-	return sg->dma_address;
-}
-
-static unsigned int ehca_dma_len(struct ib_device *dev, struct scatterlist *sg)
-{
-	return sg->length;
-}
-
 static void ehca_dma_sync_single_for_cpu(struct ib_device *dev, u64 addr,
 					 size_t size,
 					 enum dma_data_direction dir)
@@ -2650,8 +2640,6 @@ struct ib_dma_mapping_ops ehca_dma_mapping_ops = {
 	.unmap_page             = ehca_dma_unmap_page,
 	.map_sg                 = ehca_dma_map_sg,
 	.unmap_sg               = ehca_dma_unmap_sg,
-	.dma_address            = ehca_dma_address,
-	.dma_len                = ehca_dma_len,
 	.sync_single_for_cpu    = ehca_dma_sync_single_for_cpu,
 	.sync_single_for_device = ehca_dma_sync_single_for_device,
 	.alloc_coherent         = ehca_dma_alloc_coherent,
diff --git a/drivers/infiniband/hw/qib/qib_dma.c b/drivers/infiniband/hw/qib/qib_dma.c
index 2920bb3..3620c6c 100644
--- a/drivers/infiniband/hw/qib/qib_dma.c
+++ b/drivers/infiniband/hw/qib/qib_dma.c
@@ -104,7 +104,12 @@ static int qib_map_sg(struct ib_device *dev, struct scatterlist *sgl,
 	for_each_sg(sgl, sg, nents, i) {
 		addr = (u64) page_address(sg_page(sg));
 		/* TODO: handle highmem pages */
-		if (!addr) {
+		if (addr) {
+			sg->dma_address = addr + sg->offset;
+#ifdef CONFIG_NEED_SG_DMA_LENGTH
+			sg->dma_length = sg->length;
+#endif
+		} else {
 			ret = 0;
 			break;
 		}
@@ -119,21 +124,6 @@ static void qib_unmap_sg(struct ib_device *dev,
 	BUG_ON(!valid_dma_direction(direction));
 }
 
-static u64 qib_sg_dma_address(struct ib_device *dev, struct scatterlist *sg)
-{
-	u64 addr = (u64) page_address(sg_page(sg));
-
-	if (addr)
-		addr += sg->offset;
-	return addr;
-}
-
-static unsigned int qib_sg_dma_len(struct ib_device *dev,
-				   struct scatterlist *sg)
-{
-	return sg->length;
-}
-
 static void qib_sync_single_for_cpu(struct ib_device *dev, u64 addr,
 				    size_t size, enum dma_data_direction dir)
 {
@@ -173,8 +163,6 @@ struct ib_dma_mapping_ops qib_dma_mapping_ops = {
 	.unmap_page = qib_dma_unmap_page,
 	.map_sg = qib_map_sg,
 	.unmap_sg = qib_unmap_sg,
-	.dma_address = qib_sg_dma_address,
-	.dma_len = qib_sg_dma_len,
 	.sync_single_for_cpu = qib_sync_single_for_cpu,
 	.sync_single_for_device = qib_sync_single_for_device,
 	.alloc_coherent = qib_dma_alloc_coherent,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 46bc045..54a0689 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1049,10 +1049,6 @@ struct ib_dma_mapping_ops {
 	void		(*unmap_sg)(struct ib_device *dev,
 				    struct scatterlist *sg, int nents,
 				    enum dma_data_direction direction);
-	u64		(*dma_address)(struct ib_device *dev,
-				       struct scatterlist *sg);
-	unsigned int	(*dma_len)(struct ib_device *dev,
-				   struct scatterlist *sg);
 	void		(*sync_single_for_cpu)(struct ib_device *dev,
 					       u64 dma_handle,
 					       size_t size,
@@ -1863,12 +1859,13 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
  * ib_sg_dma_address - Return the DMA address from a scatter/gather entry
  * @dev: The device for which the DMA addresses were created
  * @sg: The scatter/gather entry
+ *
+ * Note: this function is obsolete. To do: change all occurrences of
+ * ib_sg_dma_address() into sg_dma_address().
  */
 static inline u64 ib_sg_dma_address(struct ib_device *dev,
 				    struct scatterlist *sg)
 {
-	if (dev->dma_ops)
-		return dev->dma_ops->dma_address(dev, sg);
 	return sg_dma_address(sg);
 }
 
@@ -1876,12 +1873,13 @@ static inline u64 ib_sg_dma_address(struct ib_device *dev,
  * ib_sg_dma_len - Return the DMA length from a scatter/gather entry
  * @dev: The device for which the DMA addresses were created
  * @sg: The scatter/gather entry
+ *
+ * Note: this function is obsolete. To do: change all occurrences of
+ * ib_sg_dma_len() into sg_dma_len().
  */
 static inline unsigned int ib_sg_dma_len(struct ib_device *dev,
 					 struct scatterlist *sg)
 {
-	if (dev->dma_ops)
-		return dev->dma_ops->dma_len(dev, sg);
 	return sg_dma_len(sg);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [ANNOUNCE] iptables 1.4.17 release
From: Pablo Neira Ayuso @ 2012-12-25 13:09 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, netfilter, netfilter-announce, lwn

[-- Attachment #1: Type: text/plain, Size: 429 bytes --]

Hi!

The Netfilter project proudly presents:

        iptables 1.4.17

More relevantly, this release includes the IPv6 NAT extensions from
Patrick McHardy, the ignore day transition from Florian Westphal and
a couple of fixes.

See ChangeLog that comes attached to this email for more details.

You can download it from:

http://www.netfilter.org/projects/iptables/downloads.html
ftp://ftp.netfilter.org/pub/iptables/

Have fun!

[-- Attachment #2: changes-iptables-1.4.17.txt --]
[-- Type: text/plain, Size: 744 bytes --]

Florian Westphal (1):
      libxt_time: add support to ignore day transition

Jozsef Kadlecsik (1):
      Manpage update: matches are evaluated in the order they are specified.

Pablo Neira Ayuso (2):
      Merge branch 'next' branch that contains new features scheduled for     Linux kernel 3.7
      bump version to 1.4.17

Patrick McHardy (7):
      Convert the NAT targets to use the kernel supplied nf_nat.h header
      extensions: add IPv6 MASQUERADE extension
      extensions: add IPv6 SNAT extension
      extensions: add IPv6 DNAT target
      extensions: add IPv6 REDIRECT extension
      extensions: add IPv6 NETMAP extension
      extensions: add NPT extension

Tom Eastep (1):
      extensions: libxt_statistic: Fix save output


^ permalink raw reply

* [PATCH] ndisc: Ensure to reserve header space for encapsulation.
From: YOSHIFUJI Hideaki @ 2012-12-25 14:39 UTC (permalink / raw)
  To: netdev, davem; +Cc: yoshfuji

We allocate sk_buff of MAX_HEADER + LL_RESERVED_SPACE(dev) + packet
length + dev->needed_tailroom, but reserved LL_RESERVED_SPACE(dev)
only.  This means that space for encapsulation was placed at the end
of buffer.  This does not make sense.

Reserve the space correctly, like this:

head                       data                   tail       end
+--------------------------------------------------------------+
+                |           |                      |          |
+--------------------------------------------------------------+
|<--MAX_HEADER-->|<-hlen---->|<---ipv6 packet------>|<--tlen-->|
                   =LL_
                    RESERVED_
                    SPACE(dev)

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
---
 net/ipv6/ndisc.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 6574175..5f78ac2 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -404,7 +404,7 @@ static struct sk_buff *ndisc_build_skb(struct net_device *dev,
 		return NULL;
 	}
 
-	skb_reserve(skb, hlen);
+	skb_reserve(skb, MAX_HEADER + hlen);
 	ip6_nd_hdr(sk, skb, dev, saddr, daddr, IPPROTO_ICMPV6, len);
 
 	skb->transport_header = skb->tail;
@@ -1449,7 +1449,7 @@ void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target)
 		goto release;
 	}
 
-	skb_reserve(buff, hlen);
+	skb_reserve(buff, MAX_HEADER + hlen);
 	ip6_nd_hdr(sk, buff, dev, &saddr_buf, &ipv6_hdr(skb)->saddr,
 		   IPPROTO_ICMPV6, len);
 
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH] ipv6 mcast: Fix incorrect use of pskb_may_pull().
From: YOSHIFUJI Hideaki @ 2012-12-25 14:41 UTC (permalink / raw)
  To: netdev, davem; +Cc: yoshfuji

pskb_may_pull(skb, len) ensures that data between
skb->data and skb->data + len - 1 can be accessed as a
linear buffer.  When pskb_may_pull() is being used multiple
times for the same buffer without skb_pull(), the length
is not accumulated internally, and caller must do.

For example, assuming that we have done:
  pskb_may_pull(skb, sizeof(struct icmp6hdr));
Afterwards, we have to do:
  pskb_may_pull(skb, sizeof(struct mldv2_query))
instead of:
  pskb_may_pull(skb, sizeof(struct mldv2_query) -
                sizeof(struct icmp6hdr))

This incorrect use may cause bad thing if someone sends a message
with extension headers so that the message is fragmented just
after the icmpv6 header.

Of course we can try pulling step by step but there is almost
no benefit to doing so;  MLD is our final protocol and we are
going to access almost the whole message as a linear buffer
anyway.

So, let's linearlize the buffer just after the trivial
sanity checks on IPv6 and ICMPv6 header in
igmp6_event_{query,report}().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
---
 net/ipv6/mcast.c |   22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 28dfa5f..6b42b563 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1124,9 +1124,6 @@ int igmp6_event_query(struct sk_buff *skb)
 	int mark = 0;
 	int len;
 
-	if (!pskb_may_pull(skb, sizeof(struct in6_addr)))
-		return -EINVAL;
-
 	/* compute payload length excluding extension headers */
 	len = ntohs(ipv6_hdr(skb)->payload_len) + sizeof(struct ipv6hdr);
 	len -= skb_network_header_len(skb);
@@ -1136,10 +1133,12 @@ int igmp6_event_query(struct sk_buff *skb)
 		return -EINVAL;
 
 	idev = __in6_dev_get(skb->dev);
-
 	if (idev == NULL)
 		return 0;
 
+	if (!pskb_may_pull(skb, skb->len))
+		return -EINVAL;
+
 	mld = (struct mld_msg *)icmp6_hdr(skb);
 	group = &mld->mld_mca;
 	group_type = ipv6_addr_type(group);
@@ -1165,11 +1164,6 @@ int igmp6_event_query(struct sk_buff *skb)
 		/* clear deleted report items */
 		mld_clear_delrec(idev);
 	} else if (len >= 28) {
-		int srcs_offset = sizeof(struct mld2_query) -
-				  sizeof(struct icmp6hdr);
-		if (!pskb_may_pull(skb, srcs_offset))
-			return -EINVAL;
-
 		mlh2 = (struct mld2_query *)skb_transport_header(skb);
 		max_delay = (MLDV2_MRC(ntohs(mlh2->mld2q_mrc))*HZ)/1000;
 		if (!max_delay)
@@ -1186,10 +1180,6 @@ int igmp6_event_query(struct sk_buff *skb)
 		}
 		/* mark sources to include, if group & source-specific */
 		if (mlh2->mld2q_nsrcs != 0) {
-			if (!pskb_may_pull(skb, srcs_offset +
-			    ntohs(mlh2->mld2q_nsrcs) * sizeof(struct in6_addr)))
-				return -EINVAL;
-
 			mlh2 = (struct mld2_query *)skb_transport_header(skb);
 			mark = 1;
 		}
@@ -1248,9 +1238,6 @@ int igmp6_event_report(struct sk_buff *skb)
 	    skb->pkt_type != PACKET_BROADCAST)
 		return 0;
 
-	if (!pskb_may_pull(skb, sizeof(*mld) - sizeof(struct icmp6hdr)))
-		return -EINVAL;
-
 	mld = (struct mld_msg *)icmp6_hdr(skb);
 
 	/* Drop reports with not link local source */
@@ -1263,6 +1250,9 @@ int igmp6_event_report(struct sk_buff *skb)
 	if (idev == NULL)
 		return -ENODEV;
 
+	if (!pskb_may_pull(skb, sizeof(*mld)))
+		return -EINVAL;
+
 	/*
 	 *	Cancel the timer for this group
 	 */
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] ipv6 mcast: Fix incorrect use of pskb_may_pull().
From: Eric Dumazet @ 2012-12-25 17:27 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki; +Cc: netdev, davem
In-Reply-To: <50D9BB19.2080801@linux-ipv6.org>

On Tue, 2012-12-25 at 23:41 +0900, YOSHIFUJI Hideaki wrote:

> +	if (!pskb_may_pull(skb, skb->len))
> +		return -EINVAL;
> +

skb_linearize(skb) might be more explicit then.

^ permalink raw reply

* Re: Why tcp_sacktag_walk specially process next_dup?
From: Ilpo Järvinen @ 2012-12-25 20:57 UTC (permalink / raw)
  To: Yi Li; +Cc: Netdev
In-Reply-To: <50D9817D.2010206@gmail.com>

On Tue, 25 Dec 2012, Yi Li wrote:

> Hi Ilpo,
> I am a kernel newbie, maybe this question is simple.
> If you have some free time, could you help me ?
>
> I am reading your commit
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=68f8353b480e5f2e136c38a511abdbb88eaa8ce2,
> through this code path:
> 
> tcp_sacktag_write_queue() {
>     if (tcp_sack_cache_ok(tp, cache) && !dup_sack &&
>             after(end_seq, cache->start_seq)) {
> 
>             /* Head todo? */
>             if (before(start_seq, cache->start_seq)) {
>                         skb = tcp_sacktag_skip(skb, sk, &state,
>                                start_seq);
>                 skb = tcp_sacktag_walk(skb, sk, next_dup,
>                                &state,
>                                start_seq,
>                                cache->start_seq,
>                                dup_sack);
>             }
> 
> }
> 
> and when we come to tcp_sacktag_walk(), comparing the current processing sack
> block
> with cache, we have:  start_seq < cache->start_seq, and we now need to
> process the
> bytes between (start_seq, cache->start_seq) in tcp_write_queue.
> 
> But in tcp_sacktag_walk(), why we first check the seqence space in next_dup ?
> I know this is about D-SACK, and I have read the rfc2883, but I am still
> confused.
> I have some questions:
> 1. Why we introduce a next_dup variable in SACK processing, is it better for
> performance optimization?

IIRC, it is optimization, probably to avoid need to do non-in-order ops 
(it's few years since I wrote that already :)). But I'll take a look later 
again.

>      As there is dup_sack variable, will this pre-processing of sack block be
> mixed with dup_sack ?

IIRC, dup_sack tells that we're processing DSACK currently.

> 2. What does this test statement means in tcp_sacktag_walk:
>      if ((next_dup != NULL) &&
>             before(TCP_SKB_CB(skb)->seq, next_dup->end_seq)) {
> ---------------------> A
>             in_sack = tcp_match_skb_to_sack(sk, skb,
>                             next_dup->start_seq,
>                             next_dup->end_seq);
>             if (in_sack > 0)
>                 dup_sack = true;
>         }
> as far as i know, if tcp_skb_pcout(skb)>1, this condition maybe exist:
>        skb->seq   < 
> current_sack_block.start_seq < current_sack_block.end_seq <
> next_dup->start_seq < next_dup->end_seq.
>
> So, I do not understand what the code A really does.

Then tcp_match_skb_to_sack won't match this skb but just splits first if 
necessary. It only does work if there's something to do already there 
(but I cannot fully check the code easily now so you might have some 
point I cannot follow based on this email alone).

BTW, these two conditions will always hold nowadays due to SACK block 
validation:

> current_sack_block.start_seq < current_sack_block.end_seq
> next_dup->start_seq < next_dup->end_seq.


-- 
 i.

^ permalink raw reply

* Re: [PATCH 19/29] batman-adv: fix random jitter calculation
From: Akinobu Mita @ 2012-12-25 21:35 UTC (permalink / raw)
  To: Antonio Quartulli
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Simon Wunderlich,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Marek Lindner,
	David S. Miller
In-Reply-To: <20121225112657.GC2604-E/2OGukznS5g9hUCZPvPmw@public.gmane.org>

2012/12/25 Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>:
> On Mon, Dec 24, 2012 at 11:14:06AM +0900, Akinobu Mita wrote:
>> batadv_iv_ogm_emit_send_time() attempts to calculates a random integer
>> in the range of 'orig_interval +- BATADV_JITTER' by the below lines.
>>
>>         msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER;
>>         msecs += (random32() % 2 * BATADV_JITTER);
>>
>> But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER'
>> because '%' and '*' have same precedence and associativity is
>> left-to-right.
>>
>> This adds the parentheses at the appropriate position so that it matches
>> original intension.
>>
>> Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Cc: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
>> Cc: Simon Wunderlich <siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX@public.gmane.org>
>> Cc: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
>> Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org
>> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> ---
>
> Acked-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
>
> But I would suggest to apply this change to net, since it is a fix.

I agree.
David, please consider to apply this patch for 3.8-rc*.

^ permalink raw reply

* Re: [PATCH 19/29] batman-adv: fix random jitter calculation
From: David Miller @ 2012-12-25 21:37 UTC (permalink / raw)
  To: akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, lindner_marek-LWAfsSFWpa4
In-Reply-To: <CAC5umyh1d9XxH2We4e2rjV-OYhbP2ObGC_iwrDeff2hphJ7TzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

From: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Wed, 26 Dec 2012 06:35:37 +0900

> 2012/12/25 Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>:
>> On Mon, Dec 24, 2012 at 11:14:06AM +0900, Akinobu Mita wrote:
>>> batadv_iv_ogm_emit_send_time() attempts to calculates a random integer
>>> in the range of 'orig_interval +- BATADV_JITTER' by the below lines.
>>>
>>>         msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER;
>>>         msecs += (random32() % 2 * BATADV_JITTER);
>>>
>>> But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER'
>>> because '%' and '*' have same precedence and associativity is
>>> left-to-right.
>>>
>>> This adds the parentheses at the appropriate position so that it matches
>>> original intension.
>>>
>>> Signed-off-by: Akinobu Mita <akinobu.mita-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>> Cc: Marek Lindner <lindner_marek-LWAfsSFWpa4@public.gmane.org>
>>> Cc: Simon Wunderlich <siwu-MaAgPAbsBIVS8oHt8HbXEIQuADTiUCJX@public.gmane.org>
>>> Cc: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
>>> Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org
>>> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
>>> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> ---
>>
>> Acked-by: Antonio Quartulli <ordex-GaUfNO9RBHfsrOwW+9ziJQ@public.gmane.org>
>>
>> But I would suggest to apply this change to net, since it is a fix.
> 
> I agree.
> David, please consider to apply this patch for 3.8-rc*.
> 

All patches I should consider seriously should be properly
reposted for review and inclusion.

^ permalink raw reply

* Re: [PATCH 28/29] net/: rename net_random() to prandom_u32()
From: Neil Horman @ 2012-12-26  0:42 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: linux-kernel, akpm, Jesse Gross, Venkat Venkatsubra,
	Vlad Yasevich, Sridhar Samudrala, Steffen Klassert, Herbert Xu,
	David S. Miller, linux-sctp, dev, netdev
In-Reply-To: <CAC5umyhhTJWfsFtZ_5Qm_HJHy2+sx7iMgcV=T=MMKLdi50-HQw@mail.gmail.com>

On Tue, Dec 25, 2012 at 08:47:26PM +0900, Akinobu Mita wrote:
> 2012/12/25 Neil Horman <nhorman@tuxdriver.com>:
> > On Mon, Dec 24, 2012 at 11:14:15AM +0900, Akinobu Mita wrote:
> >> Use more preferable function name which implies using a pseudo-random
> >> number generator.
> >>
> >> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> >> Cc: Jesse Gross <jesse@nicira.com>
> >> Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
> >> Cc: Vlad Yasevich <vyasevich@gmail.com>
> >> Cc: Sridhar Samudrala <sri@us.ibm.com>
> >> Cc: Neil Horman <nhorman@tuxdriver.com>
> >> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> >> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> >> Cc: "David S. Miller" <davem@davemloft.net>
> >> Cc: linux-sctp@vger.kernel.org
> >> Cc: dev@openvswitch.org
> >> Cc: netdev@vger.kernel.org
> >> ---
> >>  include/net/red.h         | 2 +-
> >>  net/802/garp.c            | 2 +-
> >>  net/openvswitch/actions.c | 2 +-
> >>  net/rds/bind.c            | 2 +-
> >>  net/sctp/socket.c         | 2 +-
> >>  net/xfrm/xfrm_state.c     | 2 +-
> >>  6 files changed, 6 insertions(+), 6 deletions(-)
> >>
> > I'm largely indifferent to this patch, but I kind of feel like its just churn.
> > Whats the real advantage in making this change?  I grant that it clearly
> > indicates the type of random number generator we're using at a given call site,
> > But for those using net_random, you probably don't care too much about
> > the source of your random bits.  If you did really want true random vs.
> > pseudo-random data, you need to explicitly use the right call.  You're previous
> > patch series did good cleanup on differentiating the different random calls, but
> > this just seems like its removing what is otherwise useful indirection.
> 
> I overlooked the importance of  net_random() indirection.
> Thanks for the feedback. I'll leave all net_random() callers as-is in
> the next version.
Well, I guess I should qualify my opinion.  I find it useful personally (the
generation of nonces in many cases can be left to most any pseudo random
generator that the system deems is a 'good enough' balance between a fast
generator that doesn't block on low entropy and a reasonably secure one that
doesn't allow for easy prediction.  As those needs and factors change, its nice
to have a set point to change them at.  If you (or anyone else has a differing
opinion, I'm happy to listen to it.


Regards
Neil

> 

^ permalink raw reply

* Re: [PATCH 4/4 v2] net/smsc911x: Provide common clock functionality
From: Linus Walleij @ 2012-12-26  0:51 UTC (permalink / raw)
  To: Lee Jones
  Cc: Russell King - ARM Linux, Steve Glendinning, Robert Marklund,
	linus.walleij, arnd, netdev, linux-kernel, linux-arm-kernel
In-Reply-To: <20121221114105.GP2691@gmail.com>

On Fri, Dec 21, 2012 at 12:41 PM, Lee Jones <lee.jones@linaro.org> wrote:

> +       if (IS_ERR(pdata->clk)) {
> +               ret = clk_prepare_enable(pdata->clk);
> +               if (ret < 0)
> +                       netdev_err(ndev, "failed to enable clock %d\n", ret);
> +       }

I think you got all of these backwards now, shouldn't it be if
(!IS_ERR(pdata->clk)) { } ...?

It's late here but enlighten me if I don't get it.

> +       if (IS_ERR(pdata->clk))
> +               clk_disable_unprepare(pdata->clk);

Dito.

> +       /* Request clock */
> +       pdata->clk = clk_get(&pdev->dev, NULL);
> +       if (IS_ERR(pdata->clk))
> +               netdev_warn(ndev, "couldn't get clock %li\n", PTR_ERR(pdata->clk));

This one seems correct though.

> +       /* Free clock */
> +       if (IS_ERR(pdata->clk)) {
> +               clk_put(pdata->clk);
> +               pdata->clk = NULL;
> +       }

Should be !IS_ERR()

Yours,
Linus Walleij

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox