Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label
From: Paul Gortmaker @ 2012-05-25 19:05 UTC (permalink / raw)
  To: David Miller
  Cc: jon.maloy, netdev, tipc-discussion, ying.xue, erik.hugne,
	allan.stephens, maloy
In-Reply-To: <20120524.161231.1058511318935925082.davem@davemloft.net>

[Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label] On 24/05/2012 (Thu 16:12) David Miller wrote:

> From: Paul Gortmaker <paul.gortmaker@windriver.com>
> Date: Thu, 24 May 2012 15:58:16 -0400
> 
> > But for new TIPC development features, future direction, and things like
> > that -- making the right call requires intimate understanding of TIPC
> > and its users, which is something that a maintainer should have but
> > something I know I don't have.  (A man has to know his limitations.)
> > 
> > In this context, I'm not talking about these three trivial patches; but
> > more complicated stuff that I imagine will be floated in the future.
> > 
> > To that end, I can still review and call out issues in a crap patch when
> > I see them.  But I'd like to see new stuff sent to netdev, so that folks
> > smarter than me have a chance to catch when a patch appears generally OK
> > but is architecturally the wrong direction etc.
> 
> For maintainership, taste is more important than deep knowledge of the
> specific technology.  Worst case you ask the submitter to explain the
> background of their change more thoroughly and that information is an
> absolutely requirement in the commit message and code comments
> anyways.

OK, what I'm hearing is that you'd prefer I continue to collect up TIPC
patches and issue pull requests for a while longer.  I can do that.  Any
specifics of how you'd like things done? -- e.g. if the reviews of new
TIPC development patches takes place here on netdev before I stage them,
will that create extra work for you dealing with them in patchworks?

Paul.

^ permalink raw reply

* Re: [PATCH] gianfar:don't add FCB length to hard_header_len
From: Joe Perches @ 2012-05-25 19:51 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: Jan Ceuleers, David Miller, b06378, netdev, linuxppc-dev
In-Reply-To: <20120525155820.GA25102@windriver.com>

On Fri, 2012-05-25 at 11:58 -0400, Paul Gortmaker wrote:
> But you really shouldn't need the hardware to validate this kind of
> patch anyways -- aside from your code flow change in the irq routine of
> gianfar_ptp, you should have been simply able to check for object file
> equivalence before and after your change.

No cross compiler either, and I'm lazy 'bout that...

cheers, Joe

^ permalink raw reply

* Re: [PATCH] gianfar:don't add FCB length to hard_header_len
From: Paul Gortmaker @ 2012-05-25 20:04 UTC (permalink / raw)
  To: Joe Perches; +Cc: Jan Ceuleers, David Miller, b06378, netdev, linuxppc-dev
In-Reply-To: <1337975465.30100.4.camel@joe2Laptop>

On 12-05-25 03:51 PM, Joe Perches wrote:
> On Fri, 2012-05-25 at 11:58 -0400, Paul Gortmaker wrote:
>> But you really shouldn't need the hardware to validate this kind of
>> patch anyways -- aside from your code flow change in the irq routine of
>> gianfar_ptp, you should have been simply able to check for object file
>> equivalence before and after your change.
> 
> No cross compiler either, and I'm lazy 'bout that...

Can't get much easier than using one of these:

http://www.kernel.org/pub/tools/crosstool/

Just untar, export PATH ARCH CROSS_COMPILE and go.

Can't get much lazier than that. Great to have around.

Paul.

> 
> cheers, Joe
> 

^ permalink raw reply

* [PATCH] net: compute a more reasonable default ip6_rt_max_size
From: Arun Sharma @ 2012-05-25 20:15 UTC (permalink / raw)
  To: netdev; +Cc: Arun Sharma, linux-kernel, David Miller

The algorithm is based on ipv4 and alloc_large_system_hash().

The following data is from a x86_64 box I tested:

128MB
$ cat /proc/sys/net/ipv{4,6}/route/max_size
16384
22444

512MB
$ cat /proc/sys/net/ipv{4,6}/route/max_size
65536
99856

1GB
$ cat /proc/sys/net/ipv{4,6}/route/max_size
524288
203068

2GB
$ cat /proc/sys/net/ipv{4,6}/route/max_size
1048576
524288

4GB
$ cat /proc/sys/net/ipv{4,6}/route/max_size
2097152
524288

Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
---
 net/ipv6/route.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 49d6ce1..c89ebbb 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2827,6 +2827,16 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
 }
 #endif
 
+static __initdata unsigned long ip6_rt_entries;
+static int __init set_rt_entries(char *str)
+{
+	if (!str)
+		return 0;
+	ip6_rt_entries = simple_strtoul(str, &str, 0);
+	return 1;
+}
+__setup("ip6_rt_entries=", set_rt_entries);
+
 static int __net_init ip6_route_net_init(struct net *net)
 {
 	int ret = -ENOMEM;
@@ -2872,8 +2882,17 @@ static int __net_init ip6_route_net_init(struct net *net)
 			 ip6_template_metrics, true);
 #endif
 
+	/* Compute a reasonable default based on what we do for ipv4
+	 * total size = 1/16th of total RAM
+	 * No more than 512k entries unless overridden on kernel cmdline */
+        if (ip6_rt_entries == 0) {
+		ip6_rt_entries = (totalram_pages << PAGE_SHIFT) >> 4;
+		ip6_rt_entries /= sizeof(struct rt6_info);
+		ip6_rt_entries = min(512 * 1024UL, ip6_rt_entries);
+        }
+
 	net->ipv6.sysctl.flush_delay = 0;
-	net->ipv6.sysctl.ip6_rt_max_size = 4096;
+	net->ipv6.sysctl.ip6_rt_max_size = ip6_rt_entries;
 	net->ipv6.sysctl.ip6_rt_gc_min_interval = HZ / 2;
 	net->ipv6.sysctl.ip6_rt_gc_timeout = 60*HZ;
 	net->ipv6.sysctl.ip6_rt_gc_interval = 30*HZ;
-- 
1.7.8.4

^ permalink raw reply related

* Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label
From: David Miller @ 2012-05-25 20:24 UTC (permalink / raw)
  To: paul.gortmaker; +Cc: jon.maloy, netdev, tipc-discussion, allan.stephens, maloy
In-Reply-To: <20120525190506.GB25102@windriver.com>

From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Fri, 25 May 2012 15:05:06 -0400

> OK, what I'm hearing is that you'd prefer I continue to collect up TIPC
> patches and issue pull requests for a while longer.  I can do that.  Any
> specifics of how you'd like things done? -- e.g. if the reviews of new
> TIPC development patches takes place here on netdev before I stage them,
> will that create extra work for you dealing with them in patchworks?

When you want the TIPC patches reviewed here you can simply post the
set, and since you're informing me how this will work I'll know that
you'll send me a pull request later, and therefore I can mark the
patches as "Awaiting Upstream" or similar in patchwork.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

^ permalink raw reply

* WARNING: at net/ipv4/tcp.c:1610 tcp_recvmsg+0xb1b/0xc70()
From: Jack Stone @ 2012-05-25 20:25 UTC (permalink / raw)
  To: davem, netdev, Linux Kernel

Hi All,

The following warning keeps hitting me. I couldn't get the first one - it had already left dmesg hence the W taint.
The C taint is from r8712u from staging.

I've seen it with 3.4.0-076444-g07acfc2 (recent Linus tree) and 3.4.0-rc3-00089-gc6f5c93.

I am going to attempt to bisect it now.

[ 3896.037489] ------------[ cut here ]------------
[ 3896.037490] WARNING: at net/ipv4/tcp.c:1610 tcp_recvmsg+0xb1b/0xc70()
[ 3896.037491] Hardware name: System Product Name
[ 3896.037491] recvmsg bug 2: copied 3F1199D7 seq 3F1199D7 rcvnxt 3F119A71 fl 0
[ 3896.037511] Modules linked in: fuse ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge rfcomm lockd 8021q garp stp llc bnep nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_conntrack_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables vhost_net snd_hda_codec_hdmi macvtap macvlan tun snd_hda_codec_realtek virtio_net btusb bluetooth coretemp kvm_intel kvm snd_hda_intel r8712u(C) snd_hda_codec snd_hwdep e1000e joydev snd_seq snd_seq_device snd_pcm snd_timer snd sunrpc eeepc_wmi asus_wmi hid_logitech_dj sparse_keymap mxm_wmi soundcore iTCO_wdt rfkill snd_page_alloc wmi i2c_i801 pcspkr iTCO_vendor_support serio_raw binfmt_misc uinput microcode crc32c_intel ghash_clmulni_intel firewire_ohci fi
 rewire_core crc_itu_t [last unloaded: scsi_wait_scan]
[ 3896.037512] Pid: 3926, comm: spotify Tainted: G        WC   3.4.0-07644-g07acfc2 #2
[ 3896.037513] Call Trace:
[ 3896.037514]  [<ffffffff8106010f>] warn_slowpath_common+0x7f/0xc0
[ 3896.037515]  [<ffffffff81060206>] warn_slowpath_fmt+0x46/0x50
[ 3896.037517]  [<ffffffff8163f4c5>] ? tcp_recvmsg+0x35/0xc70
[ 3896.037518]  [<ffffffff812c130f>] ? avc_has_perm_flags+0xef/0x230
[ 3896.037519]  [<ffffffff812c125c>] ? avc_has_perm_flags+0x3c/0x230
[ 3896.037520]  [<ffffffff8163ffab>] tcp_recvmsg+0xb1b/0xc70
[ 3896.037522]  [<ffffffff8166a8c0>] ? inet_sendmsg+0x230/0x230
[ 3896.037523]  [<ffffffff8166a9f7>] inet_recvmsg+0x137/0x250
[ 3896.037525]  [<ffffffff815d7f58>] ? sock_update_classid+0x128/0x310
[ 3896.037526]  [<ffffffff815cfe40>] do_sock_read+0xf0/0x110
[ 3896.037527]  [<ffffffff815d0b8c>] sock_aio_read.part.5+0x4c/0x70
[ 3896.037528]  [<ffffffff812c130f>] ? avc_has_perm_flags+0xef/0x230
[ 3896.037530]  [<ffffffff815d0bb0>] ? sock_aio_read.part.5+0x70/0x70
[ 3896.037531]  [<ffffffff815d0bdd>] sock_aio_read+0x2d/0x40
[ 3896.037532]  [<ffffffff811bc2b3>] do_sync_readv_writev+0xd3/0x110
[ 3896.037534]  [<ffffffff812beca6>] ? security_file_permission+0x96/0xb0
[ 3896.037535]  [<ffffffff811bb9a1>] ? rw_verify_area+0x61/0x100
[ 3896.037537]  [<ffffffff811bc584>] do_readv_writev+0xd4/0x1d0
[ 3896.037538]  [<ffffffff811bdad8>] ? fget_light+0x48/0x4f0
[ 3896.037540]  [<ffffffff811bdad8>] ? fget_light+0x48/0x4f0
[ 3896.037541]  [<ffffffff811bc71c>] vfs_readv+0x3c/0x50
[ 3896.037543]  [<ffffffff811bc77d>] sys_readv+0x4d/0xc0
[ 3896.037544]  [<ffffffff8174c829>] system_call_fastpath+0x16/0x1b
[ 3896.037545] ---[ end trace 762b4689c56af7ab ]---

The relevant code from tcp.c is:

		/* Next get a buffer. */

                skb_queue_walk(&sk->sk_receive_queue, skb) {
                        /* Now that we have two receive queues this
                         * shouldn't happen.
                         */
                        if (WARN(before(*seq, TCP_SKB_CB(skb)->seq),
                                 "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n",
                                 *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt,
                                 flags))
                                break;

                        offset = *seq - TCP_SKB_CB(skb)->seq;
                        if (tcp_hdr(skb)->syn)
                                offset--;
                        if (offset < skb->len)
                                goto found_ok_skb;
                        if (tcp_hdr(skb)->fin)
                                goto found_fin_ok;
This warn here ----->        WARN(!(flags & MSG_PEEK),
                             "recvmsg bug 2: copied %X seq %X rcvnxt %X fl %X\n",
                             *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt, flags);
                }

Thanks,

Jack

^ permalink raw reply

* Re: [PATCH] net: compute a more reasonable default ip6_rt_max_size
From: David Miller @ 2012-05-25 20:26 UTC (permalink / raw)
  To: asharma; +Cc: netdev, linux-kernel
In-Reply-To: <1337976934-18065-1-git-send-email-asharma@fb.com>

From: Arun Sharma <asharma@fb.com>
Date: Fri, 25 May 2012 13:15:34 -0700

> +	/* Compute a reasonable default based on what we do for ipv4
> +	 * total size = 1/16th of total RAM
> +	 * No more than 512k entries unless overridden on kernel cmdline */

Please format this comment correctly:

	/* Compute a reasonable default based on what we do for ipv4
	 * total size = 1/16th of total RAM
	 * No more than 512k entries unless overridden on kernel cmdline
	 */

^ permalink raw reply

* Re: WARNING: at net/ipv4/tcp.c:1610 tcp_recvmsg+0xb1b/0xc70()
From: Eric Dumazet @ 2012-05-25 20:45 UTC (permalink / raw)
  To: Jack Stone; +Cc: davem, netdev, Linux Kernel
In-Reply-To: <4FBFEACC.8040601@fastmail.fm>

On Fri, 2012-05-25 at 21:25 +0100, Jack Stone wrote:
> Hi All,
> 
> The following warning keeps hitting me. I couldn't get the first one - it had already left dmesg hence the W taint.
> The C taint is from r8712u from staging.
> 
> I've seen it with 3.4.0-076444-g07acfc2 (recent Linus tree) and 3.4.0-rc3-00089-gc6f5c93.
> 
> I am going to attempt to bisect it now.

No need, update your tree.

^ permalink raw reply

* Re: [PATCH] net: compute a more reasonable default ip6_rt_max_size
From: Eric Dumazet @ 2012-05-25 20:47 UTC (permalink / raw)
  To: Arun Sharma; +Cc: netdev, linux-kernel, David Miller
In-Reply-To: <1337976934-18065-1-git-send-email-asharma@fb.com>

On Fri, 2012-05-25 at 13:15 -0700, Arun Sharma wrote:
> The algorithm is based on ipv4 and alloc_large_system_hash().
> 

Why is it needed at all ?

IPv4 has a route cache with potentially millions of entries, not IPv6.

^ permalink raw reply

* Re: WARNING: at net/ipv4/tcp.c:1610 tcp_recvmsg+0xb1b/0xc70()
From: Eric Dumazet @ 2012-05-25 20:55 UTC (permalink / raw)
  To: Jack Stone; +Cc: davem, netdev, Linux Kernel
In-Reply-To: <1337978725.10135.0.camel@edumazet-glaptop>

On Fri, 2012-05-25 at 22:45 +0200, Eric Dumazet wrote:
> On Fri, 2012-05-25 at 21:25 +0100, Jack Stone wrote:
> > Hi All,
> > 
> > The following warning keeps hitting me. I couldn't get the first one - it had already left dmesg hence the W taint.
> > The C taint is from r8712u from staging.
> > 
> > I've seen it with 3.4.0-076444-g07acfc2 (recent Linus tree) and 3.4.0-rc3-00089-gc6f5c93.
> > 
> > I am going to attempt to bisect it now.
> 
> No need, update your tree.
> 
> 


http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=1ca7ee30630e1022dbcf1b51be20580815ffab73

^ permalink raw reply

* Re: WARNING: at net/ipv4/tcp.c:1610 tcp_recvmsg+0xb1b/0xc70()
From: Jack Stone @ 2012-05-25 21:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, Linux Kernel
In-Reply-To: <1337979331.10135.2.camel@edumazet-glaptop>

On 05/25/2012 09:55 PM, Eric Dumazet wrote:
> On Fri, 2012-05-25 at 22:45 +0200, Eric Dumazet wrote:
>> No need, update your tree.
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=1ca7ee30630e1022dbcf1b51be20580815ffab73
> 

Thank you. Rebuilding now.

Jack

^ permalink raw reply

* Inadvertently sending a Christmas Tree TCP packet
From: Earl Chew @ 2012-05-25 22:15 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org; +Cc: netdev
In-Reply-To: <4FBFCFCA.2090501@ixiacom.com>

I had previously observed the following behaviour captured from WireShark:

16220	111.075627	10.64.33.43	10.128.163.100	TCP	59253 > exec [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=2
16222	0.203210	10.128.163.100	10.64.33.43	TCP	exec > 59253 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1250 WS=7
16223	0.000032	10.64.33.43	10.128.163.100	TCP	59253 > exec [ACK] Seq=1 Ack=1 Win=65532 Len=0
... snip ...
16237	0.000319	10.128.163.100	10.64.33.43	TCP	exec > 59253 [FIN, PSH, ACK, URG] Seq=31 Ack=30 Win=5888 Urg=1 Len=1
16240	1.114085	10.128.163.100	10.64.33.43	TCP	[TCP Retransmission] exec > 59253 [FIN, PSH, ACK, URG] Seq=31 Ack=30 Win=5888 Urg=1 Len=1


These packets were sent from an application running on Linux 2.6.18. 

The receiver has become confused, and the so the Linux sender retransmits at packet 16240,
and continues retransmitting. In this case, the application code at the receiver is blocked
indefinitely trying to read a socket that seemingly has (URG) data and yet at the same time
doesn't have any more data (FIN).

Looking at the 2.6.18 source code for tcp_output.c, I see code at tcp_send_fin()
that is attaching FIN to the packet.

The code in 3.4 seems fairly much the same:


/* Send a fin.  The caller locks the socket for us.  This cannot be
 * allowed to fail queueing a FIN frame under any circumstances.
 */
void tcp_send_fin(struct sock *sk)
{
	struct tcp_sock *tp = tcp_sk(sk);
	struct sk_buff *skb = tcp_write_queue_tail(sk);
	int mss_now;

	/* Optimization, tack on the FIN if we have a queue of
	 * unsent frames.  But be careful about outgoing SACKS
	 * and IP options.
	 */
	mss_now = tcp_current_mss(sk);

	if (tcp_send_head(sk) != NULL) {
		TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_FIN;
		TCP_SKB_CB(skb)->end_seq++;
		tp->write_seq++;
	} else {



The comment block says to be careful about IP options, but
the code doesn't appear to worry too much.

Is something like:

	if (tcp_send_head(sk) != NULL &&
		TCP_SKB_CB(skb)->tcp_flags == 0)

more appropriate ?


Earl

^ permalink raw reply

* Re: [PATCH] net: compute a more reasonable default ip6_rt_max_size
From: Arun Sharma @ 2012-05-25 22:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel, David Miller
In-Reply-To: <1337978820.10135.1.camel@edumazet-glaptop>

On 5/25/12 1:47 PM, Eric Dumazet wrote:
> On Fri, 2012-05-25 at 13:15 -0700, Arun Sharma wrote:
>> The algorithm is based on ipv4 and alloc_large_system_hash().
>>
>
> Why is it needed at all ?
>
> IPv4 has a route cache with potentially millions of entries, not IPv6.

With the default size of 4096 for the ipv6 routing table, entries can 
get garbage collected and hosts could lose their default route and 
therefore lose connectivity.

We actually saw it happen.

  -Arun

^ permalink raw reply

* [PATCH] net: compute a more reasonable default ip6_rt_max_size (v2)
From: Arun Sharma @ 2012-05-25 22:26 UTC (permalink / raw)
  To: netdev; +Cc: Arun Sharma, linux-kernel, David Miller

The algorithm is based on ipv4 and alloc_large_system_hash().

The following data is from a x86_64 box I tested:

128MB
$ cat /proc/sys/net/ipv{4,6}/route/max_size
16384
22444

512MB
$ cat /proc/sys/net/ipv{4,6}/route/max_size
65536
99856

1GB
$ cat /proc/sys/net/ipv{4,6}/route/max_size
524288
203068

2GB
$ cat /proc/sys/net/ipv{4,6}/route/max_size
1048576
524288

4GB
$ cat /proc/sys/net/ipv{4,6}/route/max_size
2097152
524288

Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
---
 net/ipv6/route.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 49d6ce1..bf85926 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2827,6 +2827,16 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
 }
 #endif
 
+static __initdata unsigned long ip6_rt_entries;
+static int __init set_rt_entries(char *str)
+{
+	if (!str)
+		return 0;
+	ip6_rt_entries = simple_strtoul(str, &str, 0);
+	return 1;
+}
+__setup("ip6_rt_entries=", set_rt_entries);
+
 static int __net_init ip6_route_net_init(struct net *net)
 {
 	int ret = -ENOMEM;
@@ -2872,8 +2882,17 @@ static int __net_init ip6_route_net_init(struct net *net)
 			 ip6_template_metrics, true);
 #endif
 
+	/* Compute a reasonable default based on what we do for ipv4
+	 * total size = 1/16th of total RAM
+	 * No more than 512k entries unless overridden on kernel cmdline */
+	if (ip6_rt_entries == 0) {
+		ip6_rt_entries = (totalram_pages << PAGE_SHIFT) >> 4;
+		ip6_rt_entries /= sizeof(struct rt6_info);
+		ip6_rt_entries = min(512 * 1024UL, ip6_rt_entries);
+	}
+
 	net->ipv6.sysctl.flush_delay = 0;
-	net->ipv6.sysctl.ip6_rt_max_size = 4096;
+	net->ipv6.sysctl.ip6_rt_max_size = ip6_rt_entries;
 	net->ipv6.sysctl.ip6_rt_gc_min_interval = HZ / 2;
 	net->ipv6.sysctl.ip6_rt_gc_timeout = 60*HZ;
 	net->ipv6.sysctl.ip6_rt_gc_interval = 30*HZ;
-- 
1.7.8.4

^ permalink raw reply related

* Re: [PATCH] net: compute a more reasonable default ip6_rt_max_size
From: David Miller @ 2012-05-25 22:51 UTC (permalink / raw)
  To: asharma; +Cc: eric.dumazet, netdev, linux-kernel
In-Reply-To: <4FC0063E.8080209@fb.com>

From: Arun Sharma <asharma@fb.com>
Date: Fri, 25 May 2012 15:22:54 -0700

> On 5/25/12 1:47 PM, Eric Dumazet wrote:
>> On Fri, 2012-05-25 at 13:15 -0700, Arun Sharma wrote:
>>> The algorithm is based on ipv4 and alloc_large_system_hash().
>>>
>>
>> Why is it needed at all ?
>>
>> IPv4 has a route cache with potentially millions of entries, not IPv6.
> 
> With the default size of 4096 for the ipv6 routing table, entries can
> get garbage collected and hosts could lose their default route and
> therefore lose connectivity.
> 
> We actually saw it happen.

Under no circumstances should administrator configured ipv6 routes be
garbage collected, that is a bug.

^ permalink raw reply

* Re: [PATCH] net: compute a more reasonable default ip6_rt_max_size
From: Arun Sharma @ 2012-05-26  0:08 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, linux-kernel
In-Reply-To: <20120525.185131.2017517041016424794.davem@davemloft.net>

On 5/25/12 3:51 PM, David Miller wrote:
> From: Arun Sharma<asharma@fb.com>
> Date: Fri, 25 May 2012 15:22:54 -0700
>
>> On 5/25/12 1:47 PM, Eric Dumazet wrote:
>>> On Fri, 2012-05-25 at 13:15 -0700, Arun Sharma wrote:
>>>> The algorithm is based on ipv4 and alloc_large_system_hash().
>>>>
>>>
>>> Why is it needed at all ?
>>>
>>> IPv4 has a route cache with potentially millions of entries, not IPv6.
>>
>> With the default size of 4096 for the ipv6 routing table, entries can
>> get garbage collected and hosts could lose their default route and
>> therefore lose connectivity.
>>
>> We actually saw it happen.
>
> Under no circumstances should administrator configured ipv6 routes be
> garbage collected, that is a bug.

These were not admin configured routes. They were discovered via ipv6 
neighbor discovery.

  -Arun

^ permalink raw reply

* Re: [PATCH] net: compute a more reasonable default ip6_rt_max_size
From: David Miller @ 2012-05-26  0:11 UTC (permalink / raw)
  To: asharma; +Cc: eric.dumazet, netdev, linux-kernel
In-Reply-To: <4FC01F1B.1080009@fb.com>

From: Arun Sharma <asharma@fb.com>
Date: Fri, 25 May 2012 17:08:59 -0700

> On 5/25/12 3:51 PM, David Miller wrote:
>> From: Arun Sharma<asharma@fb.com>
>> Date: Fri, 25 May 2012 15:22:54 -0700
>>
>>> On 5/25/12 1:47 PM, Eric Dumazet wrote:
>>>> On Fri, 2012-05-25 at 13:15 -0700, Arun Sharma wrote:
>>>>> The algorithm is based on ipv4 and alloc_large_system_hash().
>>>>>
>>>>
>>>> Why is it needed at all ?
>>>>
>>>> IPv4 has a route cache with potentially millions of entries, not IPv6.
>>>
>>> With the default size of 4096 for the ipv6 routing table, entries can
>>> get garbage collected and hosts could lose their default route and
>>> therefore lose connectivity.
>>>
>>> We actually saw it happen.
>>
>> Under no circumstances should administrator configured ipv6 routes be
>> garbage collected, that is a bug.
> 
> These were not admin configured routes. They were discovered via ipv6
> neighbor discovery.

Then such default routes should either be:

1) Passed over by GC

2) Trigger neighbour discovery when GC'd

^ permalink raw reply

* Re: [PATCH] net: compute a more reasonable default ip6_rt_max_size
From: Arun Sharma @ 2012-05-26  0:44 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, linux-kernel
In-Reply-To: <20120525.201150.1782581593120395710.davem@davemloft.net>

On 5/25/12 5:11 PM, David Miller wrote:

>> These were not admin configured routes. They were discovered via ipv6
>> neighbor discovery.
>
> Then such default routes should either be:
>
> 1) Passed over by GC
>
> 2) Trigger neighbour discovery when GC'd

It's possible that there is a bug somewhere - we didn't get a chance to 
dig deeper. What we observed is that as we got close to the 4096 limit, 
some hosts were becoming unreachable. A modest increase in the routing 
table size made things better.

  -Arun

^ permalink raw reply

* (unknown), 
From: robothroli company @ 2012-05-25 13:52 UTC (permalink / raw)



 i am robothroli, Purchase manager from roli Merchant Ltd. We are
Import/export Company based in taiwan. We are interested in purchasing
your product and I would like to make an inquiry. Please inform me on:

Sample availability and price
Minimum order quantity
FOB Prices

Sincerely
Purchase Manager
robothroli

^ permalink raw reply

* Re: [PATCH 01/17] netfilter: add struct nf_proto_net for register l4proto sysctl
From: Gao feng @ 2012-05-26  2:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano,
	Gao feng
In-Reply-To: <4FBF207D.80809@cn.fujitsu.com>

于 2012年05月25日 14:02, Gao feng 写道:
> 于 2012年05月25日 10:54, Pablo Neira Ayuso 写道:
>> On Fri, May 25, 2012 at 09:05:34AM +0800, Gao feng wrote:
>>> 于 2012年05月24日 22:38, Pablo Neira Ayuso 写道:
>>>> On Thu, May 24, 2012 at 06:54:42PM +0800, Gao feng wrote:
>>>> [...]
>>>>>>>> I don't see why we need this new field.
>>>>>>>>
>>>>>>>> It seems to be set to 1 in each structure that has set:
>>>>>>>>
>>>>>>>> .ctl_compat_table
>>>>>>>>
>>>>>>>> to non-NULL. So, it's redundant.
>>>>>>>>
>>>>>>>> Moreover, you already know from the protocol tracker itself if you
>>>>>>>> have to allocate the compat ctl table or not.
>>>>>>>>
>>>>>>>> In other words: You set compat to 1 for nf_conntrack_l4proto_generic.
>>>>>>>> Then, you pass that compat value to generic_init_net via ->inet_net
>>>>>>>> again, but this information (that determines if the compat has to be
>>>>>>>> done or not) is already in the scope of the protocol tracker.
>>>>>>>>
>>>>>>>
>>>>>>> because some protocols such l4proto_tcp6 and l4proto_tcp use the same init_net
>>>>>>> function. the l4proto_tcp6 doesn't need compat sysctl, so we should use this new
>>>>>>> field to identify if we should kmemdup compat_sysctl_table.
>>>>>>
>>>>>> Then, could you use two init_net functions? one for TCP for IPv4 and another
>>>>>> for TCP for IPv6?
>>>>>
>>>>> Of cause, if you prefer to impletment it in this way.
>>>>
>>>> If this removes the .compat field that you added, then use two
>>>> init_net functions, yes.
>>>
>>> Sorry I miss something.
>>>
>>> nf_ct_l4proto_unregister_sysctl also uses .compat to identify if we
>>> can unregister the compat sysctl.
>>>
>>> if we register l4proto_tcp and l4proto_tcp6 both. without .compat,
>>> when unregister l4proto_tcp6, the compat sysctl will be unregister too.
>>>
>>> So maybe we have to use .compat.
>>
>> Could you resolve this by checking pn->ctl_compat_header != NULL ?
> 
> pn->ctl_table_header and ctl_compat_header is shared by l4proto_tcp and l4proto_tcp6.
> if we both register l4proto_tcp and l4proto_tcp6, when unregister l4proto_tcp6
> pn->ctl_compat_header must not be NULL.
> 

Maybe we can resolve this by  nf_conntrack_l4proto.l3proto == AF_INET &&  pn->ctl_compat_header != NULL
Because compat sysctl is registered by AF_INET's proto only.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 04/17] netfilter: add namespace support for l4proto_generic
From: Gao feng @ 2012-05-26  2:36 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano
In-Reply-To: <20120524144038.GB15898@1984>

于 2012年05月24日 22:40, Pablo Neira Ayuso 写道:
> On Thu, May 24, 2012 at 07:07:36PM +0800, Gao feng wrote:
>> 于 2012年05月24日 17:52, Pablo Neira Ayuso 写道:
>>> On Thu, May 24, 2012 at 09:13:36AM +0800, Gao feng wrote:
>>>> 于 2012年05月23日 18:32, Pablo Neira Ayuso 写道:
>>>>> On Mon, May 14, 2012 at 04:52:14PM +0800, Gao feng wrote:
>>>>>> implement and export nf_conntrack_proto_generic_[init,fini],
>>>>>> nf_conntrack_[init,cleanup]_net call them to register or unregister
>>>>>> the sysctl of generic proto.
>>>>>>
>>>>>> implement generic_net_init,it's used to initial the pernet
>>>>>> data for generic proto.
>>>>>>
>>>>>> and use nf_generic_net.timeout to replace nf_ct_generic_timeout in
>>>>>> get_timeouts function.
>>>>>>
>>>>>> Acked-by: Eric W. Biederman <ebiederm@xmission.com>
>>>>>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
>>>>>> ---
>>>>>>  include/net/netfilter/nf_conntrack_l4proto.h |    2 +
>>>>>>  include/net/netns/conntrack.h                |    6 +++
>>>>>>  net/netfilter/nf_conntrack_core.c            |    8 +++-
>>>>>>  net/netfilter/nf_conntrack_proto.c           |   21 +++++-----
>>>>>>  net/netfilter/nf_conntrack_proto_generic.c   |   55 ++++++++++++++++++++++++-
>>>>>>  5 files changed, 76 insertions(+), 16 deletions(-)
>>>>>>
>>>>>> diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h
>>>>>> index a93dcd5..0d329b9 100644
>>>>>> --- a/include/net/netfilter/nf_conntrack_l4proto.h
>>>>>> +++ b/include/net/netfilter/nf_conntrack_l4proto.h
>>>>>> @@ -118,6 +118,8 @@ struct nf_conntrack_l4proto {
>>>>>>  
>>>>>>  /* Existing built-in generic protocol */
>>>>>>  extern struct nf_conntrack_l4proto nf_conntrack_l4proto_generic;
>>>>>> +extern int nf_conntrack_proto_generic_init(struct net *net);
>>>>>> +extern void nf_conntrack_proto_generic_fini(struct net *net);
>>>>>>  
>>>>>>  #define MAX_NF_CT_PROTO 256
>>>>>>  
>>>>>> diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
>>>>>> index 94992e9..3381b80 100644
>>>>>> --- a/include/net/netns/conntrack.h
>>>>>> +++ b/include/net/netns/conntrack.h
>>>>>> @@ -20,7 +20,13 @@ struct nf_proto_net {
>>>>>>  	unsigned int		users;
>>>>>>  };
>>>>>>  
>>>>>> +struct nf_generic_net {
>>>>>> +	struct nf_proto_net pn;
>>>>>> +	unsigned int timeout;
>>>>>> +};
>>>>>> +
>>>>>>  struct nf_ip_net {
>>>>>> +	struct nf_generic_net   generic;
>>>>>>  #if defined(CONFIG_SYSCTL) && defined(CONFIG_NF_CONNTRACK_PROC_COMPAT)
>>>>>>  	struct ctl_table_header *ctl_table_header;
>>>>>>  	struct ctl_table	*ctl_table;
>>>>>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
>>>>>> index 32c5909..fd33e91 100644
>>>>>> --- a/net/netfilter/nf_conntrack_core.c
>>>>>> +++ b/net/netfilter/nf_conntrack_core.c
>>>>>> @@ -1353,6 +1353,7 @@ static void nf_conntrack_cleanup_net(struct net *net)
>>>>>>  	}
>>>>>>  
>>>>>>  	nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
>>>>>> +	nf_conntrack_proto_generic_fini(net);
>>>>>>  	nf_conntrack_helper_fini(net);
>>>>>>  	nf_conntrack_timeout_fini(net);
>>>>>>  	nf_conntrack_ecache_fini(net);
>>>>>> @@ -1586,9 +1587,12 @@ static int nf_conntrack_init_net(struct net *net)
>>>>>>  	ret = nf_conntrack_helper_init(net);
>>>>>>  	if (ret < 0)
>>>>>>  		goto err_helper;
>>>>>> -
>>>>>> +	ret = nf_conntrack_proto_generic_init(net);
>>>>>> +	if (ret < 0)
>>>>>> +		goto err_generic;
>>>>>>  	return 0;
>>>>>> -
>>>>>> +err_generic:
>>>>>> +	nf_conntrack_helper_fini(net);
>>>>>>  err_helper:
>>>>>>  	nf_conntrack_timeout_fini(net);
>>>>>>  err_timeout:
>>>>>> diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
>>>>>> index 7ee6653..9b4bf6d 100644
>>>>>> --- a/net/netfilter/nf_conntrack_proto.c
>>>>>> +++ b/net/netfilter/nf_conntrack_proto.c
>>>>>> @@ -287,10 +287,16 @@ EXPORT_SYMBOL_GPL(nf_conntrack_l3proto_unregister);
>>>>>>  static struct nf_proto_net *nf_ct_l4proto_net(struct net *net,
>>>>>>  					      struct nf_conntrack_l4proto *l4proto)
>>>>>>  {
>>>>>> -	if (l4proto->net_id)
>>>>>> -		return net_generic(net, *l4proto->net_id);
>>>>>> -	else
>>>>>> -		return NULL;
>>>>>> +	switch (l4proto->l4proto) {
>>>>>> +	case 255: /* l4proto_generic */
>>>>>> +		return (struct nf_proto_net *)&net->ct.proto.generic;
>>>>>> +	default:
>>>>>> +		if (l4proto->net_id)
>>>>>> +			return net_generic(net, *l4proto->net_id);
>>>>>> +		else
>>>>>> +			return NULL;
>>>>>> +	}
>>>>>> +	return NULL;
>>>>>>  }
>>>>>>  
>>>>>>  int nf_ct_l4proto_register_sysctl(struct net *net,
>>>>>> @@ -457,11 +463,6 @@ EXPORT_SYMBOL_GPL(nf_conntrack_l4proto_unregister);
>>>>>>  int nf_conntrack_proto_init(void)
>>>>>>  {
>>>>>>  	unsigned int i;
>>>>>> -	int err;
>>>>>> -
>>>>>> -	err = nf_ct_l4proto_register_sysctl(&init_net, &nf_conntrack_l4proto_generic);
>>>>>> -	if (err < 0)
>>>>>> -		return err;
>>>>>
>>>>> I like that all protocols sysctl are registered by
>>>>> nf_conntrack_proto_init. Can you keep using that?
>>>>
>>>> you mean per-net's generic_proto sysctl are registered by
>>>> nf_conntrack_proto_init?
>>>>
>>>> such as
>>>>
>>>> int nf_conntrack_proto_init(struct net *net)
>>>> {
>>>> 	...
>>>> 	err = nf_ct_l4proto_register_sysctl(net, &nf_conntrack_l4proto_generic);
>>>
>>> Yes, all protocol trackers included in nf_conntrack_proto_init:
>>>
>>>         err = nf_conntrack_proto_generic_init(net);
>>>         ...
>>>         err = nf_conntrack_proto_tcp_init(net);
>>>         ...
>>>
>>> and so on.
>>
>> sounds good,but the l4protos except l4proto_generic are enabled by
>> insmod modules(such as nf_conntrack_ipv4,nf_conntrack_proto_udplite).
>>
>> So I think it makes no sense to init all protocol here, unless we decide
>> to put those protos into module nf_conntrack.
> 
> Sorry, I meant to say all protocols that are built-in.
> 
> So, just put there those that are built-in, like TCP, UDP and generic

AFAIK l4proto_generic is registered when install module nf_conntrack,
BUT l4proto_tcp,l4proto_udp,l4proto_icmp are registered when install module nf_conntrack_ipv4.

So we can only register generic proto here.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net: compute a more reasonable default ip6_rt_max_size
From: Eric Dumazet @ 2012-05-26  3:39 UTC (permalink / raw)
  To: Arun Sharma; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <4FC02777.5070003@fb.com>

On Fri, 2012-05-25 at 17:44 -0700, Arun Sharma wrote:
> On 5/25/12 5:11 PM, David Miller wrote:
> 
> >> These were not admin configured routes. They were discovered via ipv6
> >> neighbor discovery.
> >
> > Then such default routes should either be:
> >
> > 1) Passed over by GC
> >
> > 2) Trigger neighbour discovery when GC'd
> 
> It's possible that there is a bug somewhere - we didn't get a chance to 
> dig deeper. What we observed is that as we got close to the 4096 limit, 
> some hosts were becoming unreachable. A modest increase in the routing 
> table size made things better.
> 
>   -Arun

But your patch is not a  "modest increase", so whats the deal ?

A modest increase would be 8192 instead of 4096, regardless of RAM size.

^ permalink raw reply

* Re: [PATCH] net: compute a more reasonable default ip6_rt_max_size
From: Eric Dumazet @ 2012-05-26  4:17 UTC (permalink / raw)
  To: Arun Sharma; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <1338003580.10135.6.camel@edumazet-glaptop>

On Sat, 2012-05-26 at 05:39 +0200, Eric Dumazet wrote:

> But your patch is not a  "modest increase", so whats the deal ?
> 
> A modest increase would be 8192 instead of 4096, regardless of RAM size.
> 
> 

More over, a boot parameter to tweak it is absolutely not needed

sysctl -w net.ipv6.route.max_size=16384

or

echo 16384 >/proc/sys/net/ipv6/route/max_size

IPv4 has to allocate a hash table at boot time, and this hash table is
not resized. Thus some really special purpose machines need a boot
param.

^ permalink raw reply

* Re: [PATCH] STA2X11 CAN: CAN driver for the STA2X11 board
From: Federico Vaga @ 2012-05-26  8:36 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: Marc Kleine-Budde, linux-can, netdev, linux-kernel,
	Giancarlo Asnaghi, Alan Cox
In-Reply-To: <4FB5E581.1090209@grandegger.com>

> Thanks for your contribution. At a first glance, this driver looks
> similar to the pch_can and the c_can driver. It seems that a C_CAN based
> controller is used on that board as well. If that's true, it should be
> handled by the C_CAN driver. To get ride of the obsolete pch_can driver,
> I sent some time ago the patch "[RFC/PATCH] c_can: add driver for the
> PCH CAN controller":
>
>  http://marc.info/?t=132991563600003&r=1&w=4
>
> I could serve as base of a generic c_can_pci driver.

You are right, this is a C-CAN but it isn't specified on the board manual.
I compared the board manual with the Bosch C-CAN manual, and it is the same.
I'm working on a PCI module for this board, when it is ready we can think for
a generic c_can_pci

Thank you

-- 
Federico Vaga

^ permalink raw reply

* Re: [PATCH] ipv6: fix incorrect ipsec transport mode fragment
From: Gao feng @ 2012-05-26  9:00 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev, davem, lw
In-Reply-To: <20120514130528.GA24733@secunet.com>

Hi Steffen

于 2012年05月14日 21:05, Steffen Klassert 写道:
> On Mon, May 14, 2012 at 11:21:00AM +0800, Gao feng wrote:
>> Since commit 299b0767(ipv6: Fix IPsec slowpath fragmentation problem)
>> the fragment of ipsec transport mode packets is incorrect.
>> because tunnel mode needs IPsec headers and trailer for all fragments,
>> while on transport mode it is sufficient to add the headers to the
>> first fragment and the trailer to the last.
> 
> I mentioned this in an other thread some time ago,
> this is due to commit ad0081e43a
> "ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed"
> changed tunnel mode to do fragmentation before the transformation
> while transport mode still does fragmentation after transformation.
> Now, tunnel mode needs IPsec headers and trailer for all fragments,
> while on transport mode it is sufficient to add the headers to the
> first fragment and the trailer to the last.
> 
>>
>> so modify mtu and maxfraglen base on ipsec mode and if fragment is first
>> or last.
> 
> There might be other opinions, but I don't like to see this IPsec mode
> dependent stuff hacked into the generic ipv6 output path.
> 
> Basically we have two cases. One where we have to add rt->dst.header_len
> to the first fragment and rt->dst.trailer_len to the last fragment,
> and the other where we have to add both to all fragments. So perhaps we
> could isolate this code and create two functions, one for each case.
> 
> 

I thought this problem carefully,I think the important and troubled thing is
how to deal with transport mode.

we have to use different mtu and maxfraglen for checking if the prev_skb has
extra data or has free room. so mtu_prev and maxfraglen_prev have to be used.

And I also think it's not very well to create two function for the two cases,
it will create a lot of redundant codes.

I will add a dst_entry flag DST_XFRM_TUNNEL to avoid ipsec mode dependent stuff
hacked into the generic ipv6_output path, and send patch v2.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox