Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3] AX25: kill user triggable printks
From: David Miller @ 2008-01-10 11:58 UTC (permalink / raw)
  To: max; +Cc: netdev
In-Reply-To: <1199874070-9524-1-git-send-email-max@stro.at>

From: maximilian attems <max@stro.at>
Date: Wed,  9 Jan 2008 11:21:10 +0100

> sfuzz can easily trigger any of those.
> 
> move the printk message to the corresponding comment:
> makes the intention of the code clear and easy
> to pick up on an scheduled removal.
> as bonus simplify the braces placement.
> 
> Signed-off-by: maximilian attems <max@stro.at>

Applied, thanks.

^ permalink raw reply

* Re: [patch net-2.6.25 00/10][NETNS][IPV6] make sysctl per namespace - V3
From: Daniel Lezcano @ 2008-01-10 11:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, benjamin.thery
In-Reply-To: <20080110.031544.259471083.davem@davemloft.net>

David Miller wrote:
> From: Daniel Lezcano <dlezcano@fr.ibm.com>
> Date: Wed, 09 Jan 2008 17:45:33 +0100
> 
>> The following patchset makes the ipv6 sysctl to handle multiple
>> network namespaces. Each instance of a network namespace as its own
>> set of sysctl values, that means the behavior of the ipv6 stack can be
>> different depending on the sysctl values setup in the different
>> network namespaces.
> 
> I applied all of this to net-2.6.25 but what a rough half hour
> it was :-/
> 
> Starting at patch #5 there were tons of "space before tab" errors.
> And as I fixed them up, this made subsequent patches need rediffing
> since the contextual lines in patches after #5 needed the whitespace
> fixed up as well.
> 
> I didn't push this back to you because this was already the 3rd round,
> but please show me some love and check this stuff out before
> submission.  GIT gives you effective ways to verify the whitespace
> without even applying the patch.
> 
> ~davem/bin/pcheck:
> 
> #!/bin/sh
> set -x
> git apply --check --whitespace=error-all $1

Sorry, I will check that in the future :|
Many thanks for taking the time to fix that.

   -- Daniel.

^ permalink raw reply

* Re: Linux IPv6 DAD not full conform to RFC 4862 ?
From: Neil Horman @ 2008-01-10 12:25 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: YOSHIFUJI Hideaki / 吉藤英明, kkeil,
	netdev
In-Reply-To: <47853825.2030002@hp.com>

On Wed, Jan 09, 2008 at 04:09:57PM -0500, Vlad Yasevich wrote:
> Neil Horman wrote:
>> On Thu, Jan 10, 2008 at 01:38:57AM +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote:
>>> In article <20080109153656.GA16962@pingi.kke.suse.de> (at Wed, 9 Jan 2008 16:36:56 +0100), Karsten Keil <kkeil@suse.de> says:
>>>
>>>> So I think we should disable the interface now, if DAD fails on a
>>>> hardware based LLA.
>>> I don't want to do this, at least, unconditionally.
>>>
>>> Options (not exclusive):
>>>
>>> - we could have "dad_reaction" interface variable and
>>>  > 1: disable interface
>>>  = 1: disable IPv6
>>>  < 0: ignore (as we do now)
>>>
>> I like the flexibility of this solution, but given that the only part of the RFC
>> that we're missing on at the moment is that we SHOULD disable the interface on
>> DAD failure for a link-local address, I would think this scheme would be good:
>>
>>   < 0 : ignore, and del address from interface (current behavior)   = 0 : 
>> disable interface for dad failure for a link-local address   > 0 : disable 
>> interface for dad failure for any address 
>> Regards
>> Neil
>>  
>
> Just a friendly reminder that such a scheme should only be
> applied to autoconfigured addresses.  A manually configured
> duplicated address should not bring down the whole interface.
>

I agree, but I think that case would be covered by the default option above
(sysctl < 0).

Neil

> -vlad

^ permalink raw reply

* EQL / doubts
From: Jeba Anandhan @ 2008-01-10 13:01 UTC (permalink / raw)
  To: netdev

Hi All,
I have few questions about EQL driver

*) Why the tx_queue_len is set as 5?.  For example if we bond 3 lines
and each has 1000 as tx_queue_len, will the bonding line(eql)
tx_queue_len be sum of these three tx_queue_len?. In this case, will the
bonding line(eql)tx_queue_len be 3000?

*)Question: Why list_add is used instead of list_add_tail?. For queue
implementation, list_add_tail would be required. Why do we implement of
slave queue in the way of stack implementation?.

 File:  linux/drivers/net/eql.c
 Function: __eql_insert_slave(slave_queue_t *queue, slave_t *slave)

Code: 

/* queue->lock must be held */

static int __eql_insert_slave(slave_queue_t *queue, slave_t *slave)
{
      if (!eql_is_full(queue)) {
           slave_t *duplicate_slave = NULL;
           duplicate_slave = __eql_find_slave_dev(queue, slave->dev);
           if (duplicate_slave != 0)
                 eql_kill_one_slave(queue, duplicate_slave);

                list_add(&slave->list, &queue->all_slaves); // Why
list_add has been
used instead of list_add_tail?. I hope queue->all_slaves is queue
implementation.

           


*) Is it possible to improve the load balancing performance using
multiprocessor?. For example,if a server has two processors and N n/w
interfaces, is it possible to assign one processor for N/2 n/w
interface's tx and rx handling and other for N/2 n/w interface's tx/rx
handling


Thanks
Jeba



^ permalink raw reply

* Re: [PATCH 3/4] [XFRM]: Kill some bloat
From: Ilpo Järvinen @ 2008-01-10 13:53 UTC (permalink / raw)
  To: andi
  Cc: David Miller, Herbert Xu, Netdev, Arnaldo Carvalho de Melo,
	paul.moore, latten
In-Reply-To: <Pine.LNX.4.64.0801081228010.12911@kivilampi-30.cs.helsinki.fi>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4535 bytes --]

On Tue, 8 Jan 2008, Ilpo Järvinen wrote:

> On Mon, 7 Jan 2008, David Miller wrote:
> 
> > From: Andi Kleen <andi@firstfloor.org>
> > Date: Tue, 8 Jan 2008 06:00:07 +0100
> > 
> > > On Mon, Jan 07, 2008 at 07:37:00PM -0800, David Miller wrote:
> > > > The vast majority of them are one, two, and three liners.
> > > 
> > > % awk '  { line++ } ; /^{/ { total++; start = line } ; /^}/ { len=line-start-3; if (len > 4) l++; if (len >= 10) k++; } ; END { print total, l, l/total, k, k/total }' < include/net/tcp.h
> > > 68 28 0.411765 20 0.294118
> > > 
> > > 41% are over 4 lines, 29% are >= 10 lines.
> > 
> > Take out the comments and whitespace lines, your script is
> > too simplistic.

In addition it triggered spuriously per struct/enum end brace :-) and
was using the last known function starting brace in there so no wonder
the numbers were that high... Counting with the corrected lines 
(len=line-start-1) & spurious matches removed:

74 19 0.256757 7 0.0945946


Here are (finally) the measured bytes (couple of the functions are 
missing because I had couple of bugs in the regexps and the #if trickery 
at the inline resulted failed compiles):

 12 funcs, 242+, 1697-, diff: -1455 	 tcp_set_state 
 13 funcs, 92+, 632-, diff: -540 	 tcp_is_cwnd_limited 
 12 funcs, 2836+, 3225-, diff: -389 	 tcp_current_ssthresh 
 5 funcs, 261+, 556-, diff: -295 	 tcp_prequeue 
 7 funcs, 2777+, 3049-, diff: -272 	 tcp_clear_retrans_hints_partial 
 11 funcs, 64+, 275-, diff: -211 	 tcp_win_from_space 
 6 funcs, 128+, 320-, diff: -192 	 tcp_prequeue_init 
 12 funcs, 45+, 209-, diff: -164 	 tcp_set_ca_state 
 7 funcs, 106+, 237-, diff: -131 	 tcp_fast_path_check 
 5 funcs, 167+, 291-, diff: -124 	 tcp_write_queue_purge 
 6 funcs, 43+, 160-, diff: -117 	 tcp_push_pending_frames 
 9 funcs, 55+, 159-, diff: -104 	 tcp_v4_check 
 6 funcs, 4+, 97-, diff: -93 	 tcp_packets_in_flight 
 7 funcs, 58+, 150-, diff: -92 	 tcp_fast_path_on 
 4 funcs, 4+, 91-, diff: -87 	 tcp_clear_options 
 6 funcs, 141+, 217-, diff: -76 	 tcp_openreq_init 
 8 funcs, 38+, 111-, diff: -73 	 tcp_unlink_write_queue 
 7 funcs, 32+, 103-, diff: -71 	 tcp_checksum_complete 
 7 funcs, 35+, 101-, diff: -66 	 __tcp_fast_path_on 
 5 funcs, 4+, 66-, diff: -62 	 tcp_receive_window 
 6 funcs, 67+, 128-, diff: -61 	 tcp_add_write_queue_tail 
 7 funcs, 30+, 86-, diff: -56 	 tcp_ca_event 
 6 funcs, 73+, 106-, diff: -33 	 tcp_paws_check 
 4 funcs, 4+, 36-, diff: -32 	 tcp_highest_sack_seq 
 6 funcs, 46+, 78-, diff: -32 	 tcp_fin_time 
 3 funcs, 4+, 35-, diff: -31 	 tcp_clear_all_retrans_hints 
 7 funcs, 30+, 51-, diff: -21 	 __tcp_add_write_queue_tail 
 3 funcs, 4+, 14-, diff: -10 	 tcp_enable_fack 
 4 funcs, 4+, 14-, diff: -10 	 keepalive_time_when 
 8 funcs, 66+, 73-, diff: -7 	 tcp_full_space 
 3 funcs, 4+, 5-, diff: -1 	 tcp_wnd_end 
 4 funcs, 97+, 97-, diff: +0 	 tcp_mib_init 
 3 funcs, 4+, 3-, diff: +1 	 tcp_skb_is_last 
 2 funcs, 4+, 2-, diff: +2 	 keepalive_intvl_when 
 2 funcs, 4+, 2-, diff: +2 	 tcp_is_fack 
 2 funcs, 4+, 2-, diff: +2 	 tcp_skb_mss 
 2 funcs, 4+, 2-, diff: +2 	 tcp_write_queue_empty 
 2 funcs, 4+, 2-, diff: +2 	 tcp_advance_highest_sack 
 2 funcs, 4+, 2-, diff: +2 	 tcp_advance_send_head 
 2 funcs, 4+, 2-, diff: +2 	 tcp_check_send_head 
 2 funcs, 4+, 2-, diff: +2 	 tcp_highest_sack_reset 
 2 funcs, 4+, 2-, diff: +2 	 tcp_init_send_head 
 2 funcs, 4+, 2-, diff: +2 	 tcp_sack_reset 
 6 funcs, 47+, 44-, diff: +3 	 tcp_space 
 5 funcs, 55+, 50-, diff: +5 	 tcp_too_many_orphans 
 3 funcs, 8+, 2-, diff: +6 	 tcp_minshall_update 
 3 funcs, 8+, 2-, diff: +6 	 tcp_update_wl 
 8 funcs, 25+, 14-, diff: +11 	 between 
 3 funcs, 14+, 2-, diff: +12 	 tcp_put_md5sig_pool 
 3 funcs, 14+, 2-, diff: +12 	 tcp_clear_xmit_timers 
 5 funcs, 30+, 17-, diff: +13 	 tcp_dec_pcount_approx_int 
 6 funcs, 33+, 20-, diff: +13 	 tcp_insert_write_queue_after 
 3 funcs, 17+, 2-, diff: +15 	 __tcp_checksum_complete 
 5 funcs, 17+, 2-, diff: +15 	 tcp_init_wl 
 4 funcs, 57+, 41-, diff: +16 	 tcp_dec_quickack_mode 
 4 funcs, 40+, 22-, diff: +18 	 __tcp_add_write_queue_head 
 5 funcs, 36+, 16-, diff: +20 	 tcp_highest_sack_combine 
 4 funcs, 40+, 18-, diff: +22 	 tcp_dec_pcount_approx 
 6 funcs, 29+, 5-, diff: +24 	 tcp_is_sack 
 4 funcs, 28+, 2-, diff: +26 	 tcp_is_reno 
 5 funcs, 50+, 24-, diff: +26 	 tcp_insert_write_queue_before 
 4 funcs, 83+, 56-, diff: +27 	 tcp_check_probe_timer 
 8 funcs, 69+, 14-, diff: +55 	 tcp_left_out 
 11 funcs, 2995+, 2893-, diff: +102 	 tcp_skb_pcount 
 30 funcs, 930+, 2-, diff: +928 	 before 

-- 
 i.

^ permalink raw reply

* [PATCH net-2.6.25 0/6][NETNS]: Make ipv6_devconf (all and default) live in net namespaces
From: Pavel Emelyanov @ 2008-01-10 13:55 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel, Daniel Lezcano, Benjamin Thery

The ipv6_devconf_(all) and ipv6_devconf_dflt are currently
global, but should be per-namespace.

This set moves them on the struct net. Or, more precisely,
on the struct netns_ipv6, which is already added.

Unfortunately, many code in the ipv6 cannot yet provide a 
correct struct net to get the ipv6_devconf from (e.g. routing 
code), so this part of job is to be done after the appropriate 
parts are virtualized.

However, after this set user can play with the ipv6_devconf 
inside a namespace not affecting the others.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

^ permalink raw reply

* [PATCH net-2.6.25 1/6][NETNS]: Clean out the ipv6-related sysctls creation/destruction
From: Pavel Emelyanov @ 2008-01-10 13:58 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel, Daniel Lezcano, Benjamin Thery
In-Reply-To: <478623C0.7030008@openvz.org>

The addrconf sysctls and neigh sysctls are registered and
unregistered always in pairs, so they can be joined into
one (well, two) functions, that accept the struct inet6_dev
and do all the job.

This also get rids of unneeded ifdefs inside the code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---
 net/ipv6/addrconf.c |   63 +++++++++++++++++++++++++++-----------------------
 1 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 6a48bb8..27b35dd 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -102,7 +102,15 @@
 
 #ifdef CONFIG_SYSCTL
 static void addrconf_sysctl_register(struct inet6_dev *idev);
-static void addrconf_sysctl_unregister(struct ipv6_devconf *p);
+static void addrconf_sysctl_unregister(struct inet6_dev *idev);
+#else
+static inline void addrconf_sysctl_register(struct inet6_dev *idev)
+{
+}
+
+static inline void addrconf_sysctl_unregister(struct inet6_dev *idev)
+{
+}
 #endif
 
 #ifdef CONFIG_IPV6_PRIVACY
@@ -392,13 +400,7 @@ static struct inet6_dev * ipv6_add_dev(struct net_device *dev)
 
 	ipv6_mc_init_dev(ndev);
 	ndev->tstamp = jiffies;
-#ifdef CONFIG_SYSCTL
-	neigh_sysctl_register(dev, ndev->nd_parms, NET_IPV6,
-			      NET_IPV6_NEIGH, "ipv6",
-			      &ndisc_ifinfo_sysctl_change,
-			      NULL);
 	addrconf_sysctl_register(ndev);
-#endif
 	/* protected by rtnl_lock */
 	rcu_assign_pointer(dev->ip6_ptr, ndev);
 
@@ -2391,15 +2393,8 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 	case NETDEV_CHANGENAME:
 		if (idev) {
 			snmp6_unregister_dev(idev);
-#ifdef CONFIG_SYSCTL
-			addrconf_sysctl_unregister(&idev->cnf);
-			neigh_sysctl_unregister(idev->nd_parms);
-			neigh_sysctl_register(dev, idev->nd_parms,
-					      NET_IPV6, NET_IPV6_NEIGH, "ipv6",
-					      &ndisc_ifinfo_sysctl_change,
-					      NULL);
+			addrconf_sysctl_unregister(idev);
 			addrconf_sysctl_register(idev);
-#endif
 			err = snmp6_register_dev(idev);
 			if (err)
 				return notifier_from_errno(err);
@@ -2523,10 +2518,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	/* Shot the device (if unregistered) */
 
 	if (how == 1) {
-#ifdef CONFIG_SYSCTL
-		addrconf_sysctl_unregister(&idev->cnf);
-		neigh_sysctl_unregister(idev->nd_parms);
-#endif
+		addrconf_sysctl_unregister(idev);
 		neigh_parms_release(&nd_tbl, idev->nd_parms);
 		neigh_ifdown(&nd_tbl, dev);
 		in6_dev_put(idev);
@@ -4106,21 +4098,34 @@ out:
 	return;
 }
 
+static void __addrconf_sysctl_unregister(struct ipv6_devconf *p)
+{
+	struct addrconf_sysctl_table *t;
+
+	if (p->sysctl == NULL)
+		return;
+
+	t = p->sysctl;
+	p->sysctl = NULL;
+	unregister_sysctl_table(t->sysctl_header);
+	kfree(t->dev_name);
+	kfree(t);
+}
+
 static void addrconf_sysctl_register(struct inet6_dev *idev)
 {
+	neigh_sysctl_register(idev->dev, idev->nd_parms, NET_IPV6,
+			      NET_IPV6_NEIGH, "ipv6",
+			      &ndisc_ifinfo_sysctl_change,
+			      NULL);
 	__addrconf_sysctl_register(idev->dev->name, idev->dev->ifindex,
 			idev, &idev->cnf);
 }
 
-static void addrconf_sysctl_unregister(struct ipv6_devconf *p)
+static void addrconf_sysctl_unregister(struct inet6_dev *idev)
 {
-	if (p->sysctl) {
-		struct addrconf_sysctl_table *t = p->sysctl;
-		p->sysctl = NULL;
-		unregister_sysctl_table(t->sysctl_header);
-		kfree(t->dev_name);
-		kfree(t);
-	}
+	__addrconf_sysctl_unregister(&idev->cnf);
+	neigh_sysctl_unregister(idev->nd_parms);
 }
 
 
@@ -4232,8 +4237,8 @@ void addrconf_cleanup(void)
 	unregister_netdevice_notifier(&ipv6_dev_notf);
 
 #ifdef CONFIG_SYSCTL
-	addrconf_sysctl_unregister(&ipv6_devconf_dflt);
-	addrconf_sysctl_unregister(&ipv6_devconf);
+	__addrconf_sysctl_unregister(&ipv6_devconf_dflt);
+	__addrconf_sysctl_unregister(&ipv6_devconf);
 #endif
 
 	rtnl_lock();
-- 
1.5.3.4


^ permalink raw reply related

* [PATCH net-2.6.25 2/6][NETNS]: Make the __addrconf_sysctl_register return an error
From: Pavel Emelyanov @ 2008-01-10 14:01 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel, Daniel Lezcano, Benjamin Thery
In-Reply-To: <478623C0.7030008@openvz.org>

This error code will be needed to abort the namespace
creation if needed.

Probably, this is to be checked when a new device is
created (currently it is ignored).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---
 net/ipv6/addrconf.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 27b35dd..18d4334 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4044,7 +4044,7 @@ static struct addrconf_sysctl_table
 	},
 };
 
-static void __addrconf_sysctl_register(char *dev_name, int ctl_name,
+static int __addrconf_sysctl_register(char *dev_name, int ctl_name,
 		struct inet6_dev *idev, struct ipv6_devconf *p)
 {
 	int i;
@@ -4088,14 +4088,14 @@ static void __addrconf_sysctl_register(char *dev_name, int ctl_name,
 		goto free_procname;
 
 	p->sysctl = t;
-	return;
+	return 0;
 
 free_procname:
 	kfree(t->dev_name);
 free:
 	kfree(t);
 out:
-	return;
+	return -ENOBUFS;
 }
 
 static void __addrconf_sysctl_unregister(struct ipv6_devconf *p)
-- 
1.5.3.4



^ permalink raw reply related

* [PATCH net-2.6.25 3/6][NETNS]: Make the ctl-tables per-namespace
From: Pavel Emelyanov @ 2008-01-10 14:03 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel, Daniel Lezcano, Benjamin Thery
In-Reply-To: <478623C0.7030008@openvz.org>

This includes passing the net to __addrconf_sysctl_register
and saving this on the ctl_table->extra2 to be used in
handlers (those, needing it).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---
 net/ipv6/addrconf.c |   24 ++++++++++++++----------
 1 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 18d4334..bde50c6 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -456,13 +456,13 @@ static void dev_forward_change(struct inet6_dev *idev)
 }
 
 
-static void addrconf_forward_change(void)
+static void addrconf_forward_change(struct net *net)
 {
 	struct net_device *dev;
 	struct inet6_dev *idev;
 
 	read_lock(&dev_base_lock);
-	for_each_netdev(&init_net, dev) {
+	for_each_netdev(net, dev) {
 		rcu_read_lock();
 		idev = __in6_dev_get(dev);
 		if (idev) {
@@ -478,12 +478,15 @@ static void addrconf_forward_change(void)
 
 static void addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old)
 {
+	struct net *net;
+
+	net = (struct net *)table->extra2;
 	if (p == &ipv6_devconf_dflt.forwarding)
 		return;
 
 	if (p == &ipv6_devconf.forwarding) {
 		ipv6_devconf_dflt.forwarding = ipv6_devconf.forwarding;
-		addrconf_forward_change();
+		addrconf_forward_change(net);
 	} else if ((!*p) ^ (!old))
 		dev_forward_change((struct inet6_dev *)table->extra1);
 
@@ -4044,8 +4047,8 @@ static struct addrconf_sysctl_table
 	},
 };
 
-static int __addrconf_sysctl_register(char *dev_name, int ctl_name,
-		struct inet6_dev *idev, struct ipv6_devconf *p)
+static int __addrconf_sysctl_register(struct net *net, char *dev_name,
+		int ctl_name, struct inet6_dev *idev, struct ipv6_devconf *p)
 {
 	int i;
 	struct addrconf_sysctl_table *t;
@@ -4068,6 +4071,7 @@ static int __addrconf_sysctl_register(char *dev_name, int ctl_name,
 	for (i=0; t->addrconf_vars[i].data; i++) {
 		t->addrconf_vars[i].data += (char*)p - (char*)&ipv6_devconf;
 		t->addrconf_vars[i].extra1 = idev; /* embedded; no ref */
+		t->addrconf_vars[i].extra2 = net;
 	}
 
 	/*
@@ -4082,7 +4086,7 @@ static int __addrconf_sysctl_register(char *dev_name, int ctl_name,
 	addrconf_ctl_path[ADDRCONF_CTL_PATH_DEV].procname = t->dev_name;
 	addrconf_ctl_path[ADDRCONF_CTL_PATH_DEV].ctl_name = ctl_name;
 
-	t->sysctl_header = register_sysctl_paths(addrconf_ctl_path,
+	t->sysctl_header = register_net_sysctl_table(net, addrconf_ctl_path,
 			t->addrconf_vars);
 	if (t->sysctl_header == NULL)
 		goto free_procname;
@@ -4118,8 +4122,8 @@ static void addrconf_sysctl_register(struct inet6_dev *idev)
 			      NET_IPV6_NEIGH, "ipv6",
 			      &ndisc_ifinfo_sysctl_change,
 			      NULL);
-	__addrconf_sysctl_register(idev->dev->name, idev->dev->ifindex,
-			idev, &idev->cnf);
+	__addrconf_sysctl_register(idev->dev->nd_net, idev->dev->name,
+			idev->dev->ifindex, idev, &idev->cnf);
 }
 
 static void addrconf_sysctl_unregister(struct inet6_dev *idev)
@@ -4215,9 +4219,9 @@ int __init addrconf_init(void)
 	ipv6_addr_label_rtnl_register();
 
 #ifdef CONFIG_SYSCTL
-	__addrconf_sysctl_register("all", NET_PROTO_CONF_ALL,
+	__addrconf_sysctl_register(&init_net, "all", NET_PROTO_CONF_ALL,
 			NULL, &ipv6_devconf);
-	__addrconf_sysctl_register("default", NET_PROTO_CONF_DEFAULT,
+	__addrconf_sysctl_register(&init_net, "default", NET_PROTO_CONF_DEFAULT,
 			NULL, &ipv6_devconf_dflt);
 #endif
 
-- 
1.5.3.4


^ permalink raw reply related

* [DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache
From: Eric Dumazet @ 2008-01-10 14:06 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Paul E. McKenney, dipankar, netdev
In-Reply-To: <20080109113727.50eae500.dada1@cosmosbay.com>

Hi David

Here is DECNET part, shadowing commit 0bcceadceb0907094ba4e40bf9a7cd9b080f13fb ([IPV4] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache )

Thank you


[DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache

In dn_rt_cache_get_next(), no need to guard seq->private by a rcu_dereference()
since seq is private to the thread running this function. Reading seq.private
once (as guaranted bu rcu_dereference()) or several time if compiler really is 
dumb enough wont change the result.
 
But we miss real spots where rcu_dereference() are needed, both in 
dn_rt_cache_get_first() and dn_rt_cache_get_next()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 66663e5..0e10ff2 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1665,12 +1665,12 @@ static struct dn_route *dn_rt_cache_get_first(struct seq_file *seq)
 			break;
 		rcu_read_unlock_bh();
 	}
-	return rt;
+	return rcu_dereference(rt);
 }
 
 static struct dn_route *dn_rt_cache_get_next(struct seq_file *seq, struct dn_route *rt)
 {
-	struct dn_rt_cache_iter_state *s = rcu_dereference(seq->private);
+	struct dn_rt_cache_iter_state *s = seq->private;
 
 	rt = rt->u.dst.dn_next;
 	while(!rt) {
@@ -1680,7 +1680,7 @@ static struct dn_route *dn_rt_cache_get_next(struct seq_file *seq, struct dn_rou
 		rcu_read_lock_bh();
 		rt = dn_rt_hash_table[s->bucket].chain;
 	}
-	return rt;
+	return rcu_dereference(rt);
 }
 
 static void *dn_rt_cache_seq_start(struct seq_file *seq, loff_t *pos)

^ permalink raw reply related

* SMP code / network stack
From: Jeba Anandhan @ 2008-01-10 14:05 UTC (permalink / raw)
  To: netdev; +Cc: matthew.hattersley

Hi All,

If a server has multiple processors and N number of ethernet cards, is
it possible to handle transmission by each processor separately? .In
other words, each processor will be responsible for tx of few ethernet
cards?.



Example: Server has 4 processors and 8 ethernet cards. is it possible
for each processor for transmission using 2 ethernet cards only?. So
that, at a instant , data will be send out from 8 ethernet cards.


Thanks
Jeba

^ permalink raw reply

* [PATCH net-2.6.25 4/6][NETNS]: Create ipv6 devconf-s for namespaces
From: Pavel Emelyanov @ 2008-01-10 14:06 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel, Daniel Lezcano, Benjamin Thery
In-Reply-To: <478623C0.7030008@openvz.org>

This is the core. Declare and register the pernet subsys for
addrconf. The init callback the will create the devconf-s.

The init_net will reuse the existing statically declared confs,
so that accessing them from inside the ipv6 code will still
work.

The register_pernet_subsys() is moved above the ipv6_add_dev()
call for loopback, because this function will need the
net->devconf_dflt pointer to be already set.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---
 include/net/netns/ipv6.h |    2 +
 net/ipv6/addrconf.c      |   82 +++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 10733a6..06b4dc0 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -28,5 +28,7 @@ struct netns_sysctl_ipv6 {
 
 struct netns_ipv6 {
 	struct netns_sysctl_ipv6 sysctl;
+	struct ipv6_devconf	*devconf_all;
+	struct ipv6_devconf	*devconf_dflt;
 };
 #endif
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index bde50c6..3ad081e 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4135,6 +4135,70 @@ static void addrconf_sysctl_unregister(struct inet6_dev *idev)
 
 #endif
 
+static int addrconf_init_net(struct net *net)
+{
+	int err;
+	struct ipv6_devconf *all, *dflt;
+
+	err = -ENOMEM;
+	all = &ipv6_devconf;
+	dflt = &ipv6_devconf_dflt;
+
+	if (net != &init_net) {
+		all = kmemdup(all, sizeof(ipv6_devconf), GFP_KERNEL);
+		if (all == NULL)
+			goto err_alloc_all;
+
+		dflt = kmemdup(dflt, sizeof(ipv6_devconf_dflt), GFP_KERNEL);
+		if (dflt == NULL)
+			goto err_alloc_dflt;
+	}
+
+	net->ipv6.devconf_all = all;
+	net->ipv6.devconf_dflt = dflt;
+
+#ifdef CONFIG_SYSCTL
+	err = __addrconf_sysctl_register(net, "all", NET_PROTO_CONF_ALL,
+			NULL, all);
+	if (err < 0)
+		goto err_reg_all;
+
+	err = __addrconf_sysctl_register(net, "default", NET_PROTO_CONF_DEFAULT,
+			NULL, dflt);
+	if (err < 0)
+		goto err_reg_dflt;
+#endif
+	return 0;
+
+#ifdef CONFIG_SYSCTL
+err_reg_dflt:
+	__addrconf_sysctl_unregister(all);
+err_reg_all:
+	kfree(dflt);
+#endif
+err_alloc_dflt:
+	kfree(all);
+err_alloc_all:
+	return err;
+}
+
+static void addrconf_exit_net(struct net *net)
+{
+#ifdef CONFIG_SYSCTL
+	__addrconf_sysctl_unregister(net->ipv6.devconf_dflt);
+	__addrconf_sysctl_unregister(net->ipv6.devconf_all);
+#endif
+	if (net != &init_net) {
+		kfree(net->ipv6.devconf_dflt);
+		kfree(net->ipv6.devconf_all);
+	}
+}
+
+static struct pernet_operations addrconf_ops = {
+	.init = addrconf_init_net,
+	.exit = addrconf_exit_net,
+};
+
 /*
  *      Device notifier
  */
@@ -4167,6 +4231,8 @@ int __init addrconf_init(void)
 		return err;
 	}
 
+	register_pernet_subsys(&addrconf_ops);
+
 	/* The addrconf netdev notifier requires that loopback_dev
 	 * has it's ipv6 private information allocated and setup
 	 * before it can bring up and give link-local addresses
@@ -4190,7 +4256,7 @@ int __init addrconf_init(void)
 		err = -ENOMEM;
 	rtnl_unlock();
 	if (err)
-		return err;
+		goto errlo;
 
 	ip6_null_entry.u.dst.dev = init_net.loopback_dev;
 	ip6_null_entry.rt6i_idev = in6_dev_get(init_net.loopback_dev);
@@ -4218,16 +4284,11 @@ int __init addrconf_init(void)
 
 	ipv6_addr_label_rtnl_register();
 
-#ifdef CONFIG_SYSCTL
-	__addrconf_sysctl_register(&init_net, "all", NET_PROTO_CONF_ALL,
-			NULL, &ipv6_devconf);
-	__addrconf_sysctl_register(&init_net, "default", NET_PROTO_CONF_DEFAULT,
-			NULL, &ipv6_devconf_dflt);
-#endif
-
 	return 0;
 errout:
 	unregister_netdevice_notifier(&ipv6_dev_notf);
+errlo:
+	unregister_pernet_subsys(&addrconf_ops);
 
 	return err;
 }
@@ -4240,10 +4301,7 @@ void addrconf_cleanup(void)
 
 	unregister_netdevice_notifier(&ipv6_dev_notf);
 
-#ifdef CONFIG_SYSCTL
-	__addrconf_sysctl_unregister(&ipv6_devconf_dflt);
-	__addrconf_sysctl_unregister(&ipv6_devconf);
-#endif
+	unregister_pernet_subsys(&addrconf_ops);
 
 	rtnl_lock();
 
-- 
1.5.3.4


^ permalink raw reply related

* [PATCH net-2.6.25 5/6][NETNS]: Use the per-net ipv6_devconf_dflt
From: Pavel Emelyanov @ 2008-01-10 14:08 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: David Miller, Linux Netdev List, devel, Daniel Lezcano,
	Benjamin Thery
In-Reply-To: <478623C0.7030008@openvz.org>

All its users are in net/ipv6/addrconf.c's sysctl handlers.
Since they already have the struct net to get from, the
per-net ipv6_devconf_dflt can already be used.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---
 net/ipv6/addrconf.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3ad081e..9b96de3 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -334,7 +334,7 @@ static struct inet6_dev * ipv6_add_dev(struct net_device *dev)
 
 	rwlock_init(&ndev->lock);
 	ndev->dev = dev;
-	memcpy(&ndev->cnf, &ipv6_devconf_dflt, sizeof(ndev->cnf));
+	memcpy(&ndev->cnf, dev->nd_net->ipv6.devconf_dflt, sizeof(ndev->cnf));
 	ndev->cnf.mtu6 = dev->mtu;
 	ndev->cnf.sysctl = NULL;
 	ndev->nd_parms = neigh_parms_alloc(dev, &nd_tbl);
@@ -481,11 +481,11 @@ static void addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old)
 	struct net *net;
 
 	net = (struct net *)table->extra2;
-	if (p == &ipv6_devconf_dflt.forwarding)
+	if (p == &net->ipv6.devconf_dflt->forwarding)
 		return;
 
 	if (p == &ipv6_devconf.forwarding) {
-		ipv6_devconf_dflt.forwarding = ipv6_devconf.forwarding;
+		net->ipv6.devconf_dflt->forwarding = ipv6_devconf.forwarding;
 		addrconf_forward_change(net);
 	} else if ((!*p) ^ (!old))
 		dev_forward_change((struct inet6_dev *)table->extra1);
-- 
1.5.3.4


^ permalink raw reply related

* [PATCH net-2.6.25 6/6][NETNS]: Use the per-net ipv6_devconf(_all) in sysctl handlers
From: Pavel Emelyanov @ 2008-01-10 14:10 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel, Daniel Lezcano, Benjamin Thery
In-Reply-To: <478623C0.7030008@openvz.org>

Actually the net->ipv6.devconf_all can be used in a few places,
but to keep the /proc/sys/net/ipv6/conf/ sysctls work consistently
in the namespace we should use the per-net devconf_all in the
sysctl "forwarding" handler.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---
 net/ipv6/addrconf.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 9b96de3..cd90f9a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -456,7 +456,7 @@ static void dev_forward_change(struct inet6_dev *idev)
 }
 
 
-static void addrconf_forward_change(struct net *net)
+static void addrconf_forward_change(struct net *net, __s32 newf)
 {
 	struct net_device *dev;
 	struct inet6_dev *idev;
@@ -466,8 +466,8 @@ static void addrconf_forward_change(struct net *net)
 		rcu_read_lock();
 		idev = __in6_dev_get(dev);
 		if (idev) {
-			int changed = (!idev->cnf.forwarding) ^ (!ipv6_devconf.forwarding);
-			idev->cnf.forwarding = ipv6_devconf.forwarding;
+			int changed = (!idev->cnf.forwarding) ^ (!newf);
+			idev->cnf.forwarding = newf;
 			if (changed)
 				dev_forward_change(idev);
 		}
@@ -484,9 +484,10 @@ static void addrconf_fixup_forwarding(struct ctl_table *table, int *p, int old)
 	if (p == &net->ipv6.devconf_dflt->forwarding)
 		return;
 
-	if (p == &ipv6_devconf.forwarding) {
-		net->ipv6.devconf_dflt->forwarding = ipv6_devconf.forwarding;
-		addrconf_forward_change(net);
+	if (p == &net->ipv6.devconf_all->forwarding) {
+		__s32 newf = net->ipv6.devconf_all->forwarding;
+		net->ipv6.devconf_dflt->forwarding = newf;
+		addrconf_forward_change(net, newf);
 	} else if ((!*p) ^ (!old))
 		dev_forward_change((struct inet6_dev *)table->extra1);
 
-- 
1.5.3.4



^ permalink raw reply related

* ipip tunnel code (IPV4)
From: Andy Johnson @ 2008-01-10 14:34 UTC (permalink / raw)
  To: netdev

Hello,

I am trying to learn the IPV4 ipip tunnel code  (net/ipv4/ipip.c)
and I have two little questions about
semantics of variables:

ipip_fb_tunnel_init - what does "fb" stand for ?

In tunnels_wc   : what does "wc" stand for ?

Regards,
Andy

^ permalink raw reply

* TCP/IP stack / SMP kernel
From: Jeba Anandhan @ 2008-01-10 14:41 UTC (permalink / raw)
  To: netdev

Hi All,
I am just wondering how TCP/IP stack runs in SMP kernel with multi
processor environment?. will TCP/IP stack be on one processor or it is
shared among the different processors?

thanks
Jeba

^ permalink raw reply

* Re: SMP code / network stack
From: Eric Dumazet @ 2008-01-10 14:45 UTC (permalink / raw)
  To: Jeba Anandhan; +Cc: netdev, matthew.hattersley
In-Reply-To: <1199973946.29856.27.camel@vglwks010.vgl2.office.vaioni.com>

On Thu, 10 Jan 2008 14:05:46 +0000
Jeba Anandhan <jeba.anandhan@vaioni.com> wrote:

> Hi All,
> 
> If a server has multiple processors and N number of ethernet cards, is
> it possible to handle transmission by each processor separately? .In
> other words, each processor will be responsible for tx of few ethernet
> cards?.
> 
> 
> 
> Example: Server has 4 processors and 8 ethernet cards. is it possible
> for each processor for transmission using 2 ethernet cards only?. So
> that, at a instant , data will be send out from 8 ethernet cards.

Hi Jeba

Modern ethernet cards have a big TX queue, so that even one CPU is enough
to keep several cards busy in //

You can check /proc/interrupts and change /proc/irq/*/smp_affinities to direct IRQ to 
particular cpus, but transmit is usually trigered by processes that might run on different
cpus.

If all ethernet cards are on the same IRQ, then you might have a problem...

Example on a dual processor :
# cat /proc/interrupts 
           CPU0       CPU1       
  0:   11472559   74291833    IO-APIC-edge  timer
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
 81:          0          0   IO-APIC-level  ohci_hcd
 97: 1830022231        847   IO-APIC-level  ehci_hcd, eth0
121:  163095662  166443627   IO-APIC-level  libata
NMI:          0          0 
LOC:   85887285   85887193 
ERR:          0
MIS:          0

You can see eth0 is on IRQ 97
Then :
# cat /proc/irq/97/smp_affinity 
00000001
# echo 2 >/proc/irq/97/smp_affinity
# grep 97 /proc/interrupts
 97: 1830035216       2259   IO-APIC-level  ehci_hcd, eth0
# sleep 10
# grep 97 /proc/interrupts
 97: 1830035216       5482   IO-APIC-level  ehci_hcd, eth0

You can see only CPU1 is now handling IRQ 97 (but CPU0 is allowed to give to eth0 some transmit work)

You might want to check /proc/net/softnet_stat too.

If your server is doing something very special (network trafic, no disk accesses or number crunching),
 you might need to bind application processes to cpus, not only network irqs.

process A, using nic eth0 & eth1, bound to CPU 0 (process and IRQs)
process B, using nic eth2 & eth3, bound to CPU 1
process C, using nic eth4 & eth5, bound to CPU 2
process D, using nic eth6 & eth7, bound to CPU 3


Also, take a look at "ethtool -c ethX" command

^ permalink raw reply

* Re: [PATCH 0/3] bonding: 3 fixes for 2.6.24
From: Andy Gospodarek @ 2008-01-10 14:51 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Jay Vosburgh, Andy Gospodarek, Krzysztof Oledzki, netdev,
	Jeff Garzik, David Miller
In-Reply-To: <20080110005809.GA3851@gondor.apana.org.au>

On Thu, Jan 10, 2008 at 11:58:09AM +1100, Herbert Xu wrote:
> On Wed, Jan 09, 2008 at 03:19:10PM -0800, Jay Vosburgh wrote:
> >
> > >No that's not the point.  The point is to move the majority of the code
> > >into process context so that you can take the RTNL.  Once you have taken
> > >the RTNL you can disable BH all you want and I don't care one bit.
> > 
> > 	I'm not sure how we could move more code into a process context;
> > much of the bonding driver is at the mercy of its callers, as in this
> > case.  The monitoring stuff and enslave / deslave is all in a process
> > context now (workqueue).  The transmit processing functions, for
> > example, can't be assumed to be in any particular context as they're
> > called by dev_queue_xmit.
> 
> No I'm not calling for you to move any more code into process context.
> I was replying to the comment that changing the read_lock calls in
> process context to read_lock_bh somehow undoes the benefit of moving
> softirq code into process context.  It does not since the point of the
> move is to be able to take the RTNL, which you can still do as long as
> you do it before you disable BH.
> 

That wasn't the only purpose, Herbert.  Making sure that calls to
dev_set_mac_address were called from process context was important at
the time of the coding as well since at least the tg3 driver took locks
that could not be taken reliably in soft-irq context.  Michael Chan
fixed this here:

commit 986e0aeb9ae09127b401c3baa66f15b7a31f354c
Author: Michael Chan <mchan@broadcom.com>
Date:   Sat May 5 12:10:20 2007 -0700

    [TG3]: Remove reset during MAC address changes.

so if wasn't as much of an issue after that, but moving as much of the
code to process context was important for that as well (hence the move
to not continue to try to not use bh-locks everywhere).



^ permalink raw reply

* [PATCH net-2.6.25][NEIGH]: Add a comment describing what a NUD stands for.
From: Pavel Emelyanov @ 2008-01-10 15:05 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Alexey Kuznetsov, devel

When I studied the neighbor code I puzzled over what
the NUD can mean for quite a long time.

Finally I asked Alexey and he said that this was smth
like "neighbor unreachability detection".

Does it worth adding a comment helping future developers 
understand what's going on?

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 09f9fc6..bc34144 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -26,6 +26,10 @@
 #include <linux/sysctl.h>
 #include <net/rtnetlink.h>
 
+/*
+ * NUD stands for "neighbor unreachability detection"
+ */
+
 #define NUD_IN_TIMER	(NUD_INCOMPLETE|NUD_REACHABLE|NUD_DELAY|NUD_PROBE)
 #define NUD_VALID	(NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE|NUD_PROBE|NUD_STALE|NUD_DELAY)
 #define NUD_CONNECTED	(NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE)

^ permalink raw reply related

* Re: SMP code / network stack
From: Jeba Anandhan @ 2008-01-10 15:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, matthew.hattersley
In-Reply-To: <20080110154548.4b78ec7c.dada1@cosmosbay.com>

Hi Eric,
Thanks for the reply. I have one more doubt. For example, if we have 2
processor and 4 ethernet cards. Only CPU0 does all work through 8 cards.
If we set the affinity to each ethernet card as CPU number, will it be
efficient?.

Will this be default behavior?

# cat /proc/interrupts 
           CPU0       CPU1       
  0:   11472559   74291833    IO-APIC-edge  timer
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
 81:          0          0   IO-APIC-level  ohci_hcd
 97: 1830022231        847   IO-APIC-level  ehci_hcd, eth0
 97: 3830012232        847   IO-APIC-level  ehci_hcd, eth1
 97: 5830052231        847   IO-APIC-level  ehci_hcd, eth2
 97: 6830032213        847   IO-APIC-level  ehci_hcd, eth3
#sleep 10

# cat /proc/interrupts 
           CPU0       CPU1       
  0:   11472559   74291833    IO-APIC-edge  timer
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
 81:          0          0   IO-APIC-level  ohci_hcd
 97: 2031409801        847   IO-APIC-level  ehci_hcd, eth0
 97: 4813981390        847   IO-APIC-level  ehci_hcd, eth1
 97: 7123982139        847   IO-APIC-level  ehci_hcd, eth2
 97: 8030193010        847   IO-APIC-level  ehci_hcd, eth3


Instead of the above mentioned ,if we set the affinity for eth2 and
eth3.
the output will be

# cat /proc/interrupts 
           CPU0       CPU1       
  0:   11472559   74291833    IO-APIC-edge  timer
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
 81:          0          0   IO-APIC-level  ohci_hcd
 97: 1830022231        847   IO-APIC-level  ehci_hcd, eth0
 97: 3830012232        847   IO-APIC-level  ehci_hcd, eth1
 97: 5830052231        923   IO-APIC-level  ehci_hcd, eth2
 97: 6830032213        1230   IO-APIC-level  ehci_hcd, eth3
#sleep 10

# cat /proc/interrupts 
           CPU0       CPU1       
  0:   11472559   74291833    IO-APIC-edge  timer
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
 81:          0          0   IO-APIC-level  ohci_hcd
 97: 2300022231        847   IO-APIC-level  ehci_hcd, eth0
 97: 4010212232        847   IO-APIC-level  ehci_hcd, eth1
 97: 5830052231        1847   IO-APIC-level  ehci_hcd, eth2
 97: 6830032213        2337   IO-APIC-level  ehci_hcd, eth3

In this case, will the performance improves?.

Thanks
Jeba
On Thu, 2008-01-10 at 15:45 +0100, Eric Dumazet wrote:

^ permalink raw reply

* Re: [PATCH take2] Re: Nested VLAN causes recursive locking error
From: Patrick McHardy @ 2008-01-10 15:31 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Benny Amorsen, Chuck Ebbert, netdev
In-Reply-To: <20080102234107.GA6902@ami.dom.local>

Jarek Poplawski wrote:
> As a matter of fact I started to doubt it's a real problem: 2 vlan
> headers in the row - is it working?

Yes, apparently some people are using this.

> Anyway, as Patrick pointed, the previous patch was a bit buggy, and
> deeper nesting needs a little more (if it's can work too...). So,
> here is something minimal.
> 
> Patrick, if you think about something else, then of course don't care
> about this patch.

No, this seems fine, thanks. Even better would be a way to get
the last lockdep subclass through lockdep somehow, but I couldn't
find a clean way for this. So I've applied your patch and also
fixed macvlan.



^ permalink raw reply

* [VLAN]: nested VLAN: fix lockdep's recursive locking warning
From: Patrick McHardy @ 2008-01-10 15:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: 01.diff --]
[-- Type: text/x-patch, Size: 1540 bytes --]

[VLAN]: nested VLAN: fix lockdep's recursive locking warning

Allow vlans nesting other vlans without lockdep's warnings (max. 2 levels
i.e. parent + child). Thanks to Patrick McHardy for pointing a bug in the
first version of this patch.

Reported-by: Benny Amorsen

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 4d14fded63dcaf9d5dcf78e2a8ea3f5de2c29eb9
tree 2f0792e8240151b1e5437b05130d1f569175f572
parent e2474f60798c97f5c05d29a906045dd1f416ba7f
author Jarek Poplawski <jarkao2@gmail.com> Thu, 10 Jan 2008 16:25:00 +0100
committer Patrick McHardy <kaber@trash.net> Thu, 10 Jan 2008 16:25:00 +0100

 net/8021q/vlan.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 4add9bd..032bf44 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -323,6 +323,7 @@ static const struct header_ops vlan_header_ops = {
 static int vlan_dev_init(struct net_device *dev)
 {
 	struct net_device *real_dev = VLAN_DEV_INFO(dev)->real_dev;
+	int subclass = 0;
 
 	/* IFF_BROADCAST|IFF_MULTICAST; ??? */
 	dev->flags  = real_dev->flags & ~IFF_UP;
@@ -349,7 +350,11 @@ static int vlan_dev_init(struct net_device *dev)
 		dev->hard_start_xmit = vlan_dev_hard_start_xmit;
 	}
 
-	lockdep_set_class(&dev->_xmit_lock, &vlan_netdev_xmit_lock_key);
+	if (real_dev->priv_flags & IFF_802_1Q_VLAN)
+		subclass = 1;
+
+	lockdep_set_class_and_subclass(&dev->_xmit_lock,
+				&vlan_netdev_xmit_lock_key, subclass);
 	return 0;
 }
 

^ permalink raw reply related

* [MACVLAN]: Prevent nesting macvlan devices
From: Patrick McHardy @ 2008-01-10 15:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: 02.diff --]
[-- Type: text/x-patch, Size: 1217 bytes --]

[MACVLAN]: Prevent nesting macvlan devices

Don't allow to nest macvlan devices since it will cause lockdep warnings and
isn't really useful for anything.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 80a76fbde679793a17482a3dd842386801fca66b
tree 07f67e78ac0ae505a5de81e7e770a1b7d597f120
parent 4d14fded63dcaf9d5dcf78e2a8ea3f5de2c29eb9
author Patrick McHardy <kaber@trash.net> Thu, 10 Jan 2008 16:25:01 +0100
committer Patrick McHardy <kaber@trash.net> Thu, 10 Jan 2008 16:25:01 +0100

 drivers/net/macvlan.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 2e4bcd5..e8dc2f4 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -384,6 +384,13 @@ static int macvlan_newlink(struct net_device *dev,
 	if (lowerdev == NULL)
 		return -ENODEV;
 
+	/* Don't allow macvlans on top of other macvlans - its not really
+	 * wrong, but lockdep can't handle it and its not useful for anything
+	 * you couldn't do directly on top of the real device.
+	 */
+	if (lowerdev->rtnl_link_ops == dev->rtnl_link_ops)
+		return -ENODEV;
+
 	if (!tb[IFLA_MTU])
 		dev->mtu = lowerdev->mtu;
 	else if (dev->mtu > lowerdev->mtu)

^ permalink raw reply related

* Re: No idea about shaping trough many pc
From: Lennart Sorensen @ 2008-01-10 15:38 UTC (permalink / raw)
  To: Badalian Vyacheslav; +Cc: netdev
In-Reply-To: <4785E01B.3080900@bigtelecom.ru>

On Thu, Jan 10, 2008 at 12:06:35PM +0300, Badalian Vyacheslav wrote:
> Hello all.
> I try more then 2 month resolve problem witch my shaping.  Maybe you can 
> help for me?
> 
> Sheme:
>                                +-------------------+
>                     + ----- | Shaping PC 1 | ---------+
>                     /          +-------------------+              \
> +--------+   /           +--------------------+              \          
> + --------+
> | Cisco |  +-------- | Shaping PC N  | -----------+ -----| CISCO |
> +--------+   \           +--------------------+              /          
> +---------+
>                     \          +---------------------+           /
>                     + ----- | Shaping PC 20 | --------+
>                                +---------------------+
> 
> Network - Over 10k users. Common bandwidth to INTERNET more then 1 GBs
> All computers have BGP and turn on multipath.
> Cisco can't do load sharing by Packet (its can resolve all my problems 
> =((( ). Only by DST IP, SRC IP, or +Level4.
> Ok. User must have speed 1mbs.
> Lets look variants:
> 1. Create rules to user = (1mbs/N computers). If user use N connection 
> all great, but if it use 1 connection his speed = 1mbs/N - its not look 
> good. All be great if cisco can PER PACKET load sharing =(
> 2. Create rules to user = 1mbs. If user use 1 connection all great, but 
> if it use N connection his speed much more then needed limit =(
> 
> Why i use 20 PC? Becouse 1 pc normal forward 100-150mbs... when it have 
> 100% cpu usage on Sofware Interrupts...

I have managed forwarding of 600Mbps using about 15% CPU load on a
500MHz Geode LX, using 4 100Mbit pcnet32 interfaces and a small tweak to
how the NAPI is implemented on it.  Adding traffic shapping and such to
the processing would certainly increase the CPU load, but hopefully not
by much.  The reason I didn't get more than 600Mbps was that the PCI bus
is now full.

> Any idea how to resolve this problem?
> 
> In my dreams (feature request to netdev ;) ):
> Get PC - title: MASTER TC.  All 20 PC syncronize statistic with MASTER 
> and have common rules and statistic. Then i use variant 2 and will be 
> happy... but its not real? =(
> Maybe have other variants?

Well now sure about synchornizing and all that.  I still think if I can
manage 600Mbps forwarding rate using a slow poke Geode then a modern CPU
like a Q6600 with a number of PCIe gig ports should be able to do quite
a lot.

The tweak I did was to add a timer to the driver that I can activate
whenever I finish emptying the receive queue.  When the timer expires it
adds the port back to the NAPI queue, and when it is called again the
poll will either process whatever packets arrived during the delay, or
it will actually unmask the IRQ and go back to IRQ mode.  The delay I
use is 1 jiffy, and I run with 1000HZ and set the queues to 256 packets,
since 1ms at 100MBps can provide at most about 200 packets (64byte worst
case).  I simply check whenever I empty the queue how many packets I
just processed.  If greater than 0, I enable the timer to expire on the
next jiffy and leave the port masked after removing port from napi
polling, and if it was 0 then I must have been called again after the
timer expired and still had no packets to process in which case I unmask
the IRQ and don't enable the timer.  I had to change the HZ to 1000
since at 250 or 100 I wouldn't be able to handle the worst case number
of packets (the pcnet32 has a maximum of 512 packets in a queue).

With NAPI the normal behaviour is that whenever you empty the receive
queue, you reenable IRQs, but it doesn't take that fast a CPU to
actually empty the queue all the time and then you end up with the
overhead for masking IRQs everytime you receive packets, process them,
and then the overhead of unmasking the IRQ just to within a fraction of
a milisecond getting an IRQ for the next packet.  With the delay until
the next jiffy for unmasking the IRQ you end up causing a potential lag
on processing packets of up to 1ms, although on average less than that,
but the IRQ load drops dramatically and the overhead of managing the IRQ
masking and the IRQ handler goes away.  In the case of this system the
CPU load dropped from 90% at 500Mbps to 15% at 600Mbps, and the
interrupt rate dropped from one IRQ every couple of packets, to one IRQ
at the start of each burst of packets.

I believe some GB ethernet ports and most 10Gig ports have the ability
to do delayed IRQ where they wait for a certain number of packets before
generating an IRQ, which is pretty much what I tried to emulate with my
tweak and it sure works amazingly well.

--
Len Sorensen

^ permalink raw reply

* Re: [Bugme-new]  [Bug 9719] New: when a system is configured as a bridge, and at the same time configured to have multipath weighted route, with one leg goes thru NAT and another without NAT, the nat path will intermittently get packets leaking out using internal IP without being SNAT-ted
From: Patrick McHardy @ 2008-01-10 15:41 UTC (permalink / raw)
  To: mingching.tiew; +Cc: Andrew Morton, bugme-daemon, netdev
In-Reply-To: <20080109152813.83fb8168.akpm@linux-foundation.org>

Andrew Morton wrote:
>> Distribution: iptables 1.4.0 was used with kernel 2.6.23 and iptables 1.3.8
>> with 2.6.22.15
>> Hardware Environment: 3 interfaces, 2 interfaces bridged to form br0, and
>> another connects to internet using pppoe.
>> Software Environment: bridge, multipath routing
>> Problem Description: when a system is configured as a bridge with IP assigned
>> to br0 interface, and at the same time it is configured to have multipath
>> weighted default route, and one of the default route is NAT-ed and another of
>> the default route is not NAT-ed, then it is NAT-ed interface will occasionally
>> get packets leaking out to it with packets with private IPs.


That is most likely because the route changes over time (when the cache
is flushed) and the NAT mappings for the connection have been set up on
a different interface. The way to properly do this is to add routing
rules based on fwmark and use CONNMARK to bind a connection to one of
the interfaces after the initial multipath routing decision.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox