Netdev List
 help / color / mirror / Atom feed
* Re: NOHZ: local_softirq_pending 08
From: Johannes Berg @ 2009-10-23 14:31 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil
In-Reply-To: <4AE1BD3D.3020007@imap.cc>

[-- Attachment #1: Type: text/plain, Size: 1369 bytes --]

On Fri, 2009-10-23 at 16:27 +0200, Tilman Schmidt wrote:

> >> --- a/drivers/isdn/i4l/isdn_ppp.c
> >> +++ b/drivers/isdn/i4l/isdn_ppp.c
> >> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
> >>  #endif /* CONFIG_IPPP_FILTER */
> >>  	skb->dev = dev;
> >>  	skb_reset_mac_header(skb);
> >> -	netif_rx(skb);
> >> +	if (in_interrupt())
> >> +		netif_rx(skb);
> >> +	else
> >> +		netif_rx_ni(skb);
> > 
> > So you've verified that the entire i4l stack can cope with being called
> > twice on the same CPU, from different contexts?
> 
> What makes you think so?

I thought I'd explained this in my other email. *sigh*

You're squelching a warning, which comes from the fact that you're
calling something that calls into netif_rx() with softirqs enabled. That
would seem to imply that potentially a softirq could at the same time
call into that code too.

Basically, what happens now is this:

disable softirqs
call into i4l/ppp
i4l/ppp code
call netif_rx()
more code
enable softirqs


You're changing it to

call into i4l/ppp
i4l/ppp code
call netif_rx_ni()
more code

so the entire chunks "i4l/ppp code" and "more code" are now no longer
protected against being interrupted by a softirq that runs the same
code, maybe for a different device, on the same CPU.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH] net: Adjust softirq raising in __napi_schedule
From: Tilman Schmidt @ 2009-10-23 14:39 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch,
	Oliver Hartkopp
In-Reply-To: <1256200021.12174.11.camel@johannes.local>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johannes Berg schrieb:
> On Wed, 2009-10-21 at 23:39 +0200, Jarek Poplawski wrote:
> 
>> I'm not sure I can understand your question. This patch is mainly to
>> avoid using netif_rx()/netif_rx_ni() pair as a test of proper process
>> context handling; IMHO there're better tools for this (lockdep,
>> WARN_ON's).
> 
> I'm saying that it seems to me, as indicated by the API (and without
> proof otherwise that's how it is) the networking layer needs to have
> packets handed to it with softirqs disabled.

Strange. Then what are the two separate functions netif_rx() and
netif_rx_ni() for?

> This really should be obvious. You're fixing the warning at the source
> of the warning, rather than the source of the problem.

Good idea. So please do tell us where the source of the problem is.

Thanks,
Tilman

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFK4cALQ3+did9BuFsRAnW8AKCP4ey+gT2RZBYpzx91PaXd11A/PwCgh35g
fhEbJs++1BRIQ3encV8fIm4=
=SSaA
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH] net: Adjust softirq raising in __napi_schedule
From: Johannes Berg @ 2009-10-23 14:46 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch,
	Oliver Hartkopp
In-Reply-To: <4AE1C00B.5010008@imap.cc>

[-- Attachment #1: Type: text/plain, Size: 505 bytes --]

On Fri, 2009-10-23 at 16:39 +0200, Tilman Schmidt wrote:

> Strange. Then what are the two separate functions netif_rx() and
> netif_rx_ni() for?

netif_rx_ni() disables preemption.

> > This really should be obvious. You're fixing the warning at the source
> > of the warning, rather than the source of the problem.
> 
> Good idea. So please do tell us where the source of the problem is.

You use netif_rx_ni() instead of netif_rx() at whatever place that
causes this problem.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
From: Cyrill Gorcunov @ 2009-10-23 15:18 UTC (permalink / raw)
  To: netdev
  Cc: Denys Fedoryschenko, Michal Ostrowski, Eric Dumazet, linux-ppp,
	paulus, mostrows
In-Reply-To: <20091020190821.GO5181@lenovo>

[Cyrill Gorcunov - Tue, Oct 20, 2009 at 11:08:21PM +0400]
...
| 
| Just to update status of the issue. The key moment is that pppoe_flush_dev
| may be called asynchronously (especially via sysfs on dev entry, for example
| we retrieve mtu of device and while we at it other process may update mtu
| via sysfs). So I'm returning pppoe_hash_lock back which should eliminate a
| number of lock complains and make locking scheme easier. All-in-one: I'm
| working on it. Just need some more time.
| 
| 	-- Cyrill

Another status update -- the patch is under testing stage. Hope we will
reveal proper patch soon (in a few days I guess).

	-- Cyrill

^ permalink raw reply

* [PATCH] net: Cleanup redundant tests on unsigned
From: Roel Kluin @ 2009-10-23 15:21 UTC (permalink / raw)
  To: David S. Miller, netdev, Henner Eisen, linux-x25, Andrew Morton

If there is data, the unsigned skb->len is greater than 0.

rt.sigdigits is unsigned as well, so the test `>= 0' is
always true, the other part of the test catches wrapped
values.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
 net/x25/x25_in.c    |    2 +-
 net/x25/x25_route.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/x25/x25_in.c b/net/x25/x25_in.c
index 7d7c3ab..96d9227 100644
--- a/net/x25/x25_in.c
+++ b/net/x25/x25_in.c
@@ -114,7 +114,7 @@ static int x25_state1_machine(struct sock *sk, struct sk_buff *skb, int frametyp
 			/*
 			 *	Copy any Call User Data.
 			 */
-			if (skb->len >= 0) {
+			if (skb->len > 0) {
 				skb_copy_from_linear_data(skb,
 					      x25->calluserdata.cuddata,
 					      skb->len);
diff --git a/net/x25/x25_route.c b/net/x25/x25_route.c
index 2c999cc..66961ea 100644
--- a/net/x25/x25_route.c
+++ b/net/x25/x25_route.c
@@ -190,7 +190,7 @@ int x25_route_ioctl(unsigned int cmd, void __user *arg)
 		goto out;
 
 	rc = -EINVAL;
-	if (rt.sigdigits < 0 || rt.sigdigits > 15)
+	if (rt.sigdigits > 15)
 		goto out;
 
 	dev = x25_dev_get(rt.device);

^ permalink raw reply related

* Re: [PATCH] net: Fix RPF to work with policy routing
From: Ben Greear @ 2009-10-23 15:34 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, netdev, atis, eric.dumazet, zenczykowski
In-Reply-To: <1256295075.6264.59.camel@dogo.mojatatu.com>

jamal wrote:
> with the ipt or skbedit actions or via netfilter i could
> set marks which could be as trivial as "set mark X if packet 
> came in via eth0 or eth1 and mark Y if they came in via gre0"
>   
I implemented something similar while allowing for virtual router like
applications.  I had to add a mark very early in the pkt rx logic in dev.c,
and had to add a 'skb_default_mark' member to the netdevice because
the route lookup is done before the normal iptables logic ran.  Without
this, if a flow already existed for pkts coming in eth1, if the packet came
back in eth2, it would use eth1's flow.

I'll dig out the patch if anyone is interested...

Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply

* [PATCH net-next-2.6] ipip: convert hash tables locking to RCU
From: Eric Dumazet @ 2009-10-23 15:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

IPIP tunnels use one rwlock to protect their hash tables.

This locking scheme can be converted to RCU for free, since netdevice
already must wait for a RCU grace period at dismantle time.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ipip.c |   49 ++++++++++++++++++++++++++--------------------
 1 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 6a55392..3bd6998 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -134,7 +134,13 @@ static void ipip_fb_tunnel_init(struct net_device *dev);
 static void ipip_tunnel_init(struct net_device *dev);
 static void ipip_tunnel_setup(struct net_device *dev);
 
-static DEFINE_RWLOCK(ipip_lock);
+/*
+ * Locking : hash tables are protected by RCU and a spinlock
+ */
+static DEFINE_SPINLOCK(ipip_lock);
+
+#define for_each_ip_tunnel_rcu(start) \
+	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
 
 static struct ip_tunnel * ipip_tunnel_lookup(struct net *net,
 		__be32 remote, __be32 local)
@@ -144,20 +150,21 @@ static struct ip_tunnel * ipip_tunnel_lookup(struct net *net,
 	struct ip_tunnel *t;
 	struct ipip_net *ipn = net_generic(net, ipip_net_id);
 
-	for (t = ipn->tunnels_r_l[h0^h1]; t; t = t->next) {
+	for_each_ip_tunnel_rcu(ipn->tunnels_r_l[h0 ^ h1])
 		if (local == t->parms.iph.saddr &&
 		    remote == t->parms.iph.daddr && (t->dev->flags&IFF_UP))
 			return t;
-	}
-	for (t = ipn->tunnels_r[h0]; t; t = t->next) {
+
+	for_each_ip_tunnel_rcu(ipn->tunnels_r[h0])
 		if (remote == t->parms.iph.daddr && (t->dev->flags&IFF_UP))
 			return t;
-	}
-	for (t = ipn->tunnels_l[h1]; t; t = t->next) {
+
+	for_each_ip_tunnel_rcu(ipn->tunnels_l[h1])
 		if (local == t->parms.iph.saddr && (t->dev->flags&IFF_UP))
 			return t;
-	}
-	if ((t = ipn->tunnels_wc[0]) != NULL && (t->dev->flags&IFF_UP))
+
+	t = rcu_dereference(ipn->tunnels_wc[0]);
+	if (t && (t->dev->flags&IFF_UP))
 		return t;
 	return NULL;
 }
@@ -193,9 +200,9 @@ static void ipip_tunnel_unlink(struct ipip_net *ipn, struct ip_tunnel *t)
 
 	for (tp = ipip_bucket(ipn, t); *tp; tp = &(*tp)->next) {
 		if (t == *tp) {
-			write_lock_bh(&ipip_lock);
+			spin_lock_bh(&ipip_lock);
 			*tp = t->next;
-			write_unlock_bh(&ipip_lock);
+			spin_unlock_bh(&ipip_lock);
 			break;
 		}
 	}
@@ -205,10 +212,10 @@ static void ipip_tunnel_link(struct ipip_net *ipn, struct ip_tunnel *t)
 {
 	struct ip_tunnel **tp = ipip_bucket(ipn, t);
 
+	spin_lock_bh(&ipip_lock);
 	t->next = *tp;
-	write_lock_bh(&ipip_lock);
-	*tp = t;
-	write_unlock_bh(&ipip_lock);
+	rcu_assign_pointer(*tp, t);
+	spin_unlock_bh(&ipip_lock);
 }
 
 static struct ip_tunnel * ipip_tunnel_locate(struct net *net,
@@ -267,9 +274,9 @@ static void ipip_tunnel_uninit(struct net_device *dev)
 	struct ipip_net *ipn = net_generic(net, ipip_net_id);
 
 	if (dev == ipn->fb_tunnel_dev) {
-		write_lock_bh(&ipip_lock);
+		spin_lock_bh(&ipip_lock);
 		ipn->tunnels_wc[0] = NULL;
-		write_unlock_bh(&ipip_lock);
+		spin_unlock_bh(&ipip_lock);
 	} else
 		ipip_tunnel_unlink(ipn, netdev_priv(dev));
 	dev_put(dev);
@@ -318,7 +325,7 @@ static int ipip_err(struct sk_buff *skb, u32 info)
 
 	err = -ENOENT;
 
-	read_lock(&ipip_lock);
+	rcu_read_lock();
 	t = ipip_tunnel_lookup(dev_net(skb->dev), iph->daddr, iph->saddr);
 	if (t == NULL || t->parms.iph.daddr == 0)
 		goto out;
@@ -333,7 +340,7 @@ static int ipip_err(struct sk_buff *skb, u32 info)
 		t->err_count = 1;
 	t->err_time = jiffies;
 out:
-	read_unlock(&ipip_lock);
+	rcu_read_unlock();
 	return err;
 }
 
@@ -351,11 +358,11 @@ static int ipip_rcv(struct sk_buff *skb)
 	struct ip_tunnel *tunnel;
 	const struct iphdr *iph = ip_hdr(skb);
 
-	read_lock(&ipip_lock);
+	rcu_read_lock();
 	if ((tunnel = ipip_tunnel_lookup(dev_net(skb->dev),
 					iph->saddr, iph->daddr)) != NULL) {
 		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-			read_unlock(&ipip_lock);
+			rcu_read_unlock();
 			kfree_skb(skb);
 			return 0;
 		}
@@ -374,10 +381,10 @@ static int ipip_rcv(struct sk_buff *skb)
 		nf_reset(skb);
 		ipip_ecn_decapsulate(iph, skb);
 		netif_rx(skb);
-		read_unlock(&ipip_lock);
+		rcu_read_unlock();
 		return 0;
 	}
-	read_unlock(&ipip_lock);
+	rcu_read_unlock();
 
 	return -1;
 }

^ permalink raw reply related

* [PATCH] net: Cleanup redundant tests on unsigned
From: Roel Kluin @ 2009-10-23 15:59 UTC (permalink / raw)
  To: David S. Miller, netdev, Andrew Morton

optlen is unsigned so the `< 0' test is never true.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
 net/can/raw.c          |    2 --
 net/compat.c           |    3 ---
 net/ipv4/ip_sockglue.c |    2 +-
 3 files changed, 1 insertions(+), 6 deletions(-)

Or should the tests in net/can/raw.c or net/compat.c
be replaced by something else?

diff --git a/net/can/raw.c b/net/can/raw.c
index b5e8979..9e18c2a 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -424,8 +424,6 @@ static int raw_setsockopt(struct socket *sock, int level, int optname,
 
 	if (level != SOL_CAN_RAW)
 		return -EINVAL;
-	if (optlen < 0)
-		return -EINVAL;
 
 	switch (optname) {
 
diff --git a/net/compat.c b/net/compat.c
index a407c3a..e373995 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -390,9 +390,6 @@ asmlinkage long compat_sys_setsockopt(int fd, int level, int optname,
 	int err;
 	struct socket *sock;
 
-	if (optlen < 0)
-		return -EINVAL;
-
 	if ((sock = sockfd_lookup(fd, &err))!=NULL)
 	{
 		err = security_socket_setsockopt(sock,level,optname);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index e982b5c..15dbf30 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -480,7 +480,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 	case IP_OPTIONS:
 	{
 		struct ip_options *opt = NULL;
-		if (optlen > 40 || optlen < 0)
+		if (optlen > 40)
 			goto e_inval;
 		err = ip_options_get_from_user(sock_net(sk), &opt,
 					       optval, optlen);

^ permalink raw reply related

* Re: [Patch] sctp: remove deprecated SCTP_GET_*_OLD stuffs
From: Vlad Yasevich @ 2009-10-23 15:58 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Amerigo Wang, linux-kernel, netdev, akpm
In-Reply-To: <20091022214439.GA2635@merkur.ravnborg.org>



Sam Ravnborg wrote:
> On Thu, Oct 22, 2009 at 04:53:30PM -0400, Vlad Yasevich wrote:
>>> diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
>>> index be2334a..0991f1b 100644
>>> --- a/include/net/sctp/user.h
>>> +++ b/include/net/sctp/user.h
>>> @@ -131,14 +131,6 @@ enum sctp_optname {
>>>  #define SCTP_SOCKOPT_BINDX_REM	SCTP_SOCKOPT_BINDX_REM
>>>  	SCTP_SOCKOPT_PEELOFF, 	/* peel off association. */
>>>  #define SCTP_SOCKOPT_PEELOFF	SCTP_SOCKOPT_PEELOFF
>>> -	SCTP_GET_PEER_ADDRS_NUM_OLD, 	/* Get number of peer addresss. */
>>> -#define SCTP_GET_PEER_ADDRS_NUM_OLD	SCTP_GET_PEER_ADDRS_NUM_OLD
>>> -	SCTP_GET_PEER_ADDRS_OLD, 	/* Get all peer addresss. */
>>> -#define SCTP_GET_PEER_ADDRS_OLD	SCTP_GET_PEER_ADDRS_OLD
>>> -	SCTP_GET_LOCAL_ADDRS_NUM_OLD, 	/* Get number of local addresss. */
>>> -#define SCTP_GET_LOCAL_ADDRS_NUM_OLD	SCTP_GET_LOCAL_ADDRS_NUM_OLD
>>> -	SCTP_GET_LOCAL_ADDRS_OLD, 	/* Get all local addresss. */
>>> -#define SCTP_GET_LOCAL_ADDRS_OLD	SCTP_GET_LOCAL_ADDRS_OLD
>>>  	SCTP_SOCKOPT_CONNECTX_OLD, /* CONNECTX old requests. */
>> After running the regression suite against this patch I find that we can't
>> remove the enum values.  Removing the enums changes the value for the remainder
>> of the definitions and breaks binary compatibility for applications that use
>> those trailing options.
>>
>> You should be ok with removing the #defines and actual code that uses them,
>> but not the enums.  You can even rename the enums, but we must preserve
>> numeric ordering.
> 
> If we really depend on the actual value of an enum as in this case,
> then e should assign them direct to better document this.
> 
> 	Sam
> 

I agree.  I have a patch that converts the enum to just a #define section that
I'll apply on top of this removal patch and document the deletion.

-vlad

^ permalink raw reply

* [PATCH] atm: Cleanup redundant tests on unsigned
From: Roel Kluin @ 2009-10-23 16:09 UTC (permalink / raw)
  To: Chas Williams, netdev, Andrew Morton

The variables are unsigned so the `< 0' test always fails, the
other part of the test catches wrapped values.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
 drivers/atm/fore200e.c |    4 ++--
 drivers/atm/he.c       |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c
index f766cc4..bc53fed 100644
--- a/drivers/atm/fore200e.c
+++ b/drivers/atm/fore200e.c
@@ -2906,8 +2906,8 @@ fore200e_proc_read(struct atm_dev *dev, loff_t* pos, char* page)
 	u32 media_index    = FORE200E_MEDIA_INDEX(fore200e->bus->read(&fore200e->cp_queues->media_type));
 	u32 oc3_index;
 
-	if ((media_index < 0) || (media_index > 4))
-	    media_index = 5;
+	if (media_index > 4)
+		media_index = 5;
 	
 	switch (fore200e->loop_mode) {
 	    case ATM_LM_NONE:    oc3_index = 0;
diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index 7066703..e906658 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -2739,7 +2739,7 @@ he_ioctl(struct atm_dev *atm_dev, unsigned int cmd, void __user *arg)
 			spin_lock_irqsave(&he_dev->global_lock, flags);
 			switch (reg.type) {
 				case HE_REGTYPE_PCI:
-					if (reg.addr < 0 || reg.addr >= HE_REGMAP_SIZE) {
+					if (reg.addr >= HE_REGMAP_SIZE) {
 						err = -EINVAL;
 						break;
 					}

^ permalink raw reply related

* Re: [PATCH] Modify bonding hash transmit policies to use the packet's source MAC address
From: Eric Dumazet @ 2009-10-23 16:02 UTC (permalink / raw)
  To: Jasper Spaans; +Cc: Jay Vosburgh, netdev@vger.kernel.org
In-Reply-To: <20091023140845.GA24564@spaans.fox.local>

Jasper Spaans a écrit :
> Modify bonding hash transmit policies to use the psource MAC address of
> the packet instead of the MAC address configured for the bonding device.
> 
> The old sitation conflicts with the documentation.
> 
> Signed-off-by: Jasper Spaans <spaans@fox-it.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

^ permalink raw reply

* Re: [PATCH] Remove bond_dev from xmit_hash_policy call.
From: Eric Dumazet @ 2009-10-23 16:05 UTC (permalink / raw)
  To: Jasper Spaans; +Cc: Jay Vosburgh, netdev@vger.kernel.org
In-Reply-To: <20091023140924.GA24611@spaans.fox.local>

Jasper Spaans a écrit :
> Now that the bonding device is no longer used in determining the device to
> which to send packets, it can be dropped from the argument list of the various
> xmit_hash_policy calls.
> 
> Signed-off-by: Jasper Spaans <spaans@fox-it.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Next step would be to use reciprocal divide :)
I'll take care of this eventually.

^ permalink raw reply

* Re: [PATCH] atm: Cleanup redundant tests on unsigned
From: Eric Dumazet @ 2009-10-23 16:08 UTC (permalink / raw)
  To: Roel Kluin; +Cc: Chas Williams, netdev, Andrew Morton
In-Reply-To: <4AE1D553.8010406@gmail.com>

Roel Kluin a écrit :
> The variables are unsigned so the `< 0' test always fails, the
> other part of the test catches wrapped values.
> 
> Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
> ---

I believe such patches were posted in the past.

General consensus is that compiler is able to do this optimization for us,
and reader doesnt have to ask to himself : "Is the test safe enough ?"


^ permalink raw reply

* [PATCH net-next-2.6] gre: convert hash tables locking to RCU
From: Eric Dumazet @ 2009-10-23 16:14 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

GRE tunnels use one rwlock to protect their hash tables.

This locking scheme can be converted to RCU for free, since netdevice
already must wait for a RCU grace period at dismantle time.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ip_gre.c |   40 +++++++++++++++++++++++-----------------
 1 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 89ff9d5..40f0439 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -156,8 +156,13 @@ struct ipgre_net {
 #define tunnels_r	tunnels[2]
 #define tunnels_l	tunnels[1]
 #define tunnels_wc	tunnels[0]
+/*
+ * Locking : hash tables are protected by RCU and a spinlock
+ */
+static DEFINE_SPINLOCK(ipgre_lock);
 
-static DEFINE_RWLOCK(ipgre_lock);
+#define for_each_ip_tunnel_rcu(start) \
+	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
 
 /* Given src, dst and key, find appropriate for input tunnel. */
 
@@ -175,7 +180,7 @@ static struct ip_tunnel * ipgre_tunnel_lookup(struct net_device *dev,
 		       ARPHRD_ETHER : ARPHRD_IPGRE;
 	int score, cand_score = 4;
 
-	for (t = ign->tunnels_r_l[h0^h1]; t; t = t->next) {
+	for_each_ip_tunnel_rcu(ign->tunnels_r_l[h0 ^ h1]) {
 		if (local != t->parms.iph.saddr ||
 		    remote != t->parms.iph.daddr ||
 		    key != t->parms.i_key ||
@@ -200,7 +205,7 @@ static struct ip_tunnel * ipgre_tunnel_lookup(struct net_device *dev,
 		}
 	}
 
-	for (t = ign->tunnels_r[h0^h1]; t; t = t->next) {
+	for_each_ip_tunnel_rcu(ign->tunnels_r[h0 ^ h1]) {
 		if (remote != t->parms.iph.daddr ||
 		    key != t->parms.i_key ||
 		    !(t->dev->flags & IFF_UP))
@@ -224,7 +229,7 @@ static struct ip_tunnel * ipgre_tunnel_lookup(struct net_device *dev,
 		}
 	}
 
-	for (t = ign->tunnels_l[h1]; t; t = t->next) {
+	for_each_ip_tunnel_rcu(ign->tunnels_l[h1]) {
 		if ((local != t->parms.iph.saddr &&
 		     (local != t->parms.iph.daddr ||
 		      !ipv4_is_multicast(local))) ||
@@ -250,7 +255,7 @@ static struct ip_tunnel * ipgre_tunnel_lookup(struct net_device *dev,
 		}
 	}
 
-	for (t = ign->tunnels_wc[h1]; t; t = t->next) {
+	for_each_ip_tunnel_rcu(ign->tunnels_wc[h1]) {
 		if (t->parms.i_key != key ||
 		    !(t->dev->flags & IFF_UP))
 			continue;
@@ -276,8 +281,9 @@ static struct ip_tunnel * ipgre_tunnel_lookup(struct net_device *dev,
 	if (cand != NULL)
 		return cand;
 
-	if (ign->fb_tunnel_dev->flags & IFF_UP)
-		return netdev_priv(ign->fb_tunnel_dev);
+	dev = ign->fb_tunnel_dev;
+	if (dev->flags & IFF_UP)
+		return netdev_priv(dev);
 
 	return NULL;
 }
@@ -311,10 +317,10 @@ static void ipgre_tunnel_link(struct ipgre_net *ign, struct ip_tunnel *t)
 {
 	struct ip_tunnel **tp = ipgre_bucket(ign, t);
 
+	spin_lock_bh(&ipgre_lock);
 	t->next = *tp;
-	write_lock_bh(&ipgre_lock);
-	*tp = t;
-	write_unlock_bh(&ipgre_lock);
+	rcu_assign_pointer(*tp, t);
+	spin_unlock_bh(&ipgre_lock);
 }
 
 static void ipgre_tunnel_unlink(struct ipgre_net *ign, struct ip_tunnel *t)
@@ -323,9 +329,9 @@ static void ipgre_tunnel_unlink(struct ipgre_net *ign, struct ip_tunnel *t)
 
 	for (tp = ipgre_bucket(ign, t); *tp; tp = &(*tp)->next) {
 		if (t == *tp) {
-			write_lock_bh(&ipgre_lock);
+			spin_lock_bh(&ipgre_lock);
 			*tp = t->next;
-			write_unlock_bh(&ipgre_lock);
+			spin_unlock_bh(&ipgre_lock);
 			break;
 		}
 	}
@@ -476,7 +482,7 @@ static void ipgre_err(struct sk_buff *skb, u32 info)
 		break;
 	}
 
-	read_lock(&ipgre_lock);
+	rcu_read_lock();
 	t = ipgre_tunnel_lookup(skb->dev, iph->daddr, iph->saddr,
 				flags & GRE_KEY ?
 				*(((__be32 *)p) + (grehlen / 4) - 1) : 0,
@@ -494,7 +500,7 @@ static void ipgre_err(struct sk_buff *skb, u32 info)
 		t->err_count = 1;
 	t->err_time = jiffies;
 out:
-	read_unlock(&ipgre_lock);
+	rcu_read_unlock();
 	return;
 }
 
@@ -573,7 +579,7 @@ static int ipgre_rcv(struct sk_buff *skb)
 
 	gre_proto = *(__be16 *)(h + 2);
 
-	read_lock(&ipgre_lock);
+	rcu_read_lock();
 	if ((tunnel = ipgre_tunnel_lookup(skb->dev,
 					  iph->saddr, iph->daddr, key,
 					  gre_proto))) {
@@ -647,13 +653,13 @@ static int ipgre_rcv(struct sk_buff *skb)
 		ipgre_ecn_decapsulate(iph, skb);
 
 		netif_rx(skb);
-		read_unlock(&ipgre_lock);
+		rcu_read_unlock();
 		return(0);
 	}
 	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
 
 drop:
-	read_unlock(&ipgre_lock);
+	rcu_read_unlock();
 drop_nolock:
 	kfree_skb(skb);
 	return(0);

^ permalink raw reply related

* Re: NOHZ: local_softirq_pending 08
From: Tilman Schmidt @ 2009-10-23 16:33 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil
In-Reply-To: <1256308311.12174.38.camel@johannes.local>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johannes Berg schrieb:
> On Fri, 2009-10-23 at 16:27 +0200, Tilman Schmidt wrote:
>> Johannes Berg schrieb:
>>> So you've verified that the entire i4l stack can cope with being called
>>> twice on the same CPU, from different contexts?
>> What makes you think so?
> 
> I thought I'd explained this in my other email. *sigh*
[snip]

Ah, I see. You misunderstood my posting. I did not propose that
patch as a definitive and verified solution, but rather as a
request for comments from the people who know and maintain the
code in question. I thought that was clear from the facts that
- - I didn't include "[PATCH]" in the subject line
- - I didn't add a "Signed-off-by" line
- - I wrote "fixed the messages", not "solved the problem"
- - I explicitly wrote "Comments?" and "Adding i4l people to CC"

Apparently all that was still not clear enough. Sorry about that.
So let me try to make my concern as explicit as possible:

- - The patch I posted had the effect that the test which reliably
  triggered the local_softirq_pending message before did not do
  so anymore.

- - To me, this seems to indicate that the netif_rx(skb) call in
  line 1177 of source file drivers/isdn/i4l/isdn_ppp.c is indeed
  involved in the problem.

- - Now I'm asking people who know more than myself about the
  ramifications of that message (ie., you) and/or the code I
  narrowed it down to (ie., the ISDN4Linux maintainers - which
  is why I added them to the CC list) to have a look and determine
  how to fix the problem properly.

- - This would of course include, in finis, the verification you
  mistakenly assumed I might have done already.

I hope that's clear enough. If you have any questions, feel free
to ask.

Thanks,
Tilman

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFK4drJQ3+did9BuFsRAmstAJ94UF/LupINlYpjbxzz9xoiN5w34wCfflRz
YfR/fXt3HasrxUSP29REOnE=
=VQ/C
-----END PGP SIGNATURE-----

^ permalink raw reply

* [PATCH net-next-2.6] ip6tnl: convert hash tables locking to RCU
From: Eric Dumazet @ 2009-10-23 16:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

ip6_tunnels use one rwlock to protect their hash tables.

This locking scheme can be converted to RCU for free, since netdevice
already must wait for a RCU grace period at dismantle time.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv6/ip6_tunnel.c |   44 ++++++++++++++++++++++------------------
 1 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index c595bbe..670c291 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -88,8 +88,10 @@ struct ip6_tnl_net {
 	struct ip6_tnl **tnls[2];
 };
 
-/* lock for the tunnel lists */
-static DEFINE_RWLOCK(ip6_tnl_lock);
+/*
+ * Locking : hash tables are protected by RCU and a spinlock
+ */
+static DEFINE_SPINLOCK(ip6_tnl_lock);
 
 static inline struct dst_entry *ip6_tnl_dst_check(struct ip6_tnl *t)
 {
@@ -130,6 +132,9 @@ static inline void ip6_tnl_dst_store(struct ip6_tnl *t, struct dst_entry *dst)
  *   else %NULL
  **/
 
+#define for_each_ip6_tunnel_rcu(start) \
+	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
+
 static struct ip6_tnl *
 ip6_tnl_lookup(struct net *net, struct in6_addr *remote, struct in6_addr *local)
 {
@@ -138,13 +143,14 @@ ip6_tnl_lookup(struct net *net, struct in6_addr *remote, struct in6_addr *local)
 	struct ip6_tnl *t;
 	struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
 
-	for (t = ip6n->tnls_r_l[h0 ^ h1]; t; t = t->next) {
+	for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[h0 ^ h1]) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
 		    ipv6_addr_equal(remote, &t->parms.raddr) &&
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
-	if ((t = ip6n->tnls_wc[0]) != NULL && (t->dev->flags & IFF_UP))
+	t = rcu_dereference(ip6n->tnls_wc[0]);
+	if (t && (t->dev->flags & IFF_UP))
 		return t;
 
 	return NULL;
@@ -186,10 +192,10 @@ ip6_tnl_link(struct ip6_tnl_net *ip6n, struct ip6_tnl *t)
 {
 	struct ip6_tnl **tp = ip6_tnl_bucket(ip6n, &t->parms);
 
+	spin_lock_bh(&ip6_tnl_lock);
 	t->next = *tp;
-	write_lock_bh(&ip6_tnl_lock);
-	*tp = t;
-	write_unlock_bh(&ip6_tnl_lock);
+	rcu_assign_pointer(*tp, t);
+	spin_unlock_bh(&ip6_tnl_lock);
 }
 
 /**
@@ -204,9 +210,9 @@ ip6_tnl_unlink(struct ip6_tnl_net *ip6n, struct ip6_tnl *t)
 
 	for (tp = ip6_tnl_bucket(ip6n, &t->parms); *tp; tp = &(*tp)->next) {
 		if (t == *tp) {
-			write_lock_bh(&ip6_tnl_lock);
+			spin_lock_bh(&ip6_tnl_lock);
 			*tp = t->next;
-			write_unlock_bh(&ip6_tnl_lock);
+			spin_unlock_bh(&ip6_tnl_lock);
 			break;
 		}
 	}
@@ -313,9 +319,9 @@ ip6_tnl_dev_uninit(struct net_device *dev)
 	struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
 
 	if (dev == ip6n->fb_tnl_dev) {
-		write_lock_bh(&ip6_tnl_lock);
+		spin_lock_bh(&ip6_tnl_lock);
 		ip6n->tnls_wc[0] = NULL;
-		write_unlock_bh(&ip6_tnl_lock);
+		spin_unlock_bh(&ip6_tnl_lock);
 	} else {
 		ip6_tnl_unlink(ip6n, t);
 	}
@@ -409,7 +415,7 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt,
 	   in trouble since we might need the source address for further
 	   processing of the error. */
 
-	read_lock(&ip6_tnl_lock);
+	rcu_read_lock();
 	if ((t = ip6_tnl_lookup(dev_net(skb->dev), &ipv6h->daddr,
 					&ipv6h->saddr)) == NULL)
 		goto out;
@@ -482,7 +488,7 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt,
 	*msg = rel_msg;
 
 out:
-	read_unlock(&ip6_tnl_lock);
+	rcu_read_unlock();
 	return err;
 }
 
@@ -693,23 +699,23 @@ static int ip6_tnl_rcv(struct sk_buff *skb, __u16 protocol,
 	struct ip6_tnl *t;
 	struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 
-	read_lock(&ip6_tnl_lock);
+	rcu_read_lock();
 
 	if ((t = ip6_tnl_lookup(dev_net(skb->dev), &ipv6h->saddr,
 					&ipv6h->daddr)) != NULL) {
 		if (t->parms.proto != ipproto && t->parms.proto != 0) {
-			read_unlock(&ip6_tnl_lock);
+			rcu_read_unlock();
 			goto discard;
 		}
 
 		if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-			read_unlock(&ip6_tnl_lock);
+			rcu_read_unlock();
 			goto discard;
 		}
 
 		if (!ip6_tnl_rcv_ctl(t)) {
 			t->dev->stats.rx_dropped++;
-			read_unlock(&ip6_tnl_lock);
+			rcu_read_unlock();
 			goto discard;
 		}
 		secpath_reset(skb);
@@ -727,10 +733,10 @@ static int ip6_tnl_rcv(struct sk_buff *skb, __u16 protocol,
 		t->dev->stats.rx_packets++;
 		t->dev->stats.rx_bytes += skb->len;
 		netif_rx(skb);
-		read_unlock(&ip6_tnl_lock);
+		rcu_read_unlock();
 		return 0;
 	}
-	read_unlock(&ip6_tnl_lock);
+	rcu_read_unlock();
 	return 1;
 
 discard:

^ permalink raw reply related

* udp: break from the lookup when hitting the maximum score value
From: Lucian Adrian Grijincu @ 2009-10-23 16:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: opurdila, netdev

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

Before udp hashes were converted to rcu in
	udp: introduce struct udp_table and multiple spinlocks
	645ca708f936b2fbeb79e52d7823e3eb2c0905f8
we stopped searching in list upon hitting the maximum score value (which is 
9).

This got removed in the conversion to rcu.
I'm not sure whether this was intentional or it just slipped by.

As far as I understand it this does not interfere with the lockless rcu: there 
is another score check the result will have to pass and if it doesn't have a 
score of 9 (which will be the value of badness) we'll just restart the lookup.

Even if the node was deleted from the chain and reclaimed at a later time, if 
at the second score test we have value 9 again, we can still return with this 
result.

Am I missing something?

-- 
Lucian

[-- Attachment #2: score-9.patch --]
[-- Type: text/plain, Size: 643 bytes --]

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d0d436d..69464b9 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -294,6 +294,11 @@ begin:
 	sk_nulls_for_each_rcu(sk, node, &hslot->head) {
 		score = compute_score(sk, net, saddr, hnum, sport,
 				      daddr, dport, dif);
+		if (score == 9) {
+			result = sk;
+			badness = score;
+			goto skip_end_of_nulls_check;
+		}
 		if (score > badness) {
 			result = sk;
 			badness = score;
@@ -307,6 +312,7 @@ begin:
 	if (get_nulls_value(node) != hash)
 		goto begin;
 
+skip_end_of_nulls_check:
 	if (result) {
 		if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
 			result = NULL;

^ permalink raw reply related

* Re: [Patch] sctp: remove deprecated SCTP_GET_*_OLD stuffs
From: Sam Ravnborg @ 2009-10-23 16:39 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Amerigo Wang, linux-kernel, netdev, akpm
In-Reply-To: <4AE1D2AA.8060200@hp.com>

On Fri, Oct 23, 2009 at 11:58:34AM -0400, Vlad Yasevich wrote:
> 
> 
> Sam Ravnborg wrote:
> > On Thu, Oct 22, 2009 at 04:53:30PM -0400, Vlad Yasevich wrote:
> >>> diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
> >>> index be2334a..0991f1b 100644
> >>> --- a/include/net/sctp/user.h
> >>> +++ b/include/net/sctp/user.h
> >>> @@ -131,14 +131,6 @@ enum sctp_optname {
> >>>  #define SCTP_SOCKOPT_BINDX_REM	SCTP_SOCKOPT_BINDX_REM
> >>>  	SCTP_SOCKOPT_PEELOFF, 	/* peel off association. */
> >>>  #define SCTP_SOCKOPT_PEELOFF	SCTP_SOCKOPT_PEELOFF
> >>> -	SCTP_GET_PEER_ADDRS_NUM_OLD, 	/* Get number of peer addresss. */
> >>> -#define SCTP_GET_PEER_ADDRS_NUM_OLD	SCTP_GET_PEER_ADDRS_NUM_OLD
> >>> -	SCTP_GET_PEER_ADDRS_OLD, 	/* Get all peer addresss. */
> >>> -#define SCTP_GET_PEER_ADDRS_OLD	SCTP_GET_PEER_ADDRS_OLD
> >>> -	SCTP_GET_LOCAL_ADDRS_NUM_OLD, 	/* Get number of local addresss. */
> >>> -#define SCTP_GET_LOCAL_ADDRS_NUM_OLD	SCTP_GET_LOCAL_ADDRS_NUM_OLD
> >>> -	SCTP_GET_LOCAL_ADDRS_OLD, 	/* Get all local addresss. */
> >>> -#define SCTP_GET_LOCAL_ADDRS_OLD	SCTP_GET_LOCAL_ADDRS_OLD
> >>>  	SCTP_SOCKOPT_CONNECTX_OLD, /* CONNECTX old requests. */
> >> After running the regression suite against this patch I find that we can't
> >> remove the enum values.  Removing the enums changes the value for the remainder
> >> of the definitions and breaks binary compatibility for applications that use
> >> those trailing options.
> >>
> >> You should be ok with removing the #defines and actual code that uses them,
> >> but not the enums.  You can even rename the enums, but we must preserve
> >> numeric ordering.
> > 
> > If we really depend on the actual value of an enum as in this case,
> > then e should assign them direct to better document this.
> > 
> > 	Sam
> > 
> 
> I agree.  I have a patch that converts the enum to just a #define section that
> I'll apply on top of this removal patch and document the deletion.

If you keep the enum then you will have a have extras:
- a debugger will understand the symbols and display the correct value
- sparse may trigger a warning if you try to assign a non-valid value
  (a value which is not included in the enum)

But that may not matter much in this case - just wanted to highligt it.

	Sam

^ permalink raw reply

* Re: [PATCH 1/2] alchemy: add au1000-eth platform device
From: Sergei Shtylyov @ 2009-10-23 16:51 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Ralf Baechle, linux-mips, Manuel Lauss, David Miller, netdev
In-Reply-To: <200910171048.24962.florian@openwrt.org>

Hello.

Florian Fainelli wrote:

> Please find below and updated version, hopefully addressing most if not all
> of your comments.

    Thanks. I still have some comments on the code testing the NI2 bit. :-)

> --
> From: Florian Fainelli <florian@openwrt.org>
> Subject: [PATCH 1/2] alchemy: add au1000-eth platform device (v3)
> 
> This patch makes the board code register the au1000-eth
> platform device. The au1000-eth platform data can be
> overriden with the au1xxx_override_eth_cfg function
> like it has to be done for the Bosporus board which uses
> a different MAC/PHY setup.
> 
> Changes from v2:
> - declared the au1000-eth second driver instance platform_data
> - made the override function generic and pass it the port number too
> 
> Changes from v1:
> - remove per-board platform.c file
> - add an override function to pass custom eth0 platform_data PHY settings
> 
> Signed-off-by: Florian Fainelli <florian@openwrt.org>
> ---
> diff --git a/arch/mips/alchemy/common/platform.c b/arch/mips/alchemy/common/platform.c
> index 195e5b3..167b24e 100644
> --- a/arch/mips/alchemy/common/platform.c
> +++ b/arch/mips/alchemy/common/platform.c
[...]
>  static int __init au1xxx_platform_init(void)
> @@ -354,6 +438,12 @@ static int __init au1xxx_platform_init(void)
>  	for (i = 0; au1x00_uart_data[i].flags; i++)
>  		au1x00_uart_data[i].uartclk = uartclk;
>  
> +#ifndef CONFIG_SOC_AU1100
> +	/* Register second MAC if enabled in pinfunc */
> +	if (!(au_readl(SYS_PINFUNC) & (u32)(SYS_PF_NI2)) >> 4)

    Parens around SYS_PF_NI2 not needed. Shift not needed too.

> +		platform_device_register(&au1xxx_eth1_device);
> +#endif
> +
>  	return platform_add_devices(au1xxx_platform_devices,
>  				    ARRAY_SIZE(au1xxx_platform_devices));
>  }
> diff --git a/arch/mips/alchemy/devboards/db1x00/board_setup.c b/arch/mips/alchemy/devboards/db1x00/board_setup.c
> index 64eb26f..f938924 100644
> --- a/arch/mips/alchemy/devboards/db1x00/board_setup.c
> +++ b/arch/mips/alchemy/devboards/db1x00/board_setup.c
> @@ -32,6 +32,7 @@
>  #include <linux/interrupt.h>
>  
>  #include <asm/mach-au1x00/au1000.h>
> +#include <asm/mach-au1x00/au1xxx_eth.h>
>  #include <asm/mach-db1x00/db1x00.h>
>  #include <asm/mach-db1x00/bcsr.h>
>  
> @@ -101,6 +102,22 @@ void __init board_setup(void)
>  	printk(KERN_INFO "AMD Alchemy Au1100/Db1100 Board\n");
>  #endif
>  #ifdef CONFIG_MIPS_BOSPORUS
> +	struct au1000_eth_platform_data eth0_pdata;

    You can't declare data like that, amidst the code -- gcc will emit a 
warning which would be fatal with -Werror in Makefile. Do it inside a block 
instead.

> +
> +	/*
> +	 * Micrel/Kendin 5 port switch attached to MAC0,
> +	 * MAC0 is associated with PHY address 5 (== WAN port)
> +	 * MAC1 is not associated with any PHY, since it's connected directly
> +	 * to the switch.
> +	 * no interrupts are used
> +	 */
> +	eth0_pdata.phy1_search_mac0 = 0;
> +	eth0_pdata.phy_static_config = 1;
> +	eth0_pdata.phy_addr = 5;
> +	eth0_pdata.phy_busid = 0;

    Why noyt have an initializer instead (and why not make eth0_data 
*static* too)?

WBR, Sergei

^ permalink raw reply

* Re: [Patch] sctp: remove deprecated SCTP_GET_*_OLD stuffs
From: Vlad Yasevich @ 2009-10-23 16:57 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Amerigo Wang, linux-kernel, netdev, akpm
In-Reply-To: <20091023163938.GA3858@merkur.ravnborg.org>



Sam Ravnborg wrote:
> On Fri, Oct 23, 2009 at 11:58:34AM -0400, Vlad Yasevich wrote:
>>
>> Sam Ravnborg wrote:
>>> On Thu, Oct 22, 2009 at 04:53:30PM -0400, Vlad Yasevich wrote:
>>>>> diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
>>>>> index be2334a..0991f1b 100644
>>>>> --- a/include/net/sctp/user.h
>>>>> +++ b/include/net/sctp/user.h
>>>>> @@ -131,14 +131,6 @@ enum sctp_optname {
>>>>>  #define SCTP_SOCKOPT_BINDX_REM	SCTP_SOCKOPT_BINDX_REM
>>>>>  	SCTP_SOCKOPT_PEELOFF, 	/* peel off association. */
>>>>>  #define SCTP_SOCKOPT_PEELOFF	SCTP_SOCKOPT_PEELOFF
>>>>> -	SCTP_GET_PEER_ADDRS_NUM_OLD, 	/* Get number of peer addresss. */
>>>>> -#define SCTP_GET_PEER_ADDRS_NUM_OLD	SCTP_GET_PEER_ADDRS_NUM_OLD
>>>>> -	SCTP_GET_PEER_ADDRS_OLD, 	/* Get all peer addresss. */
>>>>> -#define SCTP_GET_PEER_ADDRS_OLD	SCTP_GET_PEER_ADDRS_OLD
>>>>> -	SCTP_GET_LOCAL_ADDRS_NUM_OLD, 	/* Get number of local addresss. */
>>>>> -#define SCTP_GET_LOCAL_ADDRS_NUM_OLD	SCTP_GET_LOCAL_ADDRS_NUM_OLD
>>>>> -	SCTP_GET_LOCAL_ADDRS_OLD, 	/* Get all local addresss. */
>>>>> -#define SCTP_GET_LOCAL_ADDRS_OLD	SCTP_GET_LOCAL_ADDRS_OLD
>>>>>  	SCTP_SOCKOPT_CONNECTX_OLD, /* CONNECTX old requests. */
>>>> After running the regression suite against this patch I find that we can't
>>>> remove the enum values.  Removing the enums changes the value for the remainder
>>>> of the definitions and breaks binary compatibility for applications that use
>>>> those trailing options.
>>>>
>>>> You should be ok with removing the #defines and actual code that uses them,
>>>> but not the enums.  You can even rename the enums, but we must preserve
>>>> numeric ordering.
>>> If we really depend on the actual value of an enum as in this case,
>>> then e should assign them direct to better document this.
>>>
>>> 	Sam
>>>
>> I agree.  I have a patch that converts the enum to just a #define section that
>> I'll apply on top of this removal patch and document the deletion.
> 
> If you keep the enum then you will have a have extras:
> - a debugger will understand the symbols and display the correct value
> - sparse may trigger a warning if you try to assign a non-valid value
>   (a value which is not included in the enum)
> 
> But that may not matter much in this case - just wanted to highligt it.

Yep, but this is just socket option definitions.  The enum type is not even used
anywhere, so no help from sparse.  I think it's here just for sequential numbering.

Thanks
-vlad
> 
> 	Sam
> 

^ permalink raw reply

* Re: udp: break from the lookup when hitting the maximum score value
From: Eric Dumazet @ 2009-10-23 16:57 UTC (permalink / raw)
  To: Lucian Adrian Grijincu; +Cc: opurdila, netdev
In-Reply-To: <200910231936.16022.lgrijincu@ixiacom.com>

Lucian Adrian Grijincu a écrit :
> Before udp hashes were converted to rcu in
> 	udp: introduce struct udp_table and multiple spinlocks
> 	645ca708f936b2fbeb79e52d7823e3eb2c0905f8
> we stopped searching in list upon hitting the maximum score value (which is 
> 9).
> 
> This got removed in the conversion to rcu.
> I'm not sure whether this was intentional or it just slipped by.
> 
> As far as I understand it this does not interfere with the lockless rcu: there 
> is another score check the result will have to pass and if it doesn't have a 
> score of 9 (which will be the value of badness) we'll just restart the lookup.
> 
> Even if the node was deleted from the chain and reclaimed at a later time, if 
> at the second score test we have value 9 again, we can still return with this 
> result.
> 
> Am I missing something?
> 
> 

This was intentional.

This never happens in practice and slowdown lookups.

(To reach score 9, your UDP socket must be connected, and bound to a device)
Most developpers dont even know UDP socket can be connected...

We added large hashtables in commit f86dcc5aa8c7908f2c287e7a211228df599e3e71
(udp: dynamically size hash tables at boot time), so average chain length should
be small anyway...




^ permalink raw reply

* Re: [PATCH 0/5] Candidate fix for increased number of GFP_ATOMIC failures V2
From: Karol Lewandowski @ 2009-10-23 16:58 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, David Rientjes, KOSAKI Motohiro, Mohamed Abbas,
	Jens Axboe, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Greg Kroah-Hartman,
	Stephan von Krawczynski, Kernel Testers List, netdev,
	linux-kernel, linux-mm@kvack.org
In-Reply-To: <1256221356-26049-1-git-send-email-mel@csn.ul.ie>

On Thu, Oct 22, 2009 at 03:22:31PM +0100, Mel Gorman wrote:

[Cut everything but my bug]
> [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100
> 	Karol Lewandows reported that e100 fails to allocate order-5
> 	GFP_ATOMIC when loading firmware during resume. This has started
> 	happening relatively recent.


> Test 1: Verify your problem occurs on 2.6.32-rc5 if you can

Yes, bug is still there.


> Test 2: Apply the following two patches and test again
> 
>   1/5 page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
>   2/5 page allocator: Do not allow interrupts to use ALLOC_HARDER
> 
> 
> 	These patches correct problems introduced by me during the 2.6.31-rc1
> 	merge window. The patches were not meant to introduce any functional
> 	changes but two were missed.
> 
> 	If your problem goes away with just these two patches applied,
> 	please tell me.

Likewise.


> Test 3: If you are getting allocation failures, try with the following patch
> 
>   3/5 vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
> 
> 	This is a functional change that causes kswapd to notice sooner
> 	when high-order watermarks have been hit. There have been a number
> 	of changes in page reclaim since 2.6.30 that might have delayed
> 	when kswapd kicks in for higher orders
> 
> 	If your problem goes away with these three patches applied, please
> 	tell me

No, problem doesn't go away with these patches (1+2+3).  However, from
my testing this particular patch makes it way, way harder to trigger
allocation failures (but these are still present).

This bothers me - should I test following patches with or without
above patch?  This patch makes bug harder to find, IMVHO it doesn't
fix the real problem.

(Rest not tested yet.)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Irq architecture for multi-core network driver.
From: Jesse Brandeburg @ 2009-10-23 17:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Daney, Chris Friesen, netdev, Linux Kernel Mailing List,
	linux-mips
In-Reply-To: <m13a5apmm0.fsf@fess.ebiederm.org>

On Fri, Oct 23, 2009 at 12:59 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> David Daney <ddaney@caviumnetworks.com> writes:
>> Certainly this is one mode of operation that should be supported, but I would
>> also like to be able to go for raw throughput and have as many cores as possible
>> reading from a single queue (like I currently have).
>
> I believe will detect false packet drops and ask for unnecessary
> retransmits if you have multiple cores processing a single queue,
> because you are processing the packets out of order.

So, the way the default linux kernel configures today's many core
server systems is to leave the affinity mask by default at 0xffffffff,
and most current Intel hardware based on 5000 (older core cpus), or
5500 chipset (used with Core i7 processors) that I have seen will
allow for round robin interrupts by default.  This kind of sucks for
the above unless you run irqbalance or set smp_affinity by hand.

Yes, I know Arjan and others will say you should always run
irqbalance, but some people don't and some distros don't ship it
enabled by default (or their version doesn't work for one reason or
another)  The question is should the kernel work better by default
*without* irqbalance loaded, or does it not matter?

I don't believe we should re-enable the kernel irq balancer, but
should we consider only setting a single bit in each new interrupt's
irq affinity?  Doing it with a random spread for the initial affinity
would be better than setting them all to one.

^ permalink raw reply

* Re: [PATCH] Remove bond_dev from xmit_hash_policy call.
From: Jay Vosburgh @ 2009-10-23 16:24 UTC (permalink / raw)
  To: Jasper Spaans; +Cc: Eric Dumazet, netdev@vger.kernel.org, David S. Miller
In-Reply-To: <20091023140924.GA24611@spaans.fox.local>

Jasper Spaans <spaans@fox-it.com> wrote:

>Now that the bonding device is no longer used in determining the device to
>which to send packets, it can be dropped from the argument list of the various
>xmit_hash_policy calls.
>
>Signed-off-by: Jasper Spaans <spaans@fox-it.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>


>---
> drivers/net/bonding/bond_3ad.c  |   11 +++++------
> drivers/net/bonding/bond_main.c |   11 ++++-------
> drivers/net/bonding/bonding.h   |    3 +--
> 3 files changed, 10 insertions(+), 15 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index c3fa31c..3cd8153 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -1956,7 +1956,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
> 	struct port *port, *prev_port, *temp_port;
> 	struct aggregator *aggregator, *new_aggregator, *temp_aggregator;
> 	int select_new_active_agg = 0;
>-	
>+
> 	// find the aggregator related to this slave
> 	aggregator = &(SLAVE_AD_INFO(slave).aggregator);
>
>@@ -2024,7 +2024,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
>
> 				// clear the aggregator
> 				ad_clear_agg(aggregator);
>-				
>+
> 				if (select_new_active_agg) {
> 					ad_agg_selection_logic(__get_first_agg(port));
> 				}
>@@ -2075,7 +2075,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
> 			}
> 		}
> 	}
>-	port->slave=NULL;	
>+	port->slave=NULL;
> }
>
> /**
>@@ -2301,7 +2301,7 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
> }
>
> /*
>- * set link state for bonding master: if we have an active 
>+ * set link state for bonding master: if we have an active
>  * aggregator, we're up, if not, we're down.  Presumes that we cannot
>  * have an active aggregator if there are no slaves with link up.
>  *
>@@ -2395,7 +2395,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
> 		goto out;
> 	}
>
>-	slave_agg_no = bond->xmit_hash_policy(skb, dev, slaves_in_agg);
>+	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>
> 	bond_for_each_slave(bond, slave, i) {
> 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>@@ -2468,4 +2468,3 @@ out:
>
> 	return ret;
> }
>-
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 3f05267..13058c5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3657,8 +3657,7 @@ void bond_unregister_arp(struct bonding *bond)
>  * Hash for the output device based upon layer 2 and layer 3 data. If
>  * the packet is not IP mimic bond_xmit_hash_policy_l2()
>  */
>-static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
>-				     struct net_device *bond_dev, int count)
>+static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
> {
> 	struct ethhdr *data = (struct ethhdr *)skb->data;
> 	struct iphdr *iph = ip_hdr(skb);
>@@ -3676,8 +3675,7 @@ static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
>  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
>  * altogether not IP, mimic bond_xmit_hash_policy_l2()
>  */
>-static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
>-				    struct net_device *bond_dev, int count)
>+static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
> {
> 	struct ethhdr *data = (struct ethhdr *)skb->data;
> 	struct iphdr *iph = ip_hdr(skb);
>@@ -3701,8 +3699,7 @@ static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
> /*
>  * Hash for the output device based upon layer 2 data
>  */
>-static int bond_xmit_hash_policy_l2(struct sk_buff *skb,
>-				   struct net_device *bond_dev, int count)
>+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
> {
> 	struct ethhdr *data = (struct ethhdr *)skb->data;
>
>@@ -4295,7 +4292,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
> 	if (!BOND_IS_OK(bond))
> 		goto out;
>
>-	slave_no = bond->xmit_hash_policy(skb, bond_dev, bond->slave_cnt);
>+	slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt);
>
> 	bond_for_each_slave(bond, slave, i) {
> 		slave_no--;
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 6824771..02e1f9e 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -204,7 +204,7 @@ struct bonding {
> #endif /* CONFIG_PROC_FS */
> 	struct   list_head bond_list;
> 	struct   dev_mc_list *mc_list;
>-	int      (*xmit_hash_policy)(struct sk_buff *, struct net_device *, int);
>+	int      (*xmit_hash_policy)(struct sk_buff *, int);
> 	__be32   master_ip;
> 	u16      flags;
> 	u16      rr_tx_counter;
>@@ -370,4 +370,3 @@ static inline void bond_unregister_ipv6_notifier(void)
> #endif
>
> #endif /* _LINUX_BONDING_H */
>-
>-- 
>1.6.0.4
>
>
>-- 
>Fox-IT Experts in IT Security!
>T: +31 (0) 15 284 79 99
>KvK Haaglanden 27301624
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] Modify bonding hash transmit policies to use the packet's source MAC address
From: Jay Vosburgh @ 2009-10-23 16:23 UTC (permalink / raw)
  To: Jasper Spaans; +Cc: Eric Dumazet, netdev@vger.kernel.org, David S. Miller
In-Reply-To: <20091023140845.GA24564@spaans.fox.local>

Jasper Spaans <spaans@fox-it.com> wrote:

>Modify bonding hash transmit policies to use the psource MAC address of
>the packet instead of the MAC address configured for the bonding device.
>
>The old sitation conflicts with the documentation.
>
>Signed-off-by: Jasper Spaans <spaans@fox-it.com>

	Looks good.

	Dave, please apply.  This may be suitable for -stable, as it's
pretty much a day one bug.

	-J

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

>---
> drivers/net/bonding/bond_main.c |    8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 69c5b15..3f05267 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3665,10 +3665,10 @@ static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
>
> 	if (skb->protocol == htons(ETH_P_IP)) {
> 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
>-			(data->h_dest[5] ^ bond_dev->dev_addr[5])) % count;
>+			(data->h_dest[5] ^ data->h_source[5])) % count;
> 	}
>
>-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
>+	return (data->h_dest[5] ^ data->h_source[5]) % count;
> }
>
> /*
>@@ -3695,7 +3695,7 @@ static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
>
> 	}
>
>-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
>+	return (data->h_dest[5] ^ data->h_source[5]) % count;
> }
>
> /*
>@@ -3706,7 +3706,7 @@ static int bond_xmit_hash_policy_l2(struct sk_buff *skb,
> {
> 	struct ethhdr *data = (struct ethhdr *)skb->data;
>
>-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
>+	return (data->h_dest[5] ^ data->h_source[5]) % count;
> }
>
> /*-------------------------- Device entry points ----------------------------*/
>-- 
>1.6.0.4
>
>
>-- 
>Fox-IT Experts in IT Security!
>T: +31 (0) 15 284 79 99
>KvK Haaglanden 27301624
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox