Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: Fix RPF to work with policy routing
From: jamal @ 2009-10-23 10:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, atis, eric.dumazet, zenczykowski
In-Reply-To: <20091022.214943.105371652.davem@davemloft.net>

On Thu, 2009-10-22 at 21:49 -0700, David Miller wrote:

> Such a change has a built-in assumption, I think, that
> marks are symmetric.

Only if the admin said they are symetric (by jumping
a few hoops).  In otherwise, I may intentionaly want
them to be symetric and use policy routing. I cant today.

> Just because we ended up with mark X on input doesn't mean
> that the reverse path route exists with mark X too.
>
> In fact I can't even see a valid way to specify a mark for
> the RPF lookup.

with the ipt or skbedit actions or via netfilter i could
set marks which could be as trivial as "set mark X if packet 
came in via eth0 or eth1 and mark Y if they came in via gre0"

> Maybe you can convince me otherwise :-)

Ok, let me try ;-> 

First let me say that it is _non-trivial_ for a packet to be 
policy-routing-RPF-dropped even with this patch. Youd have to do at
least 3 things to achieve that goal:
a) mark the packet on ingress 
b) have a  blackhole route in the policy routing table as the fall
through match and 
c) turn on rpf.

If someone goes that far to install policies, their intent could be
judged to mean they are serious about using RPF with policy routing.
If any of the above 3 conditions are not true then things continue to
work as is today.
IOW, if i set all those 3 conditions above then my expectation is the
system should not let in a packet if the policy routing table says no.
My intent is not to use the main table or some default route in the main
table; i really meant to use that policy routing table.

cheers,
jamal

^ permalink raw reply

* [RFC] [PATCH] cxgb3: Set the rxq
From: Krishna Kumar @ 2009-10-23 11:13 UTC (permalink / raw)
  To: davem; +Cc: netdev, divy, Krishna Kumar

From: Krishna Kumar <krkumar2@in.ibm.com>

Set the rxq# for LRO when processing the last fragment of a
frame. This helps in fast txq selection for routing workloads.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---

 drivers/net/cxgb3/sge.c |    1 +
 1 file changed, 1 insertion(+)

diff -ruNp org/drivers/net/cxgb3/sge.c new/drivers/net/cxgb3/sge.c
--- org/drivers/net/cxgb3/sge.c	2009-10-19 11:58:16.000000000 +0530
+++ new/drivers/net/cxgb3/sge.c	2009-10-20 09:40:40.000000000 +0530
@@ -2135,6 +2135,7 @@ static void lro_add_page(struct adapter 
 	if (!complete)
 		return;
 
+	skb_record_rx_queue(skb, qs - &adap->sge.qs[0]);
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
 	cpl = qs->lro_va;
 

^ permalink raw reply

* [RFC] [PATCH] udp: Don't save dst in udpv6_sendmsg()
From: Krishna Kumar @ 2009-10-23 11:13 UTC (permalink / raw)
  To: davem; +Cc: netdev, Krishna Kumar

From: Krishna Kumar <krkumar2@in.ibm.com>

Since ip6_datagram_connect saves the dst entry, it is not
required to do the same in every iteration of udpv6_sendmsg.
It also breaks the txq# caching, which then gets updated
every time in dev_pick_tx only to get reset again here.
Update dst only if ip6_sk_dst_lookup changed the dst entry.

Performance: I ran netperf UDPv6 RR to use connected sockets.
Tested with a 70 min run, aggregate of 5 netperf runs for
each result.

------------------------  UDPv6 RR Test  ---------------------
#procs  Org TPS     New TPS (%)       Org SD    New SD (%)
--------------------------------------------------------------
1       119031      118793 (-0.19)      82         77 (-5.23)
2       217572      218607 (0.47)      315        300 (-4.65)
4       258463      258823 (0.13)     1334       1248 (-6.44)
8       318018      319425 (0.44)     5735       5500 (-4.09)
10      395531      401529 (1.51)     9901       9507 (-3.98)
12      453319      453492 (0.03)    15473       15131 (-2.21)
--------------------------------------------------------------

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
---
 net/ipv6/udp.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -ruNp org/net/ipv6/udp.c new/net/ipv6/udp.c
--- org/net/ipv6/udp.c	2009-10-19 11:58:16.000000000 +0530
+++ new/net/ipv6/udp.c	2009-10-23 10:42:35.000000000 +0530
@@ -990,7 +990,8 @@ do_append_data:
 
 	if (dst) {
 		if (connected) {
-			ip6_dst_store(sk, dst,
+			if (__sk_dst_get(sk) != dst)
+				ip6_dst_store(sk, dst,
 				      ipv6_addr_equal(&fl.fl6_dst, &np->daddr) ?
 				      &np->daddr : NULL,
 #ifdef CONFIG_IPV6_SUBTREES

^ permalink raw reply

* Re: [PATCH 4/5] page allocator: Pre-emptively wake kswapd when high-order watermarks are hit
From: Mel Gorman @ 2009-10-23 11:25 UTC (permalink / raw)
  To: David Rientjes
  Cc: Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Rafael J. Wysocki, David Miller, Reinette Chatre,
	Kalle Valo, KOSAKI Motohiro, Mohamed Abbas, Jens Axboe,
	John W. Linville, Pekka Enberg, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
In-Reply-To: <alpine.DEB.2.00.0910230229010.28109-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>

On Fri, Oct 23, 2009 at 02:36:53AM -0700, David Rientjes wrote:
> On Fri, 23 Oct 2009, Mel Gorman wrote:
> 
> > > Hmm, is this really supposed to be added to __alloc_pages_high_priority()?  
> > > By the patch description I was expecting kswapd to be woken up 
> > > preemptively whenever the preferred zone is below ALLOC_WMARK_LOW and 
> > > we're known to have just allocated at a higher order, not just when 
> > > current was oom killed (when we should already be freeing a _lot_ of 
> > > memory soon) or is doing a higher order allocation during direct reclaim.
> > > 
> > 
> > It was a somewhat arbitrary choice to have it trigger in the event high
> > priority allocations were happening frequently.
> > 
> 
> I don't quite understand, users of PF_MEMALLOC shouldn't be doing these 
> higher order allocations and if ALLOC_NO_WATERMARKS is by way of the oom 
> killer, we should be freeing a substantial amount of memory imminently 
> when it exits that waking up kswapd would be irrelevant.
> 

I agree. I think it's highly unlikely this patch will make any
difference but I wanted to eliminate it as a possibility. Patch 3 and 4
were previously one patch that were tested together.

> > > If this is moved to the fastpath, why is this wake_all_kswapd() and not
> > > wakeup_kswapd(preferred_zone, order)?  Do we need to kick kswapd in all 
> > > zones even though they may be free just because preferred_zone is now 
> > > below the watermark?
> > > 
> > 
> > It probably makes no difference as zones are checked for their watermarks
> > before any real work happens. However, even if this patch makes a difference,
> > I don't want to see it merged.  At best, it is an extremely heavy-handed
> > hack which is why I asked for it to be tested in isolation. It shouldn't
> > be necessary at all because sort of pre-emptive waking of kswapd was never
> > necessary before.
> > 
> 
> Ahh, that makes a ton more sense: this particular patch is a debugging 
> effort while the first two are candidates for 2.6.32 and -stable.  Gotcha.
> 

Yep.

> > > Wouldn't it be better to do this on page_zone(page) instead of 
> > > preferred_zone anyway?
> > > 
> > 
> > No. The preferred_zone is the zone we should be allocating from. If we
> > failed to allocate from it, it implies the watermarks are not being met
> > so we want to wake it.
> > 
> 
> Oops, I'm even more confused now :)  I thought the existing 
> wake_all_kswapd() in the slowpath was doing that and that this patch was 
> waking them prematurely because it speculates that a subsequent high 
> order allocation will fail unless memory is reclaimed. 


It should be doing that. This patch should be junk but because it was tested
previously, I needed to be sure it was actually junk.

> I thought we'd  
> want to reclaim from the zone we just did a high order allocation from so 
> that the fastpath could find the memory next time with ALLOC_WMARK_LOW.

The fastpath should be getting the pages it needs from the
preferred_zone. If it's not, we still want to get pages back in that
zone and the zone we actually ended up getting pages from.

It's probably best to ignore this patch except in the unlikely event Tobias
says it makes a difference to his testing. I'm hoping he's covered by patches
1+2 and maybe 3 and that patches 4 and 5 of this set get consigned to
the bit bucket.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: [PATCH 4/5] page allocator: Pre-emptively wake kswapd when high-order watermarks are hit
From: Tobias Oetiker @ 2009-10-23 11:31 UTC (permalink / raw)
  To: Mel Gorman
  Cc: David Rientjes, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Rafael J. Wysocki, David Miller,
	Reinette Chatre, Kalle Valo, KOSAKI Motohiro, Mohamed Abbas,
	Jens Axboe, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Greg Kroah-Hartman,
	Stephan von Krawczynski, Kernel Testers List, netdev,
	linux-kernel, "linux-mm@kvack.org" <linu
In-Reply-To: <20091023112512.GW11778@csn.ul.ie>

Mel,

Today Mel Gorman wrote:

> On Fri, Oct 23, 2009 at 02:36:53AM -0700, David Rientjes wrote:
> > On Fri, 23 Oct 2009, Mel Gorman wrote:
> >
> > > > Hmm, is this really supposed to be added to __alloc_pages_high_priority()?
> > > > By the patch description I was expecting kswapd to be woken up
> > > > preemptively whenever the preferred zone is below ALLOC_WMARK_LOW and
> > > > we're known to have just allocated at a higher order, not just when
> > > > current was oom killed (when we should already be freeing a _lot_ of
> > > > memory soon) or is doing a higher order allocation during direct reclaim.
> > > >
> > >
> > > It was a somewhat arbitrary choice to have it trigger in the event high
> > > priority allocations were happening frequently.
> > >
> >
> > I don't quite understand, users of PF_MEMALLOC shouldn't be doing these
> > higher order allocations and if ALLOC_NO_WATERMARKS is by way of the oom
> > killer, we should be freeing a substantial amount of memory imminently
> > when it exits that waking up kswapd would be irrelevant.
> >
>
> I agree. I think it's highly unlikely this patch will make any
> difference but I wanted to eliminate it as a possibility. Patch 3 and 4
> were previously one patch that were tested together.

hi hi ... I have tested '3 only' this morning, and the allocation
problems started again ... so for me 3 alone does not work while
3+4 does.

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: bridging + load balancing bonding
From: Jasper Spaans @ 2009-10-23 11:45 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev
In-Reply-To: <347.1256232960@death.nxdomain.ibm.com>

On Thu, Oct 22, 2009 at 07:36:00PM +0200, Jay Vosburgh wrote:
> 	By "packets from one flow" do you really mean that packets from
> a given "flow" (TCP connection, UDP "stream", etc) are not always
> delivered to the same bonding port?  I.e., that two packets from the
> same "flow" will be delivered to different ports?  I'm not sure how
> that's possible unless the source MAC in the packets changes during the
> course of the flow.
> 
> 	Or is your problem really that the balance algorithm on the
> bonding send side doesn't match the algorithm used on the other side of
> the IDS machines coming the other direction (and, thus, packets for a
> given flow going in one direction end up at a different IDS than the
> packets going the other direction)?

It's the second problem: traffic in one direction ends up at another
node than traffic in the other direction, even if between the same pair of
machines.

> 	Locally generated packets do, but he's got a bridge in there, so
> the traffic they're balancing is presumably not locally generated (i.e.,
> is being forwarded by the bridge, in which case they'll still bear the
> source MAC of the originating node on the subnet).  If the packets were
> being routed instead of bridged, then, yah, they'd have the bond's
> source MAC.

And in case of routing, the destination MAC will also be modified.. so worst
case (all traffic goes to one node), no balancing will happen. This also
affects non-IDS use of load balancing, I guess.

> >So your solution might be the right fix...
> 
> 	Yes, I think he's found a legitimate bug, one that only will
> manifest when balancing bridged traffic.  I had to think for a minute if
> this change would break anything, and I'm coming up empty.  Locally
> generated or routed traffic won't see a change, and bridged traffic will
> be correctly balanced according to the "source MAC XOR destination MAC"
> forumla described in the documentation.

I'll post a patch shortly.

Jasper
-- 
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624

^ permalink raw reply

* Re: PATCH 23/10]Optimize the upload speed for PPP connection.
From: William Allen Simpson @ 2009-10-23 11:46 UTC (permalink / raw)
  To: fangxiaozhi 00110321; +Cc: netdev, linux-kernel, zihan, greg, haegar
In-Reply-To: <fae097f21b27.1b27fae097f2@huawei.com>

fangxiaozhi 00110321 wrote:
> From: fangxiaozhi <huananhu@huawei.com>
> 1. This patch is based on the kernel of 2.6.32-rc4 
> 2. In this patch, we enlarge the out buffer size to optimize the upload speed for the ppp connection. Then it can support the upload of HSUPA data cards.
> Signed-off-by: fangxiaozhi <huananhu@huawei.com>
> -----------------------------------------------------------------------------------------
> --- a/drivers/net/ppp_async.c	2009-10-12 05:43:56.000000000 +0800
> +++ b/drivers/net/ppp_async.c	2009-10-15 16:29:56.000000000 +0800
> @@ -36,7 +36,7 @@
>  
>  #define PPP_VERSION	"2.4.2"
>  
> -#define OBUFSIZE	256
> +#define OBUFSIZE	2048
>  
>  /* Structure for storing local state. */
>  struct asyncppp {
> 
Concur.  I'd go further than that, my code usually made room for at least
a full MTU (MRU) with HDLC escaping.  To minimize context switches, that
should be 3014 ((1500 MRU + 2 FCS + 4 header) * 2 escapes + 2 flags).

Even in the old days, when memory was tight, context switches and interrupt
time were more expensive, too.  PPP is supposed to scale to OC-192.

^ permalink raw reply

* Re: [PATCH 23/10] Optimize the upload speed for PPP connection.
From: William Allen Simpson @ 2009-10-23 11:51 UTC (permalink / raw)
  To: fangxiaozhi 00110321; +Cc: netdev, kernel, zihan, greg, haegar
In-Reply-To: <f98f8abb3f92.3f92f98f8abb@huawei.com>

fangxiaozhi 00110321 wrote:
> From: fangxiaozhi <huananhu@huawei.com>
> 1. This patch is based on the kernel of 2.6.32-rc4 
> 2. In this patch, we enlarge the out buffer size to optimize the upload speed for the ppp connection. Then it can support the upload of HSUPA data cards.
> Signed-off-by: fangxiaozhi <huananhu@huawei.com>
> -----------------------------------------------------------------------------------------
> --- a/drivers/net/ppp_async.c	2009-10-12 05:43:56.000000000 +0800
> +++ b/drivers/net/ppp_async.c	2009-10-15 16:29:56.000000000 +0800
> @@ -36,7 +36,7 @@
>  
>  #define PPP_VERSION	"2.4.2"
>  
> -#define OBUFSIZE	256
> +#define OBUFSIZE	2048
>  
>  /* Structure for storing local state. */
>  struct asyncppp {
> 
Repeat.  I'd go further than that, my code usually made room for at least
a full MTU (MRU) with HDLC escaping.  To minimize context switches, that
should be 3014 ((1500 MRU + 2 FCS + 4 header) * 2 escapes + 2 flags).

Even in the old days, when memory was tight, context switches and interrupt
time were more expensive, too.  PPP is supposed to scale to OC-192.

^ permalink raw reply

* [PATCH] Modify bonding hash transmit policies to use the packet's source MAC address
From: Jasper Spaans @ 2009-10-23 11:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <20091023114511.GA537@spaans.fox.local>

Modify bonding hash transmit policies to use the psource MAC address of
the packet instead of the MAC address configured for the bonding device.

The old sitation conflicts with the documentation.
---
 drivers/net/bonding/bond_main.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 69c5b15..b140b52 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3668,7 +3668,7 @@ static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
 			(data->h_dest[5] ^ bond_dev->dev_addr[5])) % count;
 	}
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*
@@ -3695,7 +3695,7 @@ static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
 
 	}
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*
@@ -3706,7 +3706,7 @@ static int bond_xmit_hash_policy_l2(struct sk_buff *skb,
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*-------------------------- Device entry points ----------------------------*/
-- 
1.6.0.4


-- 
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624

^ permalink raw reply related

* Re: [PATCH] Modify bonding hash transmit policies to use the packet's source MAC address
From: Eric Dumazet @ 2009-10-23 12:37 UTC (permalink / raw)
  To: Jasper Spaans; +Cc: netdev@vger.kernel.org
In-Reply-To: <20091023115834.GA2216@spaans.fox.local>

Jasper Spaans a écrit :
> Modify bonding hash transmit policies to use the psource MAC address of
> the packet instead of the MAC address configured for the bonding device.
> 
> The old sitation conflicts with the documentation.
> ---
>  drivers/net/bonding/bond_main.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 69c5b15..b140b52 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3668,7 +3668,7 @@ static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
>  			(data->h_dest[5] ^ bond_dev->dev_addr[5])) % count;

You forgot one bond_dev->dev_addr[5] occurrence here


>  	}
>  
> -	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
> +	return (data->h_dest[5] ^ data->h_source[5]) % count;
>  }

Could you check if bond->xmit_hash_policy(skb, bond_dev, bond->slave_cnt);
could be replaced by bond->xmit_hash_policy(skb, bond->slave_cnt),
now that various implementations dont need bond_dev anymore ?

static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count) ...

Dont forget your "Signed-off-by: Jasper Spaans <spaans@fox-it.com>",
copied to "David S. Miller" <davem@davemloft.net>  and "Jay Vosburgh" <fubar@us.ibm.com>
(respectively network and bonding maintainers)

Thanks

^ permalink raw reply

* Re: NOHZ: local_softirq_pending 08
From: Johannes Berg @ 2009-10-23 13:34 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil
In-Reply-To: <4AE0ECCE.2020407@imap.cc>

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

On Fri, 2009-10-23 at 01:37 +0200, Tilman Schmidt wrote:

> --- a/drivers/isdn/i4l/isdn_ppp.c
> +++ b/drivers/isdn/i4l/isdn_ppp.c
> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
>  #endif /* CONFIG_IPPP_FILTER */
>  	skb->dev = dev;
>  	skb_reset_mac_header(skb);
> -	netif_rx(skb);
> +	if (in_interrupt())
> +		netif_rx(skb);
> +	else
> +		netif_rx_ni(skb);

So you've verified that the entire i4l stack can cope with being called
twice on the same CPU, from different contexts?

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH 4/5] page allocator: Pre-emptively wake kswapd when high-order watermarks are hit
From: Mel Gorman @ 2009-10-23 13:39 UTC (permalink / raw)
  To: Tobias Oetiker
  Cc: David Rientjes, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Rafael J. Wysocki, David Miller,
	Reinette Chatre, Kalle Valo, KOSAKI Motohiro, Mohamed Abbas,
	Jens Axboe, John W. Linville, Pekka Enberg,
	Bartlomiej Zolnierkiewicz, Greg Kroah-Hartman,
	Stephan von Krawczynski, Kernel Testers List,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
In-Reply-To: <alpine.DEB.2.00.0910231329550.26462-EjsAmf5DE5zIvOfxy3zmAzgUDZmNtoG9@public.gmane.org>

On Fri, Oct 23, 2009 at 01:31:10PM +0200, Tobias Oetiker wrote:
> Mel,
> 
> Today Mel Gorman wrote:
> 
> > On Fri, Oct 23, 2009 at 02:36:53AM -0700, David Rientjes wrote:
> > > On Fri, 23 Oct 2009, Mel Gorman wrote:
> > >
> > > > > Hmm, is this really supposed to be added to __alloc_pages_high_priority()?
> > > > > By the patch description I was expecting kswapd to be woken up
> > > > > preemptively whenever the preferred zone is below ALLOC_WMARK_LOW and
> > > > > we're known to have just allocated at a higher order, not just when
> > > > > current was oom killed (when we should already be freeing a _lot_ of
> > > > > memory soon) or is doing a higher order allocation during direct reclaim.
> > > > >
> > > >
> > > > It was a somewhat arbitrary choice to have it trigger in the event high
> > > > priority allocations were happening frequently.
> > > >
> > >
> > > I don't quite understand, users of PF_MEMALLOC shouldn't be doing these
> > > higher order allocations and if ALLOC_NO_WATERMARKS is by way of the oom
> > > killer, we should be freeing a substantial amount of memory imminently
> > > when it exits that waking up kswapd would be irrelevant.
> > >
> >
> > I agree. I think it's highly unlikely this patch will make any
> > difference but I wanted to eliminate it as a possibility. Patch 3 and 4
> > were previously one patch that were tested together.
> 
> hi hi ... I have tested '3 only' this morning, and the allocation
> problems started again ... so for me 3 alone does not work while
> 3+4 does.
> 

Hi,

What was the outcome of 1+2?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* [PATCH] Modify bonding hash transmit policies to use the packet's source MAC address
From: Jasper Spaans @ 2009-10-23 14:08 UTC (permalink / raw)
  To: Jay Vosburgh, Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <4AE1A37F.8010203@gmail.com>

Modify bonding hash transmit policies to use the psource MAC address of
the packet instead of the MAC address configured for the bonding device.

The old sitation conflicts with the documentation.

Signed-off-by: Jasper Spaans <spaans@fox-it.com>
---
 drivers/net/bonding/bond_main.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 69c5b15..3f05267 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3665,10 +3665,10 @@ static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
 
 	if (skb->protocol == htons(ETH_P_IP)) {
 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
-			(data->h_dest[5] ^ bond_dev->dev_addr[5])) % count;
+			(data->h_dest[5] ^ data->h_source[5])) % count;
 	}
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*
@@ -3695,7 +3695,7 @@ static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
 
 	}
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*
@@ -3706,7 +3706,7 @@ static int bond_xmit_hash_policy_l2(struct sk_buff *skb,
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
-	return (data->h_dest[5] ^ bond_dev->dev_addr[5]) % count;
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
 /*-------------------------- Device entry points ----------------------------*/
-- 
1.6.0.4


-- 
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624

^ permalink raw reply related

* [PATCH] Remove bond_dev from xmit_hash_policy call.
From: Jasper Spaans @ 2009-10-23 14:09 UTC (permalink / raw)
  To: Jay Vosburgh, Eric Dumazet; +Cc: netdev@vger.kernel.org
In-Reply-To: <4AE1A37F.8010203@gmail.com>

Now that the bonding device is no longer used in determining the device to
which to send packets, it can be dropped from the argument list of the various
xmit_hash_policy calls.

Signed-off-by: Jasper Spaans <spaans@fox-it.com>
---
 drivers/net/bonding/bond_3ad.c  |   11 +++++------
 drivers/net/bonding/bond_main.c |   11 ++++-------
 drivers/net/bonding/bonding.h   |    3 +--
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c3fa31c..3cd8153 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1956,7 +1956,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
 	struct port *port, *prev_port, *temp_port;
 	struct aggregator *aggregator, *new_aggregator, *temp_aggregator;
 	int select_new_active_agg = 0;
-	
+
 	// find the aggregator related to this slave
 	aggregator = &(SLAVE_AD_INFO(slave).aggregator);
 
@@ -2024,7 +2024,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
 
 				// clear the aggregator
 				ad_clear_agg(aggregator);
-				
+
 				if (select_new_active_agg) {
 					ad_agg_selection_logic(__get_first_agg(port));
 				}
@@ -2075,7 +2075,7 @@ void bond_3ad_unbind_slave(struct slave *slave)
 			}
 		}
 	}
-	port->slave=NULL;	
+	port->slave=NULL;
 }
 
 /**
@@ -2301,7 +2301,7 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
 }
 
 /*
- * set link state for bonding master: if we have an active 
+ * set link state for bonding master: if we have an active
  * aggregator, we're up, if not, we're down.  Presumes that we cannot
  * have an active aggregator if there are no slaves with link up.
  *
@@ -2395,7 +2395,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 		goto out;
 	}
 
-	slave_agg_no = bond->xmit_hash_policy(skb, dev, slaves_in_agg);
+	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
 
 	bond_for_each_slave(bond, slave, i) {
 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
@@ -2468,4 +2468,3 @@ out:
 
 	return ret;
 }
-
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3f05267..13058c5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3657,8 +3657,7 @@ void bond_unregister_arp(struct bonding *bond)
  * Hash for the output device based upon layer 2 and layer 3 data. If
  * the packet is not IP mimic bond_xmit_hash_policy_l2()
  */
-static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
-				     struct net_device *bond_dev, int count)
+static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 	struct iphdr *iph = ip_hdr(skb);
@@ -3676,8 +3675,7 @@ static int bond_xmit_hash_policy_l23(struct sk_buff *skb,
  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
  * altogether not IP, mimic bond_xmit_hash_policy_l2()
  */
-static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
-				    struct net_device *bond_dev, int count)
+static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 	struct iphdr *iph = ip_hdr(skb);
@@ -3701,8 +3699,7 @@ static int bond_xmit_hash_policy_l34(struct sk_buff *skb,
 /*
  * Hash for the output device based upon layer 2 data
  */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb,
-				   struct net_device *bond_dev, int count)
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
@@ -4295,7 +4292,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
 	if (!BOND_IS_OK(bond))
 		goto out;
 
-	slave_no = bond->xmit_hash_policy(skb, bond_dev, bond->slave_cnt);
+	slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt);
 
 	bond_for_each_slave(bond, slave, i) {
 		slave_no--;
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 6824771..02e1f9e 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -204,7 +204,7 @@ struct bonding {
 #endif /* CONFIG_PROC_FS */
 	struct   list_head bond_list;
 	struct   dev_mc_list *mc_list;
-	int      (*xmit_hash_policy)(struct sk_buff *, struct net_device *, int);
+	int      (*xmit_hash_policy)(struct sk_buff *, int);
 	__be32   master_ip;
 	u16      flags;
 	u16      rr_tx_counter;
@@ -370,4 +370,3 @@ static inline void bond_unregister_ipv6_notifier(void)
 #endif
 
 #endif /* _LINUX_BONDING_H */
-
-- 
1.6.0.4


-- 
Fox-IT Experts in IT Security!
T: +31 (0) 15 284 79 99
KvK Haaglanden 27301624

^ permalink raw reply related

* Re: NOHZ: local_softirq_pending 08
From: Tilman Schmidt @ 2009-10-23 14:27 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil
In-Reply-To: <1256304869.12174.20.camel@johannes.local>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johannes Berg schrieb:
> On Fri, 2009-10-23 at 01:37 +0200, Tilman Schmidt wrote:
> 
>> --- a/drivers/isdn/i4l/isdn_ppp.c
>> +++ b/drivers/isdn/i4l/isdn_ppp.c
>> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
>>  #endif /* CONFIG_IPPP_FILTER */
>>  	skb->dev = dev;
>>  	skb_reset_mac_header(skb);
>> -	netif_rx(skb);
>> +	if (in_interrupt())
>> +		netif_rx(skb);
>> +	else
>> +		netif_rx_ni(skb);
> 
> So you've verified that the entire i4l stack can cope with being called
> twice on the same CPU, from different contexts?

What makes you think so?
Better yet, what do you propose?

Thanks,
Tilman

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFK4b09Q3+did9BuFsRAqBvAKCbRI0iXQEyK3ztxkGHcqpbcceqbACgkagX
JF7nYd152ihp2uemIs/cB54=
=YOin
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: NOHZ: local_softirq_pending 08
From: Johannes Berg @ 2009-10-23 14:31 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, isdn4linux,
	i4ldeveloper, Karsten Keil
In-Reply-To: <4AE1BD3D.3020007@imap.cc>

[-- Attachment #1: Type: text/plain, Size: 1369 bytes --]

On Fri, 2009-10-23 at 16:27 +0200, Tilman Schmidt wrote:

> >> --- a/drivers/isdn/i4l/isdn_ppp.c
> >> +++ b/drivers/isdn/i4l/isdn_ppp.c
> >> @@ -1174,7 +1174,10 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff
> >>  #endif /* CONFIG_IPPP_FILTER */
> >>  	skb->dev = dev;
> >>  	skb_reset_mac_header(skb);
> >> -	netif_rx(skb);
> >> +	if (in_interrupt())
> >> +		netif_rx(skb);
> >> +	else
> >> +		netif_rx_ni(skb);
> > 
> > So you've verified that the entire i4l stack can cope with being called
> > twice on the same CPU, from different contexts?
> 
> What makes you think so?

I thought I'd explained this in my other email. *sigh*

You're squelching a warning, which comes from the fact that you're
calling something that calls into netif_rx() with softirqs enabled. That
would seem to imply that potentially a softirq could at the same time
call into that code too.

Basically, what happens now is this:

disable softirqs
call into i4l/ppp
i4l/ppp code
call netif_rx()
more code
enable softirqs


You're changing it to

call into i4l/ppp
i4l/ppp code
call netif_rx_ni()
more code

so the entire chunks "i4l/ppp code" and "more code" are now no longer
protected against being interrupted by a softirq that runs the same
code, maybe for a different device, on the same CPU.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH] net: Adjust softirq raising in __napi_schedule
From: Tilman Schmidt @ 2009-10-23 14:39 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch,
	Oliver Hartkopp
In-Reply-To: <1256200021.12174.11.camel@johannes.local>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Johannes Berg schrieb:
> On Wed, 2009-10-21 at 23:39 +0200, Jarek Poplawski wrote:
> 
>> I'm not sure I can understand your question. This patch is mainly to
>> avoid using netif_rx()/netif_rx_ni() pair as a test of proper process
>> context handling; IMHO there're better tools for this (lockdep,
>> WARN_ON's).
> 
> I'm saying that it seems to me, as indicated by the API (and without
> proof otherwise that's how it is) the networking layer needs to have
> packets handed to it with softirqs disabled.

Strange. Then what are the two separate functions netif_rx() and
netif_rx_ni() for?

> This really should be obvious. You're fixing the warning at the source
> of the warning, rather than the source of the problem.

Good idea. So please do tell us where the source of the problem is.

Thanks,
Tilman

- --
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFK4cALQ3+did9BuFsRAnW8AKCP4ey+gT2RZBYpzx91PaXd11A/PwCgh35g
fhEbJs++1BRIQ3encV8fIm4=
=SSaA
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH] net: Adjust softirq raising in __napi_schedule
From: Johannes Berg @ 2009-10-23 14:46 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: Jarek Poplawski, David Miller, hidave.darkstar, linux-kernel,
	tglx, linux-wireless, linux-ppp, netdev, paulus, Michael Buesch,
	Oliver Hartkopp
In-Reply-To: <4AE1C00B.5010008@imap.cc>

[-- Attachment #1: Type: text/plain, Size: 505 bytes --]

On Fri, 2009-10-23 at 16:39 +0200, Tilman Schmidt wrote:

> Strange. Then what are the two separate functions netif_rx() and
> netif_rx_ni() for?

netif_rx_ni() disables preemption.

> > This really should be obvious. You're fixing the warning at the source
> > of the warning, rather than the source of the problem.
> 
> Good idea. So please do tell us where the source of the problem is.

You use netif_rx_ni() instead of netif_rx() at whatever place that
causes this problem.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
From: Cyrill Gorcunov @ 2009-10-23 15:18 UTC (permalink / raw)
  To: netdev
  Cc: Denys Fedoryschenko, Michal Ostrowski, Eric Dumazet, linux-ppp,
	paulus, mostrows
In-Reply-To: <20091020190821.GO5181@lenovo>

[Cyrill Gorcunov - Tue, Oct 20, 2009 at 11:08:21PM +0400]
...
| 
| Just to update status of the issue. The key moment is that pppoe_flush_dev
| may be called asynchronously (especially via sysfs on dev entry, for example
| we retrieve mtu of device and while we at it other process may update mtu
| via sysfs). So I'm returning pppoe_hash_lock back which should eliminate a
| number of lock complains and make locking scheme easier. All-in-one: I'm
| working on it. Just need some more time.
| 
| 	-- Cyrill

Another status update -- the patch is under testing stage. Hope we will
reveal proper patch soon (in a few days I guess).

	-- Cyrill

^ permalink raw reply

* [PATCH] net: Cleanup redundant tests on unsigned
From: Roel Kluin @ 2009-10-23 15:21 UTC (permalink / raw)
  To: David S. Miller, netdev, Henner Eisen, linux-x25, Andrew Morton

If there is data, the unsigned skb->len is greater than 0.

rt.sigdigits is unsigned as well, so the test `>= 0' is
always true, the other part of the test catches wrapped
values.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
 net/x25/x25_in.c    |    2 +-
 net/x25/x25_route.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/x25/x25_in.c b/net/x25/x25_in.c
index 7d7c3ab..96d9227 100644
--- a/net/x25/x25_in.c
+++ b/net/x25/x25_in.c
@@ -114,7 +114,7 @@ static int x25_state1_machine(struct sock *sk, struct sk_buff *skb, int frametyp
 			/*
 			 *	Copy any Call User Data.
 			 */
-			if (skb->len >= 0) {
+			if (skb->len > 0) {
 				skb_copy_from_linear_data(skb,
 					      x25->calluserdata.cuddata,
 					      skb->len);
diff --git a/net/x25/x25_route.c b/net/x25/x25_route.c
index 2c999cc..66961ea 100644
--- a/net/x25/x25_route.c
+++ b/net/x25/x25_route.c
@@ -190,7 +190,7 @@ int x25_route_ioctl(unsigned int cmd, void __user *arg)
 		goto out;
 
 	rc = -EINVAL;
-	if (rt.sigdigits < 0 || rt.sigdigits > 15)
+	if (rt.sigdigits > 15)
 		goto out;
 
 	dev = x25_dev_get(rt.device);

^ permalink raw reply related

* Re: [PATCH] net: Fix RPF to work with policy routing
From: Ben Greear @ 2009-10-23 15:34 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, netdev, atis, eric.dumazet, zenczykowski
In-Reply-To: <1256295075.6264.59.camel@dogo.mojatatu.com>

jamal wrote:
> with the ipt or skbedit actions or via netfilter i could
> set marks which could be as trivial as "set mark X if packet 
> came in via eth0 or eth1 and mark Y if they came in via gre0"
>   
I implemented something similar while allowing for virtual router like
applications.  I had to add a mark very early in the pkt rx logic in dev.c,
and had to add a 'skb_default_mark' member to the netdevice because
the route lookup is done before the normal iptables logic ran.  Without
this, if a flow already existed for pkts coming in eth1, if the packet came
back in eth2, it would use eth1's flow.

I'll dig out the patch if anyone is interested...

Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply

* [PATCH net-next-2.6] ipip: convert hash tables locking to RCU
From: Eric Dumazet @ 2009-10-23 15:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

IPIP tunnels use one rwlock to protect their hash tables.

This locking scheme can be converted to RCU for free, since netdevice
already must wait for a RCU grace period at dismantle time.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ipip.c |   49 ++++++++++++++++++++++++++--------------------
 1 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 6a55392..3bd6998 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -134,7 +134,13 @@ static void ipip_fb_tunnel_init(struct net_device *dev);
 static void ipip_tunnel_init(struct net_device *dev);
 static void ipip_tunnel_setup(struct net_device *dev);
 
-static DEFINE_RWLOCK(ipip_lock);
+/*
+ * Locking : hash tables are protected by RCU and a spinlock
+ */
+static DEFINE_SPINLOCK(ipip_lock);
+
+#define for_each_ip_tunnel_rcu(start) \
+	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
 
 static struct ip_tunnel * ipip_tunnel_lookup(struct net *net,
 		__be32 remote, __be32 local)
@@ -144,20 +150,21 @@ static struct ip_tunnel * ipip_tunnel_lookup(struct net *net,
 	struct ip_tunnel *t;
 	struct ipip_net *ipn = net_generic(net, ipip_net_id);
 
-	for (t = ipn->tunnels_r_l[h0^h1]; t; t = t->next) {
+	for_each_ip_tunnel_rcu(ipn->tunnels_r_l[h0 ^ h1])
 		if (local == t->parms.iph.saddr &&
 		    remote == t->parms.iph.daddr && (t->dev->flags&IFF_UP))
 			return t;
-	}
-	for (t = ipn->tunnels_r[h0]; t; t = t->next) {
+
+	for_each_ip_tunnel_rcu(ipn->tunnels_r[h0])
 		if (remote == t->parms.iph.daddr && (t->dev->flags&IFF_UP))
 			return t;
-	}
-	for (t = ipn->tunnels_l[h1]; t; t = t->next) {
+
+	for_each_ip_tunnel_rcu(ipn->tunnels_l[h1])
 		if (local == t->parms.iph.saddr && (t->dev->flags&IFF_UP))
 			return t;
-	}
-	if ((t = ipn->tunnels_wc[0]) != NULL && (t->dev->flags&IFF_UP))
+
+	t = rcu_dereference(ipn->tunnels_wc[0]);
+	if (t && (t->dev->flags&IFF_UP))
 		return t;
 	return NULL;
 }
@@ -193,9 +200,9 @@ static void ipip_tunnel_unlink(struct ipip_net *ipn, struct ip_tunnel *t)
 
 	for (tp = ipip_bucket(ipn, t); *tp; tp = &(*tp)->next) {
 		if (t == *tp) {
-			write_lock_bh(&ipip_lock);
+			spin_lock_bh(&ipip_lock);
 			*tp = t->next;
-			write_unlock_bh(&ipip_lock);
+			spin_unlock_bh(&ipip_lock);
 			break;
 		}
 	}
@@ -205,10 +212,10 @@ static void ipip_tunnel_link(struct ipip_net *ipn, struct ip_tunnel *t)
 {
 	struct ip_tunnel **tp = ipip_bucket(ipn, t);
 
+	spin_lock_bh(&ipip_lock);
 	t->next = *tp;
-	write_lock_bh(&ipip_lock);
-	*tp = t;
-	write_unlock_bh(&ipip_lock);
+	rcu_assign_pointer(*tp, t);
+	spin_unlock_bh(&ipip_lock);
 }
 
 static struct ip_tunnel * ipip_tunnel_locate(struct net *net,
@@ -267,9 +274,9 @@ static void ipip_tunnel_uninit(struct net_device *dev)
 	struct ipip_net *ipn = net_generic(net, ipip_net_id);
 
 	if (dev == ipn->fb_tunnel_dev) {
-		write_lock_bh(&ipip_lock);
+		spin_lock_bh(&ipip_lock);
 		ipn->tunnels_wc[0] = NULL;
-		write_unlock_bh(&ipip_lock);
+		spin_unlock_bh(&ipip_lock);
 	} else
 		ipip_tunnel_unlink(ipn, netdev_priv(dev));
 	dev_put(dev);
@@ -318,7 +325,7 @@ static int ipip_err(struct sk_buff *skb, u32 info)
 
 	err = -ENOENT;
 
-	read_lock(&ipip_lock);
+	rcu_read_lock();
 	t = ipip_tunnel_lookup(dev_net(skb->dev), iph->daddr, iph->saddr);
 	if (t == NULL || t->parms.iph.daddr == 0)
 		goto out;
@@ -333,7 +340,7 @@ static int ipip_err(struct sk_buff *skb, u32 info)
 		t->err_count = 1;
 	t->err_time = jiffies;
 out:
-	read_unlock(&ipip_lock);
+	rcu_read_unlock();
 	return err;
 }
 
@@ -351,11 +358,11 @@ static int ipip_rcv(struct sk_buff *skb)
 	struct ip_tunnel *tunnel;
 	const struct iphdr *iph = ip_hdr(skb);
 
-	read_lock(&ipip_lock);
+	rcu_read_lock();
 	if ((tunnel = ipip_tunnel_lookup(dev_net(skb->dev),
 					iph->saddr, iph->daddr)) != NULL) {
 		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-			read_unlock(&ipip_lock);
+			rcu_read_unlock();
 			kfree_skb(skb);
 			return 0;
 		}
@@ -374,10 +381,10 @@ static int ipip_rcv(struct sk_buff *skb)
 		nf_reset(skb);
 		ipip_ecn_decapsulate(iph, skb);
 		netif_rx(skb);
-		read_unlock(&ipip_lock);
+		rcu_read_unlock();
 		return 0;
 	}
-	read_unlock(&ipip_lock);
+	rcu_read_unlock();
 
 	return -1;
 }

^ permalink raw reply related

* [PATCH] net: Cleanup redundant tests on unsigned
From: Roel Kluin @ 2009-10-23 15:59 UTC (permalink / raw)
  To: David S. Miller, netdev, Andrew Morton

optlen is unsigned so the `< 0' test is never true.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
 net/can/raw.c          |    2 --
 net/compat.c           |    3 ---
 net/ipv4/ip_sockglue.c |    2 +-
 3 files changed, 1 insertions(+), 6 deletions(-)

Or should the tests in net/can/raw.c or net/compat.c
be replaced by something else?

diff --git a/net/can/raw.c b/net/can/raw.c
index b5e8979..9e18c2a 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -424,8 +424,6 @@ static int raw_setsockopt(struct socket *sock, int level, int optname,
 
 	if (level != SOL_CAN_RAW)
 		return -EINVAL;
-	if (optlen < 0)
-		return -EINVAL;
 
 	switch (optname) {
 
diff --git a/net/compat.c b/net/compat.c
index a407c3a..e373995 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -390,9 +390,6 @@ asmlinkage long compat_sys_setsockopt(int fd, int level, int optname,
 	int err;
 	struct socket *sock;
 
-	if (optlen < 0)
-		return -EINVAL;
-
 	if ((sock = sockfd_lookup(fd, &err))!=NULL)
 	{
 		err = security_socket_setsockopt(sock,level,optname);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index e982b5c..15dbf30 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -480,7 +480,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 	case IP_OPTIONS:
 	{
 		struct ip_options *opt = NULL;
-		if (optlen > 40 || optlen < 0)
+		if (optlen > 40)
 			goto e_inval;
 		err = ip_options_get_from_user(sock_net(sk), &opt,
 					       optval, optlen);

^ permalink raw reply related

* Re: [Patch] sctp: remove deprecated SCTP_GET_*_OLD stuffs
From: Vlad Yasevich @ 2009-10-23 15:58 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Amerigo Wang, linux-kernel, netdev, akpm
In-Reply-To: <20091022214439.GA2635@merkur.ravnborg.org>



Sam Ravnborg wrote:
> On Thu, Oct 22, 2009 at 04:53:30PM -0400, Vlad Yasevich wrote:
>>> diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
>>> index be2334a..0991f1b 100644
>>> --- a/include/net/sctp/user.h
>>> +++ b/include/net/sctp/user.h
>>> @@ -131,14 +131,6 @@ enum sctp_optname {
>>>  #define SCTP_SOCKOPT_BINDX_REM	SCTP_SOCKOPT_BINDX_REM
>>>  	SCTP_SOCKOPT_PEELOFF, 	/* peel off association. */
>>>  #define SCTP_SOCKOPT_PEELOFF	SCTP_SOCKOPT_PEELOFF
>>> -	SCTP_GET_PEER_ADDRS_NUM_OLD, 	/* Get number of peer addresss. */
>>> -#define SCTP_GET_PEER_ADDRS_NUM_OLD	SCTP_GET_PEER_ADDRS_NUM_OLD
>>> -	SCTP_GET_PEER_ADDRS_OLD, 	/* Get all peer addresss. */
>>> -#define SCTP_GET_PEER_ADDRS_OLD	SCTP_GET_PEER_ADDRS_OLD
>>> -	SCTP_GET_LOCAL_ADDRS_NUM_OLD, 	/* Get number of local addresss. */
>>> -#define SCTP_GET_LOCAL_ADDRS_NUM_OLD	SCTP_GET_LOCAL_ADDRS_NUM_OLD
>>> -	SCTP_GET_LOCAL_ADDRS_OLD, 	/* Get all local addresss. */
>>> -#define SCTP_GET_LOCAL_ADDRS_OLD	SCTP_GET_LOCAL_ADDRS_OLD
>>>  	SCTP_SOCKOPT_CONNECTX_OLD, /* CONNECTX old requests. */
>> After running the regression suite against this patch I find that we can't
>> remove the enum values.  Removing the enums changes the value for the remainder
>> of the definitions and breaks binary compatibility for applications that use
>> those trailing options.
>>
>> You should be ok with removing the #defines and actual code that uses them,
>> but not the enums.  You can even rename the enums, but we must preserve
>> numeric ordering.
> 
> If we really depend on the actual value of an enum as in this case,
> then e should assign them direct to better document this.
> 
> 	Sam
> 

I agree.  I have a patch that converts the enum to just a #define section that
I'll apply on top of this removal patch and document the deletion.

-vlad

^ permalink raw reply

* [PATCH] atm: Cleanup redundant tests on unsigned
From: Roel Kluin @ 2009-10-23 16:09 UTC (permalink / raw)
  To: Chas Williams, netdev, Andrew Morton

The variables are unsigned so the `< 0' test always fails, the
other part of the test catches wrapped values.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
---
 drivers/atm/fore200e.c |    4 ++--
 drivers/atm/he.c       |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c
index f766cc4..bc53fed 100644
--- a/drivers/atm/fore200e.c
+++ b/drivers/atm/fore200e.c
@@ -2906,8 +2906,8 @@ fore200e_proc_read(struct atm_dev *dev, loff_t* pos, char* page)
 	u32 media_index    = FORE200E_MEDIA_INDEX(fore200e->bus->read(&fore200e->cp_queues->media_type));
 	u32 oc3_index;
 
-	if ((media_index < 0) || (media_index > 4))
-	    media_index = 5;
+	if (media_index > 4)
+		media_index = 5;
 	
 	switch (fore200e->loop_mode) {
 	    case ATM_LM_NONE:    oc3_index = 0;
diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index 7066703..e906658 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -2739,7 +2739,7 @@ he_ioctl(struct atm_dev *atm_dev, unsigned int cmd, void __user *arg)
 			spin_lock_irqsave(&he_dev->global_lock, flags);
 			switch (reg.type) {
 				case HE_REGTYPE_PCI:
-					if (reg.addr < 0 || reg.addr >= HE_REGMAP_SIZE) {
+					if (reg.addr >= HE_REGMAP_SIZE) {
 						err = -EINVAL;
 						break;
 					}

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox