Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/3] UCC TDM driver for MPC83xx platforms
From: Kim Phillips @ 2008-01-14 18:00 UTC (permalink / raw)
  To: Aggrwal Poonam, Andrew Morton
  Cc: sfr, rubini, linux-ppcdev, netdev, linux-kernel, Gala Kumar,
	Barkowski Michael, Kalra Ashish, Cutler Richard
In-Reply-To: <FBA61160C48B8D438F3323FEFB4EF2C26E7653@zin33exm24.fsl.freescale.net>

On Thu, 10 Jan 2008 21:41:20 -0700
"Aggrwal Poonam" <Poonam.Aggrwal@freescale.com> wrote:

> Hello  All
> 
> I am waiting for more feedback on the patches.
> 
> If there are no objections please consider them for 2.6.25.
> 
if this isn't going to go through Alessandro Rubini/misc drivers, can
it go through the akpm/mm tree?

Kim

^ permalink raw reply

* Re: [PATCH 9/9] fix sparse warnings
From: Robert Olsson @ 2008-01-14 17:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Robert Olsson, Stephen Hemminger, David Miller, Robert Olsson,
	netdev
In-Reply-To: <478B9D0B.5040301@cosmosbay.com>


Eric Dumazet writes:

 > > Thats 231173/241649 = 96% with the current Internet routing.
 > >
 > > How about if would have a fastpath and store one entry direct in the 
 > > leaf struct this to avoid loading the leaf_info list in most cases?
 > >
 > > One could believe that both lookup and dump could improve.
 > >
 > You mean to include one "leaf_info" inside leaf structure, so that we 
 > can access it without cache line miss ?

 Yes.

 Cheers
					--ro

^ permalink raw reply

* Re: [patch 5/7] netxen: fix race in interrupt / napi
From: Dhananjay Phadke @ 2008-01-14 18:00 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev
In-Reply-To: <4789414B.40800@garzik.org>

Ok, I will respin the failed patches.

Thanks,
-Dhananjay

On Sat, 12 Jan 2008, Jeff Garzik wrote:

> patch conflicted with
> 
> commit 1706287f6eb58726a9a0e5cbbde87f49757615e3
> Author: David S. Miller <davem@davemloft.net>
> Date:   Mon Jan 7 20:51:29 2008 -0800
> 
>     [NETXEN]: Fix ->poll() done logic.
> 
>     If work_done >= budget we should always elide the NAPI
>     completion.
> 
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 

^ permalink raw reply

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Rick Jones @ 2008-01-14 17:46 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: LKML, netdev
In-Reply-To: <1200280292.3151.24.camel@ymzhang>

>>*) netperf/netserver support CPU affinity within themselves with the 
>>global -T option to netperf.  Is the result with taskset much different? 
>>   The equivalent to the above would be to run netperf with:
>>
>>./netperf -T 0,7 ..
> 
> I checked the source codes and didn't find this option.
> I use netperf V2.3 (I found the number in the makefile).

Indeed, that version pre-dates the -T option.  If you weren't already 
chasing a regression I'd suggest an upgrade to 2.4.mumble.  Once you are 
at a point where changing another variable won't muddle things you may 
want to consider upgrading.

happy benchmarking,

rick jones

^ permalink raw reply

* Re: [PATCH/RFC] synchronize_rcu(): high latency on idle system
From: Stephen Hemminger @ 2008-01-14 17:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: netdev, linux-kerne
In-Reply-To: <200801131634.17677.ak@suse.de>

On Sun, 13 Jan 2008 16:34:17 +0100
Andi Kleen <ak@suse.de> wrote:

> 
> > I think it should be in netdev_unregister_kobject().  But that would
> > only get rid of one of the two calls to synchronize_rcu() in the unregister_netdev.
> 
> Would be already an improvement.
> 
> > The other synchronize_rcu() is for qdisc's and not sure if that one can
> > be removed?
> 
> The standard way to remove such calls is to set a "deleted" flag in the object,
> then check and ignore such objects in the reader and finally remove the object with
> call_rcu
> 
> I have not checked if that is really feasible for qdiscs.
> 
> -Andi

Actually, the synchronize_rcu() is now acting a barrier between two sections
in the current unregister process. It can't be removed.

But, an alternative unregister_and_free_netdev() could be created that uses
call_rcu.  Basically:

void unregistr_and_free_netdev() {
    do stuff before barrier...
    setup rcu callback
    call_rcu();
}

static void netdev_after_rcu() {
    rtnl_lock();
    do stuff after barier
    rtnl_unlock();
    free_netdev
}

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* occasionally corrupted network stats in /proc/net/dev
From: Mark Seger @ 2008-01-14 17:20 UTC (permalink / raw)
  To: netdev

I had posted the following on linux-net and haven't see any responses 
possibly because nobody had any or that list is obsolete.  I have been 
told this is the current list for everything networking on linux so I 
thought I'd try again...

I suspect the answer will be that it is what it is, but here's the 
deal.  I have a tool I use for monitoring network traffic among other 
things - see http://collectl.sourceforge.net/ - and one of its benefits  
is that you can run it continuously as a daemon (similar to sar) and 
generate data in a format suitable for plotting.  This means that you 
can automate your entire network monitoring infrastructure at fairly 
fine granularity, down to second if you like.  Actually 1-second level 
monitoring will provide incorrect data on earlier kernels because the 
stats aren't updated on 1 second boundaries and you need to monitor at 
an interval of 0.9765 seconds, but that's a different story which is 
explained at http://collectl.sourceforge.net/NetworkStats.html

But more importantly, I've found that occasionally (not that often) 
there is bogus data reported from /proc/net/dev.  While I don't have a 
lot of details on this it seems to only show up in 64 bit kernels.  Look 
at the following samples taken at 1 second intervals:

 eth0:135115809 1024897    0    0    0     0          0         9 
135458926  910340    0    0    0     0       0          0
 eth0:135118023 1024923    0    0    0     0          0         9 
135460952  910363    0    0    0     0       0          0
 eth0:        0  884620    0    0    0     0          0    909397   
9687563 1049736    0    0    0     0       0          0
 eth0:135121189 1024957    0    0    0     0          0         9 
135464222  910400    0    0    0     0       0          0
 eth0:135129565 1024995    0    0    0     0          0         9 
135473687  910435    0    0    0     0       0          0

see the middle sample?  When I look at the change between samples it 
generates a really big number since the difference is assumed to be 
caused a counter wrapping.  The problem is it's not always 
straightforward when there is bad data.  For example if the original and 
bogus values are close enough it's not even clear there is a problem.

So the obvious question is, is there any way to prevent the bogus data 
from getting reported?   If not, is there any way to set the values to 
something to indicate that the correct values can't be determined?  
Clearly this problem would be visible to any tool that looks at /proc 
but since many tools are not automated or don't take it to the level I 
do, nobody probably notices.  As for the counter update frequency, even 
though they now appear to be updated closer to a 1 second boundary it 
also means tools that can monitor at sub-second intervals will report 
incorrect data since the counters only change once a second.

-mark


^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Eric Dumazet @ 2008-01-14 16:56 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Ray Lee, netdev, linux-kernel
In-Reply-To: <478B8473.6080506@nortel.com>

Chris Friesen a écrit :
> Ray Lee wrote:
>> On Jan 10, 2008 9:24 AM, Chris Friesen <cfriesen@nortel.com> wrote:
>
>>> After a recent userspace app change, we've started seeing packets being
>>> dropped by the ethernet hardware (e1000, NAPI is enabled).  The
>>> error/dropped/fifo counts are going up in ethtool:
>
>> Can you reproduce it with a simple userspace cpu hog? (Two, really,
>> one per cpu.)
>> Can you reproduce it with the newer e1000?
>
> Hmm...good questions and I haven't checked either.  The first one is 
> relatively straightforward.  The second is a bit trickier...last time 
> I tried the latest e1000 driver the card wouldn't boot (we use netboot).
>
>> Can you reproduce it with git head?
>
> Unfortunately, I don't think I'll be able to try this.  We require 
> kernel mods for our userspace to run, and I doubt I'd be able to get 
> the time to port all the changes forward to git head.
>
>> If the answer to the first one is yes, the last no, then bisect until
>> you get a kernel that doesn't show the problem. Backport the fix,
>> unless the fix happens to be CFS. However, I suspect that your
>> userpace app is just starving the system from time to time.
>
> It's conceivable that userspace is starving the kernel, but we have do 
> about 45% idle on one cpu, and 7-10% idle on the other.
>
> We also have an odd situation where on an initial test run after 
> bootup we have 18-24% idle on cpu1, but resetting the test tool drops 
> that to the 7-10% I mentioned above.
>
> Based on profiling and instrumentation it seems like the cost of 
> sctp_endpoint_lookup_assoc() more than triples, which means that the 
> amount of time that bottom halves are disabled in that function also 
> triples.
Any idea of the size of sctp hash size you have ?
(your dmesg probably includes a message starting with SCTP: Hash tables 
configured... 

How many concurrent sctp sockets are handled ?

Maybe sctp_assoc_hashfn() is too weak for your use, and some chains are 
*really* long.





^ permalink raw reply

* Re: [PATCH 2/9] get rid of unused revision element
From: Stephen Hemminger @ 2008-01-14 16:35 UTC (permalink / raw)
  To: David Miller; +Cc: Robert.Olsson, robert.olsson, netdev, stephen.hemminger
In-Reply-To: <20080114.040657.117757802.davem@davemloft.net>

On Mon, 14 Jan 2008 04:06:57 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Robert Olsson <Robert.Olsson@data.slu.se>
> Date: Mon, 14 Jan 2008 12:44:32 +0100
> 
> >  The idea was to have a selective flush of route cache entries when
> >  a fib insert/delete happened. From what I remember you added another/
> >  better solution. Just a list with route cache entries pointing to parent 
> >  route. So yes this was obsoleted by your/our effort to avoid total 
> >  flushing of the route cache. Unfinished work.
> 
> Yes, that's right.  The synchronization was very hard.
> 
> But there is another issue, see below....
> 
> >  According to  http://bgpupdates.potaroo.net/instability/bgpupd.html
> >  (last in page) we currently flush the route cache 2.80 times per second. 
> >  when using full Internet routing with Linux. Maybe we're forced to pick 
> >  up this thread again someday.
> 
> This proves we need to solve this problem.
> 
> The reason I've never gone back to that work is that I didn't
> want to do it while we still had multiple FIB data structure
> implementations.
> 
> Someone needs to go over whatever deficiencies exist in fib_trie
> vs. fib_hash so that we can delete fib_hash and move over to using
> fib_trie always.  It makes no sense to implement everything
> interfacing into that code twice.
> 
> There was a full consensus that this was the way to move forward,
> we just need the dirty work to be done.
> 
> If someone wants to show their gratitude for my getting rid of
> the multipath cached routing code, the above work would be a
> great way to do so (hint hint) :-)

I will be glad to get this working.  Is there any point in doing the a
small systems version as well?

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: [Bugme-new] [Bug 9721] New: wake on lan fails with sky2 module
From: Stephen Hemminger @ 2008-01-14 16:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: supersud501, Rafael J. Wysocki, netdev, linux-acpi, bugme-daemon
In-Reply-To: <20080113112712.e93f07a4.akpm@linux-foundation.org>

On Sun, 13 Jan 2008 11:27:12 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Sun, 13 Jan 2008 16:08:38 +0100 supersud501 <supersud501@yahoo.de> wrote:
> 
> > 
> > 
> > supersud501 wrote:
> > > 
> > > 
> > > Rafael J. Wysocki wrote:
> > > 
> > >>
> > >> Since it seems to be 100% reproducible, it would be very helpful if 
> > >> you could
> > >> use git-bisect to identify the offending commit.
> > >>
> > > 
> > > allright, bisect found the offending commit, here's what i've done:
> > > 
> > > first i started bisect with the following command (since i assumed it is 
> > > a net-driver problem):
> > > 
> > > git-bisect start 'v2.6.24-rc6' 'v2.6.23' '--' 'drivers/net/'
> > > 
> > > after building many kernels and saying good/bad if wol worked/didn't 
> > > work etc. it identified the following commit:
> > > 
> > > # bad: [ac93a3946b676025fa55356180e8321639744b31] sky2: enable PCI 
> > > config writes
> > > 
> > > and refs/bisect/bad gives:
> > > 
> > > 14:16:53 /usr/src/linux-2.6/.git # cat refs/bisect/bad
> > > ac93a3946b676025fa55356180e8321639744b31
> > > 
> > > 
> > > need some more info?
> > > 
> > 
> > i just checked it: commented out the passage of the commit in kernel 
> > 2.6.24-rc7-git4 and compiled it: wol WORKS. so this one line is causing 
> > my wol-disturbance...
> > 
> > 
> 
> So simply reverting this:
> 
> commit ac93a3946b676025fa55356180e8321639744b31
> Author: Stephen Hemminger <shemminger@linux-foundation.org>
> Date:   Mon Nov 5 15:52:08 2007 -0800
> 
>     sky2: enable PCI config writes
>     
>     On some boards, PCI configuration space access is turned off by default.
>     The 2.6.24 driver doesn't turn it on, and should have.
>     
>     Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
>     Signed-off-by: Jeff Garzik <jeff@garzik.org>
> 
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index c27c7d6..4f41a94 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -2791,6 +2791,9 @@ static void sky2_reset(struct sky2_hw *hw)
>  	sky2_write8(hw, B0_CTST, CS_RST_SET);
>  	sky2_write8(hw, B0_CTST, CS_RST_CLR);
>  
> +	/* allow writes to PCI config */
> +	sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_ON);
> +
>  	/* clear PCI errors, if any */
>  	pci_read_config_word(pdev, PCI_STATUS, &status);
>  	status |= PCI_STATUS_ERROR_BITS;
> 
> fixes this regression?
> 
> If so, we should revert that change.
> 
> > but i noticed another "bug" on 2.6.24-rc7-git with sky2: dmesg shows a 
> > lot of lines every 5 seconds:
> > 
> > [...]
> > [  357.400462] sky2 0000:02:00.0: error interrupt status=0xc0000000
> > [  362.442039] printk: 41 messages suppressed.
> > [  362.442043] sky2 0000:02:00.0: error interrupt status=0x80000000
> > [  367.439151] printk: 18 messages suppressed.
> > [  367.439156] sky2 0000:02:00.0: error interrupt status=0x80000000
> > [  372.436267] printk: 30 messages suppressed.
> > [  372.436271] sky2 0000:02:00.0: error interrupt status=0x80000000
> > [  377.350236] printk: 19 messages suppressed.
> > [...]
> > 
> > since i do not notice any errors (yet) i'll wait till next rc, maybe it 
> > will be gone then...
> 
> That's not good.  is this new behaviour?
> 

No, reverting that change will break other systems (including mine).

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: Netconf at conf.au 2008?
From: martin f. krafft @ 2008-01-14 16:39 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20080114.200634.103807161.yoshfuji@linux-ipv6.org>

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

also sprach YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> [2008.01.14.1206 +0100]:
> Very confusing to me...

FYI:
http://lists.alioth.debian.org/pipermail/netconf-devel/2008-January/000214.html

However, I am not entirely convinced. I think
conference/tool/protocol are far apart enough so that the name does
not clash.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"perhaps debian is concerned more about technical excellence rather
 than ease of use by breaking software. in the former we may excel.
 in the latter we have to concede the field to microsoft. guess
 where i want to go today?"
                                                 -- manoj srivastava
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
From: Paul Moore @ 2008-01-14 16:36 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <7197.1200327338@turing-police.cc.vt.edu>

On Monday 14 January 2008 11:15:38 am Valdis.Kletnieks@vt.edu wrote:
> On Sun, 13 Jan 2008 02:35:33 EST, Valdis.Kletnieks@vt.edu said:
> > I'm seeing problems with Sendmail on 24-rc6-mm1, where the main Sendmail
> > is listening on ::1/25, and Fetchmail connects to 127.0.0.1:25 to inject
> > mail it has just fetched from an outside server via IMAP - it will often
> > just hang and not make any further progress. Looking at netstat shows
> > something interesting:
> >
> > % netstat -n -a -A inet | grep 25
> > tcp        0   5108 127.0.0.1:59355             127.0.0.1:25             
> >   ESTABLISHED
>
> The IPv6 is apparently a red herring - this morning I'm seeing the same
> problem with another totally separate pair of programs that are IPv4-only,
> hanging on loopback.

Are you still only seeing these problems on loopback?  I can't help but wonder 
if this is the skb_clone() problem where it wasn't copying skb->iif causing 
SELinux to silently drop the packets.  Then again, I'm not sure if there is a 
clone operation in the code path are going down.  From what I can remember I 
only saw clones on some of the multicast stuff but I'm still learning some of 
the darker corners of the stack.

If you've got some spare cycles, the kernel below should both have the 
clone/iif fix (it's in Linus' tree now) as well as some printks when errors 
occur so packet's are no longer silently dropped by SELinux.

 * git://git.infradead.org/users/pcmoore/lblnet-2.6_testing

-- 
paul moore
linux security @ hp

^ permalink raw reply

* Re: 2.6.24-rc6-mm1 - oddness with IPv4/v6 mapped sockets hanging...
From: Valdis.Kletnieks @ 2008-01-14 16:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, netdev
In-Reply-To: <30887.1200209733@turing-police.cc.vt.edu>

[-- Attachment #1: Type: text/plain, Size: 684 bytes --]

On Sun, 13 Jan 2008 02:35:33 EST, Valdis.Kletnieks@vt.edu said:

> I'm seeing problems with Sendmail on 24-rc6-mm1, where the main Sendmail is
> listening on ::1/25, and Fetchmail connects to 127.0.0.1:25 to inject mail it
> has just fetched from an outside server via IMAP - it will often just hang and
> not make any further progress. Looking at netstat shows something interesting:
> 
> % netstat -n -a -A inet | grep 25
> tcp        0   5108 127.0.0.1:59355             127.0.0.1:25                ESTABLISHED 

The IPv6 is apparently a red herring - this morning I'm seeing the same problem
with another totally separate pair of programs that are IPv4-only, hanging
on loopback.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen @ 2008-01-14 15:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20080111.175310.212714342.davem@davemloft.net>

David Miller wrote:
> From: "Chris Friesen" <cfriesen@nortel.com>
> Date: Fri, 11 Jan 2008 08:59:26 -0600

>>I'd love to work on newer kernels, but we have a commitment to our 
>>customers to support multiple releases for a significant amount of time.

> And by asking here for people to dig into it for you, you are asking
> people for free help providing that support.

I'm not asking people to spend significant amounts of time...more like 
if anyone has any ideas "off the top of their heads".

> That's why there is such negative backlash to asking questions about
> such ancient kernel here, you're asking us to do your work, for free.

I hadn't realized that you felt this strongly about asking questions 
regarding old kernels.

How close to bleeding edge do we need to be for it to be considered 
acceptable to ask questions on netdev?

Given that the embedded space tends to be perpetually stuck on older 
kernels (our "current" release is based on 2.6.14) do you have any 
suggestion on how we can obtain information that would be available on 
netdev if we were using newer kernels?

Chris

^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen @ 2008-01-14 15:49 UTC (permalink / raw)
  To: Ray Lee; +Cc: netdev, linux-kernel
In-Reply-To: <2c0942db0801112137k3f3f885ek212d5cbaecb7fea0@mail.gmail.com>

Ray Lee wrote:
> On Jan 10, 2008 9:24 AM, Chris Friesen <cfriesen@nortel.com> wrote:

>>After a recent userspace app change, we've started seeing packets being
>>dropped by the ethernet hardware (e1000, NAPI is enabled).  The
>>error/dropped/fifo counts are going up in ethtool:

> Can you reproduce it with a simple userspace cpu hog? (Two, really,
> one per cpu.)
> Can you reproduce it with the newer e1000?

Hmm...good questions and I haven't checked either.  The first one is 
relatively straightforward.  The second is a bit trickier...last time I 
tried the latest e1000 driver the card wouldn't boot (we use netboot).

> Can you reproduce it with git head?

Unfortunately, I don't think I'll be able to try this.  We require 
kernel mods for our userspace to run, and I doubt I'd be able to get the 
time to port all the changes forward to git head.

> If the answer to the first one is yes, the last no, then bisect until
> you get a kernel that doesn't show the problem. Backport the fix,
> unless the fix happens to be CFS. However, I suspect that your
> userpace app is just starving the system from time to time.

It's conceivable that userspace is starving the kernel, but we have do 
about 45% idle on one cpu, and 7-10% idle on the other.

We also have an odd situation where on an initial test run after bootup 
we have 18-24% idle on cpu1, but resetting the test tool drops that to 
the 7-10% I mentioned above.

Based on profiling and instrumentation it seems like the cost of 
sctp_endpoint_lookup_assoc() more than triples, which means that the 
amount of time that bottom halves are disabled in that function also 
triples.

^ permalink raw reply

* [PATCH 0/8 2.6.25] a set of small code cleanups
From: Denis V. Lunev @ 2008-01-14 14:59 UTC (permalink / raw)
  To: David Miller; +Cc: devel, netdev

This set contains a set of small improvements found in IPv4 code during
preparations to support ARP in the network namespace. These fixes are
mostly independent except 2 last ones.

Signed-off-by: Denis V. Lunev <den@openvz.org>

^ permalink raw reply

* [PATCH 4/8 net-2.6.25] [ARP] Remove overkill checks from neigh_param_alloc.
From: Denis V. Lunev @ 2008-01-14 14:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev
In-Reply-To: <478B78CB.4030404@openvz.org>

Valid network device is always passed into neigh_param_alloc, so remove
extra checking for dev == NULL. Additionally, cleanup bogus netns assignment.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
---
 net/core/neighbour.c |   18 +++++++-----------
 1 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index af49137..32f1a23 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1301,10 +1301,7 @@ struct neigh_parms *neigh_parms_alloc(struct net_device *dev,
 	struct neigh_parms *p, *ref;
 	struct net *net;
 
-	net = &init_net;
-	if (dev)
-		net = dev->nd_net;
-
+	net = dev->nd_net;
 	ref = lookup_neigh_params(tbl, net, 0);
 	if (!ref)
 		return NULL;
@@ -1316,15 +1313,14 @@ struct neigh_parms *neigh_parms_alloc(struct net_device *dev,
 		INIT_RCU_HEAD(&p->rcu_head);
 		p->reachable_time =
 				neigh_rand_reach_time(p->base_reachable_time);
-		if (dev) {
-			if (dev->neigh_setup && dev->neigh_setup(dev, p)) {
-				kfree(p);
-				return NULL;
-			}
 
-			dev_hold(dev);
-			p->dev = dev;
+		if (dev->neigh_setup && dev->neigh_setup(dev, p)) {
+			kfree(p);
+			return NULL;
 		}
+
+		dev_hold(dev);
+		p->dev = dev;
 		p->net = hold_net(net);
 		p->sysctl_table = NULL;
 		write_lock_bh(&tbl->lock);
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 7/8 net-2.6.25] [IPV4] Remove extra argument from arp_ignore.
From: Denis V. Lunev @ 2008-01-14 15:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev
In-Reply-To: <478B78CB.4030404@openvz.org>

arp_ignore has two arguments: dev & in_dev. dev is used for inet_confirm_addr
calling only.

inet_confirm_addr, in turn, either gets in_dev from the device passed or
iterates over all network devices if the device passed is NULL. It seems
logical to directly pass in_dev into inet_confirm_addr.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 include/linux/inetdevice.h |    2 +-
 net/ipv4/arp.c             |   11 +++++------
 net/ipv4/devinet.c         |   17 ++++++-----------
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index b3c5081..45f3731 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -135,7 +135,7 @@ extern int		devinet_ioctl(unsigned int cmd, void __user *);
 extern void		devinet_init(void);
 extern struct in_device	*inetdev_by_index(int);
 extern __be32		inet_select_addr(const struct net_device *dev, __be32 dst, int scope);
-extern __be32		inet_confirm_addr(const struct net_device *dev, __be32 dst, __be32 local, int scope);
+extern __be32		inet_confirm_addr(struct in_device *in_dev, __be32 dst, __be32 local, int scope);
 extern struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 prefix, __be32 mask);
 
 static __inline__ int inet_ifa_match(__be32 addr, struct in_ifaddr *ifa)
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index e944d98..f38e1a9 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -382,8 +382,7 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
 		read_unlock_bh(&neigh->lock);
 }
 
-static int arp_ignore(struct in_device *in_dev, struct net_device *dev,
-		      __be32 sip, __be32 tip)
+static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
 {
 	int scope;
 
@@ -403,7 +402,7 @@ static int arp_ignore(struct in_device *in_dev, struct net_device *dev,
 	case 3:	/* Do not reply for scope host addresses */
 		sip = 0;
 		scope = RT_SCOPE_LINK;
-		dev = NULL;
+		in_dev = NULL;
 		break;
 	case 4:	/* Reserved */
 	case 5:
@@ -415,7 +414,7 @@ static int arp_ignore(struct in_device *in_dev, struct net_device *dev,
 	default:
 		return 0;
 	}
-	return !inet_confirm_addr(dev, sip, tip, scope);
+	return !inet_confirm_addr(in_dev, sip, tip, scope);
 }
 
 static int arp_filter(__be32 sip, __be32 tip, struct net_device *dev)
@@ -807,7 +806,7 @@ static int arp_process(struct sk_buff *skb)
 	if (sip == 0) {
 		if (arp->ar_op == htons(ARPOP_REQUEST) &&
 		    inet_addr_type(&init_net, tip) == RTN_LOCAL &&
-		    !arp_ignore(in_dev,dev,sip,tip))
+		    !arp_ignore(in_dev, sip, tip))
 			arp_send(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha,
 				 dev->dev_addr, sha);
 		goto out;
@@ -825,7 +824,7 @@ static int arp_process(struct sk_buff *skb)
 				int dont_send = 0;
 
 				if (!dont_send)
-					dont_send |= arp_ignore(in_dev,dev,sip,tip);
+					dont_send |= arp_ignore(in_dev,sip,tip);
 				if (!dont_send && IN_DEV_ARPFILTER(in_dev))
 					dont_send |= arp_filter(sip,tip,dev);
 				if (!dont_send)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 03db15b..dc1665a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -968,24 +968,19 @@ static __be32 confirm_addr_indev(struct in_device *in_dev, __be32 dst,
 
 /*
  * Confirm that local IP address exists using wildcards:
- * - dev: only on this interface, 0=any interface
+ * - in_dev: only on this interface, 0=any interface
  * - dst: only in the same subnet as dst, 0=any dst
  * - local: address, 0=autoselect the local address
  * - scope: maximum allowed scope value for the local address
  */
-__be32 inet_confirm_addr(const struct net_device *dev, __be32 dst, __be32 local, int scope)
+__be32 inet_confirm_addr(struct in_device *in_dev,
+			 __be32 dst, __be32 local, int scope)
 {
 	__be32 addr = 0;
-	struct in_device *in_dev;
-
-	if (dev) {
-		rcu_read_lock();
-		if ((in_dev = __in_dev_get_rcu(dev)))
-			addr = confirm_addr_indev(in_dev, dst, local, scope);
-		rcu_read_unlock();
+	struct net_device *dev;
 
-		return addr;
-	}
+	if (in_dev != NULL)
+		return confirm_addr_indev(in_dev, dst, local, scope);
 
 	read_lock(&dev_base_lock);
 	rcu_read_lock();
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 6/8 net-2.6.25] [ARP] neigh_parms_put(destroy) are essentially local to core/neighbour.c.
From: Denis V. Lunev @ 2008-01-14 14:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev
In-Reply-To: <478B78CB.4030404@openvz.org>

Make them static.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 include/net/neighbour.h |    7 -------
 net/core/neighbour.c    |   11 ++++++++++-
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index a0d42a5..ebbfb50 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -213,7 +213,6 @@ extern struct neighbour 	*neigh_event_ns(struct neigh_table *tbl,
 
 extern struct neigh_parms	*neigh_parms_alloc(struct net_device *dev, struct neigh_table *tbl);
 extern void			neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms);
-extern void			neigh_parms_destroy(struct neigh_parms *parms);
 extern unsigned long		neigh_rand_reach_time(unsigned long base);
 
 extern void			pneigh_enqueue(struct neigh_table *tbl, struct neigh_parms *p,
@@ -254,12 +253,6 @@ static inline void __neigh_parms_put(struct neigh_parms *parms)
 	atomic_dec(&parms->refcnt);
 }
 
-static inline void neigh_parms_put(struct neigh_parms *parms)
-{
-	if (atomic_dec_and_test(&parms->refcnt))
-		neigh_parms_destroy(parms);
-}
-
 static inline struct neigh_parms *neigh_parms_clone(struct neigh_parms *parms)
 {
 	atomic_inc(&parms->refcnt);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 9b0b773..41394db 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -55,6 +55,8 @@
 
 #define PNEIGH_HASHMASK		0xF
 
+static inline void neigh_parms_put(struct neigh_parms *parms);
+
 static void neigh_timer_handler(unsigned long arg);
 static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
@@ -1348,7 +1350,7 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
 	NEIGH_PRINTK1("neigh_parms_release: not found\n");
 }
 
-void neigh_parms_destroy(struct neigh_parms *parms)
+static void neigh_parms_destroy(struct neigh_parms *parms)
 {
 	release_net(parms->net);
 	if (parms->dev)
@@ -1356,6 +1358,13 @@ void neigh_parms_destroy(struct neigh_parms *parms)
 	kfree(parms);
 }
 
+static inline void neigh_parms_put(struct neigh_parms *parms)
+{
+	if (atomic_dec_and_test(&parms->refcnt))
+		neigh_parms_destroy(parms);
+}
+
+
 static struct lock_class_key neigh_table_proxy_queue_class;
 
 void neigh_table_init_no_netlink(struct neigh_table *tbl)
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 8/8 net-2.6.25] [NETNS] Process inet_confirm_addr in the correct namespace.
From: Denis V. Lunev @ 2008-01-14 15:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev
In-Reply-To: <478B78CB.4030404@openvz.org>

inet_confirm_addr can be called with NULL in_dev from arp_ignore iff
scope is RT_SCOPE_LINK.

Lets always pass the device and check for RT_SCOPE_LINK scope inside
inet_confirm_addr. This let us take network namespace from in_device a
need for an additional argument.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/arp.c     |    1 -
 net/ipv4/devinet.c |    6 ++++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index f38e1a9..b715ec0 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -402,7 +400,6 @@ static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
 	case 3:	/* Do not reply for scope host addresses */
 		sip = 0;
 		scope = RT_SCOPE_LINK;
-		in_dev = NULL;
 		break;
 	case 4:	/* Reserved */
 	case 5:
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index dc1665a..4569c69 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -978,13 +978,15 @@ __be32 inet_confirm_addr(struct in_device *in_dev,
 {
 	__be32 addr = 0;
 	struct net_device *dev;
+	struct net *net;
 
-	if (in_dev != NULL)
+	if (scope != RT_SCOPE_LINK)
 		return confirm_addr_indev(in_dev, dst, local, scope);
 
+	net = in_dev->dev->nd_net;
 	read_lock(&dev_base_lock);
 	rcu_read_lock();
-	for_each_netdev(&init_net, dev) {
+	for_each_netdev(net, dev) {
 		if ((in_dev = __in_dev_get_rcu(dev))) {
 			addr = confirm_addr_indev(in_dev, dst, local, scope);
 			if (addr)
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 1/8 net-2.6.25] [ARP] Move inet_addr_type call after simple error checks in arp_contructor.
From: Denis V. Lunev @ 2008-01-14 14:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev
In-Reply-To: <478B78CB.4030404@openvz.org>

The neighbour entry will be destroyed in the case of error, so it is pointless
to perform constly routing table lookup in this case.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/arp.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index b715ec0..49c24ff 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -235,8 +235,6 @@ static int arp_constructor(struct neighbour *neigh)
 	struct in_device *in_dev;
 	struct neigh_parms *parms;
 
-	neigh->type = inet_addr_type(&init_net, addr);
-
 	rcu_read_lock();
 	in_dev = __in_dev_get_rcu(dev);
 	if (in_dev == NULL) {
@@ -244,6 +242,8 @@ static int arp_constructor(struct neighbour *neigh)
 		return -EINVAL;
 	}
 
+	neigh->type = inet_addr_type(&init_net, addr);
+
 	parms = in_dev->arp_parms;
 	__neigh_parms_put(neigh->parms);
 	neigh->parms = neigh_parms_clone(parms);
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 3/8 net-2.6.25] [IPV4] fib_rules_unregister is essentially void.
From: Denis V. Lunev @ 2008-01-14 14:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev
In-Reply-To: <478B78CB.4030404@openvz.org>

fib_rules_unregister is called only after successful register and the return
code is never checked.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 include/net/fib_rules.h |    2 +-
 net/core/fib_rules.c    |   21 ++++-----------------
 2 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 88f870f..34349f9 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -104,7 +104,7 @@ static inline u32 frh_get_table(struct fib_rule_hdr *frh, struct nlattr **nla)
 }
 
 extern int fib_rules_register(struct net *, struct fib_rules_ops *);
-extern int fib_rules_unregister(struct net *, struct fib_rules_ops *);
+extern void fib_rules_unregister(struct net *, struct fib_rules_ops *);
 extern void                     fib_rules_cleanup_ops(struct fib_rules_ops *);
 
 extern int			fib_rules_lookup(struct fib_rules_ops *,
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 0eecf4c..42ccaf5 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -115,29 +115,16 @@ void fib_rules_cleanup_ops(struct fib_rules_ops *ops)
 }
 EXPORT_SYMBOL_GPL(fib_rules_cleanup_ops);
 
-int fib_rules_unregister(struct net *net, struct fib_rules_ops *ops)
+void fib_rules_unregister(struct net *net, struct fib_rules_ops *ops)
 {
-	int err = 0;
-	struct fib_rules_ops *o;
 
 	spin_lock(&net->rules_mod_lock);
-	list_for_each_entry(o, &net->rules_ops, list) {
-		if (o == ops) {
-			list_del_rcu(&o->list);
-			fib_rules_cleanup_ops(ops);
-			goto out;
-		}
-	}
-
-	err = -ENOENT;
-out:
+	list_del_rcu(&ops->list);
+	fib_rules_cleanup_ops(ops);
 	spin_unlock(&net->rules_mod_lock);
 
 	synchronize_rcu();
-	if (!err)
-		release_net(net);
-
-	return err;
+	release_net(net);
 }
 
 EXPORT_SYMBOL_GPL(fib_rules_unregister);
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 2/8 net-2.6.25] [NETNS] Make arp code network namespace consistent.
From: Denis V. Lunev @ 2008-01-14 14:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev
In-Reply-To: <478B78CB.4030404@openvz.org>

Some calls in the arp.c have network namespace as an argument. Getting
init_net inside these functions is simply inconsistent. Fix this.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/arp.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 49c24ff..0db7d49 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -969,13 +969,13 @@ static int arp_req_set_public(struct net *net, struct arpreq *r,
 	if (mask && mask != htonl(0xFFFFFFFF))
 		return -EINVAL;
 	if (!dev && (r->arp_flags & ATF_COM)) {
-		dev = dev_getbyhwaddr(&init_net, r->arp_ha.sa_family,
+		dev = dev_getbyhwaddr(net, r->arp_ha.sa_family,
 				r->arp_ha.sa_data);
 		if (!dev)
 			return -ENODEV;
 	}
 	if (mask) {
-		if (pneigh_lookup(&arp_tbl, &init_net, &ip, dev, 1) == NULL)
+		if (pneigh_lookup(&arp_tbl, net, &ip, dev, 1) == NULL)
 			return -ENOBUFS;
 		return 0;
 	}
@@ -1084,7 +1084,7 @@ static int arp_req_delete_public(struct net *net, struct arpreq *r,
 	__be32 mask = ((struct sockaddr_in *)&r->arp_netmask)->sin_addr.s_addr;
 
 	if (mask == htonl(0xFFFFFFFF))
-		return pneigh_delete(&arp_tbl, &init_net, &ip, dev);
+		return pneigh_delete(&arp_tbl, net, &ip, dev);
 
 	if (mask)
 		return -EINVAL;
@@ -1162,7 +1162,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	rtnl_lock();
 	if (r.arp_dev[0]) {
 		err = -ENODEV;
-		if ((dev = __dev_get_by_name(&init_net, r.arp_dev)) == NULL)
+		if ((dev = __dev_get_by_name(net, r.arp_dev)) == NULL)
 			goto out;
 
 		/* Mmmm... It is wrong... ARPHRD_NETROM==0 */
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 5/8 net-2.6.25] [ARP] Remove forward declaration of neigh_changeaddr.
From: Denis V. Lunev @ 2008-01-14 14:59 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev
In-Reply-To: <478B78CB.4030404@openvz.org>

No need for this. It is declared in the neighbour.h

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/core/neighbour.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 2eab6a5..9b0b773 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -59,7 +59,6 @@ static void neigh_timer_handler(unsigned long arg);
 static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
 static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
-void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 
 static struct neigh_table *neigh_tables;
 #ifdef CONFIG_PROC_FS
-- 
1.5.3.rc5


^ permalink raw reply related

* Re: [PATCH 1/2 net-2.6.25] [IPV4] Remove extra argument from arp_ignore.
From: Denis V. Lunev @ 2008-01-14 14:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, devel
In-Reply-To: <478B750D.6060407@sw.ru>

Denis V. Lunev wrote:
> David Miller wrote:
>> This patch series is numbered, but your other patch series sent a few
>> moments ago had no sequence numbers in the subject lines or changelog.
>>
>> How can I know what order to apply those in and do they need to go in
>> before or after this set?
>>
>> I shouldn't have to ask questions like this, so please help avoid
>> confusion of this nature in the future.
>>
>> Thanks.
>>
> 
> non-numbered patches do not intersect with each other.
> 
> numbered ones depends on each other but not from the rest.
> 
> Sorry for inconvenience :(

I'll resend them for you convenience. Sorry again, I forget that git
checks indexes and this can be a problem.

^ permalink raw reply

* Re: [PATCH 1/2 net-2.6.25] [IPV4] Remove extra argument from arp_ignore.
From: Denis V. Lunev @ 2008-01-14 14:43 UTC (permalink / raw)
  To: David Miller; +Cc: den, netdev, devel
In-Reply-To: <20080114.063753.42458530.davem@davemloft.net>

David Miller wrote:
> This patch series is numbered, but your other patch series sent a few
> moments ago had no sequence numbers in the subject lines or changelog.
> 
> How can I know what order to apply those in and do they need to go in
> before or after this set?
> 
> I shouldn't have to ask questions like this, so please help avoid
> confusion of this nature in the future.
> 
> Thanks.
> 

non-numbered patches do not intersect with each other.

numbered ones depends on each other but not from the rest.

Sorry for inconvenience :(

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox