Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: why does promote_secondaries default to off?
From: martin f krafft @ 2008-01-11 17:43 UTC (permalink / raw)
  To: netdev discussion list
In-Reply-To: <4787A863.3060506@fr.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 601 bytes --]

also sprach Daniel Lezcano <dlezcano@fr.ibm.com> [2008.01.11.1833 +0100]:
> This tweak is "recent" (2.6.16 as far as I remember), so I suppose
> the  reason is to not puzzled people with a changed default
> behavior.

Your instant and helpful responses are most appreciated!

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
a common mistake that people make
when trying to design something completely foolproof
was to underestimate the ingenuity of complete fools.
                                 -- douglas adams, "mostly harmless"
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: e1000 performance issue in 4 simultaneous links
From: Denys Fedoryshchenko @ 2008-01-11 17:36 UTC (permalink / raw)
  To: netdev
In-Reply-To: <47879DE4.8080603@cosmosbay.com>

Maybe good idea to use sysstat ?

http://perso.wanadoo.fr/sebastien.godard/

For example:

visp-1 ~ # mpstat -P ALL 1
Linux 2.6.24-rc7-devel (visp-1)         01/11/08

19:27:57     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal
   %idle    intr/s
19:27:58     all    0.00    0.00    0.00    0.00    0.00    2.51    0.00   
97.49   7707.00
19:27:58       0    0.00    0.00    0.00    0.00    0.00    4.00    0.00   
96.00   1926.00
19:27:58       1    0.00    0.00    0.00    0.00    0.00    1.01    0.00   
98.99   1926.00
19:27:58       2    0.00    0.00    0.00    0.00    0.00    5.00    0.00   
95.00   1927.00
19:27:58       3    0.00    0.00    0.00    0.00    0.00    0.99    0.00   
99.01   1927.00
19:27:58       4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    
0.00      0.00



> >>     
> >>> When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
> >>> of transfer rate. If I run 4 netperf against 4 different interfaces, I
> >>> get around 720 * 10^6 bits/sec.
> >>>       
> >> I hope this explanation makes sense, but what it comes down to is that
> >> combining hardware round robin balancing with NAPI is a BAD IDEA.  In
> >> general the behavior of hardware round robin balancing is bad and I'm
> >> sure it is causing all sorts of other performance issues that you may
> >> not even be aware of.
> >>     
> > I've made another test removing the ppc IRQ Round Robin scheme, bonded
> > each interface (eth6, eth7, eth16 and eth17) to different CPUs (CPU1,
> > CPU2, CPU3 and CPU4) and I also get around around 720 * 10^6 bits/s in
> > average.
> >
> > Take a look at the interrupt table this time: 
> >
> > io-dolphins:~/leitao # cat /proc/interrupts  | grep eth[1]*[67]
> > 277:         15    1362450         13         14         13         
14         15         18   XICS      Level     eth6
> > 278:         12         13    1348681         19         13         
15         10         11   XICS      Level     eth7
> > 323:         11         18         17    1348426         18         
11         11         13   XICS      Level     eth16
> > 324:         12         16         11         19    1402709         
13         14         11   XICS      Level     eth17
> >
> >
> > I also tried to bound all the 4 interface IRQ to a single CPU (CPU0)
> > using the noirqdistrib boot paramenter, and the performance was a little
> > worse.
> >
> > Rick, 
> >   The 2 interface test that I showed in my first email, was run in two
> > different NIC. Also, I am running netperf with the following command
> > "netperf -H <hostname> -T 0,8" while netserver is running without any
> > argument at all. Also, running vmstat in parallel shows that there is no
> > bottleneck in the CPU. Take a look: 
> >
> > procs -----------memory---------- ---swap-- -----io---- -system-- -----
cpu------
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy 
id wa st
> >  2  0      0 6714732  16168 227440    0    0     8     2  203   21  0  1 
98  0  0
> >  0  0      0 6715120  16176 227440    0    0     0    28 16234  505  0 16 
83  0  1
> >  0  0      0 6715516  16176 227440    0    0     0     0 16251  518  0 16 
83  0  1
> >  1  0      0 6715252  16176 227440    0    0     0     1 16316  497  0 15 
84  0  1
> >  0  0      0 6716092  16176 227440    0    0     0     0 16300  520  0 16 
83  0  1
> >  0  0      0 6716320  16180 227440    0    0     0     1 16354  486  0 15 
84  0  1
> >  
> >
> >   
> If your machine has 8 cpus, then your vmstat output shows a 
> bottleneck :)
> 
> (100/8 = 12.5), so I guess one of your CPU is full
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply

* Re: [PATCH 0/4] Pull request for 'ipg-fixes' branch
From: Francois Romieu @ 2008-01-11 17:22 UTC (permalink / raw)
  To: linux; +Cc: davem, akpm, jeff, netdev
In-Reply-To: <20080111015851.25008.qmail@science.horizon.com>

linux@horizon.com <linux@horizon.com> :
[...]
> I notice that the vendor-supplied driver doesn't have these bugs.

The M in POMS stands for "my".

[...]
> Would you be interested in some cleanup patches ?

Yes.

> In particular, I think I can get rid of tx->lock entirely, or at least
> take it off the fast path. All it's protecting is the write to
> sp->tx_current, and a few judicious memory barriers can deal with that.

I have done a kind of memory barrier trick for the r8169 in the past but
it is not clear that I would do it again. Today I would argue more strongly
in direction of similar locking amongst different drivers. The tg3 driver
is a good model imho.

Anyway you have been here for some time so I see no reason to kill any
different/new locking scheme you could come with.

Off until sunday.

-- 
Ueimor

^ permalink raw reply

* Re: why does promote_secondaries default to off?
From: Daniel Lezcano @ 2008-01-11 17:33 UTC (permalink / raw)
  To: netdev discussion list
In-Reply-To: <20080111172641.GA22449@piper.oerlikon.madduck.net>

martin f krafft wrote:
> also sprach Daniel Lezcano <dlezcano@fr.ibm.com> [2008.01.11.1813 +0100]:
>> There is a tweak in /proc/sys which activate secondaries promotion when a 
>> primary is deleted.
>>
>> /proc/sys/net/ipv4/conf/all/promote_secondaries
>>
>> I think it changes the behavior to the one you wish.
> 
> Totally. That would have been the last place I had looked.
> Thank you!
> 
> Do you have any idea why this isn't on by default?

This tweak is "recent" (2.6.16 as far as I remember), so I suppose the 
reason is to not puzzled people with a changed default behavior.

^ permalink raw reply

* Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
From: Vince Fuller @ 2008-01-11 17:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Vince Fuller, netdev, linux-kernel
In-Reply-To: <p73bq7smx1t.fsf@bingen.suse.de>

On Fri, Jan 11, 2008 at 12:17:02PM +0100, Andi Kleen wrote:
> Vince Fuller <vaf@cisco.com> writes:
> 
> > from Vince Fuller <vaf@vaf.net>
> >
> > This set of diffs modify the 2.6.20 kernel to enable use of the 240/4
> > (aka "class-E") address space as consistent with the Internet Draft
> > draft-fuller-240space-00.txt.
> 
> Wouldn't it be wise to at least wait for it becoming an RFC first? 

There is reasonable consensus on making use of 240/4; some applications,
such as ISAKMP and automatic ipv6-to-IPv4 tunneling, still need to determine
if they should treat the space as "public" or "private" but that shouldn't
affect whether kernel support is added.

Solaris recently added support for 240/4 and OSX already has it. I thought
the Linux kernel developers might appreciate having patches to do likewise.

I leave it up to you, the developers, to decide if you want to use these
patches.

	--Vince

^ permalink raw reply

* why does promote_secondaries default to off? (was: iproute2: removing primary address removes secondaries)
From: martin f krafft @ 2008-01-11 17:26 UTC (permalink / raw)
  To: netdev discussion list
In-Reply-To: <4787A3AB.4000205@fr.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 689 bytes --]

also sprach Daniel Lezcano <dlezcano@fr.ibm.com> [2008.01.11.1813 +0100]:
> There is a tweak in /proc/sys which activate secondaries promotion when a 
> primary is deleted.
>
> /proc/sys/net/ipv4/conf/all/promote_secondaries
>
> I think it changes the behavior to the one you wish.

Totally. That would have been the last place I had looked.
Thank you!

Do you have any idea why this isn't on by default?

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

"i never go without my dinner. no one ever does, except vegetarians
 and people like that."
                                                        -- oscar wilde

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Simple question about LARTC theory
From: slavon @ 2008-01-11 17:17 UTC (permalink / raw)
  To: netdev

Hello all.
Sorry for offtopic. I subscribe only on netdev@vger.kernel.org... try  
send to lartc@vger.kernel.org and get "Undelivered Mail Returned to  
Sender". May i do small offtop? This maillist have many people that  
known lartc "in code" and i hope its help for my idea. Thanks.

Simple Question

Legend
[] - qdisc
() - class
** - filter

[htb 1:0 root] *match X FLOWID 3:5*
(1:2 htb)(2:3 htb)(3:5 htb)[sfq 5]
(1:6 htb)(6:7 htb)(7:8 htb)[sfq 8]

packet go
IN -> [htb 1:0] -> (class 1:2 - GREEN) -> (class 2:3 GREEN) -> (class
3:5 - GREEN) -> [sfq 5] -> OUT

then i create

[prio 3 bound 10:0] *match X flowid 10:2*
+(10:1 htb) -- [sfq 101]
+(10:2 htb) -- [sfq 102]
+(10:3 htb) -- [sfq 103]

HOW to add filter to [sfq 5] and [sfq 8]  that then packet go out from
it its go to [prio 3 bound 10:0] and do filter from it?

flowid work if it see begin and end of links... i need like GOTO... if
i add to [prio 3 bound 10:0] PARRENT ID - flowid found path, but i
need that [prio 3 bound 10:0] must have more 1 parrent...

i look to "link" but if i understand - its work for only for hashtables
i look to classid but its go to class 10:X, not to [prio 3 bound 10:0]
and not process filter...

Or i not understand theory?

That i need? I need 3 groups in tc
1-st group get all traffic and do HTB shape (defence from ICMP and UDP shtorm)
a) icmp rate 100mbs cell 500mbs
b) udp rate 100mbs cell 500mbs
c) other rate 300mbs cell 500mbs
all prio = 0 to do normal cellrate

2-nd group do prio ( icmp and udp must be first becouse its not have  
check for transmit)
icmp = 1
udp = 2
other = 3

3-th group do speed limit by IP (shape it) ( this part is ready )

i wont that all exits on group 1 go to group 2 filters and all exits  
on group 2 go to group 3 exists...

Thanks. Slavon

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

^ permalink raw reply

* Re: iproute2: removing primary address removes secondaries
From: Daniel Lezcano @ 2008-01-11 17:13 UTC (permalink / raw)
  To: netdev discussion list
In-Reply-To: <20080111163155.GA17637@piper.oerlikon.madduck.net>

martin f krafft wrote:
> Dear list,
> 
> When I add an address to an interface whose network prefix is the
> same as that of an address already bound to the interface, the new
> address becomes a secondary address. As per
> http://www.policyrouting.org/iproute2.doc.html:
> 
>   "secondary --- this address is not used when selecting the default
>   source address for outgoing packets. An IP address becomes
>   secondary if another address within the same prefix (network)
>   already exists. The first address within the prefix is primary and
>   is the tag address for the group of all the secondary addresses.
>   When the primary address is deleted all of the secondaries are
>   purged too."
> 
> In the following, I want to argue that this is not necessary.
> I think that removal of a primary address should cause the next
> address to be promoted to be the default source address and the
> link-scoped route to be retained. This is basically out of
> http://bugs.debian.org/429689, the maintainer asked me to turn
> directly to this list.
> 
> If I add an address to a device with 'ip add', ip also implicitly
> adds a link-scoped route according to the netmask. It only does this
> for primary addresses, so if I add a second address within the same
> network, the route is not duplicated.
> 
> Thus, the net effect on the routing table is the same for the
> following two commands:
> 
>   ip a a 172.16.0.100/12 dev eth0 && ip a a 172.16.0.200/12 dev eth0
>   ip a a 172.16.0.100/12 dev eth0 && ip a a 172.16.0.200/32 dev eth0
>                                                         ^^^^
> In the first case, the .200 address becomes a secondary of the .100
> address. In the second case, they are both primaries. In both cases,
> only one /12 link-scoped route will be created.
> 
> However, in both cases, if I remove the .100 address, the .200 is
> affected: if it's secondary, it ceases to exist, and if it's
> primary (i.e. in the /32 case), then the host can no longer use it
> to communicate to hosts in the same link segment, only to hosts on
> the other side of the default gateway.
> 
> I thus question the point of purging secondary addresses. Obviously,
> only one address can be primary (it is used as source address for
> packets leaving the machine by the respective route). But if the
> primary address is removed, the next secondary should be promoted
> and the route should *not* be deleted.
> 
> Comments?
> 
> Cheers,

There is a tweak in /proc/sys which activate secondaries promotion when 
a primary is deleted.

/proc/sys/net/ipv4/conf/all/promote_secondaries

I think it changes the behavior to the one you wish.

Regards

^ permalink raw reply

* iproute2: removing primary address removes secondaries
From: martin f krafft @ 2008-01-11 16:31 UTC (permalink / raw)
  To: netdev discussion list

[-- Attachment #1: Type: text/plain, Size: 2579 bytes --]

Dear list,

When I add an address to an interface whose network prefix is the
same as that of an address already bound to the interface, the new
address becomes a secondary address. As per
http://www.policyrouting.org/iproute2.doc.html:

  "secondary --- this address is not used when selecting the default
  source address for outgoing packets. An IP address becomes
  secondary if another address within the same prefix (network)
  already exists. The first address within the prefix is primary and
  is the tag address for the group of all the secondary addresses.
  When the primary address is deleted all of the secondaries are
  purged too."

In the following, I want to argue that this is not necessary.
I think that removal of a primary address should cause the next
address to be promoted to be the default source address and the
link-scoped route to be retained. This is basically out of
http://bugs.debian.org/429689, the maintainer asked me to turn
directly to this list.

If I add an address to a device with 'ip add', ip also implicitly
adds a link-scoped route according to the netmask. It only does this
for primary addresses, so if I add a second address within the same
network, the route is not duplicated.

Thus, the net effect on the routing table is the same for the
following two commands:

  ip a a 172.16.0.100/12 dev eth0 && ip a a 172.16.0.200/12 dev eth0
  ip a a 172.16.0.100/12 dev eth0 && ip a a 172.16.0.200/32 dev eth0
                                                        ^^^^
In the first case, the .200 address becomes a secondary of the .100
address. In the second case, they are both primaries. In both cases,
only one /12 link-scoped route will be created.

However, in both cases, if I remove the .100 address, the .200 is
affected: if it's secondary, it ceases to exist, and if it's
primary (i.e. in the /32 case), then the host can no longer use it
to communicate to hosts in the same link segment, only to hosts on
the other side of the default gateway.

I thus question the point of purging secondary addresses. Obviously,
only one address can be primary (it is used as source address for
packets leaving the machine by the respective route). But if the
primary address is removed, the next secondary should be promoted
and the route should *not* be deleted.

Comments?

Cheers,

-- 
martin | http://madduck.net/ | http://two.sentenc.es/

microsoft: for when quality, reliability, and security
           just aren't that important!

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH 1/5] spidernet: add missing initialization
From: Linas Vepstas @ 2008-01-11 16:48 UTC (permalink / raw)
  To: Jens Osterkamp; +Cc: Ishizaki Kou, netdev, cbe-oss-dev, Jeff Garzik
In-Reply-To: <200801111344.35652.jens@de.ibm.com>

Hi,

On 11/01/2008, Jens Osterkamp <jens@de.ibm.com> wrote:
> Hi Ishizaki,
>
> Linas has left the company and is no longer doing kernel related stuff,
> so I suggest, given Jeff is ok with that, that the two of us take over
> spidernet maintainership.
>
> Jens
>
> ---
>
> Change maintainership for spidernet.
>
> Signed-off-by: Jens Osterkamp <jens@de.ibm.com>

Fine with me ...

Acked-by: Linas Vepstas <linasvepstas@gmail.com>

> Index: linux-2.6/MAINTAINERS
> ===================================================================
> --- linux-2.6.orig/MAINTAINERS  2008-01-11 13:32:04.000000000 +0100
> +++ linux-2.6/MAINTAINERS       2008-01-11 13:41:32.000000000 +0100
> @@ -3613,8 +3613,10 @@
>  S:     Supported
>
>  SPIDERNET NETWORK DRIVER for CELL
> -P:     Linas Vepstas
> -M:     linas@austin.ibm.com
> +P:     Ishizaki Kou
> +M:     kou.ishizaki@toshiba.co.jp
> +P:     Jens Osterkamp
> +M:     jens@de.ibm.com
>  L:     netdev@vger.kernel.org
>  S:     Supported
>
>

^ permalink raw reply

* Re: e1000 performance issue in 4 simultaneous links
From: Eric Dumazet @ 2008-01-11 16:48 UTC (permalink / raw)
  To: Breno Leitao; +Cc: Brandeburg, Jesse, rick.jones2, netdev
In-Reply-To: <1200068444.9349.20.camel@cafe>

Breno Leitao a écrit :
> On Thu, 2008-01-10 at 12:52 -0800, Brandeburg, Jesse wrote:
>   
>> Breno Leitao wrote:
>>     
>>> When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
>>> of transfer rate. If I run 4 netperf against 4 different interfaces, I
>>> get around 720 * 10^6 bits/sec.
>>>       
>> I hope this explanation makes sense, but what it comes down to is that
>> combining hardware round robin balancing with NAPI is a BAD IDEA.  In
>> general the behavior of hardware round robin balancing is bad and I'm
>> sure it is causing all sorts of other performance issues that you may
>> not even be aware of.
>>     
> I've made another test removing the ppc IRQ Round Robin scheme, bonded
> each interface (eth6, eth7, eth16 and eth17) to different CPUs (CPU1,
> CPU2, CPU3 and CPU4) and I also get around around 720 * 10^6 bits/s in
> average.
>
> Take a look at the interrupt table this time: 
>
> io-dolphins:~/leitao # cat /proc/interrupts  | grep eth[1]*[67]
> 277:         15    1362450         13         14         13         14         15         18   XICS      Level     eth6
> 278:         12         13    1348681         19         13         15         10         11   XICS      Level     eth7
> 323:         11         18         17    1348426         18         11         11         13   XICS      Level     eth16
> 324:         12         16         11         19    1402709         13         14         11   XICS      Level     eth17
>
>
> I also tried to bound all the 4 interface IRQ to a single CPU (CPU0)
> using the noirqdistrib boot paramenter, and the performance was a little
> worse.
>
> Rick, 
>   The 2 interface test that I showed in my first email, was run in two
> different NIC. Also, I am running netperf with the following command
> "netperf -H <hostname> -T 0,8" while netserver is running without any
> argument at all. Also, running vmstat in parallel shows that there is no
> bottleneck in the CPU. Take a look: 
>
> procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  2  0      0 6714732  16168 227440    0    0     8     2  203   21  0  1 98  0  0
>  0  0      0 6715120  16176 227440    0    0     0    28 16234  505  0 16 83  0  1
>  0  0      0 6715516  16176 227440    0    0     0     0 16251  518  0 16 83  0  1
>  1  0      0 6715252  16176 227440    0    0     0     1 16316  497  0 15 84  0  1
>  0  0      0 6716092  16176 227440    0    0     0     0 16300  520  0 16 83  0  1
>  0  0      0 6716320  16180 227440    0    0     0     1 16354  486  0 15 84  0  1
>  
>
>   
If your machine has 8 cpus, then your vmstat output shows a bottleneck :)

(100/8 = 12.5), so I guess one of your CPU is full






^ permalink raw reply

* RE: e1000 performance issue in 4 simultaneous links
From: Breno Leitao @ 2008-01-11 16:20 UTC (permalink / raw)
  To: Brandeburg, Jesse, rick.jones2; +Cc: netdev
In-Reply-To: <36D9DB17C6DE9E40B059440DB8D95F5204275B04@orsmsx418.amr.corp.intel.com>

On Thu, 2008-01-10 at 12:52 -0800, Brandeburg, Jesse wrote:
> Breno Leitao wrote:
> > When I run netperf in just one interface, I get 940.95 * 10^6 bits/sec
> > of transfer rate. If I run 4 netperf against 4 different interfaces, I
> > get around 720 * 10^6 bits/sec.
> 
> I hope this explanation makes sense, but what it comes down to is that
> combining hardware round robin balancing with NAPI is a BAD IDEA.  In
> general the behavior of hardware round robin balancing is bad and I'm
> sure it is causing all sorts of other performance issues that you may
> not even be aware of.
I've made another test removing the ppc IRQ Round Robin scheme, bonded
each interface (eth6, eth7, eth16 and eth17) to different CPUs (CPU1,
CPU2, CPU3 and CPU4) and I also get around around 720 * 10^6 bits/s in
average.

Take a look at the interrupt table this time: 

io-dolphins:~/leitao # cat /proc/interrupts  | grep eth[1]*[67]
277:         15    1362450         13         14         13         14         15         18   XICS      Level     eth6
278:         12         13    1348681         19         13         15         10         11   XICS      Level     eth7
323:         11         18         17    1348426         18         11         11         13   XICS      Level     eth16
324:         12         16         11         19    1402709         13         14         11   XICS      Level     eth17


I also tried to bound all the 4 interface IRQ to a single CPU (CPU0)
using the noirqdistrib boot paramenter, and the performance was a little
worse.

Rick, 
  The 2 interface test that I showed in my first email, was run in two
different NIC. Also, I am running netperf with the following command
"netperf -H <hostname> -T 0,8" while netserver is running without any
argument at all. Also, running vmstat in parallel shows that there is no
bottleneck in the CPU. Take a look: 

procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 6714732  16168 227440    0    0     8     2  203   21  0  1 98  0  0
 0  0      0 6715120  16176 227440    0    0     0    28 16234  505  0 16 83  0  1
 0  0      0 6715516  16176 227440    0    0     0     0 16251  518  0 16 83  0  1
 1  0      0 6715252  16176 227440    0    0     0     1 16316  497  0 15 84  0  1
 0  0      0 6716092  16176 227440    0    0     0     0 16300  520  0 16 83  0  1
 0  0      0 6716320  16180 227440    0    0     0     1 16354  486  0 15 84  0  1
 

Thanks!

-- 
Breno Leitao <leitao@linux.vnet.ibm.com>


^ permalink raw reply

* Re: [PROCFS] [NETNS] issue with /proc/net entries
From: Benjamin Thery @ 2008-01-11 16:00 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, linux-kernel
In-Reply-To: <m1ir21v4jv.fsf@ebiederm.dsl.xmission.com>

Eric W. Biederman wrote:
> Benjamin Thery <benjamin.thery@bull.net> writes:
> 
>> Hi Eric,
>>
>> While testing the current network namespace stuff merged in net-2.6.25,
>> I bumped into the following problem with the /proc/net/ entries.
>> It doesn't always display the actual data of the current namespace,
>> but sometime displays data from other namespaces.
>>
>> I bisected the problem to the commit:
>> "proc: remove/Fix proc generic d_revalidate"
>> 3790ee4bd86396558eedd86faac1052cb782e4e1
>>
>> The problem: If a process in a particular network namespace changes
>> current directory to /proc/net, then processes in other network
>> namespaces trying to look at /proc/net entries will see data from the
>> first namespace (the one with CWD /proc/net). (See test case below).
>>
>> As you comments in the commit suggest, you seem to be aware of some
>> issues when CONFIG_NET_NS=y. Is it one of these corner cases you
>> identified? Any idea on how we can fix it?
> 
> Yes.  It isn't especially hard.   I have most of it in my queue
> I just need to get the silly patches out of there.
> 
> Essentially we need to fix the caching of proc_generic entries,
> So that we can have a proper d_revalidate implementation.
> 
> To get d_revalidate and the caching correct for /proc/net will take
> just a bit more work.  We need to make /proc/net a symlink
> to something like /proc/self/net so that we don't get excess
> revalidates when switching between different processes.
> 
> Or else we can't properly implement the case you have described.
> Where being in the directory causes the wrong version of /proc/net
> to show up. Changing the contents of the dentry for /proc/net
> should only happen during unshare.  Not when we switch between
> processes or else we get into the d_revalidate leaks mount points
> problem again.
> 
> We also need the check to see if something is mounted on top of
> us before we call drop the dentry.  But if we don't even try until
> we know the dentry is invalid it should not be too bad.

Thanks for all the details.
I'll put this issue on my "netns current limitations" list until
it's solved.

Benjamin


> 
> Eric
> 


-- 
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

    http://www.bull.com

^ permalink raw reply

* doubt in e1000_io_write()
From: Jeba Anandhan @ 2008-01-11 15:13 UTC (permalink / raw)
  To: netdev

Hi all,
i have doubt in e1000_io_write().

void
e1000_io_write(struct e1000_hw *hw, unsigned long port, uint32_t value)
{
        outl(value, port);
}

kernel version: 2.6.12.3

Even hw structure has not been used, why it has been passed into
e1000_io_write function?

Thanks
Jeba

^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Chris Friesen @ 2008-01-11 14:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20080110.172049.118174993.davem@davemloft.net>

David Miller wrote:

> You have to be kidding, coming here for help with a nearly
> 4 year old kernel.

I figured it couldn't hurt to ask...if I can't ask the original authors, 
who else is there?

I'd love to work on newer kernels, but we have a commitment to our 
customers to support multiple releases for a significant amount of time.

Chris

^ permalink raw reply

* Re: [PATCH 2.6.23+] ingress classify to [nf]mark
From: jamal @ 2008-01-11 14:59 UTC (permalink / raw)
  To: mahatma; +Cc: netdev
In-Reply-To: <4787A663.4030204@bspu.unibel.by>

On Fri, 2008-11-01 at 15:24 -0200, Dzianis Kahanovich wrote:
> jamal wrote:

> > tc qdisc add dev XXX ingress
> > tc filter add dev XXX parent ffff: protocol ip prio 5 \
> > u32 blah bleh \
> > flowid 1:12 action ipt -j mark --set-mark 13 
> 
> Yes, I do so. But there are simple:
> ---
> if [[ $[TC_INDEX2MARK] == 0 ]] ; then
>   c=${c//action ipt -j MARK --set-mark /flowid :}
> fi
> $c
> ---

I didnt quiet understand what you have above. Does your script above
read the flowid and sets the MARK to some dynamic value based on flowid?
if thats what you are doing - it sounds sensible and much more clever
than what is posted. And it doesnt require any kernel patch.

> Simpliest:
> --- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
> +++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
> @@ -222,6 +222,16 @@
> -   			skb->tc_index = TC_H_MIN(res.classid);
> +   			skb->tc_index = TC_H_MIN(mark=res.classid);

Just write a metaset action and you can have all sorts of policies on
what tc_index, mark etc you want. It is something thats needed in any
case.
When we did tc_index it made sense then because it was for "tc" to use
some default policy. Enforcing policies in the kernel is not the best
thing to do; as an example you want to specify the polciy for mark to
be: classid major>>16|minor. I am sure you have good reasons; however,
for the next person who wants to set it it major>>8|minor for their own
good reason, theres conflict.  
My offer to help you is still open.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH 2.6.23+] ingress classify to [nf]mark
From: Dzianis Kahanovich @ 2008-01-11 17:24 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1200001167.4443.38.camel@localhost>

jamal wrote:

>> To "classid x:y" = "mark=mark&x|y" ("classid :y" = "-j MARK --set-mark y", etc).
>>
>> --- linux-2.6.23-gentoo-r2/net/sched/Kconfig
>> +++ linux-2.6.23-gentoo-r2.fixed/net/sched/Kconfig
>> @@ -222,6 +222,16 @@
> [..]
>>   			skb->tc_index = TC_H_MIN(res.classid);
>> +#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
>> +			skb->mark = (skb->mark&(res.classid>>16))|TC_H_MIN(res.classid);
>> +#endif
>>   		default:
> 
> 
> Please either use ipt action and netfilter fwmarker for this activity or

Sorry. There are only unsuccessful attempt to popularize my working solution.
Really I just use "#define tc_index mark" (in skbuff.h or sch_ingress.c) or 
something like this:

--- linux-2.6.23-gentoo-r2/net/sched/Kconfig
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/Kconfig
@@ -222,6 +222,16 @@
  	  To compile this code as a module, choose M here: the
  	  module will be called sch_ingress.

+config NET_SCH_INGRESS_TC2MARK
+	bool "ingress tc_index -> mark"
+	depends on NET_SCH_INGRESS && NET_CLS_ACT
+	---help---
+	  This enables access to "mark" value via "tc_index" alias
+	  in ingress and unify this values (usage example: set "flowid :2"
+	  in ingress and use it value as "mark" in any way - netfilter, etc).
+	
+	  But tc_index may be undefined - use "flowid :0".
+
  comment "Classification"

  config NET_CLS
--- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
@@ -18,6 +18,9 @@
  #include <net/netlink.h>
  #include <net/pkt_sched.h>

+#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
+#define tc_index mark
+#endif

  #undef DEBUG_INGRESS



> create a new action. 
> If you choose the later (example because you want to dynamically compute
> the mark), look at net/sched/act_simple.c to start from and i can help
> you if you have any questions.
>  
> If you want to use ipt action, the syntax would be something like:
> 
> ---
> tc qdisc add dev XXX ingress
> tc filter add dev XXX parent ffff: protocol ip prio 5 \
> u32 blah bleh \
> flowid 1:12 action ipt -j mark --set-mark 13 

Yes, I do so. But there are simple:
---
if [[ $[TC_INDEX2MARK] == 0 ]] ; then
  c=${c//action ipt -j MARK --set-mark /flowid :}
fi
$c
---

Simpliest:
--- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
+++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
@@ -222,6 +222,16 @@
-   			skb->tc_index = TC_H_MIN(res.classid);
+   			skb->tc_index = TC_H_MIN(mark=res.classid);


-- 
WBR,
Denis Kaganovich,  mahatma@eu.by  http://mahatma.bspu.unibel.by

^ permalink raw reply

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Jarek Poplawski @ 2008-01-11 14:13 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Eric Dumazet, Herbert Xu, davem, dipankar, netdev
In-Reply-To: <20080110235111.GF9586@linux.vnet.ibm.com>

On Thu, Jan 10, 2008 at 03:51:11PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
> > Eric Dumazet wrote, On 01/09/2008 11:37 AM:
> > ...
> > > [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
> > ...
> > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > > index d337706..28484f3 100644
> > > --- a/net/ipv4/route.c
> > > +++ b/net/ipv4/route.c
> > > @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
> > >  			break;
> > >  		rcu_read_unlock_bh();
> > >  	}
> > > -	return r;
> > > +	return rcu_dereference(r);
> > >  }
> > >  
> > >  static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
> > >  {
> > > -	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
> > > +	struct rt_cache_iter_state *st = seq->private;
> > >  
> > >  	r = r->u.dst.rt_next;
> > >  	while (!r) {
> > > @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
> > >  		rcu_read_lock_bh();
> > >  		r = rt_hash_table[st->bucket].chain;
> > >  	}
> > > -	return r;
> > > +	return rcu_dereference(r);
> > >  }
> > 
> > It seems this optimization could've a side effect: if during such a
> > loop updates are done, and r is seen !NULL during while() check, but
> > NULL after rcu_dereference(), the listing/counting could stop too
> > soon. So, IMHO, probably the first version of this patch is more
> > reliable. (Or alternatively additional check is needed before return.)
> 
> Looks to me like "r" is a local variable (argument list), so there
> should not be any possibility of it being changed by some other
> task, right?

It seems words could be stronger than then logic (in some cases)...
After forgetting what's dereference usually for, it's all right!

Thanks,
Jarek P.

^ permalink raw reply

* Re: [PATCH 2.6.23+] ingress classify to [nf]mark
From: Dzianis Kahanovich @ 2008-01-11 17:37 UTC (permalink / raw)
  To: netdev
In-Reply-To: <47865613.1000902@trash.net>

Patrick McHardy wrote:

>> --- linux-2.6.23-gentoo-r2/net/sched/sch_ingress.c
>> +++ linux-2.6.23-gentoo-r2.fixed/net/sched/sch_ingress.c
>> @@ -161,2 +161,5 @@
>>              skb->tc_index = TC_H_MIN(res.classid);
>> +#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
>> +            skb->mark = 
>> (skb->mark&(res.classid>>16))|TC_H_MIN(res.classid);
>> +#endif
>>          default:
> 
> 
> Behaviour like this shouldn't depend on compile-time options.

Also I want to move it outside of NET_CLS_ACT dependence, but unsure in 
behaviour understanding without NET_CLS_ACT.

But there are reduse code.

-- 
WBR,
Denis Kaganovich,  mahatma@eu.by  http://mahatma.bspu.unibel.by

^ permalink raw reply

* Re: [PATCH 1/5] spidernet: add missing initialization
From: Jens Osterkamp @ 2008-01-11 12:44 UTC (permalink / raw)
  To: Ishizaki Kou; +Cc: linasvepstas, netdev, cbe-oss-dev, Jeff Garzik
In-Reply-To: <20080111.153859.-1300526764.kouish@swc.toshiba.co.jp>

On Friday 11 January 2008, Ishizaki Kou wrote:
> This patch fixes initialization of "aneg_count" and "medium" fields in
> spider_net_card to make spidernet driver correctly sets "link status".
> 
> Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>

Hi Ishizaki,

Linas has left the company and is no longer doing kernel related stuff,
so I suggest, given Jeff is ok with that, that the two of us take over
spidernet maintainership.

Jens

---

Change maintainership for spidernet.

Signed-off-by: Jens Osterkamp <jens@de.ibm.com>

Index: linux-2.6/MAINTAINERS
===================================================================
--- linux-2.6.orig/MAINTAINERS	2008-01-11 13:32:04.000000000 +0100
+++ linux-2.6/MAINTAINERS	2008-01-11 13:41:32.000000000 +0100
@@ -3613,8 +3613,10 @@
 S:	Supported
 
 SPIDERNET NETWORK DRIVER for CELL
-P:	Linas Vepstas
-M:	linas@austin.ibm.com
+P:	Ishizaki Kou
+M:	kou.ishizaki@toshiba.co.jp
+P:	Jens Osterkamp
+M:	jens@de.ibm.com
 L:	netdev@vger.kernel.org
 S:	Supported
 

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply

* Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2008-01-11 12:41 UTC (permalink / raw)
  To: andi; +Cc: vaf, netdev, linux-kernel, yoshfuji
In-Reply-To: <p73bq7smx1t.fsf@bingen.suse.de>

In article <p73bq7smx1t.fsf@bingen.suse.de> (at Fri, 11 Jan 2008 12:17:02 +0100), Andi Kleen <andi@firstfloor.org> says:

> Vince Fuller <vaf@cisco.com> writes:
> 
> > from Vince Fuller <vaf@vaf.net>
> >
> > This set of diffs modify the 2.6.20 kernel to enable use of the 240/4
> > (aka "class-E") address space as consistent with the Internet Draft
> > draft-fuller-240space-00.txt.
> 
> Wouldn't it be wise to at least wait for it becoming an RFC first? 

I do think so, too.

There is no positive consesus on this draft
at the intarea meeting in Vancouver, right?

We cannot / should not enable that space until we have reached
a consensus on it.

--yoshfuji

^ permalink raw reply

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Jarek Poplawski @ 2008-01-11 12:31 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev
In-Reply-To: <20080111103742.GA26740@gondor.apana.org.au>

On Fri, Jan 11, 2008 at 09:37:42PM +1100, Herbert Xu wrote:
> On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote:
> > 
> > It looks like I'm really too lazy and/or these selfdocumenting features
> > of RCU are a bit overrated: one can never be sure which pointer is
> > really RCU protected without checking a few places?! So, after looking
> > at this rt_cache_get_next() and this patch only, it's looks like the
> > third candidate after seq->private and rtable...
> 
> Perhaps we could introduce a sparse attribute for it?

I hope I won't be cursed by all those forced to additional writing,
so I'd only admit that after this patch there should be no problem
with identifying RCU protected data properly (maybe only this kind
of rcu_dereference() needs some popularization).

Jarek P.

^ permalink raw reply

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
From: Jarek Poplawski @ 2008-01-11 11:30 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev
In-Reply-To: <20080111103852.GB26740@gondor.apana.org.au>

On Fri, Jan 11, 2008 at 09:38:52PM +1100, Herbert Xu wrote:
> On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote:
> > 
> > So, IOW: strictly speaking you are right, r can't change here, but I
> > meant r vs. the returned value! Before the patch the returned value
> > couldn't be NULL unless all elements of the list were looped. After
> > this patch it seems possible...
> 
> Since rcu_derference(r) is always the same as r this patch cannot
> change the value returned.

Right!!! (But, you mean: "always the same as r" for local r, I hope...)

So, my moronness's selfdocumenting features are not overrated at all!

Thanks again,
Jarek P.

^ permalink raw reply

* rp_filter and ip rule break ipsec policy
From: Marco Berizzi @ 2008-01-11 11:23 UTC (permalink / raw)
  To: netdev

Hello everybody.
AFAIK ipsec policy aren't related to routing
tables: if there is an ipsec policy to deliver
traffic, for example, from 192.168.0.0/16 to
10.0.0.0/8, xfrm will eat the packets ignoring
the routing table.

Here is the ipsec gateway schema:


     [-] cisco ISP router default gateway for
      |         the linux box ip=cisco-genova
      |
      |
      |
      |  _____ eth0 ip=osw-genova
      | /
      |/
   +--+--+
   |     |
   |     +---- eth1 dmz-genova/28 ip=osw-genova
   |     |
   +--+--+
      |
      |
      |------- eth2 172.23.0.0/23 ip=172.23.1.8


Take a look:

# ip ru sh
0:      from all lookup local
601:    from all to x.y.z.214 iif eth2 lookup test
32766:  from all lookup main
32767:  from all lookup default

# ip r sh table test
default via 172.23.1.254 dev eth2  metric 1

When I insert the rule number #601 packets to
x.y.z.214 aren't ate by xfrm anymore. This
happens when rp_filter is set to 1 on eth0.
Disabling rp_filter on eth0 resolve the problem:
xfrm eat the packets.
Is this the expected behaviour? Why should
rp_filter broken ipsec policy when rule #601
is inserted?

I have enabled log_martinans on eth0 and when
rp_filter is set to 1 I see this messages:

martian source 172.23.1.4 from x.y.z.214, on dev eth0
ll header: 00:30:05:cb:27:c1:00:1b:54:fb:fd:78:08:00
martian source 172.23.1.4 from x.y.z.214, on dev eth0
ll header: 00:30:05:cb:27:c1:00:1b:54:fb:fd:78:08:00

# ip x p
src x.y.z.214 dst 172.23.0.0/23
        dir in priority 2376 ptype main
        tmpl src osw-napoli dst osw-genova
                proto comp reqid 16390 mode tunnel
                level use
        tmpl src 0.0.0.0 dst 0.0.0.0
                proto esp reqid 16389 mode transport
src 172.23.0.0/23 dst x.y.z.214
        dir out priority 2376 ptype main
        tmpl src osw-genova dst osw-napoli
                proto comp reqid 16390 mode tunnel
        tmpl src 0.0.0.0 dst 0.0.0.0
                proto esp reqid 16389 mode transport
src x.y.z.214 dst 172.23.0.0/23
        dir fwd priority 2376 ptype main
        tmpl src osw-napoli dst osw-genova
                proto comp reqid 16390 mode tunnel
                level use
        tmpl src 0.0.0.0 dst 0.0.0.0
                proto esp reqid 16389 mode transport

Here are the others routing tables:

# ip r sh table main
cisco-genova dev eth0  scope link
dmz-genova/28 dev eth1  proto kernel  scope link  src osw-genova
172.23.0.0/23 dev eth2  proto kernel  scope link  src 172.23.1.8
127.0.0.0/8 dev lo  scope link
default via cisco-genova dev eth0  metric 1

# ip r sh table local
broadcast 127.255.255.255 dev lo  proto kernel  scope link  src
127.0.0.1
broadcast dmz-genova dev eth0  proto kernel  scope link  src osw-genova
broadcast dmz-genova dev eth1  proto kernel  scope link  src osw-genova
broadcast broadcast-genova dev eth0  proto kernel  scope link  src
osw-genova
broadcast broadcast-genova dev eth1  proto kernel  scope link  src
osw-genova
local osw-genova dev eth0  proto kernel  scope host  src osw-genova
local osw-genova dev eth1  proto kernel  scope host  src osw-genova
broadcast 172.23.0.0 dev eth2  proto kernel  scope link  src 172.23.1.8
broadcast 172.23.1.255 dev eth2  proto kernel  scope link  src
172.23.1.8
local 172.23.1.8 dev eth2  proto kernel  scope host  src 172.23.1.8
broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1
local 127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1
local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1



^ permalink raw reply

* Re: [PATCH 001/001] ipv4: enable use of 240/4 address space
From: Andi Kleen @ 2008-01-11 11:17 UTC (permalink / raw)
  To: Vince Fuller; +Cc: netdev, linux-kernel
In-Reply-To: <20080108011057.GA21168@cisco.com>

Vince Fuller <vaf@cisco.com> writes:

> from Vince Fuller <vaf@vaf.net>
>
> This set of diffs modify the 2.6.20 kernel to enable use of the 240/4
> (aka "class-E") address space as consistent with the Internet Draft
> draft-fuller-240space-00.txt.

Wouldn't it be wise to at least wait for it becoming an RFC first? 

-Andi

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox