Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] vlan tag match
From: Stephen Hemminger @ 2008-02-01  5:50 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev
In-Reply-To: <47A2AF6A.1060200@trash.net>

On Fri, 01 Feb 2008 06:34:34 +0100
Patrick McHardy <kaber@trash.net> wrote:

> Stephen Hemminger wrote:
> > Provide a way to use tc filters on vlan tag even if tag is buried in
> > skb due to hardware acceleration.
> 
> Looks reasonable. Would you like to add the same feature to the
> flow classifier?
> 

Yes, that would be good.

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: e1000 full-duplex TCP performance well below wire speed
From: Bill Fink @ 2008-02-01  6:27 UTC (permalink / raw)
  To: Bruce Allen
  Cc: Kok, Auke, Brandeburg, Jesse, netdev, Carsten Aulbert,
	Henning Fehrmann, Bruce Allen
In-Reply-To: <Pine.LNX.4.63.0801311340270.14403@trinity.phys.uwm.edu>

On Thu, 31 Jan 2008, Bruce Allen wrote:

> >> Based on the discussion in this thread, I am inclined to believe that
> >> lack of PCI-e bus bandwidth is NOT the issue.  The theory is that the
> >> extra packet handling associated with TCP acknowledgements are pushing
> >> the PCI-e x1 bus past its limits.  However the evidence seems to show
> >> otherwise:
> >>
> >> (1) Bill Fink has reported the same problem on a NIC with a 133 MHz
> >> 64-bit PCI connection.  That connection can transfer data at 8Gb/s.
> >
> > That was even a PCI-X connection, which is known to have extremely good latency
> > numbers, IIRC better than PCI-e? (?) which could account for a lot of the
> > latency-induced lower performance...
> >
> > also, 82573's are _not_ a serverpart and were not designed for this 
> > usage. 82546's are and that really does make a difference.
> 
> I'm confused.  It DOESN'T make a difference! Using 'server grade' 82546's 
> on a PCI-X bus, Bill Fink reports the SAME loss of throughput with TCP 
> full duplex that we see on a 'consumer grade' 82573 attached to a PCI-e x1 
> bus.
> 
> Just like us, when Bill goes from TCP to UDP, he gets wire speed back.

Good.  I thought it was just me who was confused by Auke's reply.  :-)

Yes, I get the same type of reduced TCP performance behavior on a
bidirectional test that Bruce has seen, even though I'm using the
better 82546 GigE NIC on a faster 64-bit/133-MHz PCI-X bus.  I also
don't think bus bandwidth is an issue, but I am curious if there
are any known papers on typical PCI-X/PCI-E bus overhead on network
transfers, either bulk data transfers with large packets or more
transaction or video based applications using smaller packets.

I started musing if once one side's transmitter got the upper hand,
it might somehow defer the processing of received packets, causing
the resultant ACKs to be delayed and thus further slowing down the
other end's transmitter.  I began to wonder if the txqueuelen could
have an affect on the TCP performance behavior.  I normally have
the txqueuelen set to 10000 for 10-GigE testing, so decided to run
a test with txqueuelen set to 200 (actually settled on this value
through some experimentation).  Here is a typical result:

[bill@chance4 ~]$ nuttcp -f-beta -Itx -w2m 192.168.6.79 & nuttcp -f-beta -Irx -r -w2m 192.168.6.79
tx:  1120.6345 MB /  10.07 sec =  933.4042 Mbps 12 %TX 9 %RX 0 retrans
rx:  1104.3081 MB /  10.09 sec =  917.7365 Mbps 12 %TX 11 %RX 0 retrans

This is significantly better, but there was more variability in the
results.  The above was with TSO enabled.  I also then ran a test
with TSO disabled, with the following typical result:

[bill@chance4 ~]$ nuttcp -f-beta -Itx -w2m 192.168.6.79 & nuttcp -f-beta -Irx -r -w2m 192.168.6.79
tx:  1119.4749 MB /  10.05 sec =  934.2922 Mbps 13 %TX 9 %RX 0 retrans
rx:  1131.7334 MB /  10.05 sec =  944.8437 Mbps 15 %TX 12 %RX 0 retrans

This was a little better yet and getting closer to expected results.

Jesse Brandeburg mentioned in another post that there were known
performance issues with the version of the e1000 driver I'm using.
I recognized that the kernel/driver versions I was using were rather
old, but it was what I had available to do a quick test with.  Those
particular systems are in a remote location so I have to be careful
with messing with their network drivers.  I do have some other test
systems at work that I might be able to try with newer kernels
and/or drivers or maybe even with other vendor's GigE NICs, but
I won't be back to work until early next week sometime.

						-Bill

^ permalink raw reply

* Re: [IPROUTE 02/02]: Add flow classifier support
From: Stephen Hemminger @ 2008-02-01  6:28 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Linux Netdev List
In-Reply-To: <47A20C63.6030806@trash.net>

applied both to git
-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: [PATCH] Disable TSO for non standard qdiscs
From: Glen Turner @ 2008-02-01  6:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Waskiewicz Jr, Peter P, Patrick McHardy, Stephen Hemminger,
	netdev
In-Reply-To: <20080131193406.GH4671@one.firstfloor.org>

On Thu, 2008-01-31 at 20:34 +0100, Andi Kleen wrote:

> The philosophical problem I have with this suggestion is that I expect
> that the large majority of users will be more happy with disabled TSO
> if they use non standard qdiscs and defaults that do not fit 
> the majority use case are bad.

I wouldn't be so fast to assume that all users need an exact playout
rate, as people seem to do fine with the 8Kbps playout steps in Cisco
IOS.  A nerd-knob which expresses user's preference in the
accuracy/performance trade-off would be nice.

The problem with ethtool is that it's a non-obvious nerd knob.  At
the least the ethtool documentation should be updated to indicate that 
activating TSO effects tc accuracy.

Best wishes, Glen
[a network engineer]

^ permalink raw reply

* [PATCH] add if_addrlabel.h to sanitized headers
From: Stephen Hemminger @ 2008-02-01  6:37 UTC (permalink / raw)
  Cc: netdev
In-Reply-To: <20080128.210222.07062540.yoshfuji@linux-ipv6.org>

if_addrlabel.h is needed for iproute2 usage.

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
---
 include/linux/Kbuild |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index 85b2482..ce9d0fd 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -218,6 +218,7 @@ unifdef-y += i2c-dev.h
 unifdef-y += icmp.h
 unifdef-y += icmpv6.h
 unifdef-y += if_addr.h
+unifdef-y += if_addrlabel.h
 unifdef-y += if_arp.h
 unifdef-y += if_bridge.h
 unifdef-y += if_ec.h
-- 
1.5.3.8


^ permalink raw reply related

* Re: [PATCH] Disable TSO for non standard qdiscs
From: Patrick McHardy @ 2008-02-01  6:46 UTC (permalink / raw)
  To: Glen Turner; +Cc: Andi Kleen, Waskiewicz Jr, Peter P, Stephen Hemminger, netdev
In-Reply-To: <1201847738.17656.5.camel@roma.44ansell.gdt.id.au>

Glen Turner wrote:
> On Thu, 2008-01-31 at 20:34 +0100, Andi Kleen wrote:
> 
>> The philosophical problem I have with this suggestion is that I expect
>> that the large majority of users will be more happy with disabled TSO
>> if they use non standard qdiscs and defaults that do not fit 
>> the majority use case are bad.
> 
> I wouldn't be so fast to assume that all users need an exact playout
> rate, as people seem to do fine with the 8Kbps playout steps in Cisco
> IOS.  A nerd-knob which expresses user's preference in the
> accuracy/performance trade-off would be nice.
> 
> The problem with ethtool is that it's a non-obvious nerd knob.  At
> the least the ethtool documentation should be updated to indicate that 
> activating TSO effects tc accuracy.


I agree with Andi, most user neither know nor care about TSO.
It should work properly by default and optimizations should
be explicitly configured. This is especially true if you
consider the common userbase of qdiscs - which is mostly
slow DSL lines, cablemodems etc.

^ permalink raw reply

* Re: [2.6 patch] rtnetlink.c: #if 0 no longer used functions
From: Patrick McHardy @ 2008-02-01  7:00 UTC (permalink / raw)
  To: David Miller; +Cc: bunk, netdev
In-Reply-To: <20080131.171751.192428440.davem@davemloft.net>

David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Wed, 30 Jan 2008 21:04:33 +0100
> 
>> Adrian Bunk wrote:
>>> This patch #if 0's the following no longer used functions:
>>> - rtattr_parse()
>>> - rtattr_strlcpy()
>>> - __rtattr_parse_nested_compat()
>>>   
>> Please remove them instead.
> 
> Agreed.

The rtattr_parse_nested_compat macro can also go away.

^ permalink raw reply

* Re: [PATCH] Disable TSO for non standard qdiscs
From: Andi Kleen @ 2008-02-01  7:46 UTC (permalink / raw)
  To: Glen Turner
  Cc: Andi Kleen, Waskiewicz Jr, Peter P, Patrick McHardy,
	Stephen Hemminger, netdev
In-Reply-To: <1201847738.17656.5.camel@roma.44ansell.gdt.id.au>

> The problem with ethtool is that it's a non-obvious nerd knob.  At
> the least the ethtool documentation should be updated to indicate that 
> activating TSO effects tc accuracy.

TSO tends to be activated by default in the driver; very few people who use it
do even know that ethtool exist or what TSO is.

-Andi

^ permalink raw reply

* Re: [PATCH] Disable TSO for non standard qdiscs
From: Patrick McHardy @ 2008-02-01  7:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Glen Turner, Waskiewicz Jr, Peter P, Stephen Hemminger, netdev
In-Reply-To: <20080201074600.GA12644@one.firstfloor.org>

Andi Kleen wrote:
>> The problem with ethtool is that it's a non-obvious nerd knob.  At
>> the least the ethtool documentation should be updated to indicate that 
>> activating TSO effects tc accuracy.
> 
> TSO tends to be activated by default in the driver; very few people who use it
> do even know that ethtool exist or what TSO is.


Indeed. As an example of an unknowing user, this discussion made me
check whether my cablemodem device (on which I'm using HFSC) uses
TSO :)


^ permalink raw reply

* Re: [PATCH] Disable TSO for non standard qdiscs
From: Jarek Poplawski @ 2008-02-01  7:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Waskiewicz Jr, Peter P, Patrick McHardy, Stephen Hemminger,
	netdev
In-Reply-To: <47A2540D.8090003@gmail.com>

On 01-02-2008 00:04, Jarek Poplawski wrote:
...
> ...On the other hand, with this DSL argument from the sub-thread you
> could be quite right: if this "everyone" wants to use one NIC for
> both high speed local network and such a DSL, then learning ethtool
> could be not enough...

...But, on the other hand, in this case the realization seems to be
wrong: probably still all locally created packets will be treated
the same - or I miss something?

Jarek P.

^ permalink raw reply

* Re: [1/2] POHMELFS - network filesystem with local coherent cache.
From: Evgeniy Polyakov @ 2008-02-01  7:39 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linux-kernel, netdev, linux-fsdevel
In-Reply-To: <Pine.LNX.4.64.0802010159460.4864@fbirervta.pbzchgretzou.qr>

Hi.

On Fri, Feb 01, 2008 at 02:04:39AM +0100, Jan Engelhardt (jengelh@computergmbh.de) wrote:
> >POHMELFS stands for Parallel Optimized Host Message Exchange
> >Layered File System. It allows to mount remote servers to local
> >directory via network. This filesystem supports local caching
> >and writeback flushing.
> >POHMELFS is a brick in a future distributed filesystem.
> 
> A brick is usually something that is in the way -
> Or you also say "the user has bricked his machine"
> when it's quite unusable :)
> Hope you did not mean /that/.

No, this brick as a building block :)

> >This set includes two patches:
> > * network filesystem with write-through cache (slow, but works with
> > 	remote userspace server)
> > * hack to show how local cache works and how faster it is compared
> > 	to async NFS (see below). hack disables writeback flush and
> >	performs local allocation of the objects only.
> >
> >Now, some vaporware aka food for thoughts and your brains.
> >
> >A small benchmark of the local cached mode (above hack):
> >
> >$ time tar -xf /home/zbr/threading.tar
> >
> >	POHMELFS	NFS v3 (async)
> >real    0m0.043s	0m1.679s
> >
> >Which is damn 40 times!
> 
> Needs a bigger data set to compare. But what is much more
> important: does it use a single port for networing, or some
> firewall-unfriendly-by-default multiple dynamic-port-allocation
> like NFS?

It uses single port, configurable at mount time.
POHMELFS client can connect to different addresses (including ipv6) and
via different protocols (like sctp). Metadata server will provide that
information dynamically, so pohmelfs client will be able to connect to
different nodes and perform operations in parallell.

> >Next task is to think about how to generically solve the problem with
> >syncing local changes with remote server, when remote server maintains inodes with
> >completely different numbers.
> >This, among others, will allow offline work with automatic syncing after reconnect.
> 
> What will happen when both nodes change an inode in disconnected state?
> Which inode wins out?

Who will be online first. Second node will be told that there is a
merge collision and it has to be resolved by hands.

> >This is not intended for inclusion, CRFS by Zach Brown is a bit ahead of POHMELFS,
> >but it is not generic enough (because of above problem), works only with BTRFS,
> >and was closed by Oracle so far :)
> 
> btrfs is all we need :p

Well, at least it has some very interesting ideas.
Although there are things which are not that good imho, time will
show, maybe there will be another state-of-the-art filesystem at the
moment...

This was for information.

> Where's the parallelism that is advertised by the "POH" in pohmelfs?

First, clients work with local caches and sync them either in writeback
or via cache coherency algorithm. This work is effectively parallel.
Second, pohmelfs as in distributed filesystem is developed as a transport
layer to eliminate mount operation for each different node, so that
after client asks for data it would be just sent to different server.
This allows to make parallel transactions. Essentially it looks like
mounting different remote server to virtual directory working with it,
except that connection setup should be done not at mount time, but at
run time.

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: e1000 full-duplex TCP performance well below wire speed
From: Bruce Allen @ 2008-02-01  7:54 UTC (permalink / raw)
  To: Bill Fink
  Cc: Kok, Auke, Brandeburg, Jesse, netdev, Carsten Aulbert,
	Henning Fehrmann, Bruce Allen
In-Reply-To: <20080201012732.232b7859.billfink@mindspring.com>

Hi Bill,

> I started musing if once one side's transmitter got the upper hand, it 
> might somehow defer the processing of received packets, causing the 
> resultant ACKs to be delayed and thus further slowing down the other 
> end's transmitter.  I began to wonder if the txqueuelen could have an 
> affect on the TCP performance behavior.  I normally have the txqueuelen 
> set to 10000 for 10-GigE testing, so decided to run a test with 
> txqueuelen set to 200 (actually settled on this value through some 
> experimentation).  Here is a typical result:
>
> [bill@chance4 ~]$ nuttcp -f-beta -Itx -w2m 192.168.6.79 & nuttcp -f-beta -Irx -r -w2m 192.168.6.79
> tx:  1120.6345 MB /  10.07 sec =  933.4042 Mbps 12 %TX 9 %RX 0 retrans
> rx:  1104.3081 MB /  10.09 sec =  917.7365 Mbps 12 %TX 11 %RX 0 retrans
>
> This is significantly better, but there was more variability in the
> results.  The above was with TSO enabled.  I also then ran a test
> with TSO disabled, with the following typical result:
>
> [bill@chance4 ~]$ nuttcp -f-beta -Itx -w2m 192.168.6.79 & nuttcp -f-beta -Irx -r -w2m 192.168.6.79
> tx:  1119.4749 MB /  10.05 sec =  934.2922 Mbps 13 %TX 9 %RX 0 retrans
> rx:  1131.7334 MB /  10.05 sec =  944.8437 Mbps 15 %TX 12 %RX 0 retrans
>
> This was a little better yet and getting closer to expected results.

We'll also try changing txqueuelen.  I have not looked, but I suppose that 
this is set to the default value of 1000.  We'd be delighted to see 
full-duplex performance that was consistent and greater than 900 Mb/s x 2.

> I do have some other test systems at work that I might be able to try 
> with newer kernels and/or drivers or maybe even with other vendor's GigE 
> NICs, but I won't be back to work until early next week sometime.

Bill, we'd be happy to give you root access to a couple of our systems 
here if you want to do additional testing.  We can put the latest drivers 
on them (and reboot if/as needed).  If you want to do this, please just 
send an ssh public key to Carsten.

Cheers,
 	Bruce

^ permalink raw reply

* RE: [PATCH] Disable TSO for non standard qdiscs
From: Waskiewicz Jr, Peter P @ 2008-02-01  9:28 UTC (permalink / raw)
  To: Jarek Poplawski, Andi Kleen; +Cc: Patrick McHardy, Stephen Hemminger, netdev
In-Reply-To: <20080201074239.GA2897@ff.dom.local>

> ...But, on the other hand, in this case the realization seems to be
> wrong: probably still all locally created packets will be 
> treated the same - or I miss something?
> 
> Jarek P.

The TCP layer will generate TSO packets based on the kernel socket
features associated with the flow.  So if you have two devices, one
supporting TSO, the other not, then the flows associated with the
non-TSO device will not have their packets built for TSO.  This has no
bearing on the device supporting TSO, which its feature flags will
propogate into the kernel socket for that flow, and cause any TCP flows
to that device to be TSO packets.  So in a nutshell, disabling TSO is on
a per-device level, not a global switch.

-PJ Waskiewicz

^ permalink raw reply

* RE: [PATCH] Disable TSO for non standard qdiscs
From: Waskiewicz Jr, Peter P @ 2008-02-01  9:37 UTC (permalink / raw)
  To: Patrick McHardy, Andi Kleen; +Cc: Glen Turner, Stephen Hemminger, netdev
In-Reply-To: <47A2C985.30409@trash.net>

> Indeed. As an example of an unknowing user, this discussion 
> made me check whether my cablemodem device (on which I'm 
> using HFSC) uses TSO :)

The TSO defer logic is based on your congestion window and current
window size.  So the actual frame sizes hitting your NIC attached to
your DSL probably aren't anywhere near 64KB, but probably more in line
with whatever your window size is for DSL.

The bottom line is TSO saves CPU cycles.  If we want to make it go away
because of a traffic shaping qdisc interfering, then that's fine.  I
just don't think a TSO option should be added to the scheduler layer,
since it already exists in the ethtool layer.  Asking a user to type
'ethtool -k <devicename> tso off' is probably going to be much easier
than setting an option on your qdisc through tc to turn TSO back on.

I think we're having more of a disagreement of what is considered the
"normal case" user.  If you are on a slow link, such as a DSL/cable
line, your TCP window/congestion window aren't going to be big enough to
generate large TSO's, so what is the issue?  But disabling TSO, say on a
10 GbE link, can cut throughput by half (I have data on 8-core machines
with 10 GbE with/without TSO if you're interested).  Even on a
single-core machine with a 1GbE link can have bad performance hits.  So
this is why I'm so concerned about a proposal to turn off TSO outside of
the current established methods of using ethtool.  Rather than educating
the user about how to turn TSO back on using tc if they want it, educate
them why they may want to consider turning TSO off in certain
configurations.  And I don't consider any user effectively using a TBF
qdisc someone incapable of understanding how to use ethtool.

Cheers,

-PJ Waskiewicz

^ permalink raw reply

* Re: [PATCH] Disable TSO for non standard qdiscs
From: Patrick McHardy @ 2008-02-01  9:56 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: Andi Kleen, Glen Turner, Stephen Hemminger, netdev
In-Reply-To: <D5C1322C3E673F459512FB59E0DDC32904737A42@orsmsx414.amr.corp.intel.com>

Waskiewicz Jr, Peter P wrote:
>> Indeed. As an example of an unknowing user, this discussion 
>> made me check whether my cablemodem device (on which I'm 
>> using HFSC) uses TSO :)
> 
> The TSO defer logic is based on your congestion window and current
> window size.  So the actual frame sizes hitting your NIC attached to
> your DSL probably aren't anywhere near 64KB, but probably more in line
> with whatever your window size is for DSL.
> 
> The bottom line is TSO saves CPU cycles.  If we want to make it go away
> because of a traffic shaping qdisc interfering, then that's fine.  I
> just don't think a TSO option should be added to the scheduler layer,
> since it already exists in the ethtool layer.  Asking a user to type
> 'ethtool -k <devicename> tso off' is probably going to be much easier
> than setting an option on your qdisc through tc to turn TSO back on.
> 
> I think we're having more of a disagreement of what is considered the
> "normal case" user.  If you are on a slow link, such as a DSL/cable
> line, your TCP window/congestion window aren't going to be big enough to
> generate large TSO's, so what is the issue?  But disabling TSO, say on a
> 10 GbE link, can cut throughput by half (I have data on 8-core machines
> with 10 GbE with/without TSO if you're interested).  Even on a
> single-core machine with a 1GbE link can have bad performance hits.  So
> this is why I'm so concerned about a proposal to turn off TSO outside of
> the current established methods of using ethtool.  Rather than educating
> the user about how to turn TSO back on using tc if they want it, educate
> them why they may want to consider turning TSO off in certain
> configurations.  And I don't consider any user effectively using a TBF
> qdisc someone incapable of understanding how to use ethtool.


We don't want to disable TSO for cases where it makes sense, but
who is using TBF on 10GbE? The point is that most users of qdiscs
which are incapable of dealing with TSO without hacks or special
configuration probably don't care, and 10GbE users know about
ethtool *and* don't use TBF or HTB (which are probably the only
qdiscs which actually have problems, maybe also CBQ).


^ permalink raw reply

* oops with ipcomp
From: Beschorner Daniel @ 2008-02-01 10:09 UTC (permalink / raw)
  To: netdev; +Cc: Herbert Xu

One more issue with 2.6.24, some hours after I reactivated ipcomp with
Herb's 2 patches.
The httpd log shows a http request per esp tunnel at oops time.
Don't know whether it is for network or compression guys, so I started
posting here.
Daniel

Unable to handle kernel paging request at ffffc200000fb000 RIP: 
 [<ffffffff8031b8f0>] deflate_slow+0x40/0x400
PGD 7f845067 PUD 7f846067 PMD 7f847067 PTE 0
Oops: 0000 [1] SMP 
CPU 0 
Modules linked in:
Pid: 9136, comm: httpd Not tainted 2.6.24 #2
RIP: 0010:[<ffffffff8031b8f0>]  [<ffffffff8031b8f0>]
deflate_slow+0x40/0x400
RSP: 0018:ffff81002ad35938  EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffffc200000b9000 RCX: 00000000000408d8
RDX: ffffc200000ba728 RSI: 0000000000000000 RDI: 0000000000005f65
RBP: 00000000000008d4 R08: 0000000000003dae R09: 0000000000001800
R10: 0000000000000010 R11: ffffc200000b94bc R12: 00000000000001ad
R13: 0000000000000005 R14: 0000000000000000 R15: ffffc20000097000
FS:  00002b00bb68b190(0000) GS:ffffffff805a8000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc200000fb000 CR3: 000000002ac82000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process httpd (pid: 9136, threadinfo ffff81002ad34000, task
ffff81007d2d4080)
Stack:  ffff810042f3f710 ffff81007dfb0700 0000000000000005
ffffc200000b9000
 ffff81007de89000 ffffffff8031c25d 0000000000000000 ffff81007dfb0700
 ffff81007dfb06c0 ffff81007de890a8 000000000000010a ffffffff802ff351
Call Trace:
 [<ffffffff8031c25d>] zlib_deflate+0x10d/0x330
 [<ffffffff802ff351>] deflate_compress+0x91/0xb0
 [<ffffffff804771b8>] ipcomp_output+0x98/0x1e0
 [<ffffffff80489ef6>] xfrm_output+0x116/0x1e0
 [<ffffffff80482dc4>] xfrm4_output_finish2+0x44/0x1e0
 [<ffffffff80483075>] xfrm4_output+0x55/0x60
 [<ffffffff80445989>] ip_queue_xmit+0x209/0x450
 [<ffffffff8049b0d0>] thread_return+0x3d/0x54d
 [<ffffffff8023b094>] lock_timer_base+0x34/0x70
 [<ffffffff80456dcf>] tcp_transmit_skb+0x40f/0x7c0
 [<ffffffff80458aae>] __tcp_push_pending_frames+0x11e/0x940
 [<ffffffff8044cb8e>] tcp_sendmsg+0x81e/0xc40
 [<ffffffff80291e3f>] dput+0x1f/0x130
 [<ffffffff80410b01>] sock_aio_write+0x111/0x120
 [<ffffffff804109f0>] sock_aio_write+0x0/0x120
 [<ffffffff8027f95b>] do_sync_readv_writev+0xcb/0x110
 [<ffffffff80246850>] autoremove_wake_function+0x0/0x30
 [<ffffffff8027fb99>] do_sync_read+0xd9/0x120
 [<ffffffff80287941>] permission+0x61/0x100
 [<ffffffff8027f7bd>] rw_copy_check_uvector+0x9d/0x130
 [<ffffffff802800a2>] do_readv_writev+0xe2/0x210
 [<ffffffff8027e1ba>] do_filp_open+0x3a/0x50
 [<ffffffff802806e3>] sys_writev+0x53/0x90
 [<ffffffff8020bb3e>] system_call+0x7e/0x83


Code: 0f b6 14 0a 31 d0 23 43 74 48 8b 53 60 89 43 68 89 c0 0f b7 
RIP  [<ffffffff8031b8f0>] deflate_slow+0x40/0x400
 RSP <ffff81002ad35938>
CR2: ffffc200000fb000
---[ end trace cfeb10aa23b54939 ]---

^ permalink raw reply

* [PATCH] ieee80211: fix section mismatch warning
From: Sam Ravnborg @ 2008-02-01 11:52 UTC (permalink / raw)
  To: netdev, Johannes Berg, John W. Linville, David S. Miller

Fix the following warnings:
WARNING: net/built-in.o(.init.text+0xd6c0): Section mismatch in reference from the function ieee80211_init() to the function .exit.text:rc80211_simple_exit()
WARNING: net/built-in.o(.init.text+0xd6c5): Section mismatch in reference from the function ieee80211_init() to the function .exit.text:rc80211_pid_exit()

The fix was simple - I just did as modpost told me and removed the
wrong __exit annotation of rc80211_simple_exit and rc80211_pid_exit.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: David S. Miller <davem@davemloft.net>
---

With this patch my allyesconfig build on x86 (64 bit)
is section mismatch clean in net/

	Sam

diff --git a/net/mac80211/rc80211_pid_algo.c b/net/mac80211/rc80211_pid_algo.c
index 554c4ba..c339571 100644
--- a/net/mac80211/rc80211_pid_algo.c
+++ b/net/mac80211/rc80211_pid_algo.c
@@ -538,7 +538,7 @@ int __init rc80211_pid_init(void)
 	return ieee80211_rate_control_register(&mac80211_rcpid);
 }
 
-void __exit rc80211_pid_exit(void)
+void rc80211_pid_exit(void)
 {
 	ieee80211_rate_control_unregister(&mac80211_rcpid);
 }
diff --git a/net/mac80211/rc80211_simple.c b/net/mac80211/rc80211_simple.c
index 934676d..9a78b11 100644
--- a/net/mac80211/rc80211_simple.c
+++ b/net/mac80211/rc80211_simple.c
@@ -389,7 +389,7 @@ int __init rc80211_simple_init(void)
 	return ieee80211_rate_control_register(&mac80211_rcsimple);
 }
 
-void __exit rc80211_simple_exit(void)
+void rc80211_simple_exit(void)
 {
 	ieee80211_rate_control_unregister(&mac80211_rcsimple);
 }

^ permalink raw reply related

* Re: [PATCH] Disable TSO for non standard qdiscs
From: jamal @ 2008-02-01 12:06 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Waskiewicz Jr, Peter P, Andi Kleen, Glen Turner,
	Stephen Hemminger, netdev
In-Reply-To: <47A2ECE9.5000103@trash.net>

On Fri, 2008-01-02 at 10:56 +0100, Patrick McHardy wrote:

> We don't want to disable TSO for cases where it makes sense, but
> who is using TBF on 10GbE? The point is that most users of qdiscs
> which are incapable of dealing with TSO without hacks or special
> configuration probably don't care, and 10GbE users know about
> ethtool *and* don't use TBF or HTB (which are probably the only
> qdiscs which actually have problems, maybe also CBQ).

Right - Essentially it is a usability issue:
People who know how to use TSO (Peter for example) will be clueful
enough to turn it on. Which means the default should be to protect the
clueless and turn it off.
On Andis approach:
Turning TSO off at netdev registration time with a warning will be a
cleaner IMO. Or alternatively introducing a kernel-config "I know what
TSO is" option which is then used at netdev registration. From a
usability perspective it would make more sense to just keep ethtool as
the only way to configure TSO. 

[I recently spent a few days helping someone debug a problem with IFB
because he was redirecting packets from an TSO netdevice and occasionaly
some multi-packet will be missed in the calculation; my answer was "turn
off TSO"; so there are more use cases for this TSO issue]. 

cheers,
jamal

^ permalink raw reply

* Re: [PATCH] ieee80211: fix section mismatch warning
From: Johannes Berg @ 2008-02-01 12:07 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: netdev, John W. Linville, David S. Miller, linux-wireless
In-Reply-To: <20080201115206.GA12678@uranus.ravnborg.org>

[-- Attachment #1: Type: text/plain, Size: 1908 bytes --]


On Fri, 2008-02-01 at 12:52 +0100, Sam Ravnborg wrote:
> Fix the following warnings:
> WARNING: net/built-in.o(.init.text+0xd6c0): Section mismatch in reference from the function ieee80211_init() to the function .exit.text:rc80211_simple_exit()
> WARNING: net/built-in.o(.init.text+0xd6c5): Section mismatch in reference from the function ieee80211_init() to the function .exit.text:rc80211_pid_exit()
> 
> The fix was simple - I just did as modpost told me and removed the
> wrong __exit annotation of rc80211_simple_exit and rc80211_pid_exit.

Heh, I just sent the same patch.

> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>

Acked-by: Johannes Berg <johannes@sipsolutions.net>

> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: John W. Linville <linville@tuxdriver.com>
> Cc: David S. Miller <davem@davemloft.net>
> ---
> 
> With this patch my allyesconfig build on x86 (64 bit)
> is section mismatch clean in net/
> 
> 	Sam
> 
> diff --git a/net/mac80211/rc80211_pid_algo.c b/net/mac80211/rc80211_pid_algo.c
> index 554c4ba..c339571 100644
> --- a/net/mac80211/rc80211_pid_algo.c
> +++ b/net/mac80211/rc80211_pid_algo.c
> @@ -538,7 +538,7 @@ int __init rc80211_pid_init(void)
>  	return ieee80211_rate_control_register(&mac80211_rcpid);
>  }
>  
> -void __exit rc80211_pid_exit(void)
> +void rc80211_pid_exit(void)
>  {
>  	ieee80211_rate_control_unregister(&mac80211_rcpid);
>  }
> diff --git a/net/mac80211/rc80211_simple.c b/net/mac80211/rc80211_simple.c
> index 934676d..9a78b11 100644
> --- a/net/mac80211/rc80211_simple.c
> +++ b/net/mac80211/rc80211_simple.c
> @@ -389,7 +389,7 @@ int __init rc80211_simple_init(void)
>  	return ieee80211_rate_control_register(&mac80211_rcsimple);
>  }
>  
> -void __exit rc80211_simple_exit(void)
> +void rc80211_simple_exit(void)
>  {
>  	ieee80211_rate_control_unregister(&mac80211_rcsimple);
>  }
> 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: Slow OOM in netif_RX function
From: Ivan Dichev @ 2008-02-01 12:51 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Andi Kleen, netdev
In-Reply-To: <20080125141204.GA25510@ghostprotocols.net>

Arnaldo Carvalho de Melo wrote:
> Em Fri, Jan 25, 2008 at 02:21:08PM +0100, Andi Kleen escreveu:
>   
>> "Ivan H. Dichev" <idichev@obs.bg> writes:
>>     
>>> What could happen if I put different Lan card in every slot?
>>> In ex. to-private -> 3com
>>>       to-inet    -> VIA
>>>       to-dmz     -> rtl8139
>>> And then to look which RX function is consuming the memory.
>>> (boomerang_rx, rtl8139_rx, ... etc) 
>>>       
>> The problem is unlikely to be in the driver (these are both
>> well tested ones) but more likely your complicated iptables setup somehow
>> triggers a skb leak.
>>
>> There are unfortunately no shrink wrapped debug mechanisms in the kernel
>> for leaks like this (ok you could enable CONFIG_NETFILTER_DEBUG 
>> and see if it prints something interesting, but that's a long shot).
>>
>> If you wanted to write a custom debugging patch I would do something like this:
>>
>> - Add two new integer fields to struct sk_buff: a time stamp and a integer field
>> - Fill the time stamp with jiffies in alloc_skb and clear the integer field
>> - In __kfree_skb clear the time stamp
>> - For all the ipt target modules in net/ipv4/netfilter/*.c you use change their 
>> ->target functions to put an unique value into the integer field you added.
>> - Do the same for the pkt_to_tuple functions for all conntrack modules
>>
>> Then when you observe the leak take a crash dump using kdump on the router 
>> and then use crash to dump all the slab objects for the sk_head_cache.
>> Then look for any that have an old time stamp and check what value they
>> have in the integer field. Then the netfilter function who set that unique value 
>> likely triggered the leak somehow.
>>     
>
> I wrote some systemtap scripts that do parts of what you suggest, and at
> least for the timestamp there was no need to add a new field to struct
> sk_buff, I just reuse skb->timestamp, as it is only used when we use a
> packet sniffer. Here it is for reference, but it needs some tapsets I
> wrote, so I'll publish this git repo in git.kernel.org, perhaps it can
> be useful in this case as a starting point. Find another unused field
> (hint: I know that at least 4 bytes on 64 bits is present as a hole) and
> you're done, no need to rebuild the kernel :)
>
> http://git.kernel.org/?p=linux/kernel/git/acme/nettaps.git
>
> - Arnaldo
>   
Thanks to everyone for the given ideas.
I am not kernel guru so writing patch is difficult. This is a production
server and it is quite difficult to debug (only at night)
I removed some iptables exotics -  recent , ulog, string , but no effect.
Since we can reach OOM most of the memory is going to be filled with the
leak, and we are thinking to try to dump and analyze it.
We have looked at the "crash" tool, and we will see what we can do with
it. Meanwhile do you have any hint/ideas ?
Thanks a lot.

Ivan Dichev


^ permalink raw reply

* Re: Slow OOM in netif_RX function
From: Eric Dumazet @ 2008-02-01 13:16 UTC (permalink / raw)
  To: Ivan Dichev; +Cc: Arnaldo Carvalho de Melo, Andi Kleen, netdev
In-Reply-To: <47A315DC.3070101@obs.bg>

Ivan Dichev a écrit :
> Arnaldo Carvalho de Melo wrote:
>   
>> Em Fri, Jan 25, 2008 at 02:21:08PM +0100, Andi Kleen escreveu:
>>   
>>     
>>> "Ivan H. Dichev" <idichev@obs.bg> writes:
>>>     
>>>       
>>>> What could happen if I put different Lan card in every slot?
>>>> In ex. to-private -> 3com
>>>>       to-inet    -> VIA
>>>>       to-dmz     -> rtl8139
>>>> And then to look which RX function is consuming the memory.
>>>> (boomerang_rx, rtl8139_rx, ... etc) 
>>>>       
>>>>         
>>> The problem is unlikely to be in the driver (these are both
>>> well tested ones) but more likely your complicated iptables setup somehow
>>> triggers a skb leak.
>>>
>>> There are unfortunately no shrink wrapped debug mechanisms in the kernel
>>> for leaks like this (ok you could enable CONFIG_NETFILTER_DEBUG 
>>> and see if it prints something interesting, but that's a long shot).
>>>
>>> If you wanted to write a custom debugging patch I would do something like this:
>>>
>>> - Add two new integer fields to struct sk_buff: a time stamp and a integer field
>>> - Fill the time stamp with jiffies in alloc_skb and clear the integer field
>>> - In __kfree_skb clear the time stamp
>>> - For all the ipt target modules in net/ipv4/netfilter/*.c you use change their 
>>> ->target functions to put an unique value into the integer field you added.
>>> - Do the same for the pkt_to_tuple functions for all conntrack modules
>>>
>>> Then when you observe the leak take a crash dump using kdump on the router 
>>> and then use crash to dump all the slab objects for the sk_head_cache.
>>> Then look for any that have an old time stamp and check what value they
>>> have in the integer field. Then the netfilter function who set that unique value 
>>> likely triggered the leak somehow.
>>>     
>>>       
>> I wrote some systemtap scripts that do parts of what you suggest, and at
>> least for the timestamp there was no need to add a new field to struct
>> sk_buff, I just reuse skb->timestamp, as it is only used when we use a
>> packet sniffer. Here it is for reference, but it needs some tapsets I
>> wrote, so I'll publish this git repo in git.kernel.org, perhaps it can
>> be useful in this case as a starting point. Find another unused field
>> (hint: I know that at least 4 bytes on 64 bits is present as a hole) and
>> you're done, no need to rebuild the kernel :)
>>
>> http://git.kernel.org/?p=linux/kernel/git/acme/nettaps.git
>>
>> - Arnaldo
>>   
>>     
> Thanks to everyone for the given ideas.
> I am not kernel guru so writing patch is difficult. This is a production
> server and it is quite difficult to debug (only at night)
> I removed some iptables exotics -  recent , ulog, string , but no effect.
> Since we can reach OOM most of the memory is going to be filled with the
> leak, and we are thinking to try to dump and analyze it.
> We have looked at the "crash" tool, and we will see what we can do with
> it. Meanwhile do you have any hint/ideas ?
> Thanks a lot.
>
>   
I understand you dont want to tell us exact firewall rules you have.

Maybe you could post at least following infos :

# cat /proc/slabinfo
# lsmod





^ permalink raw reply

* [PATCH for 2.6.25 1/2] [NET] ucc_geth: fix module removal
From: Anton Vorontsov @ 2008-02-01 13:22 UTC (permalink / raw)
  To: Li Yang, Jeff Garzik; +Cc: netdev, linuxppc-dev

- uccf should be set to NULL to not double-free memory on
  subsequent calls;
- ind_hash_q and group_hash_q lists should be initialized in the
  probe() function, instead of struct_init() (called by open()),
  otherwise there will be an oops if ucc_geth_driver removed
  prior 'ifconfig ethX up';
- add unregister_netdev();
- reorder geth_remove() steps.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---

Hi Li,

You kinda promised that these two patches would hit 2.6.25... ;-)

I've rebased the patches so they apply cleanly on the current tree.

Thanks,

 drivers/net/ucc_geth.c |   17 ++++++++++-------
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 4ffd873..e41da46 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -2084,8 +2084,10 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth)
 	if (!ugeth)
 		return;
 
-	if (ugeth->uccf)
+	if (ugeth->uccf) {
 		ucc_fast_free(ugeth->uccf);
+		ugeth->uccf = NULL;
+	}
 
 	if (ugeth->p_thread_data_tx) {
 		qe_muram_free(ugeth->thread_dat_tx_offset);
@@ -2305,10 +2307,6 @@ static int ucc_struct_init(struct ucc_geth_private *ugeth)
 	ug_info = ugeth->ug_info;
 	uf_info = &ug_info->uf_info;
 
-	/* Create CQs for hash tables */
-	INIT_LIST_HEAD(&ugeth->group_hash_q);
-	INIT_LIST_HEAD(&ugeth->ind_hash_q);
-
 	if (!((uf_info->bd_mem_part == MEM_PART_SYSTEM) ||
 	      (uf_info->bd_mem_part == MEM_PART_MURAM))) {
 		if (netif_msg_probe(ugeth))
@@ -3990,6 +3988,10 @@ static int ucc_geth_probe(struct of_device* ofdev, const struct of_device_id *ma
 	ugeth = netdev_priv(dev);
 	spin_lock_init(&ugeth->lock);
 
+	/* Create CQs for hash tables */
+	INIT_LIST_HEAD(&ugeth->group_hash_q);
+	INIT_LIST_HEAD(&ugeth->ind_hash_q);
+
 	dev_set_drvdata(device, dev);
 
 	/* Set the dev->base_addr to the gfar reg region */
@@ -4040,9 +4042,10 @@ static int ucc_geth_remove(struct of_device* ofdev)
 	struct net_device *dev = dev_get_drvdata(device);
 	struct ucc_geth_private *ugeth = netdev_priv(dev);
 
-	dev_set_drvdata(device, NULL);
-	ucc_geth_memclean(ugeth);
+	unregister_netdev(dev);
 	free_netdev(dev);
+	ucc_geth_memclean(ugeth);
+	dev_set_drvdata(device, NULL);
 
 	return 0;
 }
-- 
1.5.2.2


^ permalink raw reply related

* [PATCH for 2.6.25 2/2] [NET] ucc_geth: add support for netpoll
From: Anton Vorontsov @ 2008-02-01 13:22 UTC (permalink / raw)
  To: Li Yang, Jeff Garzik; +Cc: netdev, linuxppc-dev

This patch adds netpoll support for the QE UCC Gigabit Ethernet
driver. Tested using netconsole and KGDBoE.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---

Just resending this.

 drivers/net/ucc_geth.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index e41da46..fba0811 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -3666,6 +3666,23 @@ static irqreturn_t ucc_geth_irq_handler(int irq, void *info)
 	return IRQ_HANDLED;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+/*
+ * Polling 'interrupt' - used by things like netconsole to send skbs
+ * without having to re-enable interrupts. It's not called while
+ * the interrupt routine is executing.
+ */
+static void ucc_netpoll(struct net_device *dev)
+{
+	struct ucc_geth_private *ugeth = netdev_priv(dev);
+	int irq = ugeth->ug_info->uf_info.irq;
+
+	disable_irq(irq);
+	ucc_geth_irq_handler(irq, dev);
+	enable_irq(irq);
+}
+#endif /* CONFIG_NET_POLL_CONTROLLER */
+
 /* Called when something needs to use the ethernet device */
 /* Returns 0 for success. */
 static int ucc_geth_open(struct net_device *dev)
@@ -4008,6 +4025,9 @@ static int ucc_geth_probe(struct of_device* ofdev, const struct of_device_id *ma
 #ifdef CONFIG_UGETH_NAPI
 	netif_napi_add(dev, &ugeth->napi, ucc_geth_poll, UCC_GETH_DEV_WEIGHT);
 #endif				/* CONFIG_UGETH_NAPI */
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	dev->poll_controller = ucc_netpoll;
+#endif
 	dev->stop = ucc_geth_close;
 //    dev->change_mtu = ucc_geth_change_mtu;
 	dev->mtu = 1500;
-- 
1.5.2.2

^ permalink raw reply related

* kernel panic on 2.6.24 with esfq patch applied
From: Denys Fedoryshchenko @ 2008-02-01 13:25 UTC (permalink / raw)
  To: netdev

Hi

Probably bug related to ESFQ, now i will unload module and will test more. 
But probably not related, so if not difficult, please take a look.

Feb  1 09:08:50 SERVER [12380.067104] BUG: unable to handle kernel NULL 
pointer dereference
Feb  1 09:08:50 SERVER at virtual address 00000008
Feb  1 09:08:50 SERVER [12380.067140] printing eip: c01f10ed
Feb  1 09:08:50 SERVER *pde = 00000000
Feb  1 09:08:50 SERVER
Feb  1 09:08:50 SERVER [12380.067162] Oops: 0000 [#1]
Feb  1 09:08:50 SERVER SMP
Feb  1 09:08:50 SERVER
Feb  1 09:08:50 SERVER [12380.067181] Modules linked in:
Feb  1 09:08:50 SERVER netconsole
Feb  1 09:08:50 SERVER configfs
Feb  1 09:08:50 SERVER iTCO_wdt
Feb  1 09:08:50 SERVER nf_nat_pptp
Feb  1 09:08:50 SERVER nf_conntrack_pptp
Feb  1 09:08:50 SERVER nf_conntrack_proto_gre
Feb  1 09:08:50 SERVER nf_nat_proto_gre
Feb  1 09:08:50 SERVER sch_esfq
Feb  1 09:08:50 SERVER xt_tcpudp
Feb  1 09:08:50 SERVER ipt_TTL
Feb  1 09:08:50 SERVER ipt_ttl
Feb  1 09:08:50 SERVER xt_NOTRACK
Feb  1 09:08:50 SERVER iptable_raw
Feb  1 09:08:50 SERVER iptable_mangle
Feb  1 09:08:50 SERVER ifb
Feb  1 09:08:50 SERVER e1000e
Feb  1 09:08:50 SERVER em_nbyte
Feb  1 09:08:50 SERVER cls_tcindex
Feb  1 09:08:50 SERVER act_gact
Feb  1 09:08:50 SERVER cls_rsvp
Feb  1 09:08:50 SERVER sch_htb
Feb  1 09:08:50 SERVER cls_fw
Feb  1 09:08:50 SERVER act_mirred
Feb  1 09:08:50 SERVER em_u32
Feb  1 09:08:50 SERVER sch_red
Feb  1 09:08:50 SERVER sch_sfq
Feb  1 09:08:50 SERVER sch_tbf
Feb  1 09:08:50 SERVER sch_teql
Feb  1 09:08:50 SERVER cls_basic
Feb  1 09:08:50 SERVER act_police
Feb  1 09:08:50 SERVER sch_gred
Feb  1 09:08:50 SERVER act_pedit
Feb  1 09:08:50 SERVER sch_hfsc
Feb  1 09:08:50 SERVER cls_rsvp6
Feb  1 09:08:50 SERVER sch_ingress
Feb  1 09:08:50 SERVER em_meta
Feb  1 09:08:50 SERVER em_text
Feb  1 09:08:50 SERVER act_ipt
Feb  1 09:08:50 SERVER sch_dsmark
Feb  1 09:08:50 SERVER sch_prio
Feb  1 09:08:50 SERVER sch_netem
Feb  1 09:08:50 SERVER act_simple
Feb  1 09:08:50 SERVER cls_u32
Feb  1 09:08:50 SERVER em_cmp
Feb  1 09:08:50 SERVER sch_cbq
Feb  1 09:08:50 SERVER cls_route
Feb  1 09:08:50 SERVER xt_TCPMSS
Feb  1 09:08:50 SERVER iptable_nat
Feb  1 09:08:50 SERVER nf_conntrack_ipv4
Feb  1 09:08:50 SERVER ipt_LOG
Feb  1 09:08:50 SERVER ipt_MASQUERADE
Feb  1 09:08:50 SERVER ipt_REDIRECT
Feb  1 09:08:50 SERVER nf_nat
Feb  1 09:08:50 SERVER nf_conntrack
Feb  1 09:08:50 SERVER nfnetlink
Feb  1 09:08:50 SERVER iptable_filter
Feb  1 09:08:50 SERVER ip_tables
Feb  1 09:08:50 SERVER x_tables
Feb  1 09:08:50 SERVER 8021q
Feb  1 09:08:50 SERVER tun
Feb  1 09:08:50 SERVER tulip
Feb  1 09:08:50 SERVER r8169
Feb  1 09:08:50 SERVER sky2
Feb  1 09:08:50 SERVER via_velocity
Feb  1 09:08:50 SERVER via_rhine
Feb  1 09:08:50 SERVER sis900
Feb  1 09:08:50 SERVER ne2k_pci
Feb  1 09:08:50 SERVER 8390
Feb  1 09:08:50 SERVER skge
Feb  1 09:08:50 SERVER tg3
Feb  1 09:08:50 SERVER 8139too
Feb  1 09:08:50 SERVER e1000
Feb  1 09:08:50 SERVER e100
Feb  1 09:08:50 SERVER usb_storage
Feb  1 09:08:50 SERVER mtdblock
Feb  1 09:08:50 SERVER mtd_blkdevs
Feb  1 09:08:50 SERVER usbhid
Feb  1 09:08:50 SERVER uhci_hcd
Feb  1 09:08:50 SERVER ehci_hcd
Feb  1 09:08:50 SERVER ohci_hcd
Feb  1 09:08:50 SERVER usbcore
Feb  1 09:08:50 SERVER
Feb  1 09:08:50 SERVER [12380.067515]
Feb  1 09:08:50 SERVER [12380.067530] Pid: 0, comm: swapper Not tainted 
(2.6.24-build-0021 #26)
Feb  1 09:08:50 SERVER [12380.067550] EIP: 0060:[<c01f10ed>] EFLAGS: 00010086 
CPU: 0
Feb  1 09:08:50 SERVER [12380.067571] EIP is at rb_erase+0x110/0x22f
Feb  1 09:08:50 SERVER [12380.067589] EAX: f52bbea0 EBX: 00000000 ECX: 
00000000 EDX: f52bbea0
Feb  1 09:08:50 SERVER [12380.067608] ESI: f717df50 EDI: c1fed000 EBP: 
c1fecf80 ESP: c037fda8
Feb  1 09:08:50 SERVER [12380.067628]  DS: 007b ES: 007b FS: 00d8 GS: 0000 
SS: 0068
Feb  1 09:08:50 SERVER [12380.067647] Process swapper (pid: 0, ti=c037e000 
task=c03533a0 task.ti=c037e000)
Feb  1 09:08:50 SERVER
Feb  1 09:08:50 SERVER [12380.067668] Stack:
Feb  1 09:08:50 SERVER 00000001
Feb  1 09:08:50 SERVER c1fed000
Feb  1 09:08:50 SERVER c1fecf78
Feb  1 09:08:50 SERVER 00000002
Feb  1 09:08:50 SERVER 00000001
Feb  1 09:08:50 SERVER c0134663
Feb  1 09:08:50 SERVER c1fed000
Feb  1 09:08:50 SERVER c1fecf78
Feb  1 09:08:50 SERVER
Feb  1 09:08:50 SERVER [12380.067714]
Feb  1 09:08:50 SERVER c1fecf40
Feb  1 09:08:50 SERVER c013515b
Feb  1 09:08:50 SERVER 00000000
Feb  1 09:08:50 SERVER 4f3f473e
Feb  1 09:08:50 SERVER 000002d0
Feb  1 09:08:50 SERVER ffffffff
Feb  1 09:08:50 SERVER 7fffffff
Feb  1 09:08:50 SERVER 4f3f473e
Feb  1 09:08:50 SERVER
Feb  1 09:08:50 SERVER [12380.067760]
Feb  1 09:08:50 SERVER 000002d0
Feb  1 09:08:50 SERVER 00000000
Feb  1 09:08:50 SERVER c1fec120
Feb  1 09:08:50 SERVER c037ff84
Feb  1 09:08:50 SERVER c037fe70
Feb  1 09:08:50 SERVER f76ae880
Feb  1 09:08:50 SERVER c0113963
Feb  1 09:08:50 SERVER c1ff5f78
Feb  1 09:08:50 SERVER
Feb  1 09:08:50 SERVER [12380.067806] Call Trace:
Feb  1 09:08:50 SERVER [12380.067839]  [<c0134663>]
Feb  1 09:08:50 SERVER __remove_hrtimer+0x5d/0x64
Feb  1 09:08:50 SERVER [12380.067861]  [<c013515b>]
Feb  1 09:08:50 SERVER hrtimer_interrupt+0x10c/0x19a
Feb  1 09:08:50 SERVER [12380.067883]  [<c0113963>]
Feb  1 09:08:50 SERVER smp_apic_timer_interrupt+0x6f/0x80
Feb  1 09:08:50 SERVER [12380.067905]  [<c0105838>]
Feb  1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30
Feb  1 09:08:50 SERVER [12380.067928]  [<c02be6d7>]
Feb  1 09:08:50 SERVER _spin_lock_irqsave+0x13/0x27
Feb  1 09:08:50 SERVER [12380.067949]  [<c0134bc7>]
Feb  1 09:08:50 SERVER lock_hrtimer_base+0x15/0x2f
Feb  1 09:08:50 SERVER [12380.067970]  [<c0134ca0>]
Feb  1 09:08:50 SERVER hrtimer_start+0x16/0xf4
Feb  1 09:08:50 SERVER [12380.067991]  [<c027ec43>]
Feb  1 09:08:50 SERVER qdisc_watchdog_schedule+0x1e/0x21
Feb  1 09:08:50 SERVER [12380.068013]  [<f89f8fe6>]
Feb  1 09:08:50 SERVER htb_dequeue+0x6ef/0x6fb [sch_htb]
Feb  1 09:08:50 SERVER [12380.068036]  [<c028ac4d>]
Feb  1 09:08:50 SERVER ip_rcv+0x1fc/0x237
Feb  1 09:08:50 SERVER [12380.068057]  [<c0135297>]
Feb  1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb
Feb  1 09:08:50 SERVER [12380.068078]  [<c0135297>]
Feb  1 09:08:50 SERVER hrtimer_get_next_event+0xae/0xbb
Feb  1 09:08:50 SERVER [12380.068099]  [<c0136e26>]
Feb  1 09:08:50 SERVER getnstimeofday+0x2b/0xb5
Feb  1 09:08:50 SERVER [12380.068118]  [<c0138d70>]
Feb  1 09:08:50 SERVER clockevents_program_event+0xe0/0xee
Feb  1 09:08:50 SERVER [12380.068140]  [<c027da0e>]
Feb  1 09:08:50 SERVER __qdisc_run+0x2a/0x163
Feb  1 09:08:50 SERVER [12380.068161]  [<c02722d8>]
Feb  1 09:08:50 SERVER net_tx_action+0xa8/0xcc
Feb  1 09:08:50 SERVER [12380.068180]  [<c027ec65>]
Feb  1 09:08:50 SERVER qdisc_watchdog+0x0/0x1b
Feb  1 09:08:50 SERVER [12380.068199]  [<c027ec7d>]
Feb  1 09:08:50 SERVER qdisc_watchdog+0x18/0x1b
Feb  1 09:08:50 SERVER [12380.068218]  [<c0135007>]
Feb  1 09:08:50 SERVER run_hrtimer_softirq+0x4e/0x96
Feb  1 09:08:50 SERVER [12380.068241]  [<c0126a82>]
Feb  1 09:08:50 SERVER __do_softirq+0x5d/0xc1
Feb  1 09:08:50 SERVER [12380.068260]  [<c0126b18>]
Feb  1 09:08:50 SERVER do_softirq+0x32/0x36
Feb  1 09:08:50 SERVER [12380.068279]  [<c0126d6a>]
Feb  1 09:08:50 SERVER irq_exit+0x38/0x6b
Feb  1 09:08:50 SERVER [12380.068298]  [<c0113968>]
Feb  1 09:08:50 SERVER smp_apic_timer_interrupt+0x74/0x80
Feb  1 09:08:50 SERVER [12380.068319]  [<c0105838>]
Feb  1 09:08:50 SERVER apic_timer_interrupt+0x28/0x30
Feb  1 09:08:50 SERVER [12380.068343]  [<c0103243>]
Feb  1 09:08:50 SERVER mwait_idle_with_hints+0x3c/0x40
Feb  1 09:08:50 SERVER [12380.068365]  [<c0103247>]
Feb  1 09:08:50 SERVER mwait_idle+0x0/0xa
Feb  1 09:08:50 SERVER [12380.068384]  [<c010357e>]
Feb  1 09:08:50 SERVER cpu_idle+0x98/0xb9
Feb  1 09:08:50 SERVER [12380.068403]  [<c03848c2>]
Feb  1 09:08:50 SERVER start_kernel+0x2d7/0x2df
Feb  1 09:08:50 SERVER [12380.068422]  [<c03840e0>]
Feb  1 09:08:50 SERVER unknown_bootoption+0x0/0x195
Feb  1 09:08:50 SERVER [12380.068444]  =======================
Feb  1 09:08:50 SERVER [12380.068460] Code:
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 00
Feb  1 09:08:50 SERVER 00
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 4e
Feb  1 09:08:50 SERVER 08
Feb  1 09:08:50 SERVER 39
Feb  1 09:08:50 SERVER d9
Feb  1 09:08:50 SERVER 0f
Feb  1 09:08:50 SERVER 85
Feb  1 09:08:50 SERVER 85
Feb  1 09:08:50 SERVER 00
Feb  1 09:08:50 SERVER 00
Feb  1 09:08:50 SERVER 00
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 4e
Feb  1 09:08:50 SERVER 04
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER a8
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 75
Feb  1 09:08:50 SERVER 14
Feb  1 09:08:50 SERVER 83
Feb  1 09:08:50 SERVER c8
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 89
Feb  1 09:08:50 SERVER ea
Feb  1 09:08:50 SERVER 89
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 89
Feb  1 09:08:50 SERVER f0
Feb  1 09:08:50 SERVER 83
Feb  1 09:08:50 SERVER 26
Feb  1 09:08:50 SERVER fe
Feb  1 09:08:50 SERVER e8
Feb  1 09:08:50 SERVER 1e
Feb  1 09:08:50 SERVER fd
Feb  1 09:08:50 SERVER ff
Feb  1 09:08:50 SERVER ff
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 4e
Feb  1 09:08:50 SERVER 04
Feb  1 07:08:49 SERVER unparseable log message: "<8b> "
Feb  1 09:08:50 SERVER 59
Feb  1 09:08:50 SERVER 08
Feb  1 09:08:50 SERVER 85
Feb  1 09:08:50 SERVER db
Feb  1 09:08:50 SERVER 74
Feb  1 09:08:50 SERVER 06
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 03
Feb  1 09:08:50 SERVER a8
Feb  1 09:08:50 SERVER 01
Feb  1 09:08:50 SERVER 74
Feb  1 09:08:50 SERVER 15
Feb  1 09:08:50 SERVER 8b
Feb  1 09:08:50 SERVER 41
Feb  1 09:08:50 SERVER 04
Feb  1 09:08:50 SERVER 85
Feb  1 09:08:50 SERVER c0
Feb  1 09:08:50 SERVER 0f
Feb  1 09:08:50 SERVER 84
Feb  1 09:08:50 SERVER c6
Feb  1 09:08:50 SERVER
Feb  1 09:08:50 SERVER [12380.068753] EIP: [<c01f10ed>]
Feb  1 09:08:50 SERVER rb_erase+0x110/0x22f
Feb  1 09:08:50 SERVER SS:ESP 0068:c037fda8
Feb  1 09:08:50 SERVER [12380.068978] Kernel panic - not syncing: Fatal 
exception in interrupt


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply

* Protocol handler for Marvell DSA EtherType packets
From: Jesper Dangaard Brouer @ 2008-02-01 13:28 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: David S. Miller

Hi Netdev

I writing a new protocol handler using dev_add_pack().  (For a Marvell
switch chip handling DSA (Distributed Switch Architecture) Ethertype
packets).

My protocol handler works and I get the skb. But I want to remove the
DSA Headers and send the packet back for normal processing on a
device. (I actually just want to be able to tcpdump these packets on
the device).

I'm removing the headers by:
  skb_pull(skb, sizeof(struct dsa_header));

I'm trying to retransmit it by:
  netif_rx(skb);

But it seems that I just retransmit the same packet without removing
the DSA headers.

Any hints about which functions I should use the remove the DSA header?

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox