Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
From: Evgeniy Polyakov @ 2006-03-07  7:44 UTC (permalink / raw)
  To: Ingo Oeser
  Cc: David S. Miller, jengelh, christopher.leech, linux-kernel, netdev
In-Reply-To: <200603061844.07439.netdev@axxeo.de>

On Mon, Mar 06, 2006 at 06:44:07PM +0100, Ingo Oeser (netdev@axxeo.de) wrote:
> Evgeniy Polyakov wrote:
> > On Sat, Mar 04, 2006 at 01:41:44PM -0800, David S. Miller (davem@davemloft.net) wrote:
> > > From: Jan Engelhardt <jengelh@linux01.gwdg.de>
> > > Date: Sat, 4 Mar 2006 19:46:22 +0100 (MET)
> > > 
> > > > Does this buy the normal standard desktop user anything?
> > > 
> > > Absolutely, it optimizes end-node performance.
> > 
> > It really depends on how it is used.
> > According to investigation made for kevent based FS AIO reading,
> > get_user_pages() performange graph looks like sqrt() function
> 
> Hmm, so I should resurrect my user page table walker abstraction?
> 
> There I would hand each page to a "recording" function, which
> can drop the page from the collection or coalesce it in the collector
> if your scatter gather implementation allows it.

It depends on where performance growth is stopped.
>From the first glance it does not look like find_extend_vma(),
probably follow_page() fault and thus __handle_mm_fault().
I can not say actually, but if it is true and performance growth is
stopped due to increased number of faults and it's processing, 
your approach will hit this problem too, doesn't it?

> Regards
> 
> Ingo Oeser

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [PATCH, RESEND] Add MWI workaround for Tulip DC21143
From: Geert Uytterhoeven @ 2006-03-07  9:32 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Francois Romieu, Martin Michlmayr, netdev, Linux/MIPS Development
In-Reply-To: <20060307035824.GA24018@linux-mips.org>

On Tue, 7 Mar 2006, Ralf Baechle wrote:
> On Tue, Mar 07, 2006 at 12:15:30AM +0100, Francois Romieu wrote:
> 
> > [...]
> > > Does anyone have comments regarding this patch?  I received
> > > confirmation from a number of Debian users that this patch
> > > significantly improves the lockup situation on Cobalt, so
> > > it would be nice if it could go in.
> > 
> > I'll queue it with the pending de2104x fix(es ?) during my next
> > upkeep.
> 
> I'm just not convinced of having such a workaround as a build option.
> The average person building a a kernel will probably not know if the
> option needs to be enabled or not.

Indeed, if it's mentioned in the errata of the chip, the driver should take
care of it.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
From: Ingo Oeser @ 2006-03-07  9:43 UTC (permalink / raw)
  To: Evgeniy Polyakov
  Cc: David S. Miller, jengelh, christopher.leech, linux-kernel, netdev
In-Reply-To: <20060307074438.GA22672@2ka.mipt.ru>

Evgeniy Polyakov wrote:
> On Mon, Mar 06, 2006 at 06:44:07PM +0100, Ingo Oeser (netdev@axxeo.de) wrote:
> > Hmm, so I should resurrect my user page table walker abstraction?
> > 
> > There I would hand each page to a "recording" function, which
> > can drop the page from the collection or coalesce it in the collector
> > if your scatter gather implementation allows it.
> 
> It depends on where performance growth is stopped.
> From the first glance it does not look like find_extend_vma(),
> probably follow_page() fault and thus __handle_mm_fault().
> I can not say actually, but if it is true and performance growth is
> stopped due to increased number of faults and it's processing, 
> your approach will hit this problem too, doesn't it?

My approach reduced the number of loops performed and number
of memory needed at the expense of doing more work in the main
loop of get_user_pages. 

This was mitigated for the common case of getting just one page by 
providing a get_one_user_page() function.

The whole problem, why we need such multiple loops is that we have
no common container object for "IO vector + additional data".

So we always do a loop working over the vector returned by 
get_user_pages() all the time. The bigger that vector, 
the bigger the impact.

Maybe sth. as simple as providing get_user_pages() with some offset_of 
and container_of hackery will work these days without the disadvantages 
my old get_user_pages() work had.

The idea is, that you'll provide a vector (like arguments to calloc) and two 
offsets: One for the page to store within the offset and one for the vma 
to store.

If the offset has a special value (e.g MAX_LONG) you don't store there at all.

But if the performance problem really is get_user_pages() itself 
(and not its callers), then my approach won't help at all.


Regards

Ingo Oeser

^ permalink raw reply

* Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
From: Evgeniy Polyakov @ 2006-03-07 10:16 UTC (permalink / raw)
  To: Ingo Oeser
  Cc: David S. Miller, jengelh, christopher.leech, linux-kernel, netdev
In-Reply-To: <200603071043.59479.netdev@axxeo.de>

[-- Attachment #1: Type: text/plain, Size: 2527 bytes --]

On Tue, Mar 07, 2006 at 10:43:59AM +0100, Ingo Oeser (netdev@axxeo.de) wrote:
> Evgeniy Polyakov wrote:
> > On Mon, Mar 06, 2006 at 06:44:07PM +0100, Ingo Oeser (netdev@axxeo.de) wrote:
> > > Hmm, so I should resurrect my user page table walker abstraction?
> > > 
> > > There I would hand each page to a "recording" function, which
> > > can drop the page from the collection or coalesce it in the collector
> > > if your scatter gather implementation allows it.
> > 
> > It depends on where performance growth is stopped.
> > From the first glance it does not look like find_extend_vma(),
> > probably follow_page() fault and thus __handle_mm_fault().
> > I can not say actually, but if it is true and performance growth is
> > stopped due to increased number of faults and it's processing, 
> > your approach will hit this problem too, doesn't it?
> 
> My approach reduced the number of loops performed and number
> of memory needed at the expense of doing more work in the main
> loop of get_user_pages. 
> 
> This was mitigated for the common case of getting just one page by 
> providing a get_one_user_page() function.
> 
> The whole problem, why we need such multiple loops is that we have
> no common container object for "IO vector + additional data".
> 
> So we always do a loop working over the vector returned by 
> get_user_pages() all the time. The bigger that vector, 
> the bigger the impact.
> 
> Maybe sth. as simple as providing get_user_pages() with some offset_of 
> and container_of hackery will work these days without the disadvantages 
> my old get_user_pages() work had.
> 
> The idea is, that you'll provide a vector (like arguments to calloc) and two 
> offsets: One for the page to store within the offset and one for the vma 
> to store.
> 
> If the offset has a special value (e.g MAX_LONG) you don't store there at all.

You still need to find VMA in one loop, and run through it's(mm_structu) pages in
second loop.

> But if the performance problem really is get_user_pages() itself 
> (and not its callers), then my approach won't help at all.

It looks so.
My test pseudocode is following:
fget_light();
igrab();
kzalloc(number_of_pages * sizeof(void *));
get_user_pages(number_of_pages);
... undo ...

I've attached two graphs of performance with and without
get_user_pages(), it is get_user_pages.png and kmalloc.png.

Vertical axis is number of Mbytes per second thrown through above code,
horizontal one is number of pages in each run.
 
> Regards
> 
> Ingo Oeser

-- 
	Evgeniy Polyakov

[-- Attachment #2: get_user_pages.png --]
[-- Type: image/png, Size: 5498 bytes --]

[-- Attachment #3: kmalloc.png --]
[-- Type: image/png, Size: 5816 bytes --]

^ permalink raw reply

* [PATCH] [NETFILTER] ip_queue: Fix wrong skb->len == nlmsg_len assumption
From: Thomas Graf @ 2006-03-07 12:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, netfilter-devel

The size of the skb carrying the netlink message is not
equivalent to the length of the actual netlink message
due to padding. ip_queue matches the length of the payload
against the original packet size to determine if packet
mangling is desired, due to the above wrong assumption
arbitary packets may not be mangled depening on their
original size.

Signed-off-by: Thomas Graf <tgraf@suug.ch>

Index: net-2.6/net/ipv4/netfilter/ip_queue.c
===================================================================
--- net-2.6.orig/net/ipv4/netfilter/ip_queue.c
+++ net-2.6/net/ipv4/netfilter/ip_queue.c
@@ -524,7 +524,7 @@ ipq_rcv_skb(struct sk_buff *skb)
 	write_unlock_bh(&queue_lock);
 	
 	status = ipq_receive_peer(NLMSG_DATA(nlh), type,
-	                          skblen - NLMSG_LENGTH(0));
+	                          nlmsglen - NLMSG_LENGTH(0));
 	if (status < 0)
 		RCV_SKB_FAIL(status);
 		
Index: net-2.6/net/ipv6/netfilter/ip6_queue.c
===================================================================
--- net-2.6.orig/net/ipv6/netfilter/ip6_queue.c
+++ net-2.6/net/ipv6/netfilter/ip6_queue.c
@@ -522,7 +522,7 @@ ipq_rcv_skb(struct sk_buff *skb)
 	write_unlock_bh(&queue_lock);
 	
 	status = ipq_receive_peer(NLMSG_DATA(nlh), type,
-	                          skblen - NLMSG_LENGTH(0));
+	                          nlmsglen - NLMSG_LENGTH(0));
 	if (status < 0)
 		RCV_SKB_FAIL(status);
 		

^ permalink raw reply

* Re: GigE on PowerMac G5
From: Andreas Schwab @ 2006-03-07 12:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: netdev, linuxppc64-dev
In-Reply-To: <1141650907.11221.61.camel@localhost.localdomain>

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> At this point, all I can say is... does it work in OS X ?

Strange, OS X can't do it either.  Looks like I have a hardware problem.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH] [NETFILTER] ip_queue: Fix wrong skb->len == nlmsg_len assumption
From: Patrick McHardy @ 2006-03-07 12:58 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev, netfilter-devel, David S. Miller
In-Reply-To: <20060307123143.GO9559@postel.suug.ch>

Thomas Graf wrote:
> The size of the skb carrying the netlink message is not
> equivalent to the length of the actual netlink message
> due to padding. ip_queue matches the length of the payload
> against the original packet size to determine if packet
> mangling is desired, due to the above wrong assumption
> arbitary packets may not be mangled depening on their
> original size.

Looks good, thanks Thomas. I think this should also go in 2.4.

^ permalink raw reply

* Re: [Lse-tech] Re: [Patch 7/7] Generic netlink interface (delay accounting)
From: jamal @ 2006-03-07 14:38 UTC (permalink / raw)
  To: Shailabh Nagar; +Cc: netdev, linux-kernel, lse-tech
In-Reply-To: <440C6AAA.9030301@watson.ibm.com>

On Mon, 2006-06-03 at 12:00 -0500, Shailabh Nagar wrote:

> My design was to have the listener get both responses (what I call 
> replies in the code) as well as events (data sent on exit of pid)
> 

I think i may not be doing justice explaining this, so let me be more
elaborate so we can be in sync.
Here is the classical way of doing things:

- Assume several apps in user space and a target in the kernel (this
could be reversed or combined in many ways, but the sake of
simplicity/clarity make the above assumption).
- suppose we have five user space apps A, B, C, D, E; these processes
would typically do one of the following class of activities:

a) configure (ADD/NEW/DEL etc). This is issued towards the kernel to
set/create/delete/flush some scalar attribute or vector. These sorts of
commands are synchronous. i.e you issue them, you expect a response
(which may indicate success/failure etc). The response is unicast; the
effect of what they affected may cause an event which may be multicast.

b) query(GET). This is issued towards the kernel to query state of
configured items. These class of commands are also synchronous. There
are special cases of the query which dump everything in the target -
literally called "dumps". The response is unicast.

c) events. These are _asynchronous_ messages issued by the kernel to
indicate some happening in the kernel. The event may be caused by #a
above or any other activity in the kernel. Events are multicast.
To receive them you have to register for the multicast group. You do so
via sockets. You can register to many multicast group.

For clarity again assume we have a multicast group where announcements
of pids exiting is seen and C and D are registered to such a multicast
group.
Suppose process A exits. That would fall under #c above. C and D will be
notified.
Suppose B configures something in the kernel that forces the kernel to
have process E exit and that such an operation is successful. B will get
acknowledgement it succeeded (unicast). C and D will get notified
(multicast). 
Suppose C issued a GET to find details about a specific pid, then only C
will get that info back (message is unicast).
[A response message to a GET is typically designed to be the same as an
ADD message i.e one should be able to take exactly the same message,
change one or two things and shove back into the kernel to configure].
Suppose D issued a GET with dump flag, then D will get the details of
all pids (message is unicast).

Is this clear? Is there more than the above you need?

There are no hard rules on what you need to be multicasting and as an
example you could send periodic(aka time based) samples from the kernel
on a multicast channel and that would be received by all. It did seem
odd that you want to have a semi-promiscous mode where a response to a
GET is multicast. If that is still what you want to achieve, then you
should.

> However, we could switch to the model you suggest and use a 
> multithreaded send/receive userspace utility.
> 

This is more of the classical way of doing things. 


> >There is a recent netlink addition to make sure
> >that events dont get sent if no listeners exist.
> >genetlink needs to be extended. For now assume such a thing exists.
> >  
> >
> Ok. Will this addition work for both unicast and multicast modes ?
> 

If you never open a connection to the kernel, nothing will be generated
towards user space. 
There are other techniques to rate limit event generation as well (one
such technique is a nagle-like algorithm used by xfrm).

> >
> Will this be necessary ? Isn't genl_rcv_msg() going to return a -EOPNOTSUPP
> automatically for us since we've not registered the command ?
>  

Yes, please in your doc feedback remind me of this,

> >
> >Also if you can provide feedback whether the doc i sent was any use
> >and what wasnt clear etc.
> >  
> >
> Will do.
> 

also take a look at the excellent documentation Thomas Graf has put in
the kernel for all the utilities for manipulating netlink messages and
tell me if that should also be put in this doc (It is listed as a TODO).


cheers,
jamal

^ permalink raw reply

* Re: de2104x: interrupts before interrupt handler is registered
From: Martin Michlmayr @ 2006-03-07 14:57 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel
In-Reply-To: <20060307051152.GA1244@deprecation.cyrius.com>

* Martin Michlmayr <tbm@cyrius.com> [2006-03-07 05:11]:
> * Francois Romieu <romieu@fr.zoreil.com> [2006-03-06 22:17]:
> > Not sure about this one, but...
> 
> It seems to help.  It's hard to say for sure because I don't have a
> foolproof way to reproduce this panic.  It _usually_ occurs after
> copying a few hundred MB but there's no clear trigger.  I've now copied
> a few GB around using a kernel with your patch and it hasn't crashed.

I'm pretty sure now that your patch helps.  I left the system running
overnight and it was still alive in the morning after transferring ~10
GB.  I do get all kind of underrun messages (see below) but the data
got transferred alright.  I then rebooted with the kernel that doesn't
have your patch and it crashed after ~1 GB.


(this was at about 3 GB, but the same goes on and on; but the network
works.)

eth0      Link encap:Ethernet  HWaddr 00:80:C8:33:4F:96  
          inet addr:192.168.1.145  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::280:c8ff:fe33:4f96/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1199533 errors:7 dropped:0 overruns:7 frame:0
          TX packets:2344296 errors:396 dropped:252 overruns:396 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:64846004 (61.8 MiB)  TX bytes:3479989567 (3.2 GiB)
          Interrupt:10 Base address:0x2000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:560 (560.0 b)  TX bytes:560 (560.0 b)


Adding 136512k swap on /dev/hda5.  Priority:-1 extents:1 across:136512k
EXT3 FS on hda1, internal journal
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
eth0: enabling interface
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-0, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb022
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb012
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb032
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb02a
eth0: tx err, status 0x7fffb01a
eth0: tx err, status 0x7fffb02a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 15/37/38
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb012
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 54 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 16/6/7
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb012
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 60 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 41/47/48
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 32 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 55 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 43 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 2 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: rx err, slot 6 status 0x508329 len 76
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002
NETDEV WATCHDOG: eth0: transmit timed out
eth0: NIC status fc660000 mode 7ffc2002 sia 45e1d1c8 desc 17/63/0
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb002
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb00a
eth0: tx err, status 0x7fffb002

-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply

* Re: de2104x: interrupts before interrupt handler is registered
From: Martin Michlmayr @ 2006-03-07 15:16 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel
In-Reply-To: <20060306191706.GA6947@deprecation.cyrius.com>

* Martin Michlmayr <tbm@cyrius.com> [2006-03-06 19:17]:
> There's another interrupt related bug in the driver, though.  I

There's yet another bug (or two).

I just got another kernel panic:
http://www.cyrius.com/tmp/de2104x_panic2.jpg (which I haven't been
able to reproduce so far; this was without your latest patch applied,
btw).  This happened when I was doing DHCP while my server was not
responding to DHCP.  I wonder if it's related to another issue I've
observed.

This card is a D-Link DE 530 with both a BNC and RJ-45 connector.
When I boot my machine without having the Ethernet cable plugged in,
Linux thinks there's a BNC connection.  When I plug in the cable, the
link light on the card goes on but Linux doesn't seem to notice - in
fact, when I then start DHCP again, the link light goes off again and
Linux talks about BNC being up... [FWIW, Linux 2.4 doesn't handle this
situation either.  Under 2.4 the link light doesn't even come up.]


dmesg: booting without the RJ-45 cable plugged in, doing DHCP, then
plugging the RJ-45 cable in and doing DHCP again:

hda: 4999680 sectors (2559 MB) w/256KiB Cache, CHS=4960/16/63, UDMA(33)
 hda: hda1 hda2 < hda5 hda6 >
ACPI: PCI Interrupt 0000:00:0b.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10
de0: SROM leaf offset 30, default media 10baseT auto
de0:   media block #0: 10baseT-FD
de0:   media block #1: BNC
de0:   media block #2: 10baseT-HD
eth0: 21041 at 0xb8802000, 00:80:c8:33:4f:96, IRQ 10
Probing IDE interface ide1...
Attempting manual resume
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Real Time Clock Driver v1.12ac
input: PC Speaker as /class/input/input1
FDC 0 is a post-1991 82077
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
Adding 136512k swap on /dev/hda5.  Priority:-1 extents:1 across:136512k
EXT3 FS on hda1, internal journal
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
eth0: enabling interface
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: set link BNC
eth0:    mode 0x7ffc0000, sia 0x10c4,0xffffef09,0xfffff7fd,0xffff0006
eth0:    set mode 0x7ffc0000, set sia 0xef09,0xf7fd,0x6
eth0: link up, media BNC
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
eth0: disabling interface
eth0: timeout expired stopping DMA
ACPI: PCI interrupt for device 0000:00:0b.0 disabled
eth0: enabling interface
eth0: set link BNC
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef09,0xfffff7fd,0xffff0006
eth0:    set mode 0x7ffc0040, set sia 0xef09,0xf7fd,0x6
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: link up, media BNC
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present


As a comparison, this happens when I boot with the RJ-45 cable plugged
in:

ACPI: PCI Interrupt 0000:00:0b.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ 10
de0: SROM leaf offset 30, default media 10baseT auto
de0:   media block #0: 10baseT-FD
de0:   media block #1: BNC
de0:   media block #2: 10baseT-HD
eth0: 21041 at 0xb8802000, 00:80:c8:33:4f:96, IRQ 10
Probing IDE interface ide1...
Attempting manual resume
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Real Time Clock Driver v1.12ac
input: PC Speaker as /class/input/input1
FDC 0 is a post-1991 82077
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
Adding 136512k swap on /dev/hda5.  Priority:-1 extents:1 across:136512k
EXT3 FS on hda1, internal journal
device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
eth0: enabling interface
eth0: set link 10baseT auto
eth0:    mode 0x7ffc0040, sia 0x10c4,0xffffef01,0xffffffff,0xffff0008
eth0:    set mode 0x7ffc0040, set sia 0xef01,0xffff,0x8
eth0: link up, media 10baseT auto
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present

-- 
Martin Michlmayr
http://www.cyrius.com/

^ permalink raw reply

* Re: 2.6.16-rc5-mm2: IPW_QOS: two remarks
From: Adrian Bunk @ 2006-03-07 17:06 UTC (permalink / raw)
  To: Andreas Happe; +Cc: Andrew Morton, linux-kernel, linville, jgarzik, netdev
In-Reply-To: <200603050146.27529.andreashappe@snikt.net>

On Sun, Mar 05, 2006 at 01:46:26AM +0100, Andreas Happe wrote:
> On Friday 03 March 2006 16:26, Adrian Bunk wrote:
>...
> > - please add a help text
> 
> i could add some stuff about WMM to its help text, but I think someone more 
> involved with the ipw2200-project should do that.

Even a short help text is better than no help text.

> andy
>...

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply

* [PATCH] IPoIB: Fix build now that destructor is in neigh_params
From: Roland Dreier @ 2006-03-07 19:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: netdev, davem, openib-general
In-Reply-To: <20060306211630.0df79464.akpm@osdl.org>

Dave, here's an incremental patch that fixes the IPoIB build (which is
broken in net-2.6.17 because of my screw-up, which left out the chunk
below).  I'll also send a full patch that can replace the "Move
destructor from neigh->ops to neigh_params" patch if you'd rather
replace it in your tree.

Thanks, and sorry about the screw-up.

---

Get rid of the last place in IPoIB where we clear
neigh->neighbour->ops->destructor.  This is broken now that the
destructor member has moved to neigh_params.

Signed-off-by: Roland Dreier <rolandd@cisco.com>

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index a2408d7..19fd173 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -115,7 +115,6 @@ static void ipoib_mcast_free(struct ipoi
 		if (neigh->ah)
 			ipoib_put_ah(neigh->ah);
 		*to_ipoib_neigh(neigh->neighbour) = NULL;
-		neigh->neighbour->ops->destructor = NULL;
 		kfree(neigh);
 	}
 

^ permalink raw reply related

* [PATCH] [NET]: Move destructor from neigh->ops to neigh_params
From: Roland Dreier @ 2006-03-07 19:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: netdev, davem, openib-general
In-Reply-To: <20060306211630.0df79464.akpm@osdl.org>

Here's the fixed version of the original patch.

---

From: Michael S. Tsirkin <mst@mellanox.co.il>

struct neigh_ops currently has a destructor field, which no in-kernel
drivers outside of infiniband use.  The infiniband/ulp/ipoib in-tree
driver stashes some info in the neighbour structure (the results of
the second-stage lookup from ARP results to real link-level path), and
it uses neigh->ops->destructor to get a callback so it can clean up
this extra info when a neighbour is freed.  We've run into problems
with this: since the destructor is in an ops field that is shared
between neighbours that may belong to different net devices, there's
no way to set/clear it safely.

The following patch moves this field to neigh_parms where it can be
safely set, together with its twin neigh_setup.  Two additional
patches in the patch series update ipoib to use this new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index c3b5f79..9d9cecd 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -247,7 +247,6 @@ static void path_free(struct net_device 
 		if (neigh->ah)
 			ipoib_put_ah(neigh->ah);
 		*to_ipoib_neigh(neigh->neighbour) = NULL;
-		neigh->neighbour->ops->destructor = NULL;
 		kfree(neigh);
 	}
 
@@ -530,7 +529,6 @@ static void neigh_add_path(struct sk_buf
 err:
 	*to_ipoib_neigh(skb->dst->neighbour) = NULL;
 	list_del(&neigh->list);
-	neigh->neighbour->ops->destructor = NULL;
 	kfree(neigh);
 
 	++priv->stats.tx_dropped;
@@ -769,21 +767,9 @@ static void ipoib_neigh_destructor(struc
 		ipoib_put_ah(ah);
 }
 
-static int ipoib_neigh_setup(struct neighbour *neigh)
-{
-	/*
-	 * Is this kosher?  I can't find anybody in the kernel that
-	 * sets neigh->destructor, so we should be able to set it here
-	 * without trouble.
-	 */
-	neigh->ops->destructor = ipoib_neigh_destructor;
-
-	return 0;
-}
-
 static int ipoib_neigh_setup_dev(struct net_device *dev, struct neigh_parms *parms)
 {
-	parms->neigh_setup = ipoib_neigh_setup;
+	parms->neigh_destructor = ipoib_neigh_destructor;
 
 	return 0;
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index a2408d7..19fd173 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -115,7 +115,6 @@ static void ipoib_mcast_free(struct ipoi
 		if (neigh->ah)
 			ipoib_put_ah(neigh->ah);
 		*to_ipoib_neigh(neigh->neighbour) = NULL;
-		neigh->neighbour->ops->destructor = NULL;
 		kfree(neigh);
 	}
 
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 6fa9ae1..b0666d6 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -68,6 +68,7 @@ struct neigh_parms
 	struct net_device *dev;
 	struct neigh_parms *next;
 	int	(*neigh_setup)(struct neighbour *);
+	void	(*neigh_destructor)(struct neighbour *);
 	struct neigh_table *tbl;
 
 	void	*sysctl_table;
@@ -145,7 +146,6 @@ struct neighbour
 struct neigh_ops
 {
 	int			family;
-	void			(*destructor)(struct neighbour *);
 	void			(*solicit)(struct neighbour *, struct sk_buff*);
 	void			(*error_report)(struct neighbour *, struct sk_buff*);
 	int			(*output)(struct sk_buff*);
diff --git a/net/atm/clip.c b/net/atm/clip.c
index 73370de..9d72817 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -289,7 +289,6 @@ static void clip_neigh_error(struct neig
 
 static struct neigh_ops clip_neigh_ops = {
 	.family =		AF_INET,
-	.destructor =		clip_neigh_destroy,
 	.solicit =		clip_neigh_solicit,
 	.error_report =		clip_neigh_error,
 	.output =		dev_queue_xmit,
@@ -346,6 +345,7 @@ static struct neigh_table clip_tbl = {
 
 	/* parameters are copied from ARP ... */
 	.parms = {
+		.destructor		= clip_neigh_destroy,
 		.tbl 			= &clip_tbl,
 		.base_reachable_time 	= 30 * HZ,
 		.retrans_time 		= 1 * HZ,
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index e68700f..3489e23 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -586,8 +586,8 @@ void neigh_destroy(struct neighbour *nei
 			kfree(hh);
 	}
 
-	if (neigh->ops && neigh->ops->destructor)
-		(neigh->ops->destructor)(neigh);
+	if (neigh->parms->neigh_destructor)
+		(neigh->parms->neigh_destructor)(neigh);
 
 	skb_queue_purge(&neigh->arp_queue);
 

^ permalink raw reply related

* Re: Re: TSO and IPoIB performance degradation
From: Matt Leininger @ 2006-03-07 21:44 UTC (permalink / raw)
  To: Shirley Ma
  Cc: netdev, Linux Kernel Mailing List, openib-general,
	David S. Miller, Stephen Hemminger
In-Reply-To: <OF336D72E6.999D2A30-ON8725712A.00117C92-8825712A.00116629@us.ibm.com>

On Mon, 2006-03-06 at 19:13 -0800, Shirley Ma wrote:
> 
> > More likely you are getting hit by the fact that TSO prevents the
> congestion
> window from increasing properly. This was fixed in 2.6.15 (around mid
> of Nov 2005). 
> 
> Yep, I noticed the same problem. After updating to the new kernel, the
> performance are much better, but it's still lower than before.

 Here is an updated version of OpenIB IPoIB performance for various
kernels with and without one of the TSO patches.  The netperf
performance for the latest kernels has not improved the TSO performance
drop.

  Any comments or suggestions would be appreciated.

  - Matt

> 
All benchmarks are with RHEL4 x86_64 with HCA FW v4.7.0
dual EM64T 3.2 GHz PCIe IB HCA (memfull)
patch 1 - remove changeset 314324121f9b94b2ca657a494cf2b9cb0e4a28cc

Kernel                OpenIB    msi_x  netperf (MB/s)  
2.6.16-rc5           in-kernel    1     367
2.6.15               in-kernel    1     382
2.6.14-rc4 patch 1   in-kernel    1     434 
2.6.14-rc4           in-kernel    1     385 
2.6.14-rc3           in-kernel    1     374 
2.6.13.2             svn3627      1     386 
2.6.13.2 patch 1     svn3627      1     446 
2.6.13.2             in-kernel    1     394 
2.6.13-rc3 patch 12  in-kernel    1     442 
2.6.13-rc3 patch 1   in-kernel    1     450 
2.6.13-rc3           in-kernel    1     395
2.6.12.5-lustre      in-kernel    1     399  
2.6.12.5 patch 1     in-kernel    1     464
2.6.12.5             in-kernel    1     402 
2.6.12               in-kernel    1     406 
2.6.12-rc6 patch 1   in-kernel    1     470 
2.6.12-rc6           in-kernel    1     407
2.6.12-rc5           in-kernel    1     405 
2.6.12-rc5 patch 1   in-kernel    1     474
2.6.12-rc4           in-kernel    1     470 
2.6.12-rc3           in-kernel    1     466 
2.6.12-rc2           in-kernel    1     469 
2.6.12-rc1           in-kernel    1     466
2.6.11               in-kernel    1     464 
2.6.11               svn3687      1     464 
2.6.9-11.ELsmp       svn3513      1     425  (Woody's results, 3.6Ghz
EM64T) 

^ permalink raw reply

* Re: Re: TSO and IPoIB performance degradation
From: Stephen Hemminger @ 2006-03-07 21:49 UTC (permalink / raw)
  To: Matt Leininger
  Cc: netdev, Linux Kernel Mailing List, openib-general,
	David S. Miller
In-Reply-To: <1141767891.6119.903.camel@localhost>

On Tue, 07 Mar 2006 13:44:51 -0800
Matt Leininger <mlleinin@hpcn.ca.sandia.gov> wrote:

> On Mon, 2006-03-06 at 19:13 -0800, Shirley Ma wrote:
> > 
> > > More likely you are getting hit by the fact that TSO prevents the
> > congestion
> > window from increasing properly. This was fixed in 2.6.15 (around mid
> > of Nov 2005). 
> > 
> > Yep, I noticed the same problem. After updating to the new kernel, the
> > performance are much better, but it's still lower than before.
> 
>  Here is an updated version of OpenIB IPoIB performance for various
> kernels with and without one of the TSO patches.  The netperf
> performance for the latest kernels has not improved the TSO performance
> drop.
> 
>   Any comments or suggestions would be appreciated.
> 
>   - Matt

Configuration information? like did you increase the tcp_rmem, tcp_wmem?
Tcpdump traces of what is being sent and available window?
Is IB using NAPI or just doing netif_rx()?

^ permalink raw reply

* Re: Re: TSO and IPoIB performance degradation
From: Michael S. Tsirkin @ 2006-03-07 21:53 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, Linux Kernel Mailing List, openib-general,
	David S. Miller
In-Reply-To: <20060307134907.733d3d27@localhost.localdomain>

Quoting r. Stephen Hemminger <shemminger@osdl.org>:
> Is IB using NAPI or just doing netif_rx()?

No, IPoIB doesn't use NAPI.

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies

^ permalink raw reply

* Re: [PATCH] IPoIB: Fix build now that destructor is in neigh_params
From: David S. Miller @ 2006-03-07 22:53 UTC (permalink / raw)
  To: rdreier; +Cc: akpm, netdev, openib-general
In-Reply-To: <adawtf6rrh7.fsf_-_@cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Tue, 07 Mar 2006 11:21:08 -0800

> Dave, here's an incremental patch that fixes the IPoIB build (which is
> broken in net-2.6.17 because of my screw-up, which left out the chunk
> below).  I'll also send a full patch that can replace the "Move
> destructor from neigh->ops to neigh_params" patch if you'd rather
> replace it in your tree.
> 
> Thanks, and sorry about the screw-up.

Why not just put this into your -ipoib patch set in -mm?
The change was for IPOIB's sake anyways...

^ permalink raw reply

* Re: [PATCH] IPoIB: Fix build now that destructor is in neigh_params
From: Roland Dreier @ 2006-03-07 22:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: akpm, netdev, openib-general
In-Reply-To: <20060307.145350.77989471.davem@davemloft.net>

    David> Why not just put this into your -ipoib patch set in -mm?
    David> The change was for IPOIB's sake anyways...

OK, good idea.  I'll put this in my for-2.6.17 and for-mm queues.

 - R.

^ permalink raw reply

* Re: [PATCH] [NETFILTER] ip_queue: Fix wrong skb->len == nlmsg_len assumption
From: David S. Miller @ 2006-03-07 23:01 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel
In-Reply-To: <440D838D.8000302@trash.net>

From: Patrick McHardy <kaber@trash.net>
Date: Tue, 07 Mar 2006 13:58:53 +0100

> Thomas Graf wrote:
> > The size of the skb carrying the netlink message is not
> > equivalent to the length of the actual netlink message
> > due to padding. ip_queue matches the length of the payload
> > against the original packet size to determine if packet
> > mangling is desired, due to the above wrong assumption
> > arbitary packets may not be mangled depening on their
> > original size.
> 
> Looks good, thanks Thomas. I think this should also go in 2.4.

Pushed to 2.6.16, 2.6.x stable, and 2.4.x.

Phew!

Thanks Thomas.

^ permalink raw reply

* Re: Re: TSO and IPoIB performance degradation
From: Matt Leininger @ 2006-03-08  0:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, Linux Kernel Mailing List, openib-general,
	David S. Miller
In-Reply-To: <20060307134907.733d3d27@localhost.localdomain>

On Tue, 2006-03-07 at 13:49 -0800, Stephen Hemminger wrote:
> On Tue, 07 Mar 2006 13:44:51 -0800
> Matt Leininger <mlleinin@hpcn.ca.sandia.gov> wrote:
> 
> > On Mon, 2006-03-06 at 19:13 -0800, Shirley Ma wrote:
> > > 
> > > > More likely you are getting hit by the fact that TSO prevents the
> > > congestion
> > > window from increasing properly. This was fixed in 2.6.15 (around mid
> > > of Nov 2005). 
> > > 
> > > Yep, I noticed the same problem. After updating to the new kernel, the
> > > performance are much better, but it's still lower than before.
> > 
> >  Here is an updated version of OpenIB IPoIB performance for various
> > kernels with and without one of the TSO patches.  The netperf
> > performance for the latest kernels has not improved the TSO performance
> > drop.
> > 
> >   Any comments or suggestions would be appreciated.
> > 
> >   - Matt
> 
> Configuration information? like did you increase the tcp_rmem, tcp_wmem?
> Tcpdump traces of what is being sent and available window?
> Is IB using NAPI or just doing netif_rx()?

  I used the standard setting for tcp_rmem and tcp_wmem.   Here are a
few other runs that change those variables.  I was able to improve
performance by ~30MB/s to 403 MB/s, but this is still a ways from the
474 MB/s before the TSO patches.

 Thanks,

	- Matt

All benchmarks are with RHEL4 x86_64 with HCA FW v4.7.0
dual EM64T 3.2 GHz PCIe IB HCA (memfull)
patch 1 - remove changeset 314324121f9b94b2ca657a494cf2b9cb0e4a28cc
msi_x=1 for all tests

Kernel                OpenIB     netperf (MB/s)  
2.6.16-rc5           in-kernel    403      
tcp_wmem 4096 87380 16777216 tcp_rmem 4096 87380 16777216

2.6.16-rc5           in-kernel    395      
tcp_wmem 4096 102400 16777216 tcp_rmem 4096 102400 16777216

2.6.16-rc5           in-kernel    392      
tcp_wmem 4096 65536 16777216 tcp_rmem 4096 87380 16777216

2.6.16-rc5           in-kernel    394      
tcp_wmem 4096 131072 16777216 tcp_rmem 4096 102400 16777216

2.6.16-rc5           in-kernel    377      
tcp_wmem 4096 131072 16777216 tcp_rmem 4096 153600 16777216

2.6.16-rc5           in-kernel    377      
tcp_wmem 4096 131072 16777216 tcp_rmem 4096 131072 16777216

2.6.16-rc5           in-kernel    353      
tcp_wmem 4096 262144 16777216 tcp_rmem 4096 262144 16777216

2.6.16-rc5           in-kernel    305      
tcp_wmem 4096 262144 16777216 tcp_rmem 4096 524288 16777216

2.6.16-rc5           in-kernel    303      
tcp_wmem 4096 131072 16777216 tcp_rmem 4096 524288 16777216

2.6.16-rc5           in-kernel    290      
tcp_wmem 4096 524288 16777216 tcp_rmem 4096 524288 16777216

2.6.16-rc5           in-kernel    367      default tcp values

--------------------
All with standard tcp settings
Kernel                OpenIB     netperf (MB/s)  
2.6.16-rc5           in-kernel    367      
2.6.15               in-kernel    382
2.6.14-rc4 patch 12  in-kernel    436 
2.6.14-rc4 patch 1   in-kernel    434 
2.6.14-rc4           in-kernel    385 
2.6.14-rc3           in-kernel    374 
2.6.13.2             svn3627      386 
2.6.13.2 patch 1     svn3627      446 
2.6.13.2             in-kernel    394 
2.6.13-rc3 patch 12  in-kernel    442 
2.6.13-rc3 patch 1   in-kernel    450 
2.6.13-rc3           in-kernel    395
2.6.12.5-lustre      in-kernel    399  
2.6.12.5 patch 1     in-kernel    464
2.6.12.5             in-kernel    402 
2.6.12               in-kernel    406 
2.6.12-rc6 patch 1   in-kernel    470 
2.6.12-rc6           in-kernel    407
2.6.12-rc5           in-kernel    405 
2.6.12-rc5 patch 1   in-kernel    474
2.6.12-rc4           in-kernel    470 
2.6.12-rc3           in-kernel    466 
2.6.12-rc2           in-kernel    469 
2.6.12-rc1           in-kernel    466
2.6.11               in-kernel    464 
2.6.11               svn3687      464 
2.6.9-11.ELsmp       svn3513      425  (Woody's results, 3.6Ghz EM64T) 

^ permalink raw reply

* Re: de2104x: interrupts before interrupt handler is registered
From: Francois Romieu @ 2006-03-08  0:15 UTC (permalink / raw)
  To: Martin Michlmayr; +Cc: netdev, linux-kernel
In-Reply-To: <20060307051152.GA1244@deprecation.cyrius.com>

Martin Michlmayr <tbm@cyrius.com> :
[...]
> It seems to help.  It's hard to say for sure because I don't have a
> foolproof way to reproduce this panic.  It _usually_ occurs after
> copying a few hundred MB but there's no clear trigger.  I've now copied
> a few GB around using a kernel with your patch and it hasn't crashed.

netdev watchdog events appear in the dmesg of the patched driver.
The driver survived it. So I'd say that the patch does its job.

OTOH, if you ever saw the unpatched driver survive this event, yell now.

-- 
Ueimor

^ permalink raw reply

* Re: Re: TSO and IPoIB performance degradation
From: David S. Miller @ 2006-03-08  0:18 UTC (permalink / raw)
  To: mlleinin; +Cc: netdev, linux-kernel, openib-general, shemminger
In-Reply-To: <1141776697.6119.938.camel@localhost>

From: Matt Leininger <mlleinin@hpcn.ca.sandia.gov>
Date: Tue, 07 Mar 2006 16:11:37 -0800

>   I used the standard setting for tcp_rmem and tcp_wmem.   Here are a
> few other runs that change those variables.  I was able to improve
> performance by ~30MB/s to 403 MB/s, but this is still a ways from the
> 474 MB/s before the TSO patches.

How limited are the IPoIB devices, TX descriptor wise?

One side effect of the TSO changes is that one extra descriptor
will be used for outgoing packets.  This is because we have to
put the headers as well as the user data, into page based
buffers now.

Perhaps you can experiment with increasing the transmit descriptor
table size, if that's possible.

^ permalink raw reply

* Re: Re: TSO and IPoIB performance degradation
From: Roland Dreier @ 2006-03-08  1:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel, openib-general, shemminger
In-Reply-To: <20060307.161808.60227862.davem@davemloft.net>

    David> How limited are the IPoIB devices, TX descriptor wise?

    David> One side effect of the TSO changes is that one extra
    David> descriptor will be used for outgoing packets.  This is
    David> because we have to put the headers as well as the user
    David> data, into page based buffers now.

We have essentially no limit on TX descriptors.  However I think
there's some confusion about TSO: IPoIB does _not_ do TSO -- generic
InfiniBand hardware does not have any TSO capability.  In the future
we might be able to implement TSO for certain hardware that does have
support, but even that requires some firmware help from the from the
HCA vendors, etc.  So right now the IPoIB driver does not do TSO.

The reason TSO comes up is that reverting the patch described below
helps (or helped at some point at least) IPoIB throughput quite a bit.
Clearly this was a bug fix so we can't revert it in general but I
think what Michael Tsirkin was suggesting at the beginning of this
thread is to do what the last paragraph of the changelog says -- find
some way to re-enable the trick.

diff-tree 3143241... (from e16fa6b...)
Author: David S. Miller <davem@davemloft.net>
Date:   Mon May 23 12:03:06 2005 -0700

    [TCP]: Fix stretch ACK performance killer when doing ucopy.

    When we are doing ucopy, we try to defer the ACK generation to
    cleanup_rbuf().  This works most of the time very well, but if the
    ucopy prequeue is large, this ACKing behavior kills performance.

    With TSO, it is possible to fill the prequeue so large that by the
    time the ACK is sent and gets back to the sender, most of the window
    has emptied of data and performance suffers significantly.

    This behavior does help in some cases, so we should think about
    re-enabling this trick in the future, using some kind of limit in
    order to avoid the bug case.

 - R.

^ permalink raw reply

* Re: Re: TSO and IPoIB performance degradation
From: David S. Miller @ 2006-03-08  1:23 UTC (permalink / raw)
  To: rdreier; +Cc: netdev, linux-kernel, openib-general, shemminger
In-Reply-To: <adaacc1raz9.fsf@cisco.com>

From: Roland Dreier <rdreier@cisco.com>
Date: Tue, 07 Mar 2006 17:17:30 -0800

> The reason TSO comes up is that reverting the patch described below
> helps (or helped at some point at least) IPoIB throughput quite a bit.

I wish you had started the thread by mentioning this specific
patch, we wasted an enormous amount of precious developer time
speculating and asking for arbitrary tests to be run in order
to narrow down the problem, yet you knew the specific change
that introduced the performance regression already...

This is a good example of how not to report a bug.

^ permalink raw reply

* Re: Re: TSO and IPoIB performance degradation
From: Roland Dreier @ 2006-03-08  1:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel, openib-general, shemminger
In-Reply-To: <20060307.172336.107863253.davem@davemloft.net>

    David> I wish you had started the thread by mentioning this
    David> specific patch, we wasted an enormous amount of precious
    David> developer time speculating and asking for arbitrary tests
    David> to be run in order to narrow down the problem, yet you knew
    David> the specific change that introduced the performance
    David> regression already...

Sorry, you're right.  I was a little confused because I had a memory of
Michael's original email (http://lkml.org/lkml/2006/3/6/150) quoting a
changelog entry, but looking back at the message, it was quoting
something completely different and misleading.

I think the most interesting email in the old thread is
http://openib.org/pipermail/openib-general/2005-October/012482.html
which shows that reverting 314324121 (the "stretch ACK performance
killer" fix) gives ~400 Mbit/sec in extra IPoIB performance.

 - R.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox