Netdev List
 help / color / mirror / Atom feed
* Re: dcache leak in 2.6.16-git8 II
From: Al Viro @ 2006-03-30 10:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, bharata, linux-kernel, netdev
In-Reply-To: <20060330095048.GW27946@ftp.linux.org.uk>

On Thu, Mar 30, 2006 at 10:50:48AM +0100, Al Viro wrote:
> FWIW...  One thing that might be useful here:

Here's what I had in mind:

Allow explictly mark allocated objects as "allocated here", so that they'll
show up that way for all slab debugging purposes.  New helpers:
	slab_charge_here(objp, cachep)
	slab_charge_caller(objp, cachep)
mark object as allocated resp. by place where we have ...charge_here() called
and by the caller of function that calls slab_charge_caller().

It's useful when call chain leading to allocation in given cache always
ends the same way, making normal caller accounting uninformative.  E.g.
allocation of struct socket is always done via sock_alloc() => new_inode() =>
alloc_inode() => sock_alloc_inode() => kmem_cache_alloc().  The last step
has no chance to give any useful information about the caller; adding
slab_charge_caller() in sock_alloc() will give us much more useful picture.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
----

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 3af03b1..6cc2f96 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -151,6 +151,16 @@ static inline void *kcalloc(size_t n, si
 extern void kfree(const void *);
 extern unsigned int ksize(const void *);
 
+#ifndef CONFIG_DEBUG_SLAB
+#define slab_set_creator(objp, cachep, address)
+#define slab_charge_here(objp, cachep)
+#else
+extern void slab_set_creator(void *objp, struct kmem_cache *cachep, void *address);
+extern void slab_charge_here(void *objp, struct kmem_cache *cachep);
+#endif
+#define slab_charge_caller(objp, cachep) \
+	slab_set_creator((objp), (cachep), __builtin_return_address(0))
+
 #ifdef CONFIG_NUMA
 extern void *kmem_cache_alloc_node(kmem_cache_t *, gfp_t flags, int node);
 extern void *kmalloc_node(size_t size, gfp_t flags, int node);
@@ -189,6 +199,10 @@ void kfree(const void *m);
 unsigned int ksize(const void *m);
 unsigned int kmem_cache_size(struct kmem_cache *c);
 
+#define slab_set_creator(objp, cachep, address)
+#define slab_charge_here(objp, cachep)
+#define slab_charge_caller(objp, cachep)
+
 static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
 {
 	return __kzalloc(n * size, flags);
diff --git a/mm/slab.c b/mm/slab.c
index 4cbf8bb..db21301 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3144,6 +3144,23 @@ void *kmem_cache_zalloc(struct kmem_cach
 }
 EXPORT_SYMBOL(kmem_cache_zalloc);
 
+#ifdef CONFIG_DEBUG_SLAB
+void slab_set_creator(void *objp, struct kmem_cache *cachep, void *address)
+{
+	if (cachep->flags & SLAB_STORE_USER)
+		*dbg_userword(cachep, objp) = address;
+}
+
+EXPORT_SYMBOL(slab_set_creator);
+
+void slab_charge_here(void *objp, struct kmem_cache *cachep)
+{
+	slab_set_creator(objp, cachep, __builtin_return_address(0));
+}
+EXPORT_SYMBOL(slab_charge_here);
+
+#endif
+
 /**
  * kmem_ptr_validate - check if an untrusted pointer might
  *	be a slab entry.
diff --git a/net/socket.c b/net/socket.c
index fcd77ea..0c4d61b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -517,6 +517,9 @@ static struct socket *sock_alloc(void)
 	if (!inode)
 		return NULL;
 
+	slab_charge_caller(container_of(inode, struct socket_alloc, vfs_inode),
+			   sock_inode_cachep);
+
 	sock = SOCKET_I(inode);
 
 	inode->i_mode = S_IFSOCK|S_IRWXUGO;

^ permalink raw reply related

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Herbert Xu @ 2006-03-30 10:12 UTC (permalink / raw)
  To: Boris B. Zhmurov
  Cc: David S. Miller, jesse.brandeburg, nipsy, jrlundgren, cat,
	djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
	jesse.brandeburg, E1000-devel
In-Reply-To: <442BAC99.2090404@kernelpanic.ru>

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

On Thu, Mar 30, 2006 at 10:02:01AM +0000, Boris B. Zhmurov wrote:
> 
> [zhmurov@builds linux-2.6.16]$ patch -p1 < 
> ../../../SOURCES/linux-2.6.16-e1000-try-to-fix-assertion_sk_forward_alloc_failed_by_Herbert_Xu.patch 
> 
> patching file drivers/net/e1000/e1000_main.c
> Reversed (or previously applied) patch detected!  Assume -R? [n]
> 
> Herbert, is that patch already included in 2.6.16.1?

Not really.  It's just patch being silly (or too smart :)

Here it is again rediffed against 2.6.16.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: e1000-1.patch --]
[-- Type: text/plain, Size: 1210 bytes --]

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 84dcca3..847d168 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2932,13 +2932,6 @@
 		count++;
 #endif
 
-#ifdef NETIF_F_TSO
-	/* Controller Erratum workaround */
-	if (!skb->data_len && tx_ring->last_tx_tso &&
-		!skb_shinfo(skb)->tso_size)
-		count++;
-#endif
-
 	count += TXD_USE_COUNT(len, max_txd_pwr);
 
 	if (adapter->pcix_82544)
@@ -2957,9 +2950,6 @@
 				       max_txd_pwr);
 	if (adapter->pcix_82544)
 		count += nr_frags;
-
-	if (adapter->hw.tx_pkt_filtering && (adapter->hw.mac_type == e1000_82573) )
-		e1000_transfer_dhcp_info(adapter, skb);
 
 	local_irq_save(flags);
 	if (!spin_trylock(&tx_ring->tx_lock)) {
@@ -2967,6 +2957,16 @@
 		local_irq_restore(flags);
 		return NETDEV_TX_LOCKED;
 	}
+
+#ifdef NETIF_F_TSO
+	/* Controller Erratum workaround */
+	if (!skb->data_len && tx_ring->last_tx_tso &&
+		!skb_shinfo(skb)->tso_size)
+		count++;
+#endif
+
+	if (adapter->hw.tx_pkt_filtering && (adapter->hw.mac_type == e1000_82573) )
+		e1000_transfer_dhcp_info(adapter, skb);
 
 	/* need: count + 2 desc gap to keep tail from touching
 	 * head, otherwise try next time */

^ permalink raw reply related

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-30 10:02 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, jesse.brandeburg, nipsy, jrlundgren, cat,
	djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
	jesse.brandeburg, E1000-devel
In-Reply-To: <20060330095245.GA2385@gondor.apana.org.au>

Hello, Herbert Xu.

On 30.03.2006 13:52 you said the following:

> On Wed, Mar 29, 2006 at 08:44:09PM -0800, David S. Miller wrote:
> 
>>Herbert do you see any holes here?
> 
> 
> Well I started from the beginning again, and found this.  This may be
> the smoking gun that we're after :)
> 
> The xmit routine is lockless but checks last_tx_tso outside the locked
> section.  So if a TSO packet wins a race against a non-TSO packet with
> last_tx_tso == 0 then we'll have memory corruption.
> 
> Everyone, please try this patch and let us know whether the problem
> goes away.
> 
> Thanks,



[zhmurov@builds linux-2.6.16]$ patch -p1 < 
../../../SOURCES/linux-2.6.16-e1000-try-to-fix-assertion_sk_forward_alloc_failed_by_Herbert_Xu.patch 

patching file drivers/net/e1000/e1000_main.c
Reversed (or previously applied) patch detected!  Assume -R? [n]


Herbert, is that patch already included in 2.6.16.1?

-- 
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Herbert Xu @ 2006-03-30  9:52 UTC (permalink / raw)
  To: David S. Miller
  Cc: jesse.brandeburg, nipsy, jrlundgren, cat, djani22, yoseph.basri,
	bb, mykleb, olel, michal, chris, netdev, jesse.brandeburg,
	E1000-devel
In-Reply-To: <20060329.204409.109404254.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

On Wed, Mar 29, 2006 at 08:44:09PM -0800, David S. Miller wrote:
> 
> Herbert do you see any holes here?

Well I started from the beginning again, and found this.  This may be
the smoking gun that we're after :)

The xmit routine is lockless but checks last_tx_tso outside the locked
section.  So if a TSO packet wins a race against a non-TSO packet with
last_tx_tso == 0 then we'll have memory corruption.

Everyone, please try this patch and let us know whether the problem
goes away.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: e1000.patch --]
[-- Type: text/plain, Size: 1229 bytes --]

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 49cd096..96b7bef 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2821,13 +2821,6 @@
 		count++;
 #endif
 
-#ifdef NETIF_F_TSO
-	/* Controller Erratum workaround */
-	if (!skb->data_len && tx_ring->last_tx_tso &&
-	    !skb_shinfo(skb)->tso_size)
-		count++;
-#endif
-
 	count += TXD_USE_COUNT(len, max_txd_pwr);
 
 	if (adapter->pcix_82544)
@@ -2846,11 +2839,6 @@
 				       max_txd_pwr);
 	if (adapter->pcix_82544)
 		count += nr_frags;
-
-
-	if (adapter->hw.tx_pkt_filtering &&
-	    (adapter->hw.mac_type == e1000_82573))
-		e1000_transfer_dhcp_info(adapter, skb);
 
 	local_irq_save(flags);
 	if (!spin_trylock(&tx_ring->tx_lock)) {
@@ -2858,6 +2846,17 @@
 		local_irq_restore(flags);
 		return NETDEV_TX_LOCKED;
 	}
+
+#ifdef NETIF_F_TSO
+	/* Controller Erratum workaround */
+	if (!skb->data_len && tx_ring->last_tx_tso &&
+	    !skb_shinfo(skb)->tso_size)
+		count++;
+#endif
+
+	if (adapter->hw.tx_pkt_filtering &&
+	    (adapter->hw.mac_type == e1000_82573))
+		e1000_transfer_dhcp_info(adapter, skb);
 
 	/* need: count + 2 desc gap to keep tail from touching
 	 * head, otherwise try next time */

^ permalink raw reply related

* Re: dcache leak in 2.6.16-git8 II
From: Al Viro @ 2006-03-30  9:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, bharata, linux-kernel, netdev
In-Reply-To: <200603300026.59131.ak@suse.de>

On Thu, Mar 30, 2006 at 12:26:58AM +0200, Andi Kleen wrote:
> dentry_cache      999168 1024594    208   19    1 : tunables  120   60    8 : slabdata  53926  53926      0 : shrinker stat 18522624 8871000
> 
> Hrm interesting is this one:
> 
> sock_inode_cache  996784 996805    704    5    1 : tunables   54   27    8 : slabdata 199361 199361      0
> 
> Most of the leaked dentries seem to be sockets. I didn't notice this earlier.

ITYM "all".  You've got 2384 non-socket dentries, which is about what I'd
expect on severely pressured busy system...
 
> This was with the debugging patches applied btw. 
> 
> So maybe we have a socket leak?

Looks like that.  Note: /proc/slab_allocators won't help here; all allocations
into that cache are done from sock_alloc_inode(), which is what will be shown.
Not useful...  Moreover, call chain is predictable several steps deeper than
that: sock_alloc_inode() (as ->alloc_inode()) from alloc_inode() from
new_inode() from sock_alloc().

FWIW...  One thing that might be useful here:

a) slab_set_creator(objp, cachep, address): no-op unless DEBUG_SLAB_LEAK set,
void slab_set_creator(void *objp, struct kmem_cache *cachep, void *address)
{
        if (cachep->flags & SLAB_STORE_USER)
                *dbg_userword(cachep, objp) = address;
}
otherwise (has to be function in mm/slab.c; exported).

b)
void slab_charge_here(void *objp, struct kmem_cache *cachep, void *address)
{
	slab_set_creator(objp, cachep, __builtin_return_address(0));
}
in mm/slab.c (exported)

c) #define slab_charge_caller(objp, cachep) \
	slab_set_creator((objp), (cachep), __builtin_return_address(0))


Then we can do the following: in sock_alloc() have
	slab_charge_caller(container_of(inode, struct socket_alloc, vfs_inode),
			   sock_inode_cachep);

and _then_ /proc/slab_allocators will charge these guys to callers of
sock_alloc(); if you'll need to pursue it further, you can always slap
more slab_charge_...() where needed.

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Johan Lundgren @ 2006-03-30  9:49 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: nipsy, cat, djani22, yoseph.basri, bb, mykleb, olel, michal,
	chris, netdev, Jesse Brandeburg, davem, E1000-devel
In-Reply-To: <C925F8B43D79CC49ACD0601FB68FF50C07461F59@orsmsx408>

Hi,

>What seems to cause this problem?

That I cannot say but the problem was fixed by removing one e1000 card
from the server (I initially had two e1000 cards installed in addition
to the two tg3 cards on the board).

Another fix was to disable TSO with ethtool.

>What motherboards are you using?

Supermicro H8DAE (dual Opteron)

>Are you all using iptables?  Are you all routing?

Iptables yes, routing no.

>none of you are using an 82571/2/3 (pci express)

Correct.

Regards,
Johan


On 3/30/06, Brandeburg, Jesse <jesse.brandeburg@intel.com> wrote:
> Hi all, I've identified you as people who have at some point in the past
> emailed one of the Linux lists with problems with e1000 and
> sk_forward_alloc.  It seems to be fairly widespread, but only seems to
> have appeared with recent kernel changes (after 2.6.12...)
>
> What I need from you is a reproducible test, and some information.  I
> have never been able to reproduce this, and I'm trying to isolate the
> problem a bit.  What motherboards are you using?  What seems to cause
> this problem?  Are you all using iptables?  Are you all routing? From
> the reports I assume none of you are using an 82571/2/3 (pci express)
>
> As far as I know e1000 has the same requirement as tg3 and some others
> where we have to modify the header of the skb in the case of transmits
> using TSO.  I don't see anywhere else that the driver modifies the skb.
> Tomorrow I'll generate a patch to try a more paranoid copying of the
> skb, I hope some of you can test.
>
> To do this we have code like so in e1000_tso:
> 2529         if (skb_shinfo(skb)->tso_size) {
> 2530                 if (skb_header_cloned(skb)) {
> 2531                         err = pskb_expand_head(skb, 0, 0,
> GFP_ATOMIC);
> 2532                         if (err)
> 2533                                 return err;
> 2534                 }
> 2535
> 2536                 hdr_len = ((skb->h.raw - skb->data) +
> (skb->h.th->doff << 2));
> 2537                 mss = skb_shinfo(skb)->tso_size;
> 2538                 if (skb->protocol == ntohs(ETH_P_IP)) {
> 2539                         skb->nh.iph->tot_len = 0;
> 2540                         skb->nh.iph->check = 0;
>
> Thanks for your assistance
>
> Jesse
>


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid\x110944&bid$1720&dat\x121642

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-30  8:39 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: nipsy, jrlundgren, cat, djani22, yoseph.basri, mykleb, olel,
	michal, chris, netdev, Jesse Brandeburg, davem, E1000-devel
In-Reply-To: <C925F8B43D79CC49ACD0601FB68FF50C07461F59@orsmsx408>

[-- Attachment #1: Type: text/plain, Size: 2898 bytes --]

Hello, Brandeburg, Jesse.

On 30.03.2006 06:53 you said the following:

> Hi all, I've identified you as people who have at some point in the past
> emailed one of the Linux lists with problems with e1000 and
> sk_forward_alloc.  It seems to be fairly widespread, but only seems to
> have appeared with recent kernel changes (after 2.6.12...)
> 
> What I need from you is a reproducible test, and some information.  I
> have never been able to reproduce this, and I'm trying to isolate the
> problem a bit.  What motherboards are you using?  What seems to cause
> this problem?  Are you all using iptables?  Are you all routing? From
> the reports I assume none of you are using an 82571/2/3 (pci express)
> 
> As far as I know e1000 has the same requirement as tg3 and some others
> where we have to modify the header of the skb in the case of transmits
> using TSO.  I don't see anywhere else that the driver modifies the skb.
> Tomorrow I'll generate a patch to try a more paranoid copying of the
> skb, I hope some of you can test.


Jesse, I'd like to try your patches to help get rid of this annoying 
problem. I want to say, that this problem 100% reproucible on my 
hard-loading webserver based on RHEL4 with kernels 2.6.9 (rhel4 
original) - 2.6.15.7 (i.e. all releases from 2.6.9 to 2.6.15.7 affected).

  I have an asus 1unit server with double P4@2.8Ghz processors with 
enabled HyperThreading and 3Gb RAM, but with 1Gb RAM I have the same 
problem, thus it's not a RAM issue. This is really high load server, 
serving about 1000-1500 http requests per second plus about 500-1000 ftp 
requests per second.

  dmesg, lspci -vv, iptables -nL and ip route show in attached files. 
Wating for your instructions.


P.S. I use two e1000 adapters at the same time with advanced routing 
like this:

[root@msk4 ~]# cat /etc/rc.local |grep ip
# This script will be executed *after* all the other init scripts.
/sbin/ip rule add from 83.102.130.174 table NEW
/sbin/ip route add default via 83.102.130.173 dev eth0 table NEW



And also I have some hardcored sysctl options like this:

net.core.somaxconn=1024
net.ipv4.tcp_timestamps=0
net.ipv4.tcp_max_tw_buckets=720000
net.core.rmem_default=215040
net.core.rmem_max=262144
net.core.wmem_default=215040
net.core.wmem_max=262144
net.core.optmem_max=81920
net.core.netdev_max_backlog=8192
net.ipv4.neigh.default.gc_thresh1=512
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh3=4096
net.ipv4.neigh.default.unres_qlen=64
net.ipv4.neigh.default.proxy_qlen=256
net.ipv4.tcp_rmem = 4096 131072 262144
net.ipv4.tcp_wmem = 4096 131072 262144
net.ipv4.tcp_keepalive_time=1800
net.ipv4.tcp_sack=0
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_window_scaling=0
net.ipv4.tcp_keepalive_probes=3
kernel.sem=250 32000 100 128



-- 
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"


[-- Attachment #2: dmesg --]
[-- Type: text/plain, Size: 19109 bytes --]

Linux version 2.6.15-7.0.bbel4smp (zhmurov@builds.kernelpanic.ru) (gcc version 3.4.5 20051201 (Red Hat 3.4.5-2)) #1 SMP Tue Mar 28 14:52:13 MSD 2006
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000bfffb000 (usable)
 BIOS-e820: 00000000bfffb000 - 00000000bffff000 (ACPI data)
 BIOS-e820: 00000000bffff000 - 00000000c0000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
2175MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f0830
On node 0 totalpages: 786427
  DMA zone: 4096 pages, LIFO batch:0
  DMA32 zone: 0 pages, LIFO batch:0
  Normal zone: 225280 pages, LIFO batch:31
  HighMem zone: 557051 pages, LIFO batch:31
DMI 2.3 present.
Using APIC driver default
ACPI: RSDP (v000 ASUS                                  ) @ 0x000f55a0
ACPI: RSDT (v001 ASUS   PR-DLSR  0x42302e31 MSFT 0x31313031) @ 0xbfffb000
ACPI: FADT (v001 ASUS   PR-DLSR  0x42302e31 MSFT 0x31313031) @ 0xbfffb145
ACPI: BOOT (v001 ASUS   PR-DLSR  0x42302e31 MSFT 0x31313031) @ 0xbfffb034
ACPI: SPCR (v001 ASUS   PR-DLSR  0x42302e31 MSFT 0x31313031) @ 0xbfffb05c
ACPI: MADT (v001 ASUS   PR-DLSR  0x42302e31 MSFT 0x31313031) @ 0xbfffb0a9
ACPI: DSDT (v001   ASUS PR-DLSR  0x00001000 MSFT 0x0100000b) @ 0x00000000
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x06] enabled)
Processor #6 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
Processor #7 15:2 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-15
ACPI: IOAPIC (id[0x03] address[0xfec01000] gsi_base[16])
IOAPIC[1]: apic_id 3, version 17, address 0xfec01000, GSI 16-31
ACPI: IOAPIC (id[0x04] address[0xfec02000] gsi_base[32])
IOAPIC[2]: apic_id 4, version 17, address 0xfec02000, GSI 32-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 3 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at c4000000 (gap: c0000000:3ec00000)
Built 1 zonelists
Kernel command line: ro root=LABEL=/
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
mapped IOAPIC to ffffb000 (fec01000)
mapped IOAPIC to ffffa000 (fec02000)
Initializing CPU#0
CPU 0 irqstacks, hard=c0391000 soft=c0371000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 2790.958 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 3114260k/3145708k available (1599k kernel code, 30168k reserved, 685k data, 188k init, 2228204k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5589.74 BogoMIPS (lpj=27948717)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080 00004400 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Xeon(TM) CPU 2.80GHz stepping 05
Booting processor 1/1 eip 2000
CPU 1 irqstacks, hard=c0392000 soft=c0372000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5581.58 BogoMIPS (lpj=27907945)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080 00004400 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Xeon(TM) CPU 2.80GHz stepping 05
Booting processor 2/6 eip 2000
CPU 2 irqstacks, hard=c0393000 soft=c0373000
Initializing CPU#2
Calibrating delay using timer specific routine.. 5581.69 BogoMIPS (lpj=27908471)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080 00004400 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#2.
CPU2: Intel P4/Xeon Extended MCE MSRs (12) available
CPU2: Thermal monitoring enabled
CPU2: Intel(R) Xeon(TM) CPU 2.80GHz stepping 05
Booting processor 3/7 eip 2000
CPU 3 irqstacks, hard=c0394000 soft=c0374000
Initializing CPU#3
Calibrating delay using timer specific routine.. 5581.78 BogoMIPS (lpj=27908908)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 3
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080 00004400 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#3.
CPU3: Intel P4/Xeon Extended MCE MSRs (12) available
CPU3: Thermal monitoring enabled
CPU3: Intel(R) Xeon(TM) CPU 2.80GHz stepping 05
Total of 4 processors activated (22334.80 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ...  failed.
...trying to set up timer as Virtual Wire IRQ... works.
checking TSC synchronization across 4 CPUs: passed.
Brought up 4 CPUs
checking if image is initramfs... it is
Freeing initrd memory: 656k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xf18e0, last bus=2
PCI: Using configuration type 1
ACPI: Subsystem revision 20050902
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 11 *12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKI] (IRQs 3 *4 5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKJ] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKK] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKL] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0
ACPI: PCI Interrupt Link [LNKM] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKN] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKO] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKP] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKQ] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKR] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKS] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKT] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKU] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKV] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKW] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKX] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKY] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKZ] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK2] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK3] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK4] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNK5] (IRQs 5 10 11 12 14 15) *0
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
ACPI: _BBN and _CRS returns different value for \_SB_.PCI1. Select _CRS
ACPI: _BBN and _CRS returns different value for \_SB_.PCI2. Select _CRS
Boot video device is 0000:00:03.0
PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI _CRS 1 overrides _BBN 0
ACPI: PCI Root Bridge [PCI1] (0000:01)
PCI: Probing PCI hardware (bus 01)
ACPI: _BBN and _CRS returns different value for \_SB_.PCI1. Select _CRS
ACPI: _BBN and _CRS returns different value for \_SB_.PCI2. Select _CRS
ACPI: PCI Interrupt Routing Table [\_SB_.PCI1._PRT]
ACPI: PCI _CRS 2 overrides _BBN 0
ACPI: PCI Root Bridge [PCI2] (0000:02)
PCI: Probing PCI hardware (bus 02)
ACPI: _BBN and _CRS returns different value for \_SB_.PCI1. Select _CRS
ACPI: _BBN and _CRS returns different value for \_SB_.PCI2. Select _CRS
ACPI: PCI Interrupt Routing Table [\_SB_.PCI2._PRT]
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
Simple Boot Flag at 0x3a set to 0x80
highmem bounce pool size: 64 pages
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
Linux agpgart interface v0.101 (c) Dave Jones
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks CSB5: IDE controller at PCI slot 0000:00:0f.1
SvrWks CSB5: chipset revision 147
SvrWks CSB5: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xa800-0xa807, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xa808-0xa80f, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
Probing IDE interface ide1...
Probing IDE interface ide0...
Probing IDE interface ide1...
mice: PS/2 mouse device common for all mice
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
NET: Registered protocol family 2
IP route cache hash table entries: 131072 (order: 7, 524288 bytes)
TCP established hash table entries: 524288 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
TCP bic registered
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
Starting balanced_irq
Using IPI Shortcut mode
Freeing unused kernel memory: 188k freed
SCSI subsystem initialized
megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
megaraid: 2.20.4.6 (Release Date: Mon Mar 07 12:27:22 EST 2005)
megaraid: probe new device 0x1000:0x1960:0x1000:0xa520: bus 1:slot 3:func 0
ACPI: PCI Interrupt 0000:01:03.0[A] -> GSI 27 (level, low) -> IRQ 16
megaraid: fw version:[1Z26] bios version:[G112]
scsi0 : LSI Logic MegaRAID driver
scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
  Vendor: SDR       Model: GEM318            Rev: 0   
  Type:   Processor                          ANSI SCSI revision: 02
scsi[0]: scanning scsi channel 1 [Phy 1] for non-raid devices
scsi[0]: scanning scsi channel 2 [virtual] for logical drives
  Vendor: MegaRAID  Model: LD0 RAID5 70090R  Rev: 1Z26
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 143544320 512-byte hdwr sectors (73495 MB)
sda: asking for cache data failed
sda: assuming drive cache: write through
SCSI device sda: 143544320 512-byte hdwr sectors (73495 MB)
sda: asking for cache data failed
sda: assuming drive cache: write through
 sda: sda1 sda2 sda3 sda4 < sda5 >
sd 0:2:0:0: Attached scsi disk sda
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
 0:0:11:0: Attached scsi generic sg0 type 3
sd 0:2:0:0: Attached scsi generic sg1 type 0
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
Intel(R) PRO/1000 Network Driver - version 6.1.16-k2-NAPI
Copyright (c) 1999-2005 Intel Corporation.
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 18 (level, low) -> IRQ 17
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:01:02.0[A] -> GSI 24 (level, low) -> IRQ 18
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device
piix4_smbus 0000:00:0f.0: Unusual config register value
piix4_smbus 0000:00:0f.0: Try using fix_hstcfg=1 if you experience problems
piix4_smbus 0000:00:0f.0: Illegal Interrupt configuration (or code out of date)!
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ACPI: PCI Interrupt Link [LNK5] BIOS reported IRQ 0, using IRQ 11
ACPI: PCI Interrupt Link [LNK5] enabled at IRQ 11
ACPI: PCI Interrupt 0000:00:0f.2[A] -> Link [LNK5] -> GSI 11 (level, low) -> IRQ 11
ohci_hcd 0000:00:0f.2: OHCI Host Controller
ohci_hcd 0000:00:0f.2: new USB bus registered, assigned bus number 1
ohci_hcd 0000:00:0f.2: irq 11, io mem 0xf5000000
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 4 ports detected
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ibm_acpi: ec object not found
EXT3 FS on sda1, internal journal
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
program dmraid is using a deprecated SCSI ioctl, please convert it to SG_IO
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 1052248k swap on /dev/sda2.  Priority:-1 extents:1 across:1052248k
ip_tables: (C) 2000-2002 Netfilter core team
ip_conntrack version 2.4 (8192 buckets, 65536 max) - 212 bytes per conntrack
e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
loop: loaded (max 8 devices)
TCP: Treason uncloaked! Peer 80.72.16.78:50766/80 shrinks window 2092777962:2092777963. Repaired.
TCP: Treason uncloaked! Peer 80.72.16.78:50769/80 shrinks window 2096690385:2096690386. Repaired.
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)

[-- Attachment #3: ip_route_show --]
[-- Type: text/plain, Size: 222 bytes --]

83.102.130.172/30 dev eth0  proto kernel  scope link  src 83.102.130.174 
83.102.130.176/30 dev eth1  proto kernel  scope link  src 83.102.130.178 
169.254.0.0/16 dev eth1  scope link 
default via 83.102.130.177 dev eth1 

[-- Attachment #4: iptables --]
[-- Type: text/plain, Size: 1650 bytes --]

Chain INPUT (policy DROP)
target     prot opt source               destination         
DROP       all  --  212.176.84.130       0.0.0.0/0           
ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0           limit: avg 100/sec burst 5 
ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
DROP       all  --  127.0.0.1            0.0.0.0/0           
DROP       all  --  0.0.0.0/0            127.0.0.1           
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22 
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:25 
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:80 
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:3690 
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:5432 
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           multiport dports 20,21 
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpts:64000:65500 
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state ESTABLISHED 

Chain FORWARD (policy DROP)
target     prot opt source               destination         

Chain OUTPUT (policy DROP)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            212.176.84.130      
ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0           limit: avg 100/sec burst 5 
ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state ESTABLISHED 
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state NEW 

[-- Attachment #5: lspci_vv --]
[-- Type: text/plain, Size: 6906 bytes --]

00:00.0 Host bridge: Broadcom CMIC-LE Host Bridge (GC-LE chipset) (rev 33)
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:00.1 Host bridge: Broadcom CMIC-LE Host Bridge (GC-LE chipset)
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:00.2 Host bridge: Broadcom CMIC-LE Host Bridge (GC-LE chipset)
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:02.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
	Subsystem: Intel Corporation 82540EM Gigabit Ethernet Controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (63750ns min), Cache Line Size 08
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at f7000000 (32-bit, non-prefetchable) [size=128K]
	Region 2: I/O ports at d800 [size=64]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [e4] PCI-X non-bridge device.
		Command: DPERE- ERO+ RBC=0 OST=0
		Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=2, DMOST=0, DMCRS=1, RSCEM-
	Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000

00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA])
	Subsystem: ATI Technologies Inc Rage XL
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (2000ns min), Cache Line Size 08
	Interrupt: pin A routed to IRQ 10
	Region 0: Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: I/O ports at d400 [size=256]
	Region 2: Memory at f5800000 (32-bit, non-prefetchable) [size=4K]
	Expansion ROM at febe0000 [disabled] [size=128K]
	Capabilities: [5c] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0f.0 ISA bridge: Broadcom CSB5 South Bridge (rev 93)
	Subsystem: Broadcom CSB5 South Bridge
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 32

00:0f.1 IDE interface: Broadcom CSB5 IDE Controller (rev 93) (prog-if 8a [Master SecP PriP])
	Subsystem: Broadcom CSB5 IDE Controller
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64, Cache Line Size 08
	Region 0: I/O ports at <ignored>
	Region 1: I/O ports at <ignored>
	Region 2: I/O ports at <ignored>
	Region 3: I/O ports at <ignored>
	Region 4: I/O ports at a800 [size=16]

00:0f.2 USB Controller: Broadcom OSB4/CSB5 OHCI USB Controller (rev 05) (prog-if 10 [OHCI])
	Subsystem: Broadcom OSB4/CSB5 OHCI USB Controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (20000ns max), Cache Line Size 08
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at f5000000 (32-bit, non-prefetchable) [size=4K]

00:0f.3 Host bridge: Broadcom CSB5 LPC bridge
	Subsystem: Broadcom: Unknown device 0230
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0

00:11.0 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05)
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Capabilities: [60] PCI-X non-bridge device.
		Command: DPERE- ERO- RBC=0 OST=4
		Status: Bus=0 Dev=0 Func=0 64bit+ 133MHz+ SCD- USC-, DC=bridge, DMMRBC=0, DMOST=4, DMCRS=0, RSCEM-

00:11.2 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05)
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr+ DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Capabilities: [60] PCI-X non-bridge device.
		Command: DPERE- ERO- RBC=0 OST=4
		Status: Bus=0 Dev=0 Func=0 64bit+ 133MHz+ SCD- USC-, DC=bridge, DMMRBC=0, DMOST=4, DMCRS=0, RSCEM-

01:02.0 Ethernet controller: Intel Corporation 82544GC Gigabit Ethernet Controller (LOM) (rev 02)
	Subsystem: Intel Corporation 82544GC Based Network Connection
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 32 (63750ns min), Cache Line Size 08
	Interrupt: pin A routed to IRQ 18
	Region 0: Memory at f4800000 (32-bit, non-prefetchable) [size=128K]
	Region 1: Memory at f4000000 (32-bit, non-prefetchable) [size=128K]
	Region 2: I/O ports at a400 [size=32]
	Expansion ROM at feae0000 [disabled] [size=128K]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [e4] PCI-X non-bridge device.
		Command: DPERE- ERO+ RBC=0 OST=0
		Status: Bus=0 Dev=0 Func=0 64bit+ 133MHz+ SCD- USC-, DC=simple, DMMRBC=2, DMOST=0, DMCRS=1, RSCEM-
	Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000

01:03.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID (rev 01)
	Subsystem: LSI Logic / Symbios Logic MegaRAID ZCR SCSI 320-0 Controller
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 32, Cache Line Size 08
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at f8000000 (32-bit, prefetchable) [size=64M]
	[virtual] Expansion ROM at c4000000 [disabled] [size=32K]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-


^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Mark Nipper @ 2006-03-30  8:24 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: jrlundgren, cat, djani22, yoseph.basri, bb, mykleb, olel, michal,
	chris, netdev, Jesse Brandeburg, davem, E1000-devel
In-Reply-To: <C925F8B43D79CC49ACD0601FB68FF50C07461F59@orsmsx408>

On 29 Mar 2006, Brandeburg, Jesse wrote:
> What I need from you is a reproducible test, and some information.  I
> have never been able to reproduce this, and I'm trying to isolate the
> problem a bit.  What motherboards are you using?  What seems to cause
> this problem?  Are you all using iptables?  Are you all routing? From
> the reports I assume none of you are using an 82571/2/3 (pci express)

        Unfortunately, my problem machine is a remote, leased
server, so I'd have to ask my provider for information on the
motherboard.  I have no specific idea what causes the problem as
the assertions simply show up after the fact in my logcheck
output.  I am not using iptables or routing.  And I'm fairly
certain the e1000 chip is just an integrated PCI device on the
motherboard.

> As far as I know e1000 has the same requirement as tg3 and some others
> where we have to modify the header of the skb in the case of transmits
> using TSO.  I don't see anywhere else that the driver modifies the skb.
> Tomorrow I'll generate a patch to try a more paranoid copying of the
> skb, I hope some of you can test.

        I'll be happy to test any patches you may have to narrow
down the problem.  I was actually considering running tcpdump or
ethereal or some such to try to capture the event on the network
side, but this probably isn't a wise idea considering it's a
production server and I do not have hands-on access to it.  A
patch which simply increased the verbosity of the event
(including counters and registers maybe?) would be preferable to
trying to capture an arbitrary amount of network traffic simply
waiting for the next time the assertion is triggered.

        Sorry for the real lack of data on this end.  But as I
said, any patch to help debug this is welcome.

-- 
Mark Nipper                                                e-contacts:
832 Tanglewood Drive                                nipsy@bitgnome.net
Bryan, Texas 77802-4013                     http://nipsy.bitgnome.net/
(979)575-3193                      AIM/Yahoo: texasnipsy ICQ: 66971617

-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GG/IT d- s++:+ a- C++$ UBL++++$ P--->+++ L+++$ !E---
W++(--) N+ o K++ w(---) O++ M V(--) PS+++(+) PE(--)
Y+ PGP t+ 5 X R tv b+++@ DI+(++) D+ G e h r++ y+(**)
------END GEEK CODE BLOCK------

---begin random quote of the moment---
I lost interest in "blade servers" when I found they didn't throw
knives at people who weren't supposed to be in your machine room.
  -- Anthony de Boer
----end random quote of the moment----


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Christiaan den Besten @ 2006-03-30  8:08 UTC (permalink / raw)
  To: Brandeburg, Jesse, nipsy, jrlundgren, cat, djani22, yoseph.basri,
	bb, mykleb, olel, michal
  Cc: netdev, Jesse Brandeburg, davem, E1000-devel, Brandeburg, Jesse
In-Reply-To: <C925F8B43D79CC49ACD0601FB68FF50C07461F59@orsmsx408>

Hi !

Yes, we still have these errors ... but then, we have not changed the running kernel version for some time now ;)

---
Linux version 2.6.14-rc2-mm2 (root@localhost) (gcc version 3.4.4 20050721 (Red Hat 3.4.4-2)) #2 SMP Thu Dec 15 19:06:21 CET 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009b800 (usable)
 BIOS-e820: 000000000009b800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000cff70000 (usable)
 BIOS-e820: 00000000cff70000 - 00000000cff78000 (ACPI data)
 BIOS-e820: 00000000cff78000 - 00000000cff80000 (ACPI NVS)
 BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved)
 BIOS-e820: 00000000fffffc00 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000130000000 (usable)
3968MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f6820
NX (Execute Disable) protection: active
On node 0 totalpages: 1245184
  DMA zone: 4096 pages, LIFO batch:2
  DMA32 zone: 0 pages, LIFO batch:2
  Normal zone: 225280 pages, LIFO batch:64
  HighMem zone: 1015808 pages, LIFO batch:64
DMI present.
ACPI: RSDP (v000 PTLTD                                 ) @ 0x000f6880
ACPI: RSDT (v001 PTLTD    RSDT   0x06040000  LTP 0x00000000) @ 0xcff72d76
ACPI: FADT (v001 INTEL  LINDHRST 0x06040000 PTL  0x00000003) @ 0xcff77e20
ACPI: MADT (v001 PTLTD           APIC   0x06040000  LTP 0x00000000) @ 0xcff77e94
ACPI: BOOT (v001 PTLTD  $SBFTBL$ 0x06040000  LTP 0x00000001) @ 0xcff77f48
ACPI: SPCR (v001 PTLTD  $UCRTBL$ 0x06040000 PTL  0x00000001) @ 0xcff77f70
ACPI: MCFG (v001 PTLTD           MCFG   0x06040000  LTP 0x00000000) @ 0xcff77fc0
ACPI: SSDT (v001  PmRef    CpuPm 0x00003000 INTL 0x20030224) @ 0xcff72db2
ACPI: DSDT (v001  Intel LINDHRST 0x06040000 MSFT 0x0100000e) @ 0x00000000
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x06] enabled)
Processor #6 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x07] enabled)
Processor #7 15:4 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x03] address[0xfec80000] gsi_base[24])
IOAPIC[1]: apic_id 3, version 32, address 0xfec80000, GSI 24-47
ACPI: IOAPIC (id[0x04] address[0xfec80400] gsi_base[48])
IOAPIC[2]: apic_id 4, version 32, address 0xfec80400, GSI 48-71
ACPI: IOAPIC (id[0x05] address[0xfec84000] gsi_base[72])
IOAPIC[3]: apic_id 5, version 32, address 0xfec84000, GSI 72-95
ACPI: IOAPIC (id[0x08] address[0xfec84400] gsi_base[96])
IOAPIC[4]: apic_id 8, version 32, address 0xfec84400, GSI 96-119
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 5 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at d1000000 (gap: d0000000:10000000)
Built 1 zonelists
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
mapped IOAPIC to ffffb000 (fec80000)
mapped IOAPIC to ffffa000 (fec80400)
mapped IOAPIC to ffff9000 (fec84000)
mapped IOAPIC to ffff8000 (fec84400)
Initializing CPU#0
Kernel command line: ro root=LABEL=/
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 2801.358 MHz processor.
Using pmtmr for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 4146392k/4980736k available (2757k kernel code, 45216k reserved, 783k data, 216k init, 3276224k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5603.74 BogoMIPS (lpj=2801871)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 20100000 00000000 00000080 0000641d 00000000 00000000
mtrr: v2.0 (20020519)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Booting processor 1/1 eip 2000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5599.54 BogoMIPS (lpj=2799774)
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 20100000 00000000 00000080 0000641d 00000000 00000000
CPU1: Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Booting processor 2/6 eip 2000
Initializing CPU#2
Calibrating delay using timer specific routine.. 5571.21 BogoMIPS (lpj=2785607)
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 3
CPU: After all inits, caps: bfebfbff 20100000 00000000 00000080 0000641d 00000000 00000000
CPU2: Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Booting processor 3/7 eip 2000
Initializing CPU#3
Calibrating delay using timer specific routine.. 5599.55 BogoMIPS (lpj=2799778)
CPU: After generic identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 20100000 00000000 00000000 0000641d 00000000 00000000
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 3
CPU: After all inits, caps: bfebfbff 20100000 00000000 00000080 0000641d 00000000 00000000
CPU3: Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Total of 4 processors activated (22374.06 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
checking TSC synchronization across 4 CPUs: passed.
Brought up 4 CPUs
checking if image is initramfs... it is
Freeing initrd memory: 296k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfd830, last bus=9
PCI: Using MMCONFIG
usbcore: registered new driver usbfs
usbcore: registered new driver hub
ACPI: Subsystem revision 20050916
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: PXH quirk detected, disabling MSI for SHPC device
PCI: PXH quirk detected, disabling MSI for SHPC device
Boot video device is 0000:09:01.0
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX0.PXH0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX0.PXH1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEY0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEZ0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEZ0.PXH0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEZ0.PXH1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIB._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 10 11 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *10 11 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 *7 10 11 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 *11 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 4 5 6 7 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 4 5 6 7 *10 11 14 15)
ACPI: Device [PRT] status [0000000c]: functional but not present; setting present
SCSI subsystem initialized
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
PCI: Bridge: 0000:01:00.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:01:00.2
  IO window: 2000-2fff
  MEM window: dd200000-dd2fffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:02.0
  IO window: 2000-2fff
  MEM window: dd100000-dd2fffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:04.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:06:01.0
  IO window: disabled.
  MEM window: dd400000-dd4fffff
  PREFETCH window: d1000000-d10fffff
PCI: Bridge: 0000:05:00.0
  IO window: disabled.
  MEM window: dd400000-dd4fffff
  PREFETCH window: d1000000-d10fffff
PCI: Bridge: 0000:05:00.2
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:06.0
  IO window: disabled.
  MEM window: dd300000-dd4fffff
  PREFETCH window: d1000000-d10fffff
PCI: Bridge: 0000:00:1e.0
  IO window: 3000-3fff
  MEM window: dd500000-deffffff
  PREFETCH window: d1100000-d11fffff
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:02.0 to 64
PCI: Setting latency timer of device 0000:01:00.0 to 64
PCI: Setting latency timer of device 0000:01:00.2 to 64
ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:04.0 to 64
ACPI: PCI Interrupt 0000:00:06.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:06.0 to 64
PCI: Setting latency timer of device 0000:05:00.0 to 64
PCI: Setting latency timer of device 0000:05:00.2 to 64
PCI: Setting latency timer of device 0000:00:1e.0 to 64
Simple Boot Flag at 0x39 set to 0x1
highmem bounce pool size: 64 pages
Total HugeTLB memory allocated, 0
SGI XFS with large block numbers, no debug enabled
Initializing Cryptographic API
ACPI: PCI Interrupt 0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: debug port 1
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: irq 17, io mem 0xdd001000
PCI: cache line size of 128 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: USB 2.0 initialized, EHCI 1.00, driver 10 Dec 2004
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
usbcore: registered new driver usbserial
drivers/usb/serial/usb-serial.c: USB Serial support registered for Generic
usbcore: registered new driver usbserial_generic
drivers/usb/serial/usb-serial.c: USB Serial Driver core v2.0
ACPI: Power Button (FF) [PWRF]
ACPI: Power Button (CM) [PWRB]
ACPI: CPU0 (power states: C1[C1])
ACPI: CPU2 (power states: C1[C1])
ACPI: CPU1 (power states: C1[C1])
ACPI: CPU3 (power states: C1[C1])
Real Time Clock Driver v1.12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
mice: PS/2 mouse device common for all mice
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
loop: loaded (max 8 devices)
Intel(R) PRO/1000 Network Driver - version 6.0.60-k2
Copyright (c) 1999-2005 Intel Corporation.
ACPI: PCI Interrupt 0000:03:02.0[A] -> GSI 54 (level, low) -> IRQ 18
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:03:02.1[B] -> GSI 55 (level, low) -> IRQ 19
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH5: IDE controller at PCI slot 0000:00:1f.1
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 20
ICH5: chipset revision 2
ICH5: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0x14a0-0x14a7, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0x14a8-0x14af, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: SR244W, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
Probing IDE interface ide1...
hda: ATAPI 24X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ACPI: PCI Interrupt 0000:07:0e.0[A] -> GSI 74 (level, low) -> IRQ 21
ARECA RAID ADAPTER0: 64BITS PCI BUS DMA ADDRESSING SUPPORTED
ARECA RAID ADAPTER0: FIRMWARE VERSION V1.38 2005-10-4
scsi0 : ARECA ARC1160 PCI-X 16 PORTS SATA RAID CONTROLLER (RAID6-ENGINE Inside)
        Driver Version 1.20.00.12
  Vendor: Areca     Model: ARC-1160-VOL#00   Rev: R001
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: Areca     Model: DATA1             Rev: R001
  Type:   Direct-Access                      ANSI SCSI revision: 03
  Vendor: Areca     Model: DATA2             Rev: R001
  Type:   Direct-Access                      ANSI SCSI revision: 03
arcmsr device major number 254
libata version 1.12 loaded.
SCSI device sda: 312499712 512-byte hdwr sectors (160000 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 312499712 512-byte hdwr sectors (160000 MB)
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sdb: 820310016 512-byte hdwr sectors (419999 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 820310016 512-byte hdwr sectors (419999 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
SCSI device sdc: 820310016 512-byte hdwr sectors (419999 MB)
SCSI device sdc: drive cache: write back
SCSI device sdc: 820310016 512-byte hdwr sectors (419999 MB)
SCSI device sdc: drive cache: write back
 sdc: sdc1
Attached scsi disk sdc at scsi0, channel 0, id 2, lun 0
md: md driver 0.90.2 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 3.39
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
NET: Registered protocol family 2
IP route cache hash table entries: 262144 (order: 8, 1048576 bytes)
TCP established hash table entries: 524288 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
ip_conntrack version 2.3 (8192 buckets, 65536 max) - 172 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
NET: Registered protocol family 15
802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
All bugs added by David S. Miller <davem@redhat.com>
Starting balanced_irq
Using IPI Shortcut mode
Freeing unused kernel memory: 216k freed
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
EXT3 FS on sda1, internal journal
hda: packet command error: status=0x51 { DriveReady SeekComplete Error }
hda: packet command error: error=0x50 { LastFailedSense=0x05 }
ide: failed opcode was: unknown
cdrom: open failed.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda6, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
XFS mounting filesystem sdb1
Starting XFS recovery on filesystem: sdb1 (dev: sdb1)
Ending XFS recovery on filesystem: sdb1 (dev: sdb1)
XFS mounting filesystem sdc1
Starting XFS recovery on filesystem: sdc1 (dev: sdc1)
Ending XFS recovery on filesystem: sdc1 (dev: sdc1)
e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
e1000: eth1: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
e1000: eth1: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
---

iptables : YES

routing : NO (3 routes at present)

traffic : A lot ;) ... 24x7 300mbit on eth0 and 450 mbit on eth1.

bye,
Chris

----- Original Message ----- 
From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
To: <nipsy@bitgnome.net>; <jrlundgren@gmail.com>; <cat@zip.com.au>; <djani22@dynamicweb.hu>; <yoseph.basri@gmail.com>; 
<bb@kernelpanic.ru>; <mykleb@no.ibm.com>; <olel@ans.pl>; <michal@feix.cz>; <chris@scorpion.nl>
Cc: <netdev@vger.kernel.org>; "Jesse Brandeburg" <jesse.brandeburg@gmail.com>; <davem@davemloft.net>; 
<E1000-devel@lists.sourceforge.net>; "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Sent: Thursday, March 30, 2006 4:53 AM
Subject: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...


Hi all, I've identified you as people who have at some point in the past
emailed one of the Linux lists with problems with e1000 and
sk_forward_alloc.  It seems to be fairly widespread, but only seems to
have appeared with recent kernel changes (after 2.6.12...)

What I need from you is a reproducible test, and some information.  I
have never been able to reproduce this, and I'm trying to isolate the
problem a bit.  What motherboards are you using?  What seems to cause
this problem?  Are you all using iptables?  Are you all routing? From
the reports I assume none of you are using an 82571/2/3 (pci express)

As far as I know e1000 has the same requirement as tg3 and some others
where we have to modify the header of the skb in the case of transmits
using TSO.  I don't see anywhere else that the driver modifies the skb.
Tomorrow I'll generate a patch to try a more paranoid copying of the
skb, I hope some of you can test.

To do this we have code like so in e1000_tso:
2529         if (skb_shinfo(skb)->tso_size) {
2530                 if (skb_header_cloned(skb)) {
2531                         err = pskb_expand_head(skb, 0, 0,
GFP_ATOMIC);
2532                         if (err)
2533                                 return err;
2534                 }
2535
2536                 hdr_len = ((skb->h.raw - skb->data) +
(skb->h.th->doff << 2));
2537                 mss = skb_shinfo(skb)->tso_size;
2538                 if (skb->protocol == ntohs(ETH_P_IP)) {
2539                         skb->nh.iph->tot_len = 0;
2540                         skb->nh.iph->check = 0;

Thanks for your assistance

Jesse





-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642

^ permalink raw reply

* Re: [PATCH] deinline 200+ byte inlines in sock.h
From: David S. Miller @ 2006-03-30  8:08 UTC (permalink / raw)
  To: vda; +Cc: akpm, linux-kernel, netdev
In-Reply-To: <200603301026.24864.vda@ilport.com.ua>

From: Denis Vlasenko <vda@ilport.com.ua>
Date: Thu, 30 Mar 2006 10:26:24 +0300

> We have 200+ byte inlines in sock.h. That's way too large
> in my opinion.

In Linus's tree already.

Would you please check the relevant trees before re-posting patches?
Thanks a lot.

^ permalink raw reply

* Re: [PATCH 1/8] [I/OAT] DMA memcpy subsystem
From: Kumar Gala @ 2006-03-30  8:01 UTC (permalink / raw)
  To: Andrew Grover; +Cc: Chris Leech, linux kernel mailing list, netdev
In-Reply-To: <c0a09e5c0603291505h10f062d5qd6e1861ef052d07b@mail.gmail.com>


On Mar 29, 2006, at 5:05 PM, Andrew Grover wrote:

> On 3/28/06, Kumar Gala <galak@kernel.crashing.org> wrote:
>> Do you only get callback when a channel is available?
>
> Yes
>
>> How do you
>> decide to do to provide PIO to the client?
>
> The client is responsible for using any channels it gets, or falling
> back to memcpy() if it doesn't get any. (I don't understand how PIO
> comes into the picture..?)

I was under the impression that the DMA engine would provide a "sync"  
cpu based memcpy (PIO) if a real HW channel wasn't avail, if this is  
left to the client that's fine.  So how does the client know he  
should use normal memcpy()?

>> A client should only request multiple channel to handle multiple
>> concurrent operations.
>
> Correct, if there aren't any CPU concurrency issues then 1 channel
> will use the device's full bandwidth (unless some other client has
> acquired the other channels and is using them, of course.)
>
>>> This gets around the problem of DMA clients registering (and  
>>> therefore
>>> not getting) channels simply because they init before the DMA device
>>> is discovered.
>>
>> What do you expect to happen in a system in which the channels are
>> over subscribed?
>>
>> Do you expect the DMA device driver to handle scheduling of channels
>> between multiple clients?
>
> It does the simplest thing that could possibly work right now:
> channels are allocated first come first serve. When there is a need,
> it should be straightforward to allow multiple clients to share DMA
> channels.

Sounds good for a start.  Have you given any thoughts on handling  
priorities between clients?

I need to take a look at the latest patches. How would you guys like  
modifications?

- k

^ permalink raw reply

* Re: [PATCH 1/9] [I/OAT] DMA memcpy subsystem
From: Kumar Gala @ 2006-03-30  8:01 UTC (permalink / raw)
  To: Chris Leech; +Cc: linux-kernel, netdev
In-Reply-To: <20060329225548.25585.73037.stgit@gitlost.site>


On Mar 29, 2006, at 4:55 PM, Chris Leech wrote:

> Provides an API for offloading memory copies to DMA devices
>
> Signed-off-by: Chris Leech <christopher.leech@intel.com>
> ---
>
>  drivers/Kconfig           |    2
>  drivers/Makefile          |    1
>  drivers/dma/Kconfig       |   13 +
>  drivers/dma/Makefile      |    1
>  drivers/dma/dmaengine.c   |  405 ++++++++++++++++++++++++++++++++++ 
> +++++++++++
>  include/linux/dmaengine.h |  337 ++++++++++++++++++++++++++++++++++ 
> +++
>  6 files changed, 759 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 9f5c0da..f89ac05 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -72,4 +72,6 @@ source "drivers/edac/Kconfig"
>
>  source "drivers/rtc/Kconfig"
>
> +source "drivers/dma/Kconfig"
> +
>  endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index 4249552..9b808a6 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -74,3 +74,4 @@ obj-$(CONFIG_SGI_SN)		+= sn/
>  obj-y				+= firmware/
>  obj-$(CONFIG_CRYPTO)		+= crypto/
>  obj-$(CONFIG_SUPERH)		+= sh/
> +obj-$(CONFIG_DMA_ENGINE)	+= dma/
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> new file mode 100644
> index 0000000..f9ac4bc
> --- /dev/null
> +++ b/drivers/dma/Kconfig
> @@ -0,0 +1,13 @@
> +#
> +# DMA engine configuration
> +#
> +
> +menu "DMA Engine support"
> +
> +config DMA_ENGINE
> +	bool "Support for DMA engines"
> +	---help---
> +	  DMA engines offload copy operations from the CPU to dedicated
> +	  hardware, allowing the copies to happen asynchronously.
> +
> +endmenu
> diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
> new file mode 100644
> index 0000000..10b7391
> --- /dev/null
> +++ b/drivers/dma/Makefile
> @@ -0,0 +1 @@
> +obj-y += dmaengine.o
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> new file mode 100644
> index 0000000..683456a
> --- /dev/null
> +++ b/drivers/dma/dmaengine.c
> @@ -0,0 +1,405 @@
> +/*
> + * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or  
> modify it
> + * under the terms of the GNU General Public License as published  
> by the Free
> + * Software Foundation; either version 2 of the License, or (at  
> your option)
> + * any later version.
> + *
> + * This program is distributed in the hope that it will be useful,  
> but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of  
> MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public  
> License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public  
> License along with
> + * this program; if not, write to the Free Software Foundation,  
> Inc., 59
> + * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
> + *
> + * The full GNU General Public License is included in this  
> distribution in the
> + * file called COPYING.
> + */
> +
> +/*
> + * This code implements the DMA subsystem. It provides a HW- 
> neutral interface
> + * for other kernel code to use asynchronous memory copy  
> capabilities,
> + * if present, and allows different HW DMA drivers to register as  
> providing
> + * this capability.
> + *
> + * Due to the fact we are accelerating what is already a  
> relatively fast
> + * operation, the code goes to great lengths to avoid additional  
> overhead,
> + * such as locking.
> + *
> + * LOCKING:
> + *
> + * The subsystem keeps two global lists, dma_device_list and  
> dma_client_list.
> + * Both of these are protected by a spinlock, dma_list_lock.
> + *
> + * Each device has a channels list, which runs unlocked but is  
> never modified
> + * once the device is registered, it's just setup by the driver.
> + *
> + * Each client has a channels list, it's only modified under the  
> client->lock
> + * and in an RCU callback, so it's safe to read under rcu_read_lock 
> ().
> + *
> + * Each device has a kref, which is initialized to 1 when the  
> device is
> + * registered. A kref_put is done for each class_device  
> registered.  When the
> + * class_device is released, the coresponding kref_put is done in  
> the release
> + * method. Every time one of the device's channels is allocated to  
> a client,
> + * a kref_get occurs.  When the channel is freed, the coresponding  
> kref_put
> + * happens. The device's release function does a completion, so
> + * unregister_device does a remove event, class_device_unregister,  
> a kref_put
> + * for the first reference, then waits on the completion for all  
> other
> + * references to finish.
> + *
> + * Each channel has an open-coded implementation of Rusty  
> Russell's "bigref,"
> + * with a kref and a per_cpu local_t.  A single reference is set  
> when on an
> + * ADDED event, and removed with a REMOVE event.  Net DMA client  
> takes an
> + * extra reference per outstanding transaction.  The relase  
> function does a
> + * kref_put on the device. -ChrisL
> + */
> +
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/dmaengine.h>
> +#include <linux/hardirq.h>
> +#include <linux/spinlock.h>
> +#include <linux/percpu.h>
> +#include <linux/rcupdate.h>
> +
> +static DEFINE_SPINLOCK(dma_list_lock);
> +static LIST_HEAD(dma_device_list);
> +static LIST_HEAD(dma_client_list);
> +
> +/* --- sysfs implementation --- */
> +
> +static ssize_t show_memcpy_count(struct class_device *cd, char *buf)
> +{
> +	struct dma_chan *chan = container_of(cd, struct dma_chan,  
> class_dev);
> +	unsigned long count = 0;
> +	int i;
> +
> +	for_each_cpu(i)
> +		count += per_cpu_ptr(chan->local, i)->memcpy_count;
> +
> +	return sprintf(buf, "%lu\n", count);
> +}
> +
> +static ssize_t show_bytes_transferred(struct class_device *cd,  
> char *buf)
> +{
> +	struct dma_chan *chan = container_of(cd, struct dma_chan,  
> class_dev);
> +	unsigned long count = 0;
> +	int i;
> +
> +	for_each_cpu(i)
> +		count += per_cpu_ptr(chan->local, i)->bytes_transferred;
> +
> +	return sprintf(buf, "%lu\n", count);
> +}
> +

What is the utility of exporting memcpy_count, and bytes_transferred  
to userspace via sysfs?  Is this really for debug (and thus should be  
under debugfs?)

> +static ssize_t show_in_use(struct class_device *cd, char *buf)
> +{
> +	struct dma_chan *chan = container_of(cd, struct dma_chan,  
> class_dev);
> +
> +	return sprintf(buf, "%d\n", (chan->client ? 1 : 0));
> +}
> +
> +static struct class_device_attribute dma_class_attrs[] = {
> +	__ATTR(memcpy_count, S_IRUGO, show_memcpy_count, NULL),
> +	__ATTR(bytes_transferred, S_IRUGO, show_bytes_transferred, NULL),
> +	__ATTR(in_use, S_IRUGO, show_in_use, NULL),
> +	__ATTR_NULL
> +};
> +

[snip]

- kumar

^ permalink raw reply

* Re: [PATCH] deinline some larger functions from netdevice.h
From: Andrew Morton @ 2006-03-30  7:28 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: linux-kernel, netdev, davem
In-Reply-To: <200603301021.48925.vda@ilport.com.ua>

Denis Vlasenko <vda@ilport.com.ua> wrote:
>
> Network folks did non comment on these two patches, let me try
>  submitting them to you instead.

They're both merged (one is in -linus, the other's in -davem).

^ permalink raw reply

* Re: [PATCH] deinline some larger functions from netdevice.h
From: David S. Miller @ 2006-03-30  7:26 UTC (permalink / raw)
  To: vda; +Cc: akpm, linux-kernel, netdev
In-Reply-To: <200603301021.48925.vda@ilport.com.ua>

From: Denis Vlasenko <vda@ilport.com.ua>
Date: Thu, 30 Mar 2006 10:21:48 +0300

> Network folks did non comment on these two patches, let me try
> submitting them to you instead.

It's in my tree if you would bother checking:

	kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git

^ permalink raw reply

* [PATCH] deinline 200+ byte inlines in sock.h
From: Denis Vlasenko @ 2006-03-30  7:26 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, netdev, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2342 bytes --]

We have 200+ byte inlines in sock.h. That's way too large
in my opinion.

Sizes in bytes (i386) and files where those inlines are used:

238 sock_queue_rcv_skb 2.6.16/net/x25/x25_in.o
238 sock_queue_rcv_skb 2.6.16/net/rose/rose_in.o
238 sock_queue_rcv_skb 2.6.16/net/packet/af_packet.o
238 sock_queue_rcv_skb 2.6.16/net/netrom/nr_in.o
238 sock_queue_rcv_skb 2.6.16/net/llc/llc_sap.o
238 sock_queue_rcv_skb 2.6.16/net/llc/llc_conn.o
238 sock_queue_rcv_skb 2.6.16/net/irda/af_irda.o
238 sock_queue_rcv_skb 2.6.16/net/ipx/af_ipx.o
238 sock_queue_rcv_skb 2.6.16/net/ipv6/udp.o
238 sock_queue_rcv_skb 2.6.16/net/ipv6/raw.o
238 sock_queue_rcv_skb 2.6.16/net/ipv4/udp.o
238 sock_queue_rcv_skb 2.6.16/net/ipv4/raw.o
238 sock_queue_rcv_skb 2.6.16/net/ipv4/ipmr.o
238 sock_queue_rcv_skb 2.6.16/net/econet/econet.o
238 sock_queue_rcv_skb 2.6.16/net/econet/af_econet.o
238 sock_queue_rcv_skb 2.6.16/net/bluetooth/sco.o
238 sock_queue_rcv_skb 2.6.16/net/bluetooth/l2cap.o
238 sock_queue_rcv_skb 2.6.16/net/bluetooth/hci_sock.o
238 sock_queue_rcv_skb 2.6.16/net/ax25/ax25_in.o
238 sock_queue_rcv_skb 2.6.16/net/ax25/af_ax25.o
238 sock_queue_rcv_skb 2.6.16/net/appletalk/ddp.o
238 sock_queue_rcv_skb 2.6.16/drivers/net/pppoe.o

276 sk_receive_skb 2.6.16/net/decnet/dn_nsp_in.o
276 sk_receive_skb 2.6.16/net/dccp/ipv6.o
276 sk_receive_skb 2.6.16/net/dccp/ipv4.o
276 sk_receive_skb 2.6.16/net/dccp/dccp_ipv6.o
276 sk_receive_skb 2.6.16/drivers/net/pppoe.o

209 sk_dst_check 2.6.16/net/ipv6/ip6_output.o
209 sk_dst_check 2.6.16/net/ipv4/udp.o
209 sk_dst_check 2.6.16/net/decnet/dn_nsp_out.o

Should I also attack sock_recv_timestamp() etc?

Large inlines with multiple callers:
Size  Uses Wasted Name and definition
===== ==== ====== ================================================
  238   21   4360 sock_queue_rcv_skb    include/net/sock.h
  109   10    801 sock_recv_timestamp   include/net/sock.h
  276    4    768 sk_receive_skb        include/net/sock.h
   94    8    518 __sk_dst_check        include/net/sock.h
  209    3    378 sk_dst_check  include/net/sock.h
  131    4    333 sk_setup_caps include/net/sock.h
  152    2    132 sk_stream_alloc_pskb  include/net/sock.h
  125    2    105 sk_stream_writequeue_purge    include/net/sock.h

Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
--
vda

[-- Attachment #2: sock.patch --]
[-- Type: text/x-diff, Size: 5483 bytes --]

diff -urpN linux-2.6.16.org/include/net/sock.h linux-2.6.16.deinline/include/net/sock.h
--- linux-2.6.16.org/include/net/sock.h	Mon Mar 20 07:53:29 2006
+++ linux-2.6.16.deinline/include/net/sock.h	Mon Mar 27 09:55:12 2006
@@ -926,28 +926,7 @@ static inline void sock_put(struct sock 
 		sk_free(sk);
 }
 
-static inline int sk_receive_skb(struct sock *sk, struct sk_buff *skb)
-{
-	int rc = NET_RX_SUCCESS;
-
-	if (sk_filter(sk, skb, 0))
-		goto discard_and_relse;
-
-	skb->dev = NULL;
-
-	bh_lock_sock(sk);
-	if (!sock_owned_by_user(sk))
-		rc = sk->sk_backlog_rcv(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
-	bh_unlock_sock(sk);
-out:
-	sock_put(sk);
-	return rc;
-discard_and_relse:
-	kfree_skb(skb);
-	goto out;
-}
+extern int sk_receive_skb(struct sock *sk, struct sk_buff *skb);
 
 /* Detach socket from process context.
  * Announce socket dead, detach it from wait queue and inode.
@@ -1032,33 +1011,9 @@ sk_dst_reset(struct sock *sk)
 	write_unlock(&sk->sk_dst_lock);
 }
 
-static inline struct dst_entry *
-__sk_dst_check(struct sock *sk, u32 cookie)
-{
-	struct dst_entry *dst = sk->sk_dst_cache;
-
-	if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
-		sk->sk_dst_cache = NULL;
-		dst_release(dst);
-		return NULL;
-	}
-
-	return dst;
-}
-
-static inline struct dst_entry *
-sk_dst_check(struct sock *sk, u32 cookie)
-{
-	struct dst_entry *dst = sk_dst_get(sk);
-
-	if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
-		sk_dst_reset(sk);
-		dst_release(dst);
-		return NULL;
-	}
+extern struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie);
 
-	return dst;
-}
+extern struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie);
 
 static inline void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 {
@@ -1128,45 +1083,7 @@ extern void sk_reset_timer(struct sock *
 
 extern void sk_stop_timer(struct sock *sk, struct timer_list* timer);
 
-static inline int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
-{
-	int err = 0;
-	int skb_len;
-
-	/* Cast skb->rcvbuf to unsigned... It's pointless, but reduces
-	   number of warnings when compiling with -W --ANK
-	 */
-	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
-	    (unsigned)sk->sk_rcvbuf) {
-		err = -ENOMEM;
-		goto out;
-	}
-
-	/* It would be deadlock, if sock_queue_rcv_skb is used
-	   with socket lock! We assume that users of this
-	   function are lock free.
-	*/
-	err = sk_filter(sk, skb, 1);
-	if (err)
-		goto out;
-
-	skb->dev = NULL;
-	skb_set_owner_r(skb, sk);
-
-	/* Cache the SKB length before we tack it onto the receive
-	 * queue.  Once it is added it no longer belongs to us and
-	 * may be freed by other threads of control pulling packets
-	 * from the queue.
-	 */
-	skb_len = skb->len;
-
-	skb_queue_tail(&sk->sk_receive_queue, skb);
-
-	if (!sock_flag(sk, SOCK_DEAD))
-		sk->sk_data_ready(sk, skb_len);
-out:
-	return err;
-}
+extern int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
 
 static inline int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
 {
diff -urpN linux-2.6.16.org/net/core/sock.c linux-2.6.16.deinline/net/core/sock.c
--- linux-2.6.16.org/net/core/sock.c	Mon Mar 20 07:53:29 2006
+++ linux-2.6.16.deinline/net/core/sock.c	Mon Mar 27 09:45:09 2006
@@ -187,6 +187,99 @@ static void sock_disable_timestamp(struc
 }
 
 
+int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+{
+	int err = 0;
+	int skb_len;
+
+	/* Cast skb->rcvbuf to unsigned... It's pointless, but reduces
+	   number of warnings when compiling with -W --ANK
+	 */
+	if (atomic_read(&sk->sk_rmem_alloc) + skb->truesize >=
+	    (unsigned)sk->sk_rcvbuf) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	/* It would be deadlock, if sock_queue_rcv_skb is used
+	   with socket lock! We assume that users of this
+	   function are lock free.
+	*/
+	err = sk_filter(sk, skb, 1);
+	if (err)
+		goto out;
+
+	skb->dev = NULL;
+	skb_set_owner_r(skb, sk);
+
+	/* Cache the SKB length before we tack it onto the receive
+	 * queue.  Once it is added it no longer belongs to us and
+	 * may be freed by other threads of control pulling packets
+	 * from the queue.
+	 */
+	skb_len = skb->len;
+
+	skb_queue_tail(&sk->sk_receive_queue, skb);
+
+	if (!sock_flag(sk, SOCK_DEAD))
+		sk->sk_data_ready(sk, skb_len);
+out:
+	return err;
+}
+EXPORT_SYMBOL(sock_queue_rcv_skb);
+
+int sk_receive_skb(struct sock *sk, struct sk_buff *skb)
+{
+	int rc = NET_RX_SUCCESS;
+
+	if (sk_filter(sk, skb, 0))
+		goto discard_and_relse;
+
+	skb->dev = NULL;
+
+	bh_lock_sock(sk);
+	if (!sock_owned_by_user(sk))
+		rc = sk->sk_backlog_rcv(sk, skb);
+	else
+		sk_add_backlog(sk, skb);
+	bh_unlock_sock(sk);
+out:
+	sock_put(sk);
+	return rc;
+discard_and_relse:
+	kfree_skb(skb);
+	goto out;
+}
+EXPORT_SYMBOL(sk_receive_skb);
+
+struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
+{
+	struct dst_entry *dst = sk->sk_dst_cache;
+
+	if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
+		sk->sk_dst_cache = NULL;
+		dst_release(dst);
+		return NULL;
+	}
+
+	return dst;
+}
+EXPORT_SYMBOL(__sk_dst_check);
+
+struct dst_entry *sk_dst_check(struct sock *sk, u32 cookie)
+{
+	struct dst_entry *dst = sk_dst_get(sk);
+
+	if (dst && dst->obsolete && dst->ops->check(dst, cookie) == NULL) {
+		sk_dst_reset(sk);
+		dst_release(dst);
+		return NULL;
+	}
+
+	return dst;
+}
+EXPORT_SYMBOL(sk_dst_check);
+
 /*
  *	This is meant for all protocols to use and covers goings on
  *	at the socket level. Everything here is generic.

^ permalink raw reply

* [PATCH] deinline some larger functions from netdevice.h
From: Denis Vlasenko @ 2006-03-30  7:21 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, netdev, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]

Hi Andrew,

Network folks did non comment on these two patches, let me try
submitting them to you instead.

I hunted down some large inlines. This patch address those found
in netdevice.h.

On a allyesconfig'ured kernel:

Size  Uses Wasted Name and definition
===== ==== ====== ================================================
   95  162  12075 netif_wake_queue      include/linux/netdevice.h
  129   86   9265 dev_kfree_skb_any     include/linux/netdevice.h
  127   56   5885 netif_device_attach   include/linux/netdevice.h
   73   86   4505 dev_kfree_skb_irq     include/linux/netdevice.h
   46   60   1534 netif_device_detach   include/linux/netdevice.h
  119   16   1485 __netif_rx_schedule   include/linux/netdevice.h
  143    5    492 netif_rx_schedule     include/linux/netdevice.h
   81    7    366 netif_schedule        include/linux/netdevice.h

netif_wake_queue is big because __netif_schedule is a big inline:

static inline void __netif_schedule(struct net_device *dev)
{
        if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
                unsigned long flags;
                struct softnet_data *sd;

                local_irq_save(flags);
                sd = &__get_cpu_var(softnet_data);
                dev->next_sched = sd->output_queue;
                sd->output_queue = dev;
                raise_softirq_irqoff(NET_TX_SOFTIRQ);
                local_irq_restore(flags);
        }
}

static inline void netif_wake_queue(struct net_device *dev)
{
#ifdef CONFIG_NETPOLL_TRAP
        if (netpoll_trap())
                return;
#endif
        if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
                __netif_schedule(dev);
}

By de-inlining __netif_schedule we are saving a lot of text
at each callsite of netif_wake_queue and netif_schedule.
__netif_rx_schedule is also big, and it makes more sense to keep
both of them out of line.

Patch also deinlines dev_kfree_skb_any. We can deinline dev_kfree_skb_irq
instead... oh well.

netif_device_attach/detach are not hot paths, we can deinline them too.

Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
--
vda

[-- Attachment #2: netdevice.patch --]
[-- Type: text/x-diff, Size: 4491 bytes --]

diff -urpN linux-2.6.16.org/include/linux/netdevice.h linux-2.6.16.deinline2/include/linux/netdevice.h
--- linux-2.6.16.org/include/linux/netdevice.h	Mon Mar 20 07:53:29 2006
+++ linux-2.6.16.deinline2/include/linux/netdevice.h	Mon Mar 27 13:46:15 2006
@@ -594,20 +594,7 @@ DECLARE_PER_CPU(struct softnet_data,soft
 
 #define HAVE_NETIF_QUEUE
 
-static inline void __netif_schedule(struct net_device *dev)
-{
-	if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
-		unsigned long flags;
-		struct softnet_data *sd;
-
-		local_irq_save(flags);
-		sd = &__get_cpu_var(softnet_data);
-		dev->next_sched = sd->output_queue;
-		sd->output_queue = dev;
-		raise_softirq_irqoff(NET_TX_SOFTIRQ);
-		local_irq_restore(flags);
-	}
-}
+extern void __netif_schedule(struct net_device *dev);
 
 static inline void netif_schedule(struct net_device *dev)
 {
@@ -671,13 +658,7 @@ static inline void dev_kfree_skb_irq(str
 /* Use this variant in places where it could be invoked
  * either from interrupt or non-interrupt context.
  */
-static inline void dev_kfree_skb_any(struct sk_buff *skb)
-{
-	if (in_irq() || irqs_disabled())
-		dev_kfree_skb_irq(skb);
-	else
-		dev_kfree_skb(skb);
-}
+extern void dev_kfree_skb_any(struct sk_buff *skb);
 
 #define HAVE_NETIF_RX 1
 extern int		netif_rx(struct sk_buff *skb);
@@ -735,22 +716,9 @@ static inline int netif_device_present(s
 	return test_bit(__LINK_STATE_PRESENT, &dev->state);
 }
 
-static inline void netif_device_detach(struct net_device *dev)
-{
-	if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
-	    netif_running(dev)) {
-		netif_stop_queue(dev);
-	}
-}
+extern void netif_device_detach(struct net_device *dev);
 
-static inline void netif_device_attach(struct net_device *dev)
-{
-	if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
-	    netif_running(dev)) {
-		netif_wake_queue(dev);
- 		__netdev_watchdog_up(dev);
-	}
-}
+extern void netif_device_attach(struct net_device *dev);
 
 /*
  * Network interface message level settings
@@ -818,20 +786,7 @@ static inline int netif_rx_schedule_prep
  * already been called and returned 1.
  */
 
-static inline void __netif_rx_schedule(struct net_device *dev)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	dev_hold(dev);
-	list_add_tail(&dev->poll_list, &__get_cpu_var(softnet_data).poll_list);
-	if (dev->quota < 0)
-		dev->quota += dev->weight;
-	else
-		dev->quota = dev->weight;
-	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
-	local_irq_restore(flags);
-}
+extern void __netif_rx_schedule(struct net_device *dev);
 
 /* Try to reschedule poll. Called by irq handler. */
 
diff -urpN linux-2.6.16.org/net/core/dev.c linux-2.6.16.deinline2/net/core/dev.c
--- linux-2.6.16.org/net/core/dev.c	Mon Mar 20 07:53:29 2006
+++ linux-2.6.16.deinline2/net/core/dev.c	Mon Mar 27 13:47:00 2006
@@ -1073,6 +1073,70 @@ void dev_queue_xmit_nit(struct sk_buff *
 	rcu_read_unlock();
 }
 
+
+void __netif_schedule(struct net_device *dev)
+{
+	if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
+		unsigned long flags;
+		struct softnet_data *sd;
+
+		local_irq_save(flags);
+		sd = &__get_cpu_var(softnet_data);
+		dev->next_sched = sd->output_queue;
+		sd->output_queue = dev;
+		raise_softirq_irqoff(NET_TX_SOFTIRQ);
+		local_irq_restore(flags);
+	}
+}
+EXPORT_SYMBOL(__netif_schedule);
+
+void __netif_rx_schedule(struct net_device *dev)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	dev_hold(dev);
+	list_add_tail(&dev->poll_list, &__get_cpu_var(softnet_data).poll_list);
+	if (dev->quota < 0)
+		dev->quota += dev->weight;
+	else
+		dev->quota = dev->weight;
+	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL(__netif_rx_schedule);
+
+void dev_kfree_skb_any(struct sk_buff *skb)
+{
+	if (in_irq() || irqs_disabled())
+		dev_kfree_skb_irq(skb);
+	else
+		dev_kfree_skb(skb);
+}
+EXPORT_SYMBOL(dev_kfree_skb_any);
+
+
+/* Hot-plugging. */
+void netif_device_detach(struct net_device *dev)
+{
+	if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
+	    netif_running(dev)) {
+		netif_stop_queue(dev);
+	}
+}
+EXPORT_SYMBOL(netif_device_detach);
+
+void netif_device_attach(struct net_device *dev)
+{
+	if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
+	    netif_running(dev)) {
+		netif_wake_queue(dev);
+ 		__netdev_watchdog_up(dev);
+	}
+}
+EXPORT_SYMBOL(netif_device_attach);
+
+
 /*
  * Invalidate hardware checksum when packet is to be mangled, and
  * complete checksum manually on outgoing path.

^ permalink raw reply

* Re: dcache leak in 2.6.16-git8 II
From: Balbir Singh @ 2006-03-30  6:41 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, bharata, linux-kernel, netdev
In-Reply-To: <200603300053.25235.ak@suse.de>

On Thu, Mar 30, 2006 at 12:53:24AM +0200, Andi Kleen wrote:
> On Thursday 30 March 2006 00:50, Andrew Morton wrote:
> 
> > It looks that way.  Didn't someone else report a sock_inode_cache leak?
> 
> Didn't see it.
>  
> > > I still got a copy of the /proc in case anybody wants more information.
> > 
> > We have this fancy new /proc/slab_allocators now, it might show something
> > interesting.  It needs CONFIG_DEBUG_SLAB_LEAK.
> 
> I didn't have that enabled unfortunately. I can try it on the next round.
> 
> -Andi
>

There is also a new sysctl to drop caches. It is called vm.drop_caches.
It will be interesting to see if it is able to free up some dcache memory
for you.

Balbir
 

^ permalink raw reply

* Re: [Patch 5/8] generic netlink interface for delay accounting
From: Balbir Singh @ 2006-03-30  6:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nagar, linux-kernel, netdev, tgraf, hadi
In-Reply-To: <20060329222629.0a730997.akpm@osdl.org>

On Wed, Mar 29, 2006 at 10:26:29PM -0800, Andrew Morton wrote:
> Balbir Singh <balbir@in.ibm.com> wrote:
> >
> > > The kmem_cache_free() can happen outside the lock.
> > 
> > 
> > kmem_cache_free() and setting to NULL outside the lock is prone to
> > race conditions. Consider the following scenario
> > 
> > A thread group T1 has exiting processes P1 and P2
> > 
> > P1 is exiting, finishes the delay accounting by calling taskstats_exit_pid()
> > and gives up the mutex and calls kmem_cache_free(), but before it can set
> > tsk->delays to NULL, we try to get statistics for the entire thread group.
> > This task will show up in the thread group with a dangling tsk->delays.
> 
> Yes, the `tsk->delays = NULL;' needs to happen inside the lock.  But the
> kmem_cache_free() does not.  It pointlessly increases the lock hold time.

Understood will fix it

> 
> > > > +	if (info->attrs[TASKSTATS_CMD_ATTR_PID]) {
> > > > +		u32 pid = nla_get_u32(info->attrs[TASKSTATS_CMD_ATTR_PID]);
> > > > +		rc = fill_pid((pid_t)pid, NULL, &stats);
> > > 
> > > We shouldn't have a typecast here.  If it generates a warning then we need
> > > to get in there and find out why.
> > 
> > The reason for a typecast is that pid is passed as a u32 from userspace.
> > genetlink currently supports most unsigned types with little or no
> > support for signed types. We exchange data as u32 and do the correct
> > thing in the kernel. Would you like us to move away from this?
> > 
> 
> I think it's best to avoid the cast unless it's actually needed to avoid a
> warning or compile error, or to do special things with sign extension. 
> Because casts clutter up the code and can hide real bugs.  In this case the
> compiler should silently perform the conversion.

Yep, the compiler was doing it for me, but I tried to be smart and cast
things around. Will fix it.

Thanks,
Balbir

^ permalink raw reply

* Re: [Patch 5/8] generic netlink interface for delay accounting
From: Andrew Morton @ 2006-03-30  6:26 UTC (permalink / raw)
  To: balbir; +Cc: nagar, linux-kernel, netdev, tgraf, hadi
In-Reply-To: <20060330061005.GA18387@in.ibm.com>

Balbir Singh <balbir@in.ibm.com> wrote:
>
> > The kmem_cache_free() can happen outside the lock.
> 
> 
> kmem_cache_free() and setting to NULL outside the lock is prone to
> race conditions. Consider the following scenario
> 
> A thread group T1 has exiting processes P1 and P2
> 
> P1 is exiting, finishes the delay accounting by calling taskstats_exit_pid()
> and gives up the mutex and calls kmem_cache_free(), but before it can set
> tsk->delays to NULL, we try to get statistics for the entire thread group.
> This task will show up in the thread group with a dangling tsk->delays.

Yes, the `tsk->delays = NULL;' needs to happen inside the lock.  But the
kmem_cache_free() does not.  It pointlessly increases the lock hold time.

> > > +	if (info->attrs[TASKSTATS_CMD_ATTR_PID]) {
> > > +		u32 pid = nla_get_u32(info->attrs[TASKSTATS_CMD_ATTR_PID]);
> > > +		rc = fill_pid((pid_t)pid, NULL, &stats);
> > 
> > We shouldn't have a typecast here.  If it generates a warning then we need
> > to get in there and find out why.
> 
> The reason for a typecast is that pid is passed as a u32 from userspace.
> genetlink currently supports most unsigned types with little or no
> support for signed types. We exchange data as u32 and do the correct
> thing in the kernel. Would you like us to move away from this?
> 

I think it's best to avoid the cast unless it's actually needed to avoid a
warning or compile error, or to do special things with sign extension. 
Because casts clutter up the code and can hide real bugs.  In this case the
compiler should silently perform the conversion.

^ permalink raw reply

* Re: [PATCH 2/9] I/OAT
From: Evgeniy Polyakov @ 2006-03-30  6:21 UTC (permalink / raw)
  To: Chris Leech; +Cc: linux-kernel, netdev
In-Reply-To: <1143672844.27644.5.camel@black-lazer.jf.intel.com>

On Wed, Mar 29, 2006 at 02:54:04PM -0800, Chris Leech (christopher.leech@intel.com) wrote:
> [I/OAT] Driver for the Intel(R) I/OAT DMA engine
> 
> From: Chris Leech <christopher.leech@intel.com>
> 
> Adds a new ioatdma driver
> 
> Signed-off-by: Chris Leech <christopher.leech@intel.com>

Let's do it again.
Could you please describe how struct ioat_dma_chan channels are freed?
For example when device is removed just after it has been added.

ioat_probe() -> enumerate_dma_channels() (failures are ok now) ->
kmalloc a lot of channels.

ioat_remove() -> dma_async_device_unregister() which does not cleanup
ioat_dma_chan channels, but only clients.
It ends up in dma_async_device_cleanup() only.

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [Patch 5/8] generic netlink interface for delay accounting
From: Balbir Singh @ 2006-03-30  6:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Shailabh Nagar, linux-kernel, netdev, tgraf, hadi
In-Reply-To: <20060329210406.08d1c929.akpm@osdl.org>

Hi, Andrew


On Wed, Mar 29, 2006 at 09:04:06PM -0800, Andrew Morton wrote:
> Shailabh Nagar <nagar@watson.ibm.com> wrote:
> >
> > delayacct-genetlink.patch
> > 
> > Create a generic netlink interface (NETLINK_GENERIC family),
> > called "taskstats", for getting delay and cpu statistics of
> > tasks and thread groups during their lifetime and when they exit.
> > 
> > 
> 
> It's be best to have a netlink person review the netlinkisms here.

Jamal did review this code, his comments are available at
http://lkml.org/lkml/2006/3/26/71. His comments were very helpful in
doing the correct thing and understanding the design and usage of
genetlink.

> 
> > +static inline int delayacct_add_tsk(struct taskstats *d,
> > +				    struct task_struct *tsk)
> > +{
> > +	if (!tsk->delays)
> > +		return -EINVAL;
> > +	return __delayacct_add_tsk(d, tsk);
> > +}
> 
> hm.  It's a worry that this can return an error if delay accounting simply
> isn't enabled.

Yes, if CONFIG_TASKSTATS is enabled and the kernel is booted without
delayacct. A user space utility trying to extract statistics will get
an error.

> 
> > +struct taskstats {
> > +	/* Maintain 64-bit alignment while extending */
> > +	/* Version 1 */
> > +
> > +	/* XXX_count is number of delay values recorded.
> > +	 * XXX_total is corresponding cumulative delay in nanoseconds
> > +	 */
> > +
> > +#define TASKSTATS_NOCPUSTATS	1
> > +	__u64	cpu_count;
> > +	__u64	cpu_delay_total;	/* wait, while runnable, for cpu */
> > +	__u64	blkio_count;
> > +	__u64	blkio_delay_total;	/* sync,block io completion wait*/
> > +	__u64	swapin_count;
> > +	__u64	swapin_delay_total;	/* swapin page fault wait*/
> > +
> > +	__u64	cpu_run_total;		/* cpu running time
> > +					 * no count available/provided */
> > +};
> 
> What locking is used to make updates to these u64's appear to be atomic? 
> Maybe it's deliberately nonatomic.  Either way, it needs a comment.

These fields are protected by tsk->delays->lock. We will add comments
to indicate the same.

> 
> >  void __delayacct_tsk_exit(struct task_struct *tsk)
> >  {
> > +	/*
> > +	 * Protect against racing thread group exits
> > +	 */
> > +	mutex_lock(&delayacct_exit_mutex);
> > +	taskstats_exit_pid(tsk);
> >  	if (tsk->delays) {
> >  		kmem_cache_free(delayacct_cache, tsk->delays);
> >  		tsk->delays = NULL;
> >  	}
> > +	mutex_unlock(&delayacct_exit_mutex);
> >  }
> 
> hm, I wonder how contended that lock is likely to be.
>

It is taken for every exiting task. We did not measure the contention
using any tool.
 
> The kmem_cache_free() can happen outside the lock.


kmem_cache_free() and setting to NULL outside the lock is prone to
race conditions. Consider the following scenario

A thread group T1 has exiting processes P1 and P2

P1 is exiting, finishes the delay accounting by calling taskstats_exit_pid()
and gives up the mutex and calls kmem_cache_free(), but before it can set
tsk->delays to NULL, we try to get statistics for the entire thread group.
This task will show up in the thread group with a dangling tsk->delays.

> 
> > +#ifdef CONFIG_TASKSTATS
> > +int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
> > +{
> > +	nsec_t tmp;
> > +	struct timespec ts;
> > +	unsigned long t1,t2;
> > +
> > +	/* zero XXX_total,non-zero XXX_count implies XXX stat overflowed */
> > +
> > +	tmp = (nsec_t)d->cpu_run_total ;
> 
> stray space before semicolon.

Will fix it.

> 
> > +	tmp += (u64)(tsk->utime+tsk->stime)*TICK_NSEC;
> > +	d->cpu_run_total = (tmp < (nsec_t)d->cpu_run_total)? 0: tmp;
> 
> Missing space before ?, missing space before :

Will fix it.

> 
> > +	d->blkio_delay_total = (tmp < d->blkio_delay_total)? 0: tmp;
> > +	d->swapin_delay_total = (tmp < d->swapin_delay_total)? 0: tmp;
> 
> dittos.

Will fix it.

> 
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux-2.6.16/kernel/taskstats.c	2006-03-29 18:13:18.000000000 -0500
> > @@ -0,0 +1,292 @@
> > +/*
> > + * taskstats.c - Export per-task statistics to userland
> > + *
> > + * Copyright (C) Shailabh Nagar, IBM Corp. 2006
> > + *           (C) Balbir Singh,   IBM Corp. 2006
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/taskstats.h>
> > +#include <linux/delayacct.h>
> > +#include <net/genetlink.h>
> > +#include <asm/atomic.h>
> > +
> > +const int taskstats_version = TASKSTATS_VERSION;
> 
> Odd.  It'd be better to put TASKSTATS_VERSION in the header file, use that
> directly.

We will fix this.

> 
> > +
> > +static inline int fill_pid(pid_t pid, struct task_struct *pidtsk,
> > +			   struct taskstats *stats)
> > +{
> > +	int rc;
> > +	struct task_struct *tsk = pidtsk;
> > +
> > +	if (!pidtsk) {
> > +		read_lock(&tasklist_lock);
> > +		tsk = find_task_by_pid(pid);
> > +		if (!tsk) {
> > +			read_unlock(&tasklist_lock);
> > +			return -ESRCH;
> > +		}
> > +		get_task_struct(tsk);
> > +		read_unlock(&tasklist_lock);
> > +	} else
> > +		get_task_struct(tsk);
> > +
> > +	rc = delayacct_add_tsk(stats, tsk);
> > +	put_task_struct(tsk);
> > +
> > +	return rc;
> > +
> > +}
> 
> Has two callsites, should be uninlined.

Will do.

> 
> > +static inline int fill_tgid(pid_t tgid, struct task_struct *tgidtsk,
> > +			    struct taskstats *stats)
> > +{
> > +	int rc;
> > +	struct task_struct *tsk, *first;
> > +
> > +	first = tgidtsk;
> > +	read_lock(&tasklist_lock);
> > +	if (!first) {
> > +		first = find_task_by_pid(tgid);
> > +		if (!first) {
> > +			read_unlock(&tasklist_lock);
> > +			return -ESRCH;
> > +		}
> > +	}
> > +	tsk = first;
> > +	do {
> > +		rc = delayacct_add_tsk(stats, tsk);
> > +		if (rc)
> > +			break;
> > +	} while_each_thread(first, tsk);
> > +	read_unlock(&tasklist_lock);
> > +
> > +	return rc;
> > +}
> 
> Ditto.

Will do.

> 
> It's somewhat similar to fill_pid() - perhaps they can be combined, halving
> the overhead?

Yes, there is a lot of synergy between the two. The main difference is
the way we lock (using get/put_task_struct for pids and tasklist_lock
for tgids). Using a flag to do the correct thing looked a bit ugly,
so we split the functions.

> 
> > +static int taskstats_send_stats(struct sk_buff *skb, struct genl_info *info)
> > +{
> > +	int rc = 0;
> > +	struct sk_buff *rep_skb;
> > +	struct taskstats stats;
> > +	void *reply;
> > +	size_t size;
> > +	struct nlattr *na;
> > +
> > +	/*
> > +	 * Size includes space for nested attribute as well
> > +	 * The returned data is of the format
> > +	 * TASKSTATS_TYPE_AGGR_PID/TGID
> > +	 * --> TASKSTATS_TYPE_PID/TGID
> > +	 * --> TASKSTATS_TYPE_STATS
> > +	 */
> > +	size = nla_total_size(sizeof(u32)) +
> > +		nla_total_size(sizeof(struct taskstats)) + nla_total_size(0);
> > +
> > +	memset(&stats, 0, sizeof(stats));
> > +	rc = prepare_reply(info, TASKSTATS_CMD_NEW, &rep_skb, &reply, size);
> > +	if (rc < 0)
> > +		return rc;
> > +
> > +	if (info->attrs[TASKSTATS_CMD_ATTR_PID]) {
> > +		u32 pid = nla_get_u32(info->attrs[TASKSTATS_CMD_ATTR_PID]);
> > +		rc = fill_pid((pid_t)pid, NULL, &stats);
> 
> We shouldn't have a typecast here.  If it generates a warning then we need
> to get in there and find out why.

The reason for a typecast is that pid is passed as a u32 from userspace.
genetlink currently supports most unsigned types with little or no
support for signed types. We exchange data as u32 and do the correct
thing in the kernel. Would you like us to move away from this?

> 
> > +		if (rc < 0)
> > +			goto err;
> > +
> > +		na = nla_nest_start(rep_skb, TASKSTATS_TYPE_AGGR_PID);
> > +		NLA_PUT_U32(rep_skb, TASKSTATS_TYPE_PID, pid);
> > +	} else if (info->attrs[TASKSTATS_CMD_ATTR_TGID]) {
> > +		u32 tgid = nla_get_u32(info->attrs[TASKSTATS_CMD_ATTR_TGID]);
> > +		rc = fill_tgid((pid_t)tgid, NULL, &stats);
> 
> Ditto.
> 
> > +		if (rc < 0)
> > +			goto err;
> > +
> > +		na = nla_nest_start(rep_skb, TASKSTATS_TYPE_AGGR_TGID);
> > +		NLA_PUT_U32(rep_skb, TASKSTATS_TYPE_TGID, tgid);
> > +	} else {
> > +		rc = -EINVAL;
> > +		goto err;
> > +	}
> > +
> > +	NLA_PUT_TYPE(rep_skb, struct taskstats, TASKSTATS_TYPE_STATS, stats);
> > +	nla_nest_end(rep_skb, na);
> > +
> > +	return send_reply(rep_skb, info->snd_pid, TASKSTATS_MSG_UNICAST);
> > +
> > +nla_put_failure:
> > +	return  genlmsg_cancel(rep_skb, reply);
> 
> Extra space.

Will fix.

> 
> > +err:
> > +	nlmsg_free(rep_skb);
> > +	return rc;
> > +}
> > +
> > +
> > +/* Send pid data out on exit */
> > +void taskstats_exit_pid(struct task_struct *tsk)
> > +{
> > +	int rc = 0;
> > +	struct sk_buff *rep_skb;
> > +	void *reply;
> > +	struct taskstats stats;
> > +	size_t size;
> > +	int is_thread_group = !thread_group_empty(tsk);
> > +	struct nlattr *na;
> > +
> > +	/*
> > +	 * tasks can start to exit very early. Ensure that the family
> > +	 * is registered before notifications are sent out
> > +	 */
> > +	if (!family_registered)
> > +		return;
> 
> This code risks evaluating thread_group_empty() even if !family_registered.
> The compiler will most likely sort that out in this case, but it's a risk
> when using these initialisers.
> 

Yes, this can be optimized and we can initialize it after the check for
!family_registerd

> > +
> > +static int __init taskstats_init(void)
> > +{
> > +	if (genl_register_family(&family))
> > +		return -EFAULT;
> 
> EFAULT?

It shouldn't be (Shailabh please comment). We will fix it.

> 
> > +        family_registered = 1;
> 
> whitespace broke.

Will fix

> 
> > +
> > +	if (genl_register_ops(&family, &taskstats_ops))
> > +		goto err;
> > +
> > +	return 0;
> > +err:
> > +	genl_unregister_family(&family);
> > +	family_registered = 0;
> > +	return -EFAULT;
> > +}
> > +
> > +late_initcall(taskstats_init);
> 
> Why late_initcall()?  (A comment would be appropriate)

We will add a comment.

Thanks for your detailed review,
Balbir

^ permalink raw reply

* Re: [Patch 5/8] generic netlink interface for delay accounting
From: Andrew Morton @ 2006-03-30  5:04 UTC (permalink / raw)
  To: Shailabh Nagar; +Cc: linux-kernel, netdev, tgraf, hadi
In-Reply-To: <442B2BB6.9020309@watson.ibm.com>

Shailabh Nagar <nagar@watson.ibm.com> wrote:
>
> delayacct-genetlink.patch
> 
> Create a generic netlink interface (NETLINK_GENERIC family),
> called "taskstats", for getting delay and cpu statistics of
> tasks and thread groups during their lifetime and when they exit.
> 
> 

It's be best to have a netlink person review the netlinkisms here.

> +static inline int delayacct_add_tsk(struct taskstats *d,
> +				    struct task_struct *tsk)
> +{
> +	if (!tsk->delays)
> +		return -EINVAL;
> +	return __delayacct_add_tsk(d, tsk);
> +}

hm.  It's a worry that this can return an error if delay accounting simply
isn't enabled.

> +struct taskstats {
> +	/* Maintain 64-bit alignment while extending */
> +	/* Version 1 */
> +
> +	/* XXX_count is number of delay values recorded.
> +	 * XXX_total is corresponding cumulative delay in nanoseconds
> +	 */
> +
> +#define TASKSTATS_NOCPUSTATS	1
> +	__u64	cpu_count;
> +	__u64	cpu_delay_total;	/* wait, while runnable, for cpu */
> +	__u64	blkio_count;
> +	__u64	blkio_delay_total;	/* sync,block io completion wait*/
> +	__u64	swapin_count;
> +	__u64	swapin_delay_total;	/* swapin page fault wait*/
> +
> +	__u64	cpu_run_total;		/* cpu running time
> +					 * no count available/provided */
> +};

What locking is used to make updates to these u64's appear to be atomic? 
Maybe it's deliberately nonatomic.  Either way, it needs a comment.

>  void __delayacct_tsk_exit(struct task_struct *tsk)
>  {
> +	/*
> +	 * Protect against racing thread group exits
> +	 */
> +	mutex_lock(&delayacct_exit_mutex);
> +	taskstats_exit_pid(tsk);
>  	if (tsk->delays) {
>  		kmem_cache_free(delayacct_cache, tsk->delays);
>  		tsk->delays = NULL;
>  	}
> +	mutex_unlock(&delayacct_exit_mutex);
>  }

hm, I wonder how contended that lock is likely to be.

The kmem_cache_free() can happen outside the lock.

> +#ifdef CONFIG_TASKSTATS
> +int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
> +{
> +	nsec_t tmp;
> +	struct timespec ts;
> +	unsigned long t1,t2;
> +
> +	/* zero XXX_total,non-zero XXX_count implies XXX stat overflowed */
> +
> +	tmp = (nsec_t)d->cpu_run_total ;

stray space before semicolon.

> +	tmp += (u64)(tsk->utime+tsk->stime)*TICK_NSEC;
> +	d->cpu_run_total = (tmp < (nsec_t)d->cpu_run_total)? 0: tmp;

Missing space before ?, missing space before :

> +	d->blkio_delay_total = (tmp < d->blkio_delay_total)? 0: tmp;
> +	d->swapin_delay_total = (tmp < d->swapin_delay_total)? 0: tmp;

dittos.

> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.16/kernel/taskstats.c	2006-03-29 18:13:18.000000000 -0500
> @@ -0,0 +1,292 @@
> +/*
> + * taskstats.c - Export per-task statistics to userland
> + *
> + * Copyright (C) Shailabh Nagar, IBM Corp. 2006
> + *           (C) Balbir Singh,   IBM Corp. 2006
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/taskstats.h>
> +#include <linux/delayacct.h>
> +#include <net/genetlink.h>
> +#include <asm/atomic.h>
> +
> +const int taskstats_version = TASKSTATS_VERSION;

Odd.  It'd be better to put TASKSTATS_VERSION in the header file, use that
directly.

> +
> +static inline int fill_pid(pid_t pid, struct task_struct *pidtsk,
> +			   struct taskstats *stats)
> +{
> +	int rc;
> +	struct task_struct *tsk = pidtsk;
> +
> +	if (!pidtsk) {
> +		read_lock(&tasklist_lock);
> +		tsk = find_task_by_pid(pid);
> +		if (!tsk) {
> +			read_unlock(&tasklist_lock);
> +			return -ESRCH;
> +		}
> +		get_task_struct(tsk);
> +		read_unlock(&tasklist_lock);
> +	} else
> +		get_task_struct(tsk);
> +
> +	rc = delayacct_add_tsk(stats, tsk);
> +	put_task_struct(tsk);
> +
> +	return rc;
> +
> +}

Has two callsites, should be uninlined.

> +static inline int fill_tgid(pid_t tgid, struct task_struct *tgidtsk,
> +			    struct taskstats *stats)
> +{
> +	int rc;
> +	struct task_struct *tsk, *first;
> +
> +	first = tgidtsk;
> +	read_lock(&tasklist_lock);
> +	if (!first) {
> +		first = find_task_by_pid(tgid);
> +		if (!first) {
> +			read_unlock(&tasklist_lock);
> +			return -ESRCH;
> +		}
> +	}
> +	tsk = first;
> +	do {
> +		rc = delayacct_add_tsk(stats, tsk);
> +		if (rc)
> +			break;
> +	} while_each_thread(first, tsk);
> +	read_unlock(&tasklist_lock);
> +
> +	return rc;
> +}

Ditto.

It's somewhat similar to fill_pid() - perhaps they can be combined, halving
the overhead?

> +static int taskstats_send_stats(struct sk_buff *skb, struct genl_info *info)
> +{
> +	int rc = 0;
> +	struct sk_buff *rep_skb;
> +	struct taskstats stats;
> +	void *reply;
> +	size_t size;
> +	struct nlattr *na;
> +
> +	/*
> +	 * Size includes space for nested attribute as well
> +	 * The returned data is of the format
> +	 * TASKSTATS_TYPE_AGGR_PID/TGID
> +	 * --> TASKSTATS_TYPE_PID/TGID
> +	 * --> TASKSTATS_TYPE_STATS
> +	 */
> +	size = nla_total_size(sizeof(u32)) +
> +		nla_total_size(sizeof(struct taskstats)) + nla_total_size(0);
> +
> +	memset(&stats, 0, sizeof(stats));
> +	rc = prepare_reply(info, TASKSTATS_CMD_NEW, &rep_skb, &reply, size);
> +	if (rc < 0)
> +		return rc;
> +
> +	if (info->attrs[TASKSTATS_CMD_ATTR_PID]) {
> +		u32 pid = nla_get_u32(info->attrs[TASKSTATS_CMD_ATTR_PID]);
> +		rc = fill_pid((pid_t)pid, NULL, &stats);

We shouldn't have a typecast here.  If it generates a warning then we need
to get in there and find out why.

> +		if (rc < 0)
> +			goto err;
> +
> +		na = nla_nest_start(rep_skb, TASKSTATS_TYPE_AGGR_PID);
> +		NLA_PUT_U32(rep_skb, TASKSTATS_TYPE_PID, pid);
> +	} else if (info->attrs[TASKSTATS_CMD_ATTR_TGID]) {
> +		u32 tgid = nla_get_u32(info->attrs[TASKSTATS_CMD_ATTR_TGID]);
> +		rc = fill_tgid((pid_t)tgid, NULL, &stats);

Ditto.

> +		if (rc < 0)
> +			goto err;
> +
> +		na = nla_nest_start(rep_skb, TASKSTATS_TYPE_AGGR_TGID);
> +		NLA_PUT_U32(rep_skb, TASKSTATS_TYPE_TGID, tgid);
> +	} else {
> +		rc = -EINVAL;
> +		goto err;
> +	}
> +
> +	NLA_PUT_TYPE(rep_skb, struct taskstats, TASKSTATS_TYPE_STATS, stats);
> +	nla_nest_end(rep_skb, na);
> +
> +	return send_reply(rep_skb, info->snd_pid, TASKSTATS_MSG_UNICAST);
> +
> +nla_put_failure:
> +	return  genlmsg_cancel(rep_skb, reply);

Extra space.

> +err:
> +	nlmsg_free(rep_skb);
> +	return rc;
> +}
> +
> +
> +/* Send pid data out on exit */
> +void taskstats_exit_pid(struct task_struct *tsk)
> +{
> +	int rc = 0;
> +	struct sk_buff *rep_skb;
> +	void *reply;
> +	struct taskstats stats;
> +	size_t size;
> +	int is_thread_group = !thread_group_empty(tsk);
> +	struct nlattr *na;
> +
> +	/*
> +	 * tasks can start to exit very early. Ensure that the family
> +	 * is registered before notifications are sent out
> +	 */
> +	if (!family_registered)
> +		return;

This code risks evaluating thread_group_empty() even if !family_registered.
The compiler will most likely sort that out in this case, but it's a risk
when using these initialisers.

> +
> +static int __init taskstats_init(void)
> +{
> +	if (genl_register_family(&family))
> +		return -EFAULT;

EFAULT?

> +        family_registered = 1;

whitespace broke.

> +
> +	if (genl_register_ops(&family, &taskstats_ops))
> +		goto err;
> +
> +	return 0;
> +err:
> +	genl_unregister_family(&family);
> +	family_registered = 0;
> +	return -EFAULT;
> +}
> +
> +late_initcall(taskstats_init);

Why late_initcall()?  (A comment would be appropriate)

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: David S. Miller @ 2006-03-30  4:44 UTC (permalink / raw)
  To: jesse.brandeburg
  Cc: nipsy, jrlundgren, cat, djani22, yoseph.basri, bb, mykleb, olel,
	michal, chris, netdev, jesse.brandeburg, E1000-devel, herbert
In-Reply-To: <C925F8B43D79CC49ACD0601FB68FF50C07461F59@orsmsx408>

From: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
Date: Wed, 29 Mar 2006 18:53:57 -0800

> To do this we have code like so in e1000_tso:
> 2529         if (skb_shinfo(skb)->tso_size) {
> 2530                 if (skb_header_cloned(skb)) {
> 2531                         err = pskb_expand_head(skb, 0, 0,
> GFP_ATOMIC);
> 2532                         if (err)
> 2533                                 return err;
> 2534                 }

I was wondering if that call could somehow mess up the
sk->sk_forward_alloc value later on.

But it can't, sk_forward_alloc is modified based upon the
skb->truesize value, but pskb_expand_head() does not change that.

So the things left to check in the generic networking are the
skb_shinfo() contents and ->dataref handling.

I considered whether pskb_expand_head() could corrupt the TSO
information in skb_shinfo().  But that's clearly not the case because
pskb_expand_head() explicitly copies it over:

	memcpy(data + size, skb->end, sizeof(struct skb_shared_info));

And skb->end is set appropriately:

	skb->end      = data + size;

because skb_shinfo() is:

#define skb_shinfo(SKB)		((struct skb_shared_info *)((SKB)->end))

The only skb_shared_info that has to be explicitly setup is the
dataref, and pskb_expand_head() does that:

	atomic_set(&skb_shinfo(skb)->dataref, 1);

So that all checks out.

I wonder if something funky is going on wrt. the skb_release_data()
that pskb_expand_head() does.  We have that SKB_DATAREF_SHIFT thingy,
which will trigger in this case.

	if (!skb->cloned ||
	    !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
			       &skb_shinfo(skb)->dataref)) {

When we enqueue a new TCP frame we do skb_header_release() which goes:

	skb->nohdr = 1;
	atomic_add(1 << SKB_DATAREF_SHIFT, &skb_shinfo(skb)->dataref);

Presumably the dataref is "1" already when we get here and do this.
We will clone, the clone will set ->nohdr to 0 and will increment the
dataref.

So at this point the dataref should be:

   1 /* initial reference */
 + (1 << SKB_DATAREF_SHIFT) /* from skb_header_release() */
 + 1 /* from skb_clone */

This all works out because when the clone is freed up, skb->nohdr will
be zero, so we will subtract "1" from dataref.  Later when the ACK
arrives we'll free up the non-clone and this will have skb->nohdr set
to "1" and thus we'll subtract

	(1 << SKB_DATAREF_SHIFT) + 1

from dataref, as per skb_release_data().

Although maybe relevant here, I just noticed that __skb_linearize()
does not clear skb->nohdr.  I bet that will cause a bunch of trouble
if the original SKB had skb->nohdr set, but I cannot see how that
can occur, we only send clones out to the device and those have
skb->nohdr clear (Herbert, double check this for me please).

Luckily that thing is used rarely.  Only in the dev_queue_xmit()
path when the SKB has been configured in such a way that the
transmitting device does not support so it should not be relevant
here.  Also I note that __skb_linearize() is not used at all
outside of net/core/dev.c, so we should mark it static some point
soon.  In fact we should do that while fixing this fringe "nohdr"
bug in __skb_linearize().

All the other dataref accesses look safe.

Herbert do you see any holes here?


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Phil Oester @ 2006-03-30  4:25 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: nipsy, jrlundgren, cat, djani22, yoseph.basri, bb, mykleb, olel,
	michal, chris, netdev, Jesse Brandeburg, davem, E1000-devel
In-Reply-To: <C925F8B43D79CC49ACD0601FB68FF50C07461F59@orsmsx408>

On Wed, Mar 29, 2006 at 06:53:57PM -0800, Brandeburg, Jesse wrote:
> Hi all, I've identified you as people who have at some point in the past
> emailed one of the Linux lists with problems with e1000 and
> sk_forward_alloc.  It seems to be fairly widespread, but only seems to
> have appeared with recent kernel changes (after 2.6.12...)
> 
> What I need from you is a reproducible test, and some information.  I
> have never been able to reproduce this, and I'm trying to isolate the
> problem a bit.  What motherboards are you using?  What seems to cause
> this problem?  Are you all using iptables?  Are you all routing? From
> the reports I assume none of you are using an 82571/2/3 (pci express)

Unfortunately it happens randomly, so I have no reproducible test.
Dell 1850s and 2850s here, no iptables, routing, or pci express.
lspci reports:

82541GI/PI Gigabit Ethernet Controller (rev 05)

> As far as I know e1000 has the same requirement as tg3 and some others
> where we have to modify the header of the skb in the case of transmits
> using TSO.  I don't see anywhere else that the driver modifies the skb.
> Tomorrow I'll generate a patch to try a more paranoid copying of the
> skb, I hope some of you can test.

I'll certainly try it as long as it doesn't blow things up :)

Phil


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642

^ permalink raw reply

* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Yoseph Basri @ 2006-03-30  4:02 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: nipsy, jrlundgren, cat, djani22, bb, mykleb, olel, michal, chris,
	netdev, Jesse Brandeburg, davem, E1000-devel
In-Reply-To: <C925F8B43D79CC49ACD0601FB68FF50C07461F59@orsmsx408>

Hi Jesse,

Thanks for your concern,

My server still send warning message regarding this KERNEL: assertion
(!sk_forward_alloc)  after upgrade kernel 2.6.12 or 2.6.15.

This is from dmesg server:

Linux version 2.6.15.4 (root@xxxxx) (gcc version 3.3.4 (Debian
1:3.3.4-13)) #1 SMP Tue Feb 21 17:12:27 SGT 2006
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009d400 (usable)
 BIOS-e820: 000000000009d400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000f7ffb300 (usable)
 BIOS-e820: 00000000f7ffb300 - 00000000f8000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000108000000 (usable)
Warning only 4GB will be used.
Use a PAE enabled kernel.
3200MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 0009d540
On node 0 totalpages: 1048576
  DMA zone: 4096 pages, LIFO batch:0
  DMA32 zone: 0 pages, LIFO batch:0
  Normal zone: 225280 pages, LIFO batch:31
  HighMem zone: 819200 pages, LIFO batch:31
DMI 2.3 present.
ACPI: RSDP (v000 IBM                                   ) @ 0x000fdfc0
ACPI: RSDT (v001 IBM    SERONYXP 0x00001000 IBM  0x45444f43) @ 0xf7ffff80
ACPI: FADT (v001 IBM    SERONYXP 0x00001000 IBM  0x45444f43) @ 0xf7ffff00
ACPI: MADT (v001 IBM    SERONYXP 0x00001000 IBM  0x45444f43) @ 0xf7fffe80
ACPI: ASF! (v016 IBM    SERONYXP 0x00000001 IBM  0x45444f43) @ 0xf7fffdc0
ACPI: DSDT (v001 IBM    SERGEODE 0x00001000 MSFT 0x0100000b) @ 0x00000000
ACPI: PM-Timer IO Port: 0x488
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:2 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 14, version 17, address 0xfec00000, GSI 0-15
ACPI: IOAPIC (id[0x0d] address[0xfec01000] gsi_base[16])
IOAPIC[1]: apic_id 13, version 17, address 0xfec01000, GSI 16-31
ACPI: IOAPIC (id[0x0c] address[0xfec02000] gsi_base[32])
IOAPIC[2]: apic_id 12, version 17, address 0xfec02000, GSI 32-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ7 used by override.
Enabling APIC mode:  Flat.  Using 3 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at f8800000 (gap: f8000000:06c00000)
Built 1 zonelists
Kernel command line: BOOT_IMAGE=linux-2.6.15.4 ro root=801
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
mapped IOAPIC to ffffb000 (fec01000)
mapped IOAPIC to ffffa000 (fec02000)
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 2794.685 MHz processor.
Using pmtmr for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 4025292k/4194304k available (1906k kernel code, 36796k
reserved, 650k data, 240k init, 3145708k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5593.74 BogoMIPS (lpj=11187486)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000
00004400 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000
00004400 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080
00004400 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Xeon(TM) CPU 2.80GHz stepping 07
Booting processor 1/1 eip 2000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5588.12 BogoMIPS (lpj=11176253)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000
00004400 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000
00004400 00000000 00000000
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080
00004400 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Intel(R) Xeon(TM) CPU 2.80GHz stepping 07
Total of 2 processors activated (11181.86 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfd7dc, last bus=8
PCI: Using configuration type 1
ACPI: Subsystem revision 20050902
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
Boot video device is 0000:00:06.0
PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Root Bridge [PCI1] (0000:02)
PCI: Probing PCI hardware (bus 02)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI1._PRT]
ACPI: PCI Root Bridge [PCI2] (0000:04)
PCI: Probing PCI hardware (bus 04)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI2._PRT]
ACPI: PCI Root Bridge [PCI3] (0000:06)
PCI: Probing PCI hardware (bus 06)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI3._PRT]
ACPI: PCI Root Bridge [PCI4] (0000:08)
PCI: Probing PCI hardware (bus 08)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI4._PRT]
ACPI: PCI Interrupt Link [LP00] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP01] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP02] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP03] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP04] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP05] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP06] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP07] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP08] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP09] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP0A] (IRQs *10)
ACPI: PCI Interrupt Link [LP0B] (IRQs *9)
ACPI: PCI Interrupt Link [LP0C] (IRQs *9)
ACPI: PCI Interrupt Link [LP0D] (IRQs *3)
ACPI: PCI Interrupt Link [LP0E] (IRQs *5)
ACPI: PCI Interrupt Link [LP0F] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP10] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP11] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP12] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP13] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP14] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP15] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP16] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP17] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP18] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP19] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1A] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1B] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1C] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1D] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1E] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LP1F] (IRQs) *0, disabled.
ACPI: PCI Interrupt Link [LPUS] (IRQs *11)
SCSI subsystem initialized
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
IA-32 Microcode Update Driver: v1.14 <tigran@veritas.com>
highmem bounce pool size: 64 pages
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
Real Time Clock Driver v1.12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
Intel(R) PRO/1000 Network Driver - version 6.1.16-k2
Copyright (c) 1999-2005 Intel Corporation.
ACPI: PCI Interrupt 0000:06:08.0[A] -> GSI 29 (level, low) -> IRQ 16
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:06:08.1[B] -> GSI 30 (level, low) -> IRQ 17
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
e100: Intel(R) PRO/100 Network Driver, 3.4.14-k4-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
Probing IDE interface ide0...
hda: LG CD-ROM CRN-8245B, ATAPI CD/DVD-ROM drive
Probing IDE interface ide1...
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Fusion MPT base driver 3.03.04
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SPI Host driver 3.03.04
Fusion MPT base driver 3.03.04
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SPI Host driver 3.03.04
ACPI: PCI Interrupt 0000:08:07.0[A] -> GSI 27 (level, low) -> IRQ 18
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
scsi0 : ioc0: LSI53C1030, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=18
  Vendor: IBM-ESXS  Model: MAP3735NC     FN  Rev: B109
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sda: drive cache: write through
SCSI device sda: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sda: drive cache: write through
 sda: sda1 sda2 sda3 < sda5 sda6 sda7 sda8 sda9 sda10 >
sd 0:0:0:0: Attached scsi disk sda
  Vendor: IBM-ESXS  Model: MAP3735NC     FN  Rev: B109
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdb: drive cache: write through
SCSI device sdb: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdb: drive cache: write through
 sdb: sdb1 sdb2 sdb3 sdb4
sd 0:0:1:0: Attached scsi disk sdb
  Vendor: IBM-ESXS  Model: MAP3735NC     FN  Rev: B109
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdc: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdc: drive cache: write through
SCSI device sdc: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdc: drive cache: write through
 sdc: sdc1 sdc2 sdc3 sdc4
sd 0:0:2:0: Attached scsi disk sdc
  Vendor: IBM-ESXS  Model: MAP3735NC     FN  Rev: B109
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdd: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdd: drive cache: write through
SCSI device sdd: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdd: drive cache: write through
 sdd: sdd1 sdd2 sdd3 sdd4
sd 0:0:3:0: Attached scsi disk sdd
  Vendor: IBM-ESXS  Model: MAP3735NC     FN  Rev: B109
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sde: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sde: drive cache: write through
SCSI device sde: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sde: drive cache: write through
 sde: sde1 sde2 sde3 sde4
sd 0:0:4:0: Attached scsi disk sde
  Vendor: IBM-ESXS  Model: MAP3735NC     FN  Rev: B109
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdf: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdf: drive cache: write through
SCSI device sdf: 143374000 512-byte hdwr sectors (73407 MB)
SCSI device sdf: drive cache: write through
 sdf: sdf1 sdf2 sdf3 sdf4
sd 0:0:5:0: Attached scsi disk sdf
  Vendor: IBM       Model: 32P0032a S320  1  Rev: 1
  Type:   Processor                          ANSI SCSI revision: 02
ACPI: PCI Interrupt 0000:08:07.1[B] -> GSI 28 (level, low) -> IRQ 19
mptbase: Initiating ioc1 bringup
ioc1: 53C1030: Capabilities={Initiator}
scsi1 : ioc1: LSI53C1030, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=19
mice: PS/2 mouse device common for all mice
NET: Registered protocol family 2
IP route cache hash table entries: 262144 (order: 8, 1048576 bytes)
TCP established hash table entries: 524288 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
input: AT Translated Set 2 keyboard as /class/input/input0
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
ip_conntrack version 2.4 (8192 buckets, 65536 max) - 176 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Starting balanced_irq
Using IPI Shortcut mode
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 240k freed
Adding 2104504k swap on /dev/sda2.  Priority:-1 extents:1 across:2104504k
EXT3 FS on sda1, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda6, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda10, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda9, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda8, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdb1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdb2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdb3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdb4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdc1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdc2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdc3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdc4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdd1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdd2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdd3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdd4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sde1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sde2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sde3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sde4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdf1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdf2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdf3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdf4, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex

Are you all using iptables?

YES

Are you all routing?

NO

>From the reports I assume none of you are using an 82571/2/3 (pci express)?

think so.

Thnaks again.

YB



On 3/30/06, Brandeburg, Jesse <jesse.brandeburg@intel.com> wrote:
> Hi all, I've identified you as people who have at some point in the past
> emailed one of the Linux lists with problems with e1000 and
> sk_forward_alloc.  It seems to be fairly widespread, but only seems to
> have appeared with recent kernel changes (after 2.6.12...)
>
> What I need from you is a reproducible test, and some information.  I
> have never been able to reproduce this, and I'm trying to isolate the
> problem a bit.  What motherboards are you using?  What seems to cause
> this problem?  Are you all using iptables?  Are you all routing? From
> the reports I assume none of you are using an 82571/2/3 (pci express)
>
> As far as I know e1000 has the same requirement as tg3 and some others
> where we have to modify the header of the skb in the case of transmits
> using TSO.  I don't see anywhere else that the driver modifies the skb.
> Tomorrow I'll generate a patch to try a more paranoid copying of the
> skb, I hope some of you can test.
>
> To do this we have code like so in e1000_tso:
> 2529         if (skb_shinfo(skb)->tso_size) {
> 2530                 if (skb_header_cloned(skb)) {
> 2531                         err = pskb_expand_head(skb, 0, 0,
> GFP_ATOMIC);
> 2532                         if (err)
> 2533                                 return err;
> 2534                 }
> 2535
> 2536                 hdr_len = ((skb->h.raw - skb->data) +
> (skb->h.th->doff << 2));
> 2537                 mss = skb_shinfo(skb)->tso_size;
> 2538                 if (skb->protocol == ntohs(ETH_P_IP)) {
> 2539                         skb->nh.iph->tot_len = 0;
> 2540                         skb->nh.iph->check = 0;
>
> Thanks for your assistance
>
> Jesse
>


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid\x110944&bid$1720&dat\x121642

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox