* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Mark Nipper @ 2006-03-31 16:01 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: Christiaan den Besten, Mark Nipper, Herbert Xu, David S. Miller,
jesse.brandeburg, jrlundgren, cat, djani22, yoseph.basri, mykleb,
olel, michal, netdev, jesse.brandeburg, E1000-devel, Andi Kleen,
Jeff Garzik
In-Reply-To: <442D486D.909@kernelpanic.ru>
On 31 Mar 2006, Boris B. Zhmurov wrote:
> stream.c (279) -> stream.c (283)
> af_inet.c (148) -> af_inet.c (150)
That will be because the patches changed the line numbers
in the source I believe. Nothing helpful unfortunately.
--
Mark Nipper e-contacts:
832 Tanglewood Drive nipsy@bitgnome.net
Bryan, Texas 77802-4013 http://nipsy.bitgnome.net/
(979)575-3193 AIM/Yahoo: texasnipsy ICQ: 66971617
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GG/IT d- s++:+ a- C++$ UBL++++$ P--->+++ L+++$ !E---
W++(--) N+ o K++ w(---) O++ M V(--) PS+++(+) PE(--)
Y+ PGP t+ 5 X R tv b+++@ DI+(++) D+ G e h r++ y+(**)
------END GEEK CODE BLOCK------
---begin random quote of the moment---
"Whiskey-Tango-Foxtrot, over."
-- anonymous
----end random quote of the moment----
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-31 15:19 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: Christiaan den Besten, Mark Nipper, Herbert Xu, David S. Miller,
jesse.brandeburg, jrlundgren, cat, djani22, yoseph.basri, mykleb,
olel, michal, netdev, jesse.brandeburg, E1000-devel, Andi Kleen,
Jeff Garzik
In-Reply-To: <442D45EA.9010309@kernelpanic.ru>
Hello, Boris B. Zhmurov.
On 31.03.2006 19:08 you said the following:
> Hmm... with lastest debug patch I can't see any of debug info:
But wait a minute. Two days ago, without Herbert's patches, assertion's
errors was like this:
Mar 29 20:03:23 msk4 kernel: KERNEL: assertion (!sk->sk_forward_alloc)
failed at net/core/stream.c (279)
Mar 29 20:03:23 msk4 kernel: KERNEL: assertion (!sk->sk_forward_alloc)
failed at net/ipv4/af_inet.c (148)
and after appling patches, errors looks like this:
Mar 31 18:21:06 msk4 kernel: KERNEL: assertion (!sk->sk_forward_alloc)
failed at net/core/stream.c (283)
Mar 31 18:21:06 msk4 kernel: KERNEL: assertion (!sk->sk_forward_alloc)
failed at net/ipv4/af_inet.c (150)
stream.c (279) -> stream.c (283)
af_inet.c (148) -> af_inet.c (150)
Does it really matters?
--
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-31 15:08 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: Christiaan den Besten, Mark Nipper, Herbert Xu, David S. Miller,
jesse.brandeburg, jrlundgren, cat, djani22, yoseph.basri, mykleb,
olel, michal, netdev, jesse.brandeburg, E1000-devel, Andi Kleen,
Jeff Garzik
In-Reply-To: <442D2EF6.1040703@kernelpanic.ru>
Hello, Boris B. Zhmurov.
On 31.03.2006 17:30 you said the following:
> Herbert, with your second patch still no luck. After an hour of uptime I
> have assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (283)
> again...
>
> Trying your debug patch.
Hmm... with lastest debug patch I can't see any of debug info:
e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (283)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (150)
Is it normal?
--
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-31 13:30 UTC (permalink / raw)
To: Christiaan den Besten
Cc: Mark Nipper, Herbert Xu, David S. Miller, jesse.brandeburg,
jrlundgren, cat, djani22, yoseph.basri, mykleb, olel, michal,
netdev, jesse.brandeburg, E1000-devel, Andi Kleen, Jeff Garzik
In-Reply-To: <045601c654c4$c3dece80$3d64880a@speedy>
Hello, Christiaan den Besten.
On 31.03.2006 17:12 you said the following:
> Hi !
>
>> P.S. I have another high-load server as gateway. Same distro, same
>> kernels, but less memory (512Mb lowmem). eth0 up - e100, eth1 up -
>> e1000. No errors at all! It kinda looks like assertions happens on
>> systems, where the _only_ interface _eth1_ e1000 is up.
>
>
> No, we have a couple gateway's asserting.
Yes, my mistake :( My server asserting with eth0 and eth1 is up both...
Herbert, with your second patch still no luck. After an hour of uptime I
have assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (283)
again...
Trying your debug patch.
--
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Christiaan den Besten @ 2006-03-31 13:12 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: Mark Nipper, Herbert Xu, David S. Miller, jesse.brandeburg,
jrlundgren, cat, djani22, yoseph.basri, mykleb, olel, michal,
netdev, jesse.brandeburg, E1000-devel, Andi Kleen, Jeff Garzik
In-Reply-To: <442D24AA.8080609@kernelpanic.ru>
Hi !
> P.S. I have another high-load server as gateway. Same distro, same kernels, but less memory (512Mb lowmem). eth0 up - e100, eth1
> up - e1000. No errors at all! It kinda looks like assertions happens on systems, where the _only_ interface _eth1_ e1000 is up.
No, we have a couple gateway's asserting.
2x : Usenet feeder : Onboard eth0 and eth1 "Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03)" ->
asserts (lot's of disk activity (writes) as well by the way ... ). SMP, 4Gb RAM. (2.6.14-mm2)
4x : Usenet cache : PCI-X eth0 "Ethernet controller: Intel Corporation 82545GM Gigabit Ethernet Controller (rev 04)" -> no asserts
(no disk activity). Has 2 extra onboard e1000's, but are not used (Ethernet controller: Intel Corporation 82541GI/PI Gigabit
Ethernet Controller). SMP, 2Gb RAM (2.6.15.1)
bye,
Chris
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-31 12:46 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: Mark Nipper, Herbert Xu, David S. Miller, jesse.brandeburg,
jrlundgren, cat, djani22, yoseph.basri, mykleb, olel, michal,
chris, netdev, jesse.brandeburg, E1000-devel, Andi Kleen,
Jeff Garzik
In-Reply-To: <442D1F26.8050601@kernelpanic.ru>
Hello, Boris B. Zhmurov.
On 31.03.2006 16:23 you said the following:
> Hello, Mark Nipper.
>
> On 31.03.2006 16:10 you said the following:
>
>> This unfortunately is not the case. I have two e1000
>> interfaces but only eth1 is up and in use. And I still had
>> assertions.
>
>
>
> Can you switch to eth0? There is no problem with _eth0_, my friend says.
P.S. I have another high-load server as gateway. Same distro, same
kernels, but less memory (512Mb lowmem). eth0 up - e100, eth1 up -
e1000. No errors at all! It kinda looks like assertions happens on
systems, where the _only_ interface _eth1_ e1000 is up.
--
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: JaniD++ @ 2006-03-31 12:45 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: davem, jesse.brandeburg, nipsy, jrlundgren, cat, djani22,
yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, "Andi Kleen",
"Jeff Garzik"
In-Reply-To: <442D1B67.8000804@kernelpanic.ru>
----- Original Message -----
From: "Boris B. Zhmurov" <bb@kernelpanic.ru>
To: "Herbert Xu" <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>; <jesse.brandeburg@intel.com>;
<nipsy@bitgnome.net>; <jrlundgren@gmail.com>; <cat@zip.com.au>;
<djani22@dynamicweb.hu>; <yoseph.basri@gmail.com>; <mykleb@no.ibm.com>;
<olel@ans.pl>; <michal@feix.cz>; <chris@scorpion.nl>;
<netdev@vger.kernel.org>; <jesse.brandeburg@gmail.com>;
<E1000-devel@lists.sourceforge.net>; "Andi Kleen" <ak@suse.de>; "Jeff
Garzik" <jgarzik@pobox.com>
Sent: Friday, March 31, 2006 2:07 PM
Subject: Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
> Hello, Herbert Xu.
>
> On 31.03.2006 14:39 you said the following:
>
> > On Fri, Mar 31, 2006 at 02:16:38PM +0400, Boris B. Zhmurov wrote:
> >
> >>And xdelta tells, that e1000.ko was modified :)
> >
> >
> > Thanks for checking again.
> >
> > Anyway, it didn't take long to find another bug in the same area.
> > I'm afraid this driver does seem to be full of them :)
> >
> > It sets last_tx_tso in between computing the number of descriptors and
> > calling e1000_tx_map. This is bad because e1000_tx_map gets the wrong
> > value for last_tx_tso and therefore may corrupt memory for every TSO
> > packet when the ring is almost full.
> >
> > This bug exists on UP as well as SMP.
> >
> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> >
> > Please try this in conjunction with the previous patch.
> >
> > Cheers,
>
>
> David, Herbert - FYI. One of my colleague confirmed, that idea "bug
> reproducible only if there is more then one e1000 adapter onboard" is
> true. He has a 3 servers with double intel pro 1000 adapters, and that
> bug occurs. Also, he has 4 servers with double intel pro 1000 adapters
> onboard, but _only one_ of them is up. And there is no such messages in
> dmesg at all! Inetresting...
This is not an unique thing!
Only _one_ of my 2 equal NIC get this message
NETDEV WATCHDOG: eth0: transmit timed out
e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
with the old 2.6.15.* e1000 driver!
Not the all e1000 chips ar really equal with the same P/N Number!
This can be hardware based problem, and needs workaround?
Cheers,
>
> --
> Boris B. Zhmurov
> mailto: bb@kernelpanic.ru
> "wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
>
> _____________ NOD32 1.584 (20031220) Információ _____________
>
> Az üzenetet a NOD32 Antivirus System megvizsgálta.
> http://www.nod32.hu
>
>
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-31 12:36 UTC (permalink / raw)
To: Herbert Xu
Cc: Mark Nipper, David S. Miller, jesse.brandeburg, jrlundgren, cat,
djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, Andi Kleen, Jeff Garzik
In-Reply-To: <20060331123514.GA13500@gondor.apana.org.au>
Hello, Herbert Xu.
On 31.03.2006 16:35 you said the following:
> On Fri, Mar 31, 2006 at 04:23:02PM +0400, Boris B. Zhmurov wrote:
>
>>I'm already using kernel with second Herbert's patch. We'll see...
>
>
> If it still fails
Not yet. But give it a time :)
--
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Herbert Xu @ 2006-03-31 12:35 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: Mark Nipper, David S. Miller, jesse.brandeburg, jrlundgren, cat,
djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, Andi Kleen, Jeff Garzik
In-Reply-To: <442D1F26.8050601@kernelpanic.ru>
[-- Attachment #1: Type: text/plain, Size: 459 bytes --]
On Fri, Mar 31, 2006 at 04:23:02PM +0400, Boris B. Zhmurov wrote:
>
> I'm already using kernel with second Herbert's patch. We'll see...
If it still fails, here is a debugging patch which should tell us
whether we need to look elsewhere.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: e1000-debug.patch --]
[-- Type: text/plain, Size: 662 bytes --]
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 49cd096..64ac6f4 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2906,6 +2906,13 @@
e1000_tx_map(adapter, tx_ring, skb, first,
max_per_txd, nr_frags, mss));
+ tso = tx_ring->next_to_use - first;
+ if (tso < 0)
+ tso += tx_ring->count;
+ if (unlikely(tso > count))
+ printk(KERN_ERR "e1000 bug: mss=%d, len=%d, frags=%d, est=%d, actual=%d\n",
+ mss, skb->len, nr_frags, count, tso);
+
netdev->trans_start = jiffies;
/* Make sure there is space in the ring for the next send. */
^ permalink raw reply related
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-31 12:23 UTC (permalink / raw)
To: Mark Nipper
Cc: Herbert Xu, David S. Miller, jesse.brandeburg, jrlundgren, cat,
djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, Andi Kleen, Jeff Garzik
In-Reply-To: <20060331121007.GA2146@king.bitgnome.net>
Hello, Mark Nipper.
On 31.03.2006 16:10 you said the following:
> This unfortunately is not the case. I have two e1000
> interfaces but only eth1 is up and in use. And I still had
> assertions.
Can you switch to eth0? There is no problem with _eth0_, my friend says.
> And I still had
> assertions.
I'm already using kernel with second Herbert's patch. We'll see...
--
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Ingo Oeser @ 2006-03-31 12:18 UTC (permalink / raw)
To: Herbert Xu
Cc: David S. Miller, jesse.brandeburg, nipsy, jrlundgren, cat,
djani22, yoseph.basri, bb, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel
In-Reply-To: <20060331094240.GA11040@gondor.apana.org.au>
Hi,
Herbert Xu wrote:
> On Fri, Mar 31, 2006 at 01:35:40AM -0800, David S. Miller wrote:
> > He does not have TSO enabled, e1000 disables TSO when on a link speed
> > slower than gigabit.
dmesg|grep eth0
[4294671.426000] e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
[4294679.125000] e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
So this theory doesn't seem to hold :-(
> Indeed. But I think that only happens on PCI Express and I don't think
> Ingo is using PCI Express.
Right. PCI-Express is not available in this machine.
Maybe the traffic is not enough to trigger it. External connect is just a 6MBit DSL.
Regards
Ingo Oeser
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Mark Nipper @ 2006-03-31 12:10 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: Herbert Xu, David S. Miller, jesse.brandeburg, nipsy, jrlundgren,
cat, djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, Andi Kleen, Jeff Garzik
In-Reply-To: <442D1B67.8000804@kernelpanic.ru>
On 31 Mar 2006, Boris B. Zhmurov wrote:
> David, Herbert - FYI. One of my colleague confirmed, that idea "bug
> reproducible only if there is more then one e1000 adapter onboard" is
> true. He has a 3 servers with double intel pro 1000 adapters, and that
> bug occurs. Also, he has 4 servers with double intel pro 1000 adapters
> onboard, but _only one_ of them is up. And there is no such messages in
> dmesg at all! Inetresting...
This unfortunately is not the case. I have two e1000
interfaces but only eth1 is up and in use. And I still had
assertions. Hopefully the two already discovered problems will
fix things up for everyone though.
--
Mark Nipper e-contacts:
832 Tanglewood Drive nipsy@bitgnome.net
Bryan, Texas 77802-4013 http://nipsy.bitgnome.net/
(979)575-3193 AIM/Yahoo: texasnipsy ICQ: 66971617
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GG/IT d- s++:+ a- C++$ UBL++++$ P--->+++ L+++$ !E---
W++(--) N+ o K++ w(---) O++ M V(--) PS+++(+) PE(--)
Y+ PGP t+ 5 X R tv b+++@ DI+(++) D+ G e h r++ y+(**)
------END GEEK CODE BLOCK------
---begin random quote of the moment---
Generalizations are usually flawed by exceptions.
-- seen at http://wunderland.com/
----end random quote of the moment----
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-31 12:07 UTC (permalink / raw)
To: Herbert Xu
Cc: David S. Miller, jesse.brandeburg, nipsy, jrlundgren, cat,
djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, Andi Kleen, Jeff Garzik
In-Reply-To: <20060331103956.GA12181@gondor.apana.org.au>
Hello, Herbert Xu.
On 31.03.2006 14:39 you said the following:
> On Fri, Mar 31, 2006 at 02:16:38PM +0400, Boris B. Zhmurov wrote:
>
>>And xdelta tells, that e1000.ko was modified :)
>
>
> Thanks for checking again.
>
> Anyway, it didn't take long to find another bug in the same area.
> I'm afraid this driver does seem to be full of them :)
>
> It sets last_tx_tso in between computing the number of descriptors and
> calling e1000_tx_map. This is bad because e1000_tx_map gets the wrong
> value for last_tx_tso and therefore may corrupt memory for every TSO
> packet when the ring is almost full.
>
> This bug exists on UP as well as SMP.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> Please try this in conjunction with the previous patch.
>
> Cheers,
David, Herbert - FYI. One of my colleague confirmed, that idea "bug
reproducible only if there is more then one e1000 adapter onboard" is
true. He has a 3 servers with double intel pro 1000 adapters, and that
bug occurs. Also, he has 4 servers with double intel pro 1000 adapters
onboard, but _only one_ of them is up. And there is no such messages in
dmesg at all! Inetresting...
--
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: JaniD++ @ 2006-03-31 12:02 UTC (permalink / raw)
To: Herbert Xu
Cc: netdev, jesse.brandeburg, nipsy, jrlundgren, cat, djani22,
yoseph.basri, bb, mykleb, olel, michal, chris, netdev,
Jesse Brandeburg, E1000-devel
In-Reply-To: <20060331094240.GA11040@gondor.apana.org.au>
----- Original Message -----
From: "Herbert Xu" <herbert@gondor.apana.org.au>
To: "David S. Miller" <davem@davemloft.net>
Cc: <netdev@axxeo.de>; <jesse.brandeburg@intel.com>; <nipsy@bitgnome.net>;
<jrlundgren@gmail.com>; <cat@zip.com.au>; <djani22@dynamicweb.hu>;
<yoseph.basri@gmail.com>; <bb@kernelpanic.ru>; <mykleb@no.ibm.com>;
<olel@ans.pl>; <michal@feix.cz>; <chris@scorpion.nl>;
<netdev@vger.kernel.org>; <jesse.brandeburg@gmail.com>;
<E1000-devel@lists.sourceforge.net>
Sent: Friday, March 31, 2006 11:42 AM
Subject: Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
> On Fri, Mar 31, 2006 at 01:35:40AM -0800, David S. Miller wrote:
> >
> > He does not have TSO enabled, e1000 disables TSO when on a link speed
> > slower than gigabit.
>
> Indeed. But I think that only happens on PCI Express and I don't think
> Ingo is using PCI Express.
No, my card is "64-bit PCI-X Rev. 1.0 master interface". - from the
datasheet
Number : "82546GB"
This is not PCI Express issue!
Cheers,
>
> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> _____________ NOD32 1.584 (20031220) Információ _____________
>
> Az üzenetet a NOD32 Antivirus System megvizsgálta.
> http://www.nod32.hu
>
>
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Andi Kleen @ 2006-03-31 11:15 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: Herbert Xu, David S. Miller, jesse.brandeburg, nipsy, jrlundgren,
cat, djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, Jeff Garzik
In-Reply-To: <442D1B67.8000804@kernelpanic.ru>
On Friday 31 March 2006 14:07, Boris B. Zhmurov wrote:
> David, Herbert - FYI. One of my colleague confirmed, that idea "bug
> reproducible only if there is more then one e1000 adapter onboard" is
> true. He has a 3 servers with double intel pro 1000 adapters, and that
> bug occurs. Also, he has 4 servers with double intel pro 1000 adapters
> onboard, but _only one_ of them is up. And there is no such messages in
> dmesg at all! Inetresting...
At least all our systems with troubles seem to have more than one e1000
though. Usually only one is active though.
We're still not 100% it is actually the E1000, it is a bit hard to reproduce
the memory corruption :/
-Andi
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-31 11:02 UTC (permalink / raw)
To: Herbert Xu
Cc: davem, jesse.brandeburg, nipsy, jrlundgren, cat, djani22,
yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, ak, jgarzik
In-Reply-To: <E1FPHFJ-0003Fq-00@gondolin.me.apana.org.au>
Hello, Herbert Xu.
On 31.03.2006 14:52 you said the following:
> BTW, if you kept the built tree it is possible to apply the patch and
> then do a make which should compile just the e1000 driver.
>
> Cheers,
Thank's for the tip, actually I knew that :) First of, I've already
applied some other new patches from bk-commits-head. Not for the e1000
driver. And second - I didn't keep the tree, rpmbuild cleaned it up :)
That's why I'm recompiling entire kernel.
--
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Herbert Xu @ 2006-03-31 10:52 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: davem, herbert, jesse.brandeburg, nipsy, jrlundgren, cat, djani22,
yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, ak, jgarzik
In-Reply-To: <442D09A3.3020700@kernelpanic.ru>
Boris B. Zhmurov <bb@kernelpanic.ru> wrote:
>
> Recompiling the kernel. I need about 2 hours to get the answer...
BTW, if you kept the built tree it is possible to apply the patch and
then do a make which should compile just the e1000 driver.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Mark Nipper @ 2006-03-31 10:51 UTC (permalink / raw)
To: David S. Miller
Cc: herbert, netdev, jesse.brandeburg, nipsy, jrlundgren, cat,
djani22, yoseph.basri, bb, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel
In-Reply-To: <20060331.013540.95485284.davem@davemloft.net>
On 31 Mar 2006, David S. Miller wrote:
> He does not have TSO enabled, e1000 disables TSO when on a link speed
> slower than gigabit.
>
> You'll see something like the following in your logs:
>
> e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO
Um...
---
$ uname -a
Linux king 2.6.16.1 #1 SMP Thu Mar 30 06:11:33 CST 2006 i686 GNU/Linux
$ dmesg | grep -i task
e1000: eth1: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
$ ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
---
I know for a fact the link is 100Mbps (other than the
output from the driver itself) and I have been bitten by the
assertion.
I've been running the first patch for about the last 24
hours and have not seen any assertions yet (although they don't
occur that frequently on this server). I'll be adding the
second, most recent patch in a bit and rebooting again.
Hopefully between the two of them, that will have fixed the
problem.
--
Mark Nipper e-contacts:
832 Tanglewood Drive nipsy@bitgnome.net
Bryan, Texas 77802-4013 http://nipsy.bitgnome.net/
(979)575-3193 AIM/Yahoo: texasnipsy ICQ: 66971617
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GG/IT d- s++:+ a- C++$ UBL++++$ P--->+++ L+++$ !E---
W++(--) N+ o K++ w(---) O++ M V(--) PS+++(+) PE(--)
Y+ PGP t+ 5 X R tv b+++@ DI+(++) D+ G e h r++ y+(**)
------END GEEK CODE BLOCK------
---begin random quote of the moment---
And if I close my mind in fear, please pry it open.
----end random quote of the moment----
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-31 10:51 UTC (permalink / raw)
To: David S. Miller
Cc: herbert, jesse.brandeburg, nipsy, jrlundgren, cat, djani22,
yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, ak, jgarzik
In-Reply-To: <20060331.024544.96296223.davem@davemloft.net>
Hello, David S. Miller.
On 31.03.2006 14:45 you said the following:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Fri, 31 Mar 2006 21:39:56 +1100
>
>
>>Anyway, it didn't take long to find another bug in the same area.
>>I'm afraid this driver does seem to be full of them :)
>
>
> Indeed.
>
> Thanks for picking through this some more Herbert. I hope we got it
> this time.
Recompiling the kernel. I need about 2 hours to get the answer...
--
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: David S. Miller @ 2006-03-31 10:45 UTC (permalink / raw)
To: herbert
Cc: bb, jesse.brandeburg, nipsy, jrlundgren, cat, djani22,
yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, ak, jgarzik
In-Reply-To: <20060331103956.GA12181@gondor.apana.org.au>
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 31 Mar 2006 21:39:56 +1100
> Anyway, it didn't take long to find another bug in the same area.
> I'm afraid this driver does seem to be full of them :)
Indeed.
Thanks for picking through this some more Herbert. I hope we got it
this time.
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Herbert Xu @ 2006-03-31 10:39 UTC (permalink / raw)
To: Boris B. Zhmurov
Cc: David S. Miller, jesse.brandeburg, nipsy, jrlundgren, cat,
djani22, yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel, Andi Kleen, Jeff Garzik
In-Reply-To: <442D0186.8090705@kernelpanic.ru>
[-- Attachment #1: Type: text/plain, Size: 890 bytes --]
On Fri, Mar 31, 2006 at 02:16:38PM +0400, Boris B. Zhmurov wrote:
>
> And xdelta tells, that e1000.ko was modified :)
Thanks for checking again.
Anyway, it didn't take long to find another bug in the same area.
I'm afraid this driver does seem to be full of them :)
It sets last_tx_tso in between computing the number of descriptors and
calling e1000_tx_map. This is bad because e1000_tx_map gets the wrong
value for last_tx_tso and therefore may corrupt memory for every TSO
packet when the ring is almost full.
This bug exists on UP as well as SMP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Please try this in conjunction with the previous patch.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: e1000-tso.patch --]
[-- Type: text/plain, Size: 645 bytes --]
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 49cd096..38aeff9 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2891,7 +2891,6 @@
}
if (likely(tso)) {
- tx_ring->last_tx_tso = 1;
tx_flags |= E1000_TX_FLAGS_TSO;
} else if (likely(e1000_tx_csum(adapter, tx_ring, skb)))
tx_flags |= E1000_TX_FLAGS_CSUM;
@@ -2905,6 +2904,8 @@
e1000_tx_queue(adapter, tx_ring, tx_flags,
e1000_tx_map(adapter, tx_ring, skb, first,
max_per_txd, nr_frags, mss));
+
+ tx_ring->last_tx_tso = tso;
netdev->trans_start = jiffies;
^ permalink raw reply related
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Boris B. Zhmurov @ 2006-03-31 10:16 UTC (permalink / raw)
To: David S. Miller
Cc: herbert, jesse.brandeburg, nipsy, jrlundgren, cat, djani22,
yoseph.basri, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel
In-Reply-To: <20060331.011245.26474207.davem@davemloft.net>
Hello, David S. Miller.
On 31.03.2006 13:12 you said the following:
> From: "Boris B. Zhmurov" <bb@kernelpanic.ru>
> Date: Thu, 30 Mar 2006 17:29:09 +0400
>
>
>>Hello, Herbert Xu.
>>
>>On 30.03.2006 14:12 you said the following:
>>
>>
>>>On Thu, Mar 30, 2006 at 10:02:01AM +0000, Boris B. Zhmurov wrote:
>>>
>>>
>>>>[zhmurov@builds linux-2.6.16]$ patch -p1 <
>>>>../../../SOURCES/linux-2.6.16-e1000-try-to-fix-assertion_sk_forward_alloc_failed_by_Herbert_Xu.patch
>>>>
>>>>patching file drivers/net/e1000/e1000_main.c
>>>>Reversed (or previously applied) patch detected! Assume -R? [n]
>>>>
>>>>Herbert, is that patch already included in 2.6.16.1?
>>>
>>>
>>>Not really. It's just patch being silly (or too smart :)
>>>
>>>Here it is again rediffed against 2.6.16.
>>
>>
>>Nope, with this patch the problem still exists. After 25 min. uptime
>>with patched kernel 2.6.16.1, I have:
>
>
> Can you please double and triple check that you're really running a
> kernel with the fix from Herbert applied? I make this mistake all
> the time :-)
>
> Thanks.
>
[zhmurov@builds redhat]$ rpmbuild --sign --rebuild --target=i686
/usr/src/redhat/SRPMS/kernel-2.6.16-1.14.bbel4.src.rpm
Enter pass phrase:
Pass phrase is good.
..... SKIP.....
+ echo 'Patch #295
(linux-2.6.16-e1000-try-to-fix-assertion_sk_forward_alloc_failed_by_Herbert_Xu.patch):'
Patch #295
(linux-2.6.16-e1000-try-to-fix-assertion_sk_forward_alloc_failed_by_Herbert_Xu.patch):
+ patch -p1 -s
...... SKIP .....
And xdelta tells, that e1000.ko was modified :)
P.S.
there is src.rpm of my kernel for RHEL4 and my RHEL4-clone:
ftp://builds.kernelpanic.ru/pub/linux/BBEL/updates/kernel_of_the_day/4/SRPMS/
and yum'able repo of 2.6.16 kernels _with_ Herbert's patch :), if
anybody interested:
ftp://builds.kernelpanic.ru/pub/linux/BBEL/updates/kernel_of_the_day/4/
--
Boris B. Zhmurov
mailto: bb@kernelpanic.ru
"wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Herbert Xu @ 2006-03-31 9:42 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, jesse.brandeburg, nipsy, jrlundgren, cat, djani22,
yoseph.basri, bb, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel
In-Reply-To: <20060331.013540.95485284.davem@davemloft.net>
On Fri, Mar 31, 2006 at 01:35:40AM -0800, David S. Miller wrote:
>
> He does not have TSO enabled, e1000 disables TSO when on a link speed
> slower than gigabit.
Indeed. But I think that only happens on PCI Express and I don't think
Ingo is using PCI Express.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: David S. Miller @ 2006-03-31 9:35 UTC (permalink / raw)
To: herbert
Cc: netdev, jesse.brandeburg, nipsy, jrlundgren, cat, djani22,
yoseph.basri, bb, mykleb, olel, michal, chris, netdev,
jesse.brandeburg, E1000-devel
In-Reply-To: <E1FPFkT-0002om-00@gondolin.me.apana.org.au>
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 31 Mar 2006 20:16:53 +1100
> Ingo Oeser <netdev@axxeo.de> wrote:
> >
> > More datapoints.
> >
> > First of all, I don't see the problem, so this is an exclusion data point.
>
> Great. I think so far all the configurations that have this problem
> are
>
> e1000 + SMP + TSO
>
> Since your machine is not SMP but has the other two things it would
> indicate that this is an SMP race.
He does not have TSO enabled, e1000 disables TSO when on a link speed
slower than gigabit.
You'll see something like the following in your logs:
e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
* Re: [e1000 debug] KERNEL: assertion (!sk_forward_alloc) failed...
From: Herbert Xu @ 2006-03-31 9:16 UTC (permalink / raw)
To: Ingo Oeser
Cc: jesse.brandeburg, nipsy, jrlundgren, cat, djani22, yoseph.basri,
bb, mykleb, olel, michal, chris, netdev, jesse.brandeburg, davem,
E1000-devel
In-Reply-To: <200603311057.07213.netdev@axxeo.de>
Ingo Oeser <netdev@axxeo.de> wrote:
>
> More datapoints.
>
> First of all, I don't see the problem, so this is an exclusion data point.
Great. I think so far all the configurations that have this problem
are
e1000 + SMP + TSO
Since your machine is not SMP but has the other two things it would
indicate that this is an SMP race.
If there are anyone else out there who do have this problem and are
either using something other than e1000, have disabled TSO, or are UP,
please speak up now.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox