public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [bug] e100 bug: checksum mismatch on 82551ER rev10
@ 2006-07-09 11:34 Molle Bestefich
  2006-07-10 15:25 ` Auke Kok
  0 siblings, 1 reply; 7+ messages in thread
From: Molle Bestefich @ 2006-07-09 11:34 UTC (permalink / raw)
  To: linux-kernel, linux.nics, scott.feldman

Hello

I'm trying to get Linux running on a Nokia IP130 box.

The 3x Intel i82551ER NICs doesn't work.

This is the console messages (I've added printing of the checksum values):

===============================================================
e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
PCI: Found IRQ 10 for device 0000:00:0e.0
IRQ routing conflict for 0000:00:0e.0, have irq 11, want irq 10
e100: 0000:00:0e.0: e100_eeprom_load: EEPROM corrupted (stored: e6bc,
calc'ed: 54cf)
e100: eth0: e100_probe: addr 0x80100000, irq 11, MAC addr 00:A0:8E:22:58:58
PCI: Found IRQ 11 for device 0000:00:0f.0
IRQ routing conflict for 0000:00:0f.0, have irq 10, want irq 11
e100: 0000:00:0f.0: e100_eeprom_load: EEPROM corrupted (stored: d9cc,
calc'ed: 53cf)
e100: eth1: e100_probe: addr 0x80300000, irq 10, MAC addr 00:A0:8E:22:58:59
PCI: Found IRQ 10 for device 0000:00:10.0
IRQ routing conflict for 0000:00:10.0, have irq 5, want irq 10
e100: 0000:00:10.0: e100_eeprom_load: EEPROM corrupted (stored: d95a,
calc'ed: 52cf)
e100: eth2: e100_probe: addr 0x80500000, irq 5, MAC addr 00:A0:8E:22:58:5A
===============================================================

With vanilla 2.6.17, the e100 module dies with EAGAIN.

Apply this patch which just removes the error return path (and adds a
little debug info):

===============================================================
--- drivers/net/e100.c.orig     2006-07-09 12:03:14.000000000 +0200
+++ drivers/net/e100.c  2006-07-09 12:03:22.000000000 +0200
@@ -756,8 +756,7 @@
         * the sum of words should be 0xBABA */
        checksum = le16_to_cpu(0xBABA - checksum);
        if(checksum != nic->eeprom[nic->eeprom_wc - 1]) {
-               DPRINTK(PROBE, ERR, "EEPROM corrupted\n");
-               return -EAGAIN;
+               DPRINTK(PROBE, ERR, "EEPROM corrupted (stored: %4.4x, calc'ed: %
4.4x)\n", nic->eeprom[nic->eeprom_wc - 1], checksum);
        }

        return 0;
===============================================================

And everything works!

I think I've heard about this bug before, but I don't know why it occurs.
So the best I can do is the above (ignore failed EEPROM checksum test).

My hardware is:
00:0e.0 Ethernet controller: Intel Corporation 8255xER/82551IT Fast
Ethernet Controller (rev 10)
00:0f.0 Ethernet controller: Intel Corporation 8255xER/82551IT Fast
Ethernet Controller (rev 10)
00:10.0 Ethernet controller: Intel Corporation 8255xER/82551IT Fast
Ethernet Controller (rev 10)

I was wondering if we can just remove the -EAGAIN return from the
driver and in effect turn the EEPROM checksum stuff into a non-fatal
test?  Since I can't figure out the correct way to test the checksum
on this hardware, that seems like the best thing to do to me...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] e100 bug: checksum mismatch on 82551ER rev10
  2006-07-09 11:34 [bug] e100 bug: checksum mismatch on 82551ER rev10 Molle Bestefich
@ 2006-07-10 15:25 ` Auke Kok
  2006-07-10 16:45   ` Molle Bestefich
  0 siblings, 1 reply; 7+ messages in thread
From: Auke Kok @ 2006-07-10 15:25 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: linux-kernel, linux.nics

Molle Bestefich wrote:
> I'm trying to get Linux running on a Nokia IP130 box.
> 
> The 3x Intel i82551ER NICs doesn't work.
> 
> Apply this patch which just removes the error return path (and adds a
> little debug info):
> 
> ===============================================================
> --- drivers/net/e100.c.orig     2006-07-09 12:03:14.000000000 +0200
> +++ drivers/net/e100.c  2006-07-09 12:03:22.000000000 +0200
> @@ -756,8 +756,7 @@
>         * the sum of words should be 0xBABA */
>        checksum = le16_to_cpu(0xBABA - checksum);
>        if(checksum != nic->eeprom[nic->eeprom_wc - 1]) {
> -               DPRINTK(PROBE, ERR, "EEPROM corrupted\n");
> -               return -EAGAIN;
> +               DPRINTK(PROBE, ERR, "EEPROM corrupted (stored: %4.4x, 
> calc'ed: %
> 4.4x)\n", nic->eeprom[nic->eeprom_wc - 1], checksum);
>        }
> 
>        return 0;
> ===============================================================
> 
> And everything works!
> 
> I think I've heard about this bug before, but I don't know why it occurs.
> So the best I can do is the above (ignore failed EEPROM checksum test).

[removed scott feldman since he's not maintained e100 for a long time now]

Hi,


If you have received a motherboard or card with a broken EEPROM then your card 
is in a limbo state - it might work but results are unreliable and may cause 
your entire system to break (and even data corruption).

You should contact the hardware vendor and have the board replaced or upgraded 
with a proper EEPROM. Continuing to work with the corrupted EEPROM image that 
you have now can seriously hurt you later on.

Cheers,

Auke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] e100 bug: checksum mismatch on 82551ER rev10
  2006-07-10 15:25 ` Auke Kok
@ 2006-07-10 16:45   ` Molle Bestefich
  2006-07-10 17:20     ` Auke Kok
  0 siblings, 1 reply; 7+ messages in thread
From: Molle Bestefich @ 2006-07-10 16:45 UTC (permalink / raw)
  To: Auke Kok; +Cc: linux-kernel, linux.nics

Auke Kok wrote:
> If you have received a motherboard or card with a broken EEPROM then your card
> is in a limbo state - it might work but results are unreliable and may cause
> your entire system to break (and even data corruption).
>
> You should contact the hardware vendor and have the board replaced or upgraded
> with a proper EEPROM. Continuing to work with the corrupted EEPROM image that
> you have now can seriously hurt you later on.

Every single IP130 I've had my hands on has had an EEPROM that the
Linux driver declared bad.

I'm afraid that it's not the board that's at fault, it's the driver.

The NICs are working perfectly.

(Also, it seems mighty odd to refuse to drive the hardware based on an
EEPROM checksum failure, when the e100 driver will happily load for a
device where for example IRQ routing is broken.  Just another
indication that erroring out in this situation is overkill.)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] e100 bug: checksum mismatch on 82551ER rev10
  2006-07-10 16:45   ` Molle Bestefich
@ 2006-07-10 17:20     ` Auke Kok
  2006-07-10 17:41       ` Molle Bestefich
  0 siblings, 1 reply; 7+ messages in thread
From: Auke Kok @ 2006-07-10 17:20 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: linux-kernel

Molle Bestefich wrote:
> Auke Kok wrote:
>> If you have received a motherboard or card with a broken EEPROM then 
>> your card
>> is in a limbo state - it might work but results are unreliable and may 
>> cause
>> your entire system to break (and even data corruption).
>>
>> You should contact the hardware vendor and have the board replaced or 
>> upgraded
>> with a proper EEPROM. Continuing to work with the corrupted EEPROM 
>> image that
>> you have now can seriously hurt you later on.
> 
> Every single IP130 I've had my hands on has had an EEPROM that the
> Linux driver declared bad.

that means that whoever is selling you the IP130's is consistently putting on 
bad EEPROMs, which is *very* bad. Which vendor is that? They can fix this 
problem for you and for *everyone* else they have sold and will sell IP130's 
to in the future.

> I'm afraid that it's not the board that's at fault, it's the driver.

No it is not. The NIC is supported (you can even call Intel for first line 
support) but if your vendor put a bad EEPROM image on it then all bets are 
off.  Intel provides the vendors with the proper tools to make valid EEPROMs, 
the driver checks them for a very good reason.

> The NICs are working perfectly.

How can you tell? Do you know if jumbo frames work correctly?  Is the device 
properly checksumming? is flow control working properly?  These and many, many 
more settings are determined by the EEPROM.  Seemingly it may work correctly, 
but there is no guarantee whatsoever that it will work correctly at all if the 
checksum is bad.  Again, you can lose data, or worse, you could corrupt memory 
in the system causing massive failure (DMA timings, etc). Unlikely? sure, but 
not impossible.

> (Also, it seems mighty odd to refuse to drive the hardware based on an
> EEPROM checksum failure, when the e100 driver will happily load for a
> device where for example IRQ routing is broken.  Just another
> indication that erroring out in this situation is overkill.)

That is another discussion.  All wifi drivers bail out if the firmware is 
corrupted, why shouldn't e1000 be allowed to do so either? Are you willing to 
risk your data?

Cheers,

Auke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] e100 bug: checksum mismatch on 82551ER rev10
  2006-07-10 17:20     ` Auke Kok
@ 2006-07-10 17:41       ` Molle Bestefich
  2006-07-10 17:58         ` Auke Kok
  0 siblings, 1 reply; 7+ messages in thread
From: Molle Bestefich @ 2006-07-10 17:41 UTC (permalink / raw)
  To: Auke Kok; +Cc: linux-kernel

Auke Kok wrote:
> > Every single IP130 I've had my hands on has had an EEPROM that the
> > Linux driver declared bad.
>
> that means that whoever is selling you the IP130's is consistently putting on
> bad EEPROMs, which is *very* bad. Which vendor is that? They can fix this
> problem for you and for *everyone* else they have sold and will sell IP130's
> to in the future.

Nokia.

Maybe they've changed the BABA magic, or the checksum logic entirely,
to prevent other software than their own OS from running.

> > I'm afraid that it's not the board that's at fault, it's the driver.
>
> No it is not. The NIC is supported (you can even call Intel for first line
> support) but if your vendor put a bad EEPROM image on it then all bets are
> off.  Intel provides the vendors with the proper tools to make valid EEPROMs,
> the driver checks them for a very good reason.

You're completely sure that the EEPROM check is correct for this
particular revision of this particular chip?

(Do you happen to know where the EEPROM is located, by the way?
 Just out of interest.
 I can spot the three Intel chips but not the EEPROM.
 http://chrisbuechler.com/m0n0wall/nokia/images/9.png
 http://chrisbuechler.com/m0n0wall/nokia/images/11.png
 http://chrisbuechler.com/m0n0wall/nokia/images/10.png
 )

> > The NICs are working perfectly.
>
> How can you tell? Do you know if jumbo frames work correctly?  Is the device
> properly checksumming? is flow control working properly?  These and many, many
> more settings are determined by the EEPROM.  Seemingly it may work correctly,
> but there is no guarantee whatsoever that it will work correctly at all if the
> checksum is bad.  Again, you can lose data, or worse, you could corrupt memory
> in the system causing massive failure (DMA timings, etc). Unlikely? sure, but
> not impossible.

They've been used in production environments for years.

> > (Also, it seems mighty odd to refuse to drive the hardware based on an
> > EEPROM checksum failure, when the e100 driver will happily load for a
> > device where for example IRQ routing is broken.  Just another
> > indication that erroring out in this situation is overkill.)
>
> That is another discussion.  All wifi drivers bail out if the firmware is
> corrupted, why shouldn't e1000 be allowed to do so either? Are you willing to
> risk your data?

Yes.
Perhaps an "ignorechecksum" switch would be appropriate.
I'd like to hear from anyone else who has IP130s and are experiencing
this problem (or isn't!).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] e100 bug: checksum mismatch on 82551ER rev10
  2006-07-10 17:41       ` Molle Bestefich
@ 2006-07-10 17:58         ` Auke Kok
  0 siblings, 0 replies; 7+ messages in thread
From: Auke Kok @ 2006-07-10 17:58 UTC (permalink / raw)
  To: Molle Bestefich; +Cc: linux-kernel

Molle Bestefich wrote:
> Auke Kok wrote:
>> > Every single IP130 I've had my hands on has had an EEPROM that the
>> > Linux driver declared bad.
>>
>> that means that whoever is selling you the IP130's is consistently 
>> putting on
>> bad EEPROMs, which is *very* bad. Which vendor is that? They can fix this
>> problem for you and for *everyone* else they have sold and will sell 
>> IP130's
>> to in the future.
> 
> Nokia.
> 
> Maybe they've changed the BABA magic, or the checksum logic entirely,
> to prevent other software than their own OS from running.

in almost all cases where a bad EEPROM checksum is found on a board the vendor 
has changed settings in the EEPROM image without recalculating the checksum.

>> > I'm afraid that it's not the board that's at fault, it's the driver.
>>
>> No it is not. The NIC is supported (you can even call Intel for first 
>> line
>> support) but if your vendor put a bad EEPROM image on it then all bets 
>> are
>> off.  Intel provides the vendors with the proper tools to make valid 
>> EEPROMs,
>> the driver checks them for a very good reason.
> 
> You're completely sure that the EEPROM check is correct for this
> particular revision of this particular chip?

It's valid for every piece of network silicon that has an EEPROM ever made.

> (Do you happen to know where the EEPROM is located, by the way?

it's in the NIC itself. In your case, where you have 3 separate chips, there 
will be 3 different EEPROM images total.

>> How can you tell? Do you know if jumbo frames work correctly?  Is the 
>> device
>> properly checksumming? is flow control working properly?  These and 
>> many, many
>> more settings are determined by the EEPROM.  Seemingly it may work 
>> correctly,
>> but there is no guarantee whatsoever that it will work correctly at 
>> all if the
>> checksum is bad.  Again, you can lose data, or worse, you could 
>> corrupt memory
>> in the system causing massive failure (DMA timings, etc). Unlikely? 
>> sure, but
>> not impossible.
> 
> They've been used in production environments for years.

all the more reason to suggest that Nokia is forgetting to update the checksums :)

Cheers,

Auke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] e100 bug: checksum mismatch on 82551ER rev10
@ 2006-07-31 21:07 Charlie Brady
  0 siblings, 0 replies; 7+ messages in thread
From: Charlie Brady @ 2006-07-31 21:07 UTC (permalink / raw)
  To: linux-kernel


> Molle Bestefich wrote:
>
>> Auke Kok wrote:
>>
>> If you have received a motherboard or card with a broken EEPROM 
>> then your card is in a limbo state - it might work but results are 
>> unreliable and may cause your entire system to break (and even data
>> corruption).

Sure, and on the other hand, it might work (seemingly) perfectly, as it 
has done in the past, and will continue to do so as long as the owner 
wishes it to.

>> You should contact the hardware vendor and have the board replaced or
>> upgraded with a proper EEPROM. Continuing to work with the corrupted
>> EEPROM image that you have now can seriously hurt you later on.

Or a driver change can hurt me *right now*, by leaving my system without 
connectivity.

> Every single IP130 I've had my hands on has had an EEPROM that the
> Linux driver declared bad.

I'm now seeing this problem with a Thinkpad T23. I have a second T23 I can 
test, and will try to do so tonight.

I second the request to at least have a driver option to ignore checksum 
failures.

Auke said earlier:

>> The NICs are working perfectly.
>
> How can you tell? Do you know if jumbo frames work correctly? Is the
> device properly checksumming? is flow control working properly? These
> and many, many more settings are determined by the EEPROM. Seemingly it
> may work correctly, but there is no guarantee whatsoever that it will work
> correctly at all if the checksum is bad. Again, you can lose data, or
> worse, you could corrupt memory in the system causing massive failure (DMA
> timings, etc). Unlikely? sure, but not impossible.

Let's assume that these things are all true, and the NIC currently does 
not work perfectly, just imperfectly, but acceptably. With the recent 
driver change, it now does not work at all. That's surely a bug in the 
driver.

---
Charlie

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-07-31 21:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-09 11:34 [bug] e100 bug: checksum mismatch on 82551ER rev10 Molle Bestefich
2006-07-10 15:25 ` Auke Kok
2006-07-10 16:45   ` Molle Bestefich
2006-07-10 17:20     ` Auke Kok
2006-07-10 17:41       ` Molle Bestefich
2006-07-10 17:58         ` Auke Kok
  -- strict thread matches above, loose matches on Subject: below --
2006-07-31 21:07 Charlie Brady

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox