public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [bug] e100 bug: checksum mismatch on 82551ER rev10
@ 2006-07-09 11:34 Molle Bestefich
  2006-07-10 15:25 ` Auke Kok
  0 siblings, 1 reply; 7+ messages in thread
From: Molle Bestefich @ 2006-07-09 11:34 UTC (permalink / raw)
  To: linux-kernel, linux.nics, scott.feldman

Hello

I'm trying to get Linux running on a Nokia IP130 box.

The 3x Intel i82551ER NICs doesn't work.

This is the console messages (I've added printing of the checksum values):

===============================================================
e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
PCI: Found IRQ 10 for device 0000:00:0e.0
IRQ routing conflict for 0000:00:0e.0, have irq 11, want irq 10
e100: 0000:00:0e.0: e100_eeprom_load: EEPROM corrupted (stored: e6bc,
calc'ed: 54cf)
e100: eth0: e100_probe: addr 0x80100000, irq 11, MAC addr 00:A0:8E:22:58:58
PCI: Found IRQ 11 for device 0000:00:0f.0
IRQ routing conflict for 0000:00:0f.0, have irq 10, want irq 11
e100: 0000:00:0f.0: e100_eeprom_load: EEPROM corrupted (stored: d9cc,
calc'ed: 53cf)
e100: eth1: e100_probe: addr 0x80300000, irq 10, MAC addr 00:A0:8E:22:58:59
PCI: Found IRQ 10 for device 0000:00:10.0
IRQ routing conflict for 0000:00:10.0, have irq 5, want irq 10
e100: 0000:00:10.0: e100_eeprom_load: EEPROM corrupted (stored: d95a,
calc'ed: 52cf)
e100: eth2: e100_probe: addr 0x80500000, irq 5, MAC addr 00:A0:8E:22:58:5A
===============================================================

With vanilla 2.6.17, the e100 module dies with EAGAIN.

Apply this patch which just removes the error return path (and adds a
little debug info):

===============================================================
--- drivers/net/e100.c.orig     2006-07-09 12:03:14.000000000 +0200
+++ drivers/net/e100.c  2006-07-09 12:03:22.000000000 +0200
@@ -756,8 +756,7 @@
         * the sum of words should be 0xBABA */
        checksum = le16_to_cpu(0xBABA - checksum);
        if(checksum != nic->eeprom[nic->eeprom_wc - 1]) {
-               DPRINTK(PROBE, ERR, "EEPROM corrupted\n");
-               return -EAGAIN;
+               DPRINTK(PROBE, ERR, "EEPROM corrupted (stored: %4.4x, calc'ed: %
4.4x)\n", nic->eeprom[nic->eeprom_wc - 1], checksum);
        }

        return 0;
===============================================================

And everything works!

I think I've heard about this bug before, but I don't know why it occurs.
So the best I can do is the above (ignore failed EEPROM checksum test).

My hardware is:
00:0e.0 Ethernet controller: Intel Corporation 8255xER/82551IT Fast
Ethernet Controller (rev 10)
00:0f.0 Ethernet controller: Intel Corporation 8255xER/82551IT Fast
Ethernet Controller (rev 10)
00:10.0 Ethernet controller: Intel Corporation 8255xER/82551IT Fast
Ethernet Controller (rev 10)

I was wondering if we can just remove the -EAGAIN return from the
driver and in effect turn the EEPROM checksum stuff into a non-fatal
test?  Since I can't figure out the correct way to test the checksum
on this hardware, that seems like the best thing to do to me...

^ permalink raw reply	[flat|nested] 7+ messages in thread
* Re: [bug] e100 bug: checksum mismatch on 82551ER rev10
@ 2006-07-31 21:07 Charlie Brady
  0 siblings, 0 replies; 7+ messages in thread
From: Charlie Brady @ 2006-07-31 21:07 UTC (permalink / raw)
  To: linux-kernel


> Molle Bestefich wrote:
>
>> Auke Kok wrote:
>>
>> If you have received a motherboard or card with a broken EEPROM 
>> then your card is in a limbo state - it might work but results are 
>> unreliable and may cause your entire system to break (and even data
>> corruption).

Sure, and on the other hand, it might work (seemingly) perfectly, as it 
has done in the past, and will continue to do so as long as the owner 
wishes it to.

>> You should contact the hardware vendor and have the board replaced or
>> upgraded with a proper EEPROM. Continuing to work with the corrupted
>> EEPROM image that you have now can seriously hurt you later on.

Or a driver change can hurt me *right now*, by leaving my system without 
connectivity.

> Every single IP130 I've had my hands on has had an EEPROM that the
> Linux driver declared bad.

I'm now seeing this problem with a Thinkpad T23. I have a second T23 I can 
test, and will try to do so tonight.

I second the request to at least have a driver option to ignore checksum 
failures.

Auke said earlier:

>> The NICs are working perfectly.
>
> How can you tell? Do you know if jumbo frames work correctly? Is the
> device properly checksumming? is flow control working properly? These
> and many, many more settings are determined by the EEPROM. Seemingly it
> may work correctly, but there is no guarantee whatsoever that it will work
> correctly at all if the checksum is bad. Again, you can lose data, or
> worse, you could corrupt memory in the system causing massive failure (DMA
> timings, etc). Unlikely? sure, but not impossible.

Let's assume that these things are all true, and the NIC currently does 
not work perfectly, just imperfectly, but acceptably. With the recent 
driver change, it now does not work at all. That's surely a bug in the 
driver.

---
Charlie

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-07-31 21:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-09 11:34 [bug] e100 bug: checksum mismatch on 82551ER rev10 Molle Bestefich
2006-07-10 15:25 ` Auke Kok
2006-07-10 16:45   ` Molle Bestefich
2006-07-10 17:20     ` Auke Kok
2006-07-10 17:41       ` Molle Bestefich
2006-07-10 17:58         ` Auke Kok
  -- strict thread matches above, loose matches on Subject: below --
2006-07-31 21:07 Charlie Brady

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox