netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Realtek 8169 problems with net booting
@ 2008-11-24 18:14 Alan Cox
  2008-11-24 21:57 ` David Miller
  2008-11-29 20:44 ` Francois Romieu
  0 siblings, 2 replies; 12+ messages in thread
From: Alan Cox @ 2008-11-24 18:14 UTC (permalink / raw)
  To: netdev

On one box here it has always been the case that now and then the boot
with crash just after r8169 is loaded iff the BIOS network rom (ie the
dhcp stuff) is enabled. Its erratic, hard to reproduce but I finally got
around to looking at the driver and have a question that seems to apply
to several network drivers

8169 does this

	pci_set_master
	twiddle a few bits
	soft reset chip

The master bit is off when the driver is loaded it appears but surely the
driver should do

	twiddle a few bits
	soft reset chip
	pci_set_master

otherwise it has no idea if a warm boot from Linux without neat shutdown,
or a BIOS tftp has left the chip trying to spew into main memory ?

Alan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-11-24 18:14 Realtek 8169 problems with net booting Alan Cox
@ 2008-11-24 21:57 ` David Miller
  2008-12-04 17:22   ` Michael Brown
  2008-11-29 20:44 ` Francois Romieu
  1 sibling, 1 reply; 12+ messages in thread
From: David Miller @ 2008-11-24 21:57 UTC (permalink / raw)
  To: alan; +Cc: netdev

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Mon, 24 Nov 2008 18:14:56 +0000

> 8169 does this
> 
> 	pci_set_master
> 	twiddle a few bits
> 	soft reset chip
> 
> The master bit is off when the driver is loaded it appears but surely the
> driver should do
> 
> 	twiddle a few bits
> 	soft reset chip
> 	pci_set_master
> 
> otherwise it has no idea if a warm boot from Linux without neat shutdown,
> or a BIOS tftp has left the chip trying to spew into main memory ?

Yes, a lot of drivers will enable bus mastering before resetting
the chip.

The basic assumption is that the chip is quiescent at driver load
time.

Since switching around this order across the board is too
gigantic a project, I would suggest just handling things on
a case-by-case basis where we know the BIOS or firmware leave
the chip in a crud state like this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-11-24 18:14 Realtek 8169 problems with net booting Alan Cox
  2008-11-24 21:57 ` David Miller
@ 2008-11-29 20:44 ` Francois Romieu
  2008-11-29 21:06   ` Al Viro
  1 sibling, 1 reply; 12+ messages in thread
From: Francois Romieu @ 2008-11-29 20:44 UTC (permalink / raw)
  To: Alan Cox; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

Alan Cox <alan@lxorguk.ukuu.org.uk> :
> On one box here it has always been the case that now and then the boot
> with crash just after r8169 is loaded iff the BIOS network rom (ie the
> dhcp stuff) is enabled. Its erratic, hard to reproduce but I finally got
> around to looking at the driver and have a question that seems to apply
> to several network drivers
[...]
> The master bit is off when the driver is loaded it appears but surely the
> driver should do
>
> 	twiddle a few bits
> 	soft reset chip
> 	pci_set_master

You are right.

Can you try the attached patch against 2.6.28-rc6 and tell if it makes a
difference or not ?

While I did not test it in a BIOS network boot configuration it did not
crash trivially with these devices:
RTL8168b/8111b / XID 38000000
RTL8110s       / XID 04000000

Note to others: this patch needs testing with different chipsets (XID)
before being included upstream.

--
Ueimor

[-- Attachment #2: 0001-r8169-enable-bus-mastering-after-the-chipset-is-res.patch --]
[-- Type: text/plain, Size: 1038 bytes --]

>From 67a7da6ddf8d2f8ca7f0be04a8d70e77e2dc7285 Mon Sep 17 00:00:00 2001
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Sat, 29 Nov 2008 20:54:18 +0100
Subject: [PATCH] r8169: enable bus mastering after the chipset is reset

Based on a suggestion by Alan Cox.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
---
 drivers/net/r8169.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 4b7cb38..b5a7358 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2011,8 +2011,6 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		}
 	}
 
-	pci_set_master(pdev);
-
 	/* ioremap MMIO region */
 	ioaddr = ioremap(pci_resource_start(pdev, region), R8169_REGS_SIZE);
 	if (!ioaddr) {
@@ -2039,6 +2037,8 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		msleep_interruptible(1);
 	}
 
+	pci_set_master(pdev);
+
 	/* Identify chip attached to board */
 	rtl8169_get_mac_version(tp, ioaddr);
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-11-29 20:44 ` Francois Romieu
@ 2008-11-29 21:06   ` Al Viro
  2008-11-29 21:28     ` Francois Romieu
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2008-11-29 21:06 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Alan Cox, netdev

On Sat, Nov 29, 2008 at 09:44:17PM +0100, Francois Romieu wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> :
> > On one box here it has always been the case that now and then the boot
> > with crash just after r8169 is loaded iff the BIOS network rom (ie the
> > dhcp stuff) is enabled. Its erratic, hard to reproduce but I finally got
> > around to looking at the driver and have a question that seems to apply
> > to several network drivers
> [...]
> > The master bit is off when the driver is loaded it appears but surely the
> > driver should do
> >
> > 	twiddle a few bits
> > 	soft reset chip
> > 	pci_set_master
> 
> You are right.
> 
> Can you try the attached patch against 2.6.28-rc6 and tell if it makes a
> difference or not ?
> 
> While I did not test it in a BIOS network boot configuration it did not
> crash trivially with these devices:
> RTL8168b/8111b / XID 38000000
> RTL8110s       / XID 04000000
> 
> Note to others: this patch needs testing with different chipsets (XID)
> before being included upstream.

FWIW, on one of two very similar motherboards I'm seeing the hard hangs
from 8169 with netboot enabled, but that smells like a hardware problem;
that crap got more and more frequent until I had to disconnect the
interface.  Other symptoms: it kept trying to renegotiate the link every
few seconds.  Hang used to happen on the first incoming packet after
boot, _but_ that didn't happen on each boot.  IIRC, what finally got me
to call it quits was near 100% frequency of buggered boots *and* a hang
during downloading the kernel.  I'm not entirely sure about the last
part, though - will retest once I get that box free for experiments.

Motherboards might be actually identical; r8169 *are*, according to lspci.
Device in question is
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
        Subsystem: AOPEN Inc. AK86-L motherboard
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 23
        I/O ports at b000 [size=256]
        Memory at ed000000 (32-bit, non-prefetchable) [size=256]
        [virtual] Expansion ROM at 70100000 [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-11-29 21:06   ` Al Viro
@ 2008-11-29 21:28     ` Francois Romieu
  2008-11-29 22:47       ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Francois Romieu @ 2008-11-29 21:28 UTC (permalink / raw)
  To: Al Viro; +Cc: Alan Cox, netdev

Al Viro <viro@ZenIV.linux.org.uk> :
[...]
> Other symptoms: it kept trying to renegotiate the link every few seconds.

[...]
> Hang used to happen on the first incoming packet after boot, _but_ that
> didn't happen on each boot.

Ok, it is consistent with a dma at (more or less) random location.

Thanks for the report Al.

Can you send the "XID" line printed by the driver on recent kernels so
that I can figure the specific version of the chipset / PHY ?

Do not hurry, I am out for saturday's night.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-11-29 21:28     ` Francois Romieu
@ 2008-11-29 22:47       ` Al Viro
  0 siblings, 0 replies; 12+ messages in thread
From: Al Viro @ 2008-11-29 22:47 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Alan Cox, netdev

On Sat, Nov 29, 2008 at 10:28:30PM +0100, Francois Romieu wrote:
> Al Viro <viro@ZenIV.linux.org.uk> :
> [...]
> > Other symptoms: it kept trying to renegotiate the link every few seconds.
> 
> [...]
> > Hang used to happen on the first incoming packet after boot, _but_ that
> > didn't happen on each boot.
> 
> Ok, it is consistent with a dma at (more or less) random location.
> 
> Thanks for the report Al.
> 
> Can you send the "XID" line printed by the driver on recent kernels so
> that I can figure the specific version of the chipset / PHY ?

Interesting...

Working one -
eth0: RTL8110s at 0xffffc200008b6000, 00:01:80:50:a2:f1, XID 04000000 IRQ 23
B0rken -
eth0: RTL8169s at 0xffffc2000017c000, 00:01:80:4d:0b:3b, XID 00800000 IRQ 23

Note: that's from the logs - I won't get around to testing that stuff until
tomorrow afternoon.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-11-24 21:57 ` David Miller
@ 2008-12-04 17:22   ` Michael Brown
  2008-12-04 18:15     ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Brown @ 2008-12-04 17:22 UTC (permalink / raw)
  To: David Miller; +Cc: alan, netdev

On Monday 24 November 2008 21:57:10 David Miller wrote:
> Yes, a lot of drivers will enable bus mastering before resetting
> the chip.
>
> The basic assumption is that the chip is quiescent at driver load
> time.
>
> Since switching around this order across the board is too
> gigantic a project, I would suggest just handling things on
> a case-by-case basis where we know the BIOS or firmware leave
> the chip in a crud state like this.

The assumption that the chip is quiescent is invalid in the case of any kind 
of boot from SAN (e.g. iSCSI, AoE) via the net device.  The INT13-based 
bootloader has no way to signal to the boot firmware that it is finished 
using the INT13 interface, so the card will always be left in an active 
state.

In gPXE, we do what we can to ensure that the card is safe to use when the OS 
loads; we edit the RX buffers, ISR, etc. out of the system memory map prior 
to starting an iSCSI boot.  We don't, however, get a chance to actually 
quiesce the chip before the OS driver loads up, so the OS driver must be 
prepared to discover the chip in an active state.

Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-12-04 17:22   ` Michael Brown
@ 2008-12-04 18:15     ` David Miller
  2008-12-04 20:45       ` Michael Brown
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2008-12-04 18:15 UTC (permalink / raw)
  To: mbrown; +Cc: alan, netdev

From: Michael Brown <mbrown@fensystems.co.uk>
Date: Thu, 4 Dec 2008 17:22:26 +0000

> On Monday 24 November 2008 21:57:10 David Miller wrote:
> > Yes, a lot of drivers will enable bus mastering before resetting
> > the chip.
> >
> > The basic assumption is that the chip is quiescent at driver load
> > time.
> >
> > Since switching around this order across the board is too
> > gigantic a project, I would suggest just handling things on
> > a case-by-case basis where we know the BIOS or firmware leave
> > the chip in a crud state like this.
> 
> The assumption that the chip is quiescent is invalid in the case of any kind 
> of boot from SAN (e.g. iSCSI, AoE) via the net device.  The INT13-based 
> bootloader has no way to signal to the boot firmware that it is finished 
> using the INT13 interface, so the card will always be left in an active 
> state.

So there is no "close" method for the boot loader to call?
Who designs this crud? :-(

> In gPXE, we do what we can to ensure that the card is safe to use when the OS 
> loads; we edit the RX buffers, ISR, etc. out of the system memory map prior 
> to starting an iSCSI boot.  We don't, however, get a chance to actually 
> quiesce the chip before the OS driver loads up, so the OS driver must be 
> prepared to discover the chip in an active state.

It's really unfortunate that things have been setup so poorly.

So OK, we have to handle this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-12-04 18:15     ` David Miller
@ 2008-12-04 20:45       ` Michael Brown
  2008-12-04 20:59         ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Brown @ 2008-12-04 20:45 UTC (permalink / raw)
  To: David Miller; +Cc: alan, netdev

On Thursday 04 December 2008 18:15:03 David Miller wrote:
> > The assumption that the chip is quiescent is invalid in the case of any
> > kind of boot from SAN (e.g. iSCSI, AoE) via the net device.  The
> > INT13-based bootloader has no way to signal to the boot firmware that it
> > is finished using the INT13 interface, so the card will always be left in
> > an active state.
>
> So there is no "close" method for the boot loader to call?
> Who designs this crud? :-(

I believe that would be IBM, circa 1980.  Pity they didn't consider the needs 
of iSCSI boot in a protected-mode OS.

For SAN boot, the network boot loader (e.g. gPXE) emulates a BIOS disk using 
INT 13, and the next-stage boot loader (e.g. lilo/grub) believes that it is 
operating a physical disk; it doesn't even know that there's a NIC involved 
that may need to be shut down.

> > In gPXE, we do what we can to ensure that the card is safe to use when
> > the OS loads; we edit the RX buffers, ISR, etc. out of the system memory
> > map prior to starting an iSCSI boot.  We don't, however, get a chance to
> > actually quiesce the chip before the OS driver loads up, so the OS driver
> > must be prepared to discover the chip in an active state.
>
> It's really unfortunate that things have been setup so poorly.
>
> So OK, we have to handle this.

Agreed.  From our point of view, we will guarantee that the card is left in a 
state that is "active but harmless"; if the OS never touches the card then 
nothing bad will happen.  The driver should, as its first action, reset 
everything except the chip's PCI core.  (Some chips have only the facility to 
reset everything including the PCI core; I've seen drivers that back up PCI 
config space prior to reset and restore it afterwards, which seems to work.)

Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-12-04 20:45       ` Michael Brown
@ 2008-12-04 20:59         ` David Miller
  2008-12-04 21:44           ` Alan Cox
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2008-12-04 20:59 UTC (permalink / raw)
  To: mbrown; +Cc: alan, netdev

From: Michael Brown <mbrown@fensystems.co.uk>
Date: Thu, 4 Dec 2008 20:45:00 +0000

> On Thursday 04 December 2008 18:15:03 David Miller wrote:
> > > The assumption that the chip is quiescent is invalid in the case of any
> > > kind of boot from SAN (e.g. iSCSI, AoE) via the net device.  The
> > > INT13-based bootloader has no way to signal to the boot firmware that it
> > > is finished using the INT13 interface, so the card will always be left in
> > > an active state.
> >
> > So there is no "close" method for the boot loader to call?
> > Who designs this crud? :-(
> 
> I believe that would be IBM, circa 1980.  Pity they didn't consider the needs 
> of iSCSI boot in a protected-mode OS.

So we started with crap....

> For SAN boot, the network boot loader (e.g. gPXE) emulates a BIOS disk using 
> INT 13, and the next-stage boot loader (e.g. lilo/grub) believes that it is 
> operating a physical disk; it doesn't even know that there's a NIC involved 
> that may need to be shut down.

...and instead of adding the necessary facilities, things got built on
top of that crap.

So instead of having a real usable solution propagated widely within
a few years, we'll instead still be stuck with this stuff.

It rediculious to blame IBM for this, don't you think? :)  When
interfaces become outdated by technology, you make new ones.

> From our point of view, we will guarantee that the card is left in a
> state that is "active but harmless"; if the OS never touches the
> card then nothing bad will happen.  The driver should, as its first
> action, reset everything except the chip's PCI core.  (Some chips
> have only the facility to reset everything including the PCI core;
> I've seen drivers that back up PCI config space prior to reset and
> restore it afterwards, which seems to work.)

Yes, but don't expect this to be handled properly across the board in
any significant set of drivers any time soon.  Just about every one
I checked turns on bus mastering before doing anything else.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-12-04 20:59         ` David Miller
@ 2008-12-04 21:44           ` Alan Cox
  2008-12-04 21:55             ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Cox @ 2008-12-04 21:44 UTC (permalink / raw)
  To: David Miller; +Cc: mbrown, netdev

On Thu, 04 Dec 2008 12:59:43 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> So we started with crap....
> ...and instead of adding the necessary facilities, things got built on
> top of that crap.
> So instead of having a real usable solution propagated widely within
> a few years, we'll instead still be stuck with this stuff.

You have achieved enlightement in all things PC compatible.

> It rediculious to blame IBM for this, don't you think? :)  When
> interfaces become outdated by technology, you make new ones.

Actually I believe PXE was Intel ;) (its pronounced 'poxy' for a reason
but it is better than what went before)

Alan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Realtek 8169 problems with net booting
  2008-12-04 21:44           ` Alan Cox
@ 2008-12-04 21:55             ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2008-12-04 21:55 UTC (permalink / raw)
  To: alan; +Cc: mbrown, netdev

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Thu, 4 Dec 2008 21:44:20 +0000

> On Thu, 04 Dec 2008 12:59:43 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
> 
> > So we started with crap....
> > ...and instead of adding the necessary facilities, things got built on
> > top of that crap.
> > So instead of having a real usable solution propagated widely within
> > a few years, we'll instead still be stuck with this stuff.
> 
> You have achieved enlightement in all things PC compatible.

I understand PC compatability (I hope). :)

But often there is a reasonable migration path to something
saner created, when necessary.  That's not what happened
here.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-12-04 21:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-24 18:14 Realtek 8169 problems with net booting Alan Cox
2008-11-24 21:57 ` David Miller
2008-12-04 17:22   ` Michael Brown
2008-12-04 18:15     ` David Miller
2008-12-04 20:45       ` Michael Brown
2008-12-04 20:59         ` David Miller
2008-12-04 21:44           ` Alan Cox
2008-12-04 21:55             ` David Miller
2008-11-29 20:44 ` Francois Romieu
2008-11-29 21:06   ` Al Viro
2008-11-29 21:28     ` Francois Romieu
2008-11-29 22:47       ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).