public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* sk98lin for 2.6.23-rc1
@ 2007-07-26 15:16 Kyle Rose
  2007-07-26 16:28 ` Jan Engelhardt
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Kyle Rose @ 2007-07-26 15:16 UTC (permalink / raw)
  To: linux-kernel

>From http://www.krose.org/~krose/computing.html:

Since the sky2 driver continues to suck ass (which is a technical
description for "it hangs all the time under load, at least on my
hardware" :-) ), I've fixed the sk98lin driver to compile for
linux-2.6.23-rc1. Those who continue to have problems with sky2 can
still use 2.6.23-rc1, simply by doing the following:

   1.

      Make sure you have the headers for your kernel properly installed
      and linked to /usr/src/linux-$KVER.

   2.

      Download the sk98lin source from Marvell's site
      <http://www.marvell.com/drivers/search.do>.

   3.

      Untar the driver and run the install.sh according to the
      directions. It will fail.

   4.

      Look in /tmp for a directory called Sk98something. Go to
      http://www.krose.org/~krose/projects/sk98lin/ and copy the
      Makefile <http://www.krose.org/%7Ekrose/projects/sk98lin/Makefile>
      and sky2.c <http://www.krose.org/%7Ekrose/projects/sk98lin/sky2.c>
      into /tmp/Sk98something/all.

   5.

      Change into /tmp/Sk98something/all and execute:

          sudo -H make -C /usr/src/linux-$KVER M=`pwd` modules
          sudo -H make -C /usr/src/linux-$KVER M=`pwd` modules_install

   6.

      Blacklist sky2 in /etc/modprobe.d/blacklist, and (maybe not
      necessary) manually load sk98lin in /etc/modules.

There. You're done. Stable networking at last... er, again.

Unfortunately, you lose the nicest differential feature of
sky2---WOL---but that's a small price to pay for networking stability of
a desktop machine. It's nice to be able to watch MythTV again without
having to sudo bash -c 'ifdown eth0; rmmod sky2; modprobe sky2; ifup
eth0' every few minutes.


Personally, I'd like to see sk98lin remain in the kernel proper until
sky2 goes at least 6 months without reported problems.  The fact that I
am not the only one still seeing issues is a clear indication that sky2
(even with the recent patches in 2.6.23-rc1) is not yet ready to replace
sk98lin.

I'm happy to help debug the remaining issues with sky2, Stephen; just
let me know what information you need.

Kyle


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 15:16 sk98lin for 2.6.23-rc1 Kyle Rose
@ 2007-07-26 16:28 ` Jan Engelhardt
  2007-07-26 16:30   ` Kyle Rose
  2007-07-26 16:57 ` Adrian Bunk
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Jan Engelhardt @ 2007-07-26 16:28 UTC (permalink / raw)
  To: Kyle Rose; +Cc: linux-kernel


On Jul 26 2007 11:16, Kyle Rose wrote:
>
>   1.
>
>      Make sure you have the headers for your kernel properly installed
>      and linked to /usr/src/linux-$KVER.

Why is this a requirement? Makefile not properly done?

>   4.
>
>      Look in /tmp for a directory called Sk98something. Go to

Why /tmp? If untarred (with default options) in ~, it's in ~/tmp.

>   5.
>
>      Change into /tmp/Sk98something/all and execute:
>
>          sudo -H make -C /usr/src/linux-$KVER M=`pwd` modules
>          sudo -H make -C /usr/src/linux-$KVER M=`pwd` modules_install

This breaks with O= builds. See (1).


Sorry for the nitpick, it can be done easier :)




	Jan
-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 16:28 ` Jan Engelhardt
@ 2007-07-26 16:30   ` Kyle Rose
  2007-07-26 16:41     ` Jan Engelhardt
  0 siblings, 1 reply; 35+ messages in thread
From: Kyle Rose @ 2007-07-26 16:30 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Kyle Rose, linux-kernel


> Sorry for the nitpick, it can be done easier :)

I'm sure it can.  I didn't want to have to figure out the kernel build
system just to get this one driver working.  Hence my desire for it to
remain in the kernel proper until sky2 utterly works. ;-)

Kyle


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 16:30   ` Kyle Rose
@ 2007-07-26 16:41     ` Jan Engelhardt
  2007-07-27  1:07       ` Kyle Rose
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Engelhardt @ 2007-07-26 16:41 UTC (permalink / raw)
  To: Kyle Rose; +Cc: Kyle Rose, linux-kernel


On Jul 26 2007 12:30, Kyle Rose wrote:
>> Sorry for the nitpick, it can be done easier :)
>
>I'm sure it can.  I didn't want to have to figure out the kernel build
>system just to get this one driver working.  Hence my desire for it to
>remain in the kernel proper until sky2 utterly works. ;-)

Oh it's really easy, have a look at
https://dev.computergmbh.de/svn/misc_kernel/oopser/trunk/Makefile


	Jan
-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 15:16 sk98lin for 2.6.23-rc1 Kyle Rose
  2007-07-26 16:28 ` Jan Engelhardt
@ 2007-07-26 16:57 ` Adrian Bunk
  2007-07-26 22:58   ` Chris Stromsoe
                     ` (2 more replies)
  2007-07-26 19:17 ` Stephen Hemminger
  2007-07-26 23:52 ` Bill Davidsen
  3 siblings, 3 replies; 35+ messages in thread
From: Adrian Bunk @ 2007-07-26 16:57 UTC (permalink / raw)
  To: Kyle Rose; +Cc: linux-kernel

On Thu, Jul 26, 2007 at 11:16:36AM -0400, Kyle Rose wrote:
> >From http://www.krose.org/~krose/computing.html:
> 
> Since the sky2 driver continues to suck ass (which is a technical
> description for "it hangs all the time under load, at least on my
> hardware" :-) ), I've fixed the sk98lin driver to compile for
> linux-2.6.23-rc1. Those who continue to have problems with sky2 can
> still use 2.6.23-rc1, simply by doing the following:
>...
> Personally, I'd like to see sk98lin remain in the kernel proper until
> sky2 goes at least 6 months without reported problems.  The fact that I
> am not the only one still seeing issues is a clear indication that sky2
> (even with the recent patches in 2.6.23-rc1) is not yet ready to replace
> sk98lin.
>...

This sounds good in theory.

The practical problem with this approach is that there are always many 
people who use the old driver when the new driver doesn't work for them 
instead of reporting their problems with the new driver.

For these people a new driver will often suck when the old driver gets 
removed, but after the removal of the old driver they are finally forced 
to report their bugs resulting in a better new driver for everyone.

The sky2 driver is since nearly 2 years in the kernel and Stephen is 
usually quite good at handling bugs.

> Kyle

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 15:16 sk98lin for 2.6.23-rc1 Kyle Rose
  2007-07-26 16:28 ` Jan Engelhardt
  2007-07-26 16:57 ` Adrian Bunk
@ 2007-07-26 19:17 ` Stephen Hemminger
  2007-07-26 23:52 ` Bill Davidsen
  3 siblings, 0 replies; 35+ messages in thread
From: Stephen Hemminger @ 2007-07-26 19:17 UTC (permalink / raw)
  To: linux-kernel

On Thu, 26 Jul 2007 11:16:36 -0400
Kyle Rose <krose@akamai.com> wrote:

> From http://www.krose.org/~krose/computing.html:
> 
> Since the sky2 driver continues to suck ass (which is a technical
> description for "it hangs all the time under load, at least on my
> hardware" :-) ), I've fixed the sk98lin driver to compile for
> linux-2.6.23-rc1. Those who continue to have problems with sky2 can
> still use 2.6.23-rc1, simply by doing the following:
>

Just don't build it with lock debugging enabled or you will see all the
deadlocks lying below the surface.  Worse yet, read the macro hell
of sky2le.h


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 16:57 ` Adrian Bunk
@ 2007-07-26 22:58   ` Chris Stromsoe
  2007-07-26 23:38   ` Bill Davidsen
  2007-07-30  3:01   ` Rob Sims
  2 siblings, 0 replies; 35+ messages in thread
From: Chris Stromsoe @ 2007-07-26 22:58 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Kyle Rose, linux-kernel

On Thu, 26 Jul 2007, Adrian Bunk wrote:
> On Thu, Jul 26, 2007 at 11:16:36AM -0400, Kyle Rose wrote:
>>> From http://www.krose.org/~krose/computing.html:
>>
>> Since the sky2 driver continues to suck ass (which is a technical 
>> description for "it hangs all the time under load, at least on my 
>> hardware" :-) ), I've fixed the sk98lin driver to compile for 
>> linux-2.6.23-rc1. Those who continue to have problems with sky2 can 
>> still use 2.6.23-rc1, simply by doing the following: ... Personally, 
>> I'd like to see sk98lin remain in the kernel proper until sky2 goes at 
>> least 6 months without reported problems.  The fact that I am not the 
>> only one still seeing issues is a clear indication that sky2 (even with 
>> the recent patches in 2.6.23-rc1) is not yet ready to replace sk98lin. 
>> ...
>
> This sounds good in theory.
>
> The practical problem with this approach is that there are always many 
> people who use the old driver when the new driver doesn't work for them 
> instead of reporting their problems with the new driver.

I have a number of SK-9844 "SK-NET GE-SX dual link" cards.  skge has never 
worked with the cards.

The following sequence locks up the machine completely (power cycle to get 
it back) with 2.6.22.1:

fresno:~# modprobe skge
fresno:~# ip li set eth2 up
fresno:~# ip li set eth2 down
fresno:~# ip li set eth3 up

This works just fine:

fresno:~# rmmod skge
fresno:~# modprobe sk98lin RlmtMode=DualNet
fresno:~# ip li set eth2 up
fresno:~# ip li set eth2 down
fresno:~# ip li set eth3 up
fresno:~# ip li set eth3 down


eth2 and eth3 are ports off the sk-9844.

I've been reporting the problem since March.  If sk98lin is removed, I 
won't have networking.



-Chris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 16:57 ` Adrian Bunk
  2007-07-26 22:58   ` Chris Stromsoe
@ 2007-07-26 23:38   ` Bill Davidsen
  2007-07-26 23:41     ` Jeff Garzik
  2007-07-30  3:01   ` Rob Sims
  2 siblings, 1 reply; 35+ messages in thread
From: Bill Davidsen @ 2007-07-26 23:38 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Kyle Rose, linux-kernel

Adrian Bunk wrote:
> On Thu, Jul 26, 2007 at 11:16:36AM -0400, Kyle Rose wrote:
>> >From http://www.krose.org/~krose/computing.html:
>>
>> Since the sky2 driver continues to suck ass (which is a technical
>> description for "it hangs all the time under load, at least on my
>> hardware" :-) ), I've fixed the sk98lin driver to compile for
>> linux-2.6.23-rc1. Those who continue to have problems with sky2 can
>> still use 2.6.23-rc1, simply by doing the following:
>> ...
>> Personally, I'd like to see sk98lin remain in the kernel proper until
>> sky2 goes at least 6 months without reported problems.  The fact that I
>> am not the only one still seeing issues is a clear indication that sky2
>> (even with the recent patches in 2.6.23-rc1) is not yet ready to replace
>> sk98lin.
>> ...
> 
> This sounds good in theory.
> 
> The practical problem with this approach is that there are always many 
> people who use the old driver when the new driver doesn't work for them 
> instead of reporting their problems with the new driver.
> 
Yes, you've grasped the reason for leaving the old driver in, so people 
can use their computers. Because when there is a new driver for 
previously unsupported hardware people will be glad to put time into 
debugging it to make the hardware useful. But when you take out a 
working driver because you (ie. the responsible developer) have a new 
idea which interests you, users don't want to use it because they have 
something which works, so you take out the working driver to make work 
for the users and create what you call a "better new driver" below.

The old driver wasn't requiring any resources to maintain, the old 
hardware wasn't changing, there was no particular benefit to users in 
breaking their configuration. This disregard for the users just gives 
Linux critics an arguing point, "the next new kernel may withdraw 
support for your hardware." Isn't that why 2.6.16 is still being 
maintained? Nobody (sane) expects new drivers to be perfect, they just 
don't expect the working drivers to be disabled.

> For these people a new driver will often suck when the old driver gets 
> removed, but after the removal of the old driver they are finally forced 
> to report their bugs resulting in a better new driver for everyone.
> 
"Better" is a very subjective thing, you see elegance of design perhaps, 
I see works or not, and when I have to use statistical methods to see 
latency or CPU overhead benefits, I frankly don't care.

Removing a working driver without a fully functional replacement forces 
people to stop upgrading their kernel, or start maintaining old drivers 
out of line. Problems of the "just occasionally goes away" type can take 
months to debug, the load can't be duplicated in most cases, and there's 
no log or oops data to help.


> The sky2 driver is since nearly 2 years in the kernel and Stephen is 
> usually quite good at handling bugs.
> 
Where does sky2 come in? Does this mean the the recent suggestion to 
"just change to skge and stop complaining" is also wrong?

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 23:38   ` Bill Davidsen
@ 2007-07-26 23:41     ` Jeff Garzik
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff Garzik @ 2007-07-26 23:41 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Adrian Bunk, Kyle Rose, linux-kernel

Bill Davidsen wrote:
> The old driver wasn't requiring any resources to maintain, the old 

This statement proves you don't know anything at all about the situation.

	Jeff



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 15:16 sk98lin for 2.6.23-rc1 Kyle Rose
                   ` (2 preceding siblings ...)
  2007-07-26 19:17 ` Stephen Hemminger
@ 2007-07-26 23:52 ` Bill Davidsen
  2007-07-27  1:13   ` Kyle Rose
  3 siblings, 1 reply; 35+ messages in thread
From: Bill Davidsen @ 2007-07-26 23:52 UTC (permalink / raw)
  To: Kyle Rose; +Cc: linux-kernel

Kyle Rose wrote:
> From http://www.krose.org/~krose/computing.html:
> 
> Since the sky2 driver continues to suck ass (which is a technical
> description for "it hangs all the time under load, at least on my
> hardware" :-) ), I've fixed the sk98lin driver to compile for
> linux-2.6.23-rc1. Those who continue to have problems with sky2 can
> still use 2.6.23-rc1, simply by doing the following:
> 
Bless you, extends my update capability for another version. ;-)

However, Ingo posted a patch for the thread "network dies after random 
time" which probably didn't make it into rc1. In all fairness applying 
that might fix the problem, it's possible if unlikely that the new 
driver tickles a bug the stable sk98lin driver didn't.

Does skge work for your hardware? Based on a sample size of one (four to 
go) everything worked for me except NFS, jumbo packets work with tcp, 
not with udp. I don't have everything nailed down enough for a proper 
bug report, it's just something to note. In truth there's little to 
choose between tcp and udp for machines in the same room, I could live 
with skge.

haven't tried shy2, there was a build failure on my last server build, 
won't look at it until Monday.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 16:41     ` Jan Engelhardt
@ 2007-07-27  1:07       ` Kyle Rose
  0 siblings, 0 replies; 35+ messages in thread
From: Kyle Rose @ 2007-07-27  1:07 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Kyle Rose, linux-kernel

Thanks for the pointer.  I've done this, and created an actual kernel
module tarball that is now available at
http://www.krose.org/~krose/projects/sk98lin/sk98lin.tar.gz.

Thanks,
Kyle


Jan Engelhardt wrote:
> On Jul 26 2007 12:30, Kyle Rose wrote:
>>> Sorry for the nitpick, it can be done easier :)
>> I'm sure it can.  I didn't want to have to figure out the kernel build
>> system just to get this one driver working.  Hence my desire for it to
>> remain in the kernel proper until sky2 utterly works. ;-)
> 
> Oh it's really easy, have a look at
> https://dev.computergmbh.de/svn/misc_kernel/oopser/trunk/Makefile
> 
> 
> 	Jan


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 23:52 ` Bill Davidsen
@ 2007-07-27  1:13   ` Kyle Rose
  0 siblings, 0 replies; 35+ messages in thread
From: Kyle Rose @ 2007-07-27  1:13 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel

> Does skge work for your hardware?

I unloaded sky2 and loaded skge at one point, but it didn't recognize my
hardware.  Perhaps it doesn't work with the 88E8053?

Kyle

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-26 16:57 ` Adrian Bunk
  2007-07-26 22:58   ` Chris Stromsoe
  2007-07-26 23:38   ` Bill Davidsen
@ 2007-07-30  3:01   ` Rob Sims
  2007-09-05  9:22     ` Stephen Hemminger
  2 siblings, 1 reply; 35+ messages in thread
From: Rob Sims @ 2007-07-30  3:01 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Kyle Rose, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2307 bytes --]

On Thu, Jul 26, 2007 at 06:57:01PM +0200, Adrian Bunk wrote:
> On Thu, Jul 26, 2007 at 11:16:36AM -0400, Kyle Rose wrote:
> > >From http://www.krose.org/~krose/computing.html:
> > 
> > Since the sky2 driver continues to suck ass (which is a technical
> > description for "it hangs all the time under load, at least on my
> > hardware" :-) ), I've fixed the sk98lin driver to compile for
> > linux-2.6.23-rc1. Those who continue to have problems with sky2 can
> > still use 2.6.23-rc1, simply by doing the following:
> >...
> > Personally, I'd like to see sk98lin remain in the kernel proper until
> > sky2 goes at least 6 months without reported problems.  The fact that I
> > am not the only one still seeing issues is a clear indication that sky2
> > (even with the recent patches in 2.6.23-rc1) is not yet ready to replace
> > sk98lin.
> >...
> 
> This sounds good in theory.
> 
> The practical problem with this approach is that there are always many 
> people who use the old driver when the new driver doesn't work for them 
> instead of reporting their problems with the new driver.
> 
> For these people a new driver will often suck when the old driver gets 
> removed, but after the removal of the old driver they are finally forced 
> to report their bugs resulting in a better new driver for everyone.
> 
> The sky2 driver is since nearly 2 years in the kernel and Stephen is 
> usually quite good at handling bugs.

The driver still (2.6.20/sky2 1.13) hangs for me (more rarely than in
the past), and cycling the module generally fixes the issues.  I have
supplied all the information that Stephen has asked for, but still no
resolution.  I am not complaining about the lack of a fix, but don't
assume that all it takes to get sky2 working is adequate bug reports.  I
have been and remain willing to test and assist debug, but after several
dropped threads, I feel like the desire or ability to fix this issue
isn't there (and remote debug of an intermittent hardware issue IS
hard), and I didn't want to be a nuisance to someone that has no
obligation to me to address the issue in the first place.

Stability has improved, it's just not there yet.

I'll switch to 1.16 soon, and respond to Stephen's request on netdev for
current issues.
-- 
Rob

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-07-30  3:01   ` Rob Sims
@ 2007-09-05  9:22     ` Stephen Hemminger
  2007-09-05 19:42       ` James Corey
  2007-09-12 16:46       ` Torsten Kaiser
  0 siblings, 2 replies; 35+ messages in thread
From: Stephen Hemminger @ 2007-09-05  9:22 UTC (permalink / raw)
  To: Rob Sims; +Cc: Adrian Bunk, Kyle Rose, linux-kernel

On Sun, 29 Jul 2007 21:01:30 -0600
Rob Sims <lkml-z@robsims.com> wrote:

> On Thu, Jul 26, 2007 at 06:57:01PM +0200, Adrian Bunk wrote:
> > On Thu, Jul 26, 2007 at 11:16:36AM -0400, Kyle Rose wrote:
> > > >From http://www.krose.org/~krose/computing.html:
> > > 
> > > Since the sky2 driver continues to suck ass (which is a technical
> > > description for "it hangs all the time under load, at least on my
> > > hardware" :-) ), I've fixed the sk98lin driver to compile for
> > > linux-2.6.23-rc1. Those who continue to have problems with sky2 can
> > > still use 2.6.23-rc1, simply by doing the following:
> > >...
> > > Personally, I'd like to see sk98lin remain in the kernel proper until
> > > sky2 goes at least 6 months without reported problems.  The fact that I
> > > am not the only one still seeing issues is a clear indication that sky2
> > > (even with the recent patches in 2.6.23-rc1) is not yet ready to replace
> > > sk98lin.
> > >...
> > 
> > This sounds good in theory.
> > 
> > The practical problem with this approach is that there are always many 
> > people who use the old driver when the new driver doesn't work for them 
> > instead of reporting their problems with the new driver.
> > 
> > For these people a new driver will often suck when the old driver gets 
> > removed, but after the removal of the old driver they are finally forced 
> > to report their bugs resulting in a better new driver for everyone.
> > 
> > The sky2 driver is since nearly 2 years in the kernel and Stephen is 
> > usually quite good at handling bugs.
> 
> The driver still (2.6.20/sky2 1.13) hangs for me (more rarely than in
> the past), and cycling the module generally fixes the issues.  I have
> supplied all the information that Stephen has asked for, but still no
> resolution.  I am not complaining about the lack of a fix, but don't
> assume that all it takes to get sky2 working is adequate bug reports.  I
> have been and remain willing to test and assist debug, but after several
> dropped threads, I feel like the desire or ability to fix this issue
> isn't there (and remote debug of an intermittent hardware issue IS
> hard), and I didn't want to be a nuisance to someone that has no
> obligation to me to address the issue in the first place.
> 
> Stability has improved, it's just not there yet.
> 
> I'll switch to 1.16 soon, and respond to Stephen's request on netdev for
> current issues.
> -- 
> Rob

The only known outstanding problems on 2.62.22.6 of sky2 are:
 * problems with fibre PHY based systems
 * suspend/resume issues, missing multicast reinitalization, etc.
The previous stability problems have been addressed.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-05  9:22     ` Stephen Hemminger
@ 2007-09-05 19:42       ` James Corey
  2007-09-05 21:04         ` Kyle Rose
  2007-09-08 17:44         ` Bill Davidsen
  2007-09-12 16:46       ` Torsten Kaiser
  1 sibling, 2 replies; 35+ messages in thread
From: James Corey @ 2007-09-05 19:42 UTC (permalink / raw)
  To: Stephen Hemminger, Rob Sims; +Cc: Adrian Bunk, Kyle Rose, linux-kernel


--- Stephen Hemminger
<shemminger@linux-foundation.org> wrote:

> On Sun, 29 Jul 2007 21:01:30 -0600
> Rob Sims <lkml-z@robsims.com> wrote:
> 
> > On Thu, Jul 26, 2007 at 06:57:01PM +0200, Adrian
> Bunk wrote:
> > > On Thu, Jul 26, 2007 at 11:16:36AM -0400, Kyle
> Rose wrote:
> > > > >From
> http://www.krose.org/~krose/computing.html:
> > > > 
> > > > Since the sky2 driver continues to suck ass
> (which is a technical
> > > > description for "it hangs all the time under
> load, at least on my
> > > > hardware" :-) ), I've fixed the sk98lin driver
> to compile for
> > > > linux-2.6.23-rc1. Those who continue to have
> problems with sky2 can
> > > > still use 2.6.23-rc1, simply by doing the
> following:
> > > >...
> > > > Personally, I'd like to see sk98lin remain in
> the kernel proper until
> > > > sky2 goes at least 6 months without reported
> problems.  The fact that I
> > > > am not the only one still seeing issues is a
> clear indication that sky2
> > > > (even with the recent patches in 2.6.23-rc1)
> is not yet ready to replace
> > > > sk98lin.
> > > >...
> > > 
> > > This sounds good in theory.
> > > 
> > > The practical problem with this approach is that
> there are always many 
> > > people who use the old driver when the new
> driver doesn't work for them 
> > > instead of reporting their problems with the new
> driver.
> > > 
> > > For these people a new driver will often suck
> when the old driver gets 
> > > removed, but after the removal of the old driver
> they are finally forced 
> > > to report their bugs resulting in a better new
> driver for everyone.
> > > 
> > > The sky2 driver is since nearly 2 years in the
> kernel and Stephen is 
> > > usually quite good at handling bugs.
> > 
> > The driver still (2.6.20/sky2 1.13) hangs for me
> (more rarely than in
> > the past), and cycling the module generally fixes
> the issues.  I have
> > supplied all the information that Stephen has
> asked for, but still no
> > resolution.  I am not complaining about the lack
> of a fix, but don't
> > assume that all it takes to get sky2 working is
> adequate bug reports.  I
> > have been and remain willing to test and assist
> debug, but after several
> > dropped threads, I feel like the desire or ability
> to fix this issue
> > isn't there (and remote debug of an intermittent
> hardware issue IS
> > hard), and I didn't want to be a nuisance to
> someone that has no
> > obligation to me to address the issue in the first
> place.
> > 
> > Stability has improved, it's just not there yet.
> > 
> > I'll switch to 1.16 soon, and respond to Stephen's
> request on netdev for
> > current issues.
> > -- 
> > Rob
> 
> The only known outstanding problems on 2.62.22.6 of
> sky2 are:
>  * problems with fibre PHY based systems
>  * suspend/resume issues, missing multicast
> reinitalization, etc.
> The previous stability problems have been addressed.

I pretty much agree with everything said, including 
the part about the sky2 people working hard on it. I
have noticed several bugs fixed recently in the driver
source.

However, it really DOES lock up under load. I even 
tried 2.6.23-rc4 and the absolute latest version of
the
driver and it still locks up, as in

eth1: hw csum failure.

Call Trace:
 <IRQ>  [<ffffffff804779b6>]
__skb_checksum_complete_head+0x43/0x56
 [<ffffffff804779d5>] __skb_checksum_complete+0xc/0x11
 [<ffffffff804a989d>] tcp_v4_rcv+0x14e/0x801
 [<ffffffff8048ff84>] ip_local_deliver+0xca/0x14c
 [<ffffffff80490472>] ip_rcv+0x46c/0x4ae
 [<ffffffff88006138>] :sky2:sky2_poll+0x72b/0x9c7
 [<ffffffff80245979>] update_wall_time+0x28c/0x39b
 [<ffffffff8047c934>] net_rx_action+0xa8/0x166
 [<ffffffff8023901c>] do_timer+0x10/0xab
 [<ffffffff80235ced>] __do_softirq+0x55/0xc4
 [<ffffffff8020c5cc>] call_softirq+0x1c/0x28
 [<ffffffff8020d6fd>] do_softirq+0x2c/0x7d
 [<ffffffff8020d9bb>] do_IRQ+0x13e/0x15f
 [<ffffffff8020a780>] mwait_idle+0x0/0x48
 [<ffffffff8020b951>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff804acdb9>] udp_poll+0x0/0xfb
 [<ffffffff8020a7c2>] mwait_idle+0x42/0x48
 [<ffffffff8020a718>] cpu_idle+0xbd/0xe0
 [<ffffffff80704a5a>] start_kernel+0x2ac/0x2b8
 [<ffffffff80704140>] _sinittext+0x140/0x144

As far as I can tell, this bug has been with the
sky2 driver all the way back to the Beforetime.
Based on it happening with various versions of the
driver back to 2.6.18 that I have tried, plus some
googling on it.

So while I bug reporting point is a good one, it would
be nice to have a reliable driver in the kernel until
the sky2 one is better. The alternative is to use
the vendor driver, which less than optimal.

-J



      ____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-05 19:42       ` James Corey
@ 2007-09-05 21:04         ` Kyle Rose
  2007-09-05 23:00           ` Stephen Hemminger
  2007-09-08 17:44         ` Bill Davidsen
  1 sibling, 1 reply; 35+ messages in thread
From: Kyle Rose @ 2007-09-05 21:04 UTC (permalink / raw)
  To: James Corey; +Cc: Stephen Hemminger, Rob Sims, Adrian Bunk, linux-kernel


> However, it really DOES lock up under load. I even 
> tried 2.6.23-rc4 and the absolute latest version of
> the
> driver and it still locks up, as in
>   
Yich.  I'm glad I'm still using sk98lin on my unmanned colo box.

Kyle


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-05 21:04         ` Kyle Rose
@ 2007-09-05 23:00           ` Stephen Hemminger
  0 siblings, 0 replies; 35+ messages in thread
From: Stephen Hemminger @ 2007-09-05 23:00 UTC (permalink / raw)
  To: Kyle Rose; +Cc: James Corey, Rob Sims, Adrian Bunk, linux-kernel

On Wed, 05 Sep 2007 17:04:59 -0400
Kyle Rose <krose@krose.org> wrote:

> 
> > However, it really DOES lock up under load. I even 
> > tried 2.6.23-rc4 and the absolute latest version of
> > the
> > driver and it still locks up, as in
> >   
> Yich.  I'm glad I'm still using sk98lin on my unmanned colo box.
> 
> Kyle
> 

Great for you, when I was testing sk98lin crashed my machine on
overnight stress run. My intuition is that there is a bug in sk98lin
on Yukon EC-U chips (those without ram buffer) and a hardware
problem on Yukon XL chips (those with ram buffer) and the sky2
driver doesn't have workaround for getting the ram buffer stuck (yet).

I don't like putting workarounds in for problems I can't reproduce.
After KS, I'll rerun more stress tests on all the chip flavors
and see if the hang is reproducible.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-05 19:42       ` James Corey
  2007-09-05 21:04         ` Kyle Rose
@ 2007-09-08 17:44         ` Bill Davidsen
  2007-09-08 19:11           ` Adrian Bunk
  1 sibling, 1 reply; 35+ messages in thread
From: Bill Davidsen @ 2007-09-08 17:44 UTC (permalink / raw)
  To: James Corey
  Cc: Stephen Hemminger, Rob Sims, Adrian Bunk, Kyle Rose, linux-kernel

James Corey wrote:
> --- Stephen Hemminger
> <shemminger@linux-foundation.org> wrote:
> 
>> On Sun, 29 Jul 2007 21:01:30 -0600
>> Rob Sims <lkml-z@robsims.com> wrote:
>>
>>> On Thu, Jul 26, 2007 at 06:57:01PM +0200, Adrian
>> Bunk wrote:

>> The only known outstanding problems on 2.62.22.6 of
>> sky2 are:
>>  * problems with fibre PHY based systems
>>  * suspend/resume issues, missing multicast
>> reinitalization, etc.
>> The previous stability problems have been addressed.
> 
> I pretty much agree with everything said, including 
> the part about the sky2 people working hard on it. I
> have noticed several bugs fixed recently in the driver
> source.
> 
> However, it really DOES lock up under load. I even 
> tried 2.6.23-rc4 and the absolute latest version of
> the
> driver and it still locks up, as in
> 
> eth1: hw csum failure.
> 
I checnged from the sk98lin to the previous driver Adrian said was the 
"right one," skge IIRC. Then he started pushing sky2, and I tried that. 
Like you I get hangs, but unlike you the system doesn't hang, just the 
NIC. No errors, warnings, and reboot fixes it. Acts as if the cable were 
pulled.

That was with 2.6.22.5 (or so), dropped back to an old kernel with 
sk98lin, previously had uptimes in three digit days. Up for a week or so 
now.

Haven't tried later kernels, don't intend to, while no network is really 
secure, it not really useful.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-08 17:44         ` Bill Davidsen
@ 2007-09-08 19:11           ` Adrian Bunk
  2007-09-09  2:42             ` Kyle Rose
                               ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Adrian Bunk @ 2007-09-08 19:11 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: James Corey, Stephen Hemminger, Rob Sims, Kyle Rose, linux-kernel

On Sat, Sep 08, 2007 at 01:44:20PM -0400, Bill Davidsen wrote:
>...
> That was with 2.6.22.5 (or so), dropped back to an old kernel with sk98lin, 
> previously had uptimes in three digit days. Up for a week or so now.

There is a real long-term advantage of removing drivers like sk98lin 
because it forces people to report bugs if the new driver doesn't work  
instead of giving them the workaround of using the obsolete driver.    
And this has the (at first sight surprising) effect that removing code  
results in an improvement of the kernel.

> Haven't tried later kernels, don't intend to, while no network is really 
> secure, it not really useful.

You are a regular reader of linux-kernel, and therefore the sk98lin 
removal can hardly be a surprise for you. If you prefer whining over 
helping to improve the kernel that's your choice...

> Bill Davidsen

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-08 19:11           ` Adrian Bunk
@ 2007-09-09  2:42             ` Kyle Rose
  2007-09-09  4:48               ` Willy Tarreau
  2007-09-09 11:13               ` Adrian Bunk
  2007-09-09 12:54             ` Chris Stromsoe
  2007-09-10 14:32             ` Bill Davidsen
  2 siblings, 2 replies; 35+ messages in thread
From: Kyle Rose @ 2007-09-09  2:42 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Bill Davidsen, James Corey, Stephen Hemminger, Rob Sims,
	linux-kernel


> You are a regular reader of linux-kernel, and therefore the sk98lin 
> removal can hardly be a surprise for you. If you prefer whining over 
> helping to improve the kernel that's your choice...
>   
In my case the issue is simply one of practicality: I cannot go to the
data center 5 times per day to reboot my colo box.  Therefore, I run
sk98lin.  It's really that simple.

Kyle


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-09  2:42             ` Kyle Rose
@ 2007-09-09  4:48               ` Willy Tarreau
  2007-09-09 11:13               ` Adrian Bunk
  1 sibling, 0 replies; 35+ messages in thread
From: Willy Tarreau @ 2007-09-09  4:48 UTC (permalink / raw)
  To: Kyle Rose
  Cc: Adrian Bunk, Bill Davidsen, James Corey, Stephen Hemminger,
	Rob Sims, linux-kernel

On Sat, Sep 08, 2007 at 10:42:20PM -0400, Kyle Rose wrote:
> 
> > You are a regular reader of linux-kernel, and therefore the sk98lin 
> > removal can hardly be a surprise for you. If you prefer whining over 
> > helping to improve the kernel that's your choice...
> >   
> In my case the issue is simply one of practicality: I cannot go to the
> data center 5 times per day to reboot my colo box.  Therefore, I run
> sk98lin.  It's really that simple.

Adrian generally wants to force "normal" users to test new drivers in order
to quickly find bugs and fade out older ones. While this is often possible
on the desktop, it's not possible for production servers. And not everyone
can run 2.6.16.x to get a long-term stable kernel.

I think that what is really needed is to add the opposite of "experimental"
in the config options. Something like "deprecated drivers" which would be
disabled by default. Desktop users would normally not care about that and
rely only on newer drivers. Server users would have to enable the option if
they want their old driver to be present because they have no other choice.

With each driver's help text, it would be wise to add some text indicating
what will replace the driver in question, so that their users know how to
test it on non-production machines.

But I agree with Kyle that on production systems, it is not acceptable to
have a driver hang even once a month. This generally implies loss of service
and customers going away. Ideology has no place in this area, is is quickly
replaced by pragmatism.

It was the same reason I spent time trying to get sky2 to reliably work in
2.4 ; sk98lin v8 was horribly unstable. Sky2 was fairly better but did not
support some basic operations such as ifdown/ifup. sk98lin v10 finally worked
fine, and I upgraded my customer's system with it because I needed anything
which would reliably work. It was not acceptable anymore to have the customer
phone twice a week complaining that their server had crashed again.

In the long term, I would really like to get sky2 to work well in 2.4
because I'm more confident it in, it's cleaner, less obscure and less
bloated. Having passed terabytes of data through both drivers I have
not observed any glitch with sky2 as I had with sk98lin v8.

Fortunately, sky2 chips are mostly found on desktop motherboards, so that
helps the driver stabilize very quickly. It should not take as long as
the transition from eepro100 to e100.

Willy


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-09  2:42             ` Kyle Rose
  2007-09-09  4:48               ` Willy Tarreau
@ 2007-09-09 11:13               ` Adrian Bunk
  2007-09-11  8:05                 ` Stephen Hemminger
  1 sibling, 1 reply; 35+ messages in thread
From: Adrian Bunk @ 2007-09-09 11:13 UTC (permalink / raw)
  To: Kyle Rose
  Cc: Bill Davidsen, James Corey, Stephen Hemminger, Rob Sims,
	linux-kernel

On Sat, Sep 08, 2007 at 10:42:20PM -0400, Kyle Rose wrote:
> 
> > You are a regular reader of linux-kernel, and therefore the sk98lin 
> > removal can hardly be a surprise for you. If you prefer whining over 
> > helping to improve the kernel that's your choice...
> >   
> In my case the issue is simply one of practicality: I cannot go to the
> data center 5 times per day to reboot my colo box.  Therefore, I run
> sk98lin.  It's really that simple.

When did you report this bug the first time?

What we need is that people when testing a new kernel they plan to use 
test the new drivers *and report the bugs if they run into any*.

What could we have done so that you reported your bug without removing 
the sk98lin driver?

> Kyle

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-08 19:11           ` Adrian Bunk
  2007-09-09  2:42             ` Kyle Rose
@ 2007-09-09 12:54             ` Chris Stromsoe
  2007-11-06 22:23               ` Stephen Hemminger
  2007-09-10 14:32             ` Bill Davidsen
  2 siblings, 1 reply; 35+ messages in thread
From: Chris Stromsoe @ 2007-09-09 12:54 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Bill Davidsen, James Corey, Stephen Hemminger, Rob Sims,
	Kyle Rose, linux-kernel

On Sat, 8 Sep 2007, Adrian Bunk wrote:
> On Sat, Sep 08, 2007 at 01:44:20PM -0400, Bill Davidsen wrote:
>
>> Haven't tried later kernels, don't intend to, while no network is 
>> really secure, it not really useful.
>
> You are a regular reader of linux-kernel, and therefore the sk98lin 
> removal can hardly be a surprise for you. If you prefer whining over 
> helping to improve the kernel that's your choice...

I've been trying to migrate off sk98lin to skge since earlier this year, 
without success, starting with 2.6.18 or .19.

I have several of these cards in production using the sk98lin driver:

fresno:~# lspci -vv -s 02:01
02:01.0 Ethernet controller: SysKonnect SK-9872 Gigabit Ethernet Server Adapter (SK-NET GE-ZX dual link) (rev 11)
         Subsystem: SysKonnect SK-9844 Gigabit Ethernet Server Adapter (SK-NET GE-SX dual link)
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
         Latency: 64 (5750ns min, 7750ns max), Cache Line Size: 32 bytes
         Interrupt: pin A routed to IRQ 22
         Region 0: Memory at febfc000 (32-bit, non-prefetchable) [size=16K]
         Region 1: I/O ports at e800 [size=256]
         Expansion ROM at febc0000 [disabled] [size=128K]
         Capabilities: [48] Power Management version 1
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=1 PME-
         Capabilities: [50] Vital Product Data

They are dual port SX fiber.  Both ports are connected.  If I do this:

fresno:~# modprobe skge
fresno:~# ip li set eth2 up
fresno:~# ip li set eth2 down
fresno:~# ip li set eth3 up

the system locks up and I have to power cycle it.  The order doesn't 
matter (if I do eth3 up/down, then eth2 up kills it).

I don't have any problems with sk98lin.  This works fine:

fresno:~# modprobe sk98lin RlmtMode=DualNet
fresno:~# ip li set eth2 up
fresno:~# ip li set eth2 down
fresno:~# ip li set eth3 up
fresno:~# ip li set eth3 down


I am more than happy to test various driver changes, and have tried a few 
suggested patches but nothing has worked so far.  I would like to be using 
skge instead of sk98lin, but so far haven't had any success.




-Chris

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-08 19:11           ` Adrian Bunk
  2007-09-09  2:42             ` Kyle Rose
  2007-09-09 12:54             ` Chris Stromsoe
@ 2007-09-10 14:32             ` Bill Davidsen
  2007-09-10 15:39               ` Adrian Bunk
  2 siblings, 1 reply; 35+ messages in thread
From: Bill Davidsen @ 2007-09-10 14:32 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: James Corey, Stephen Hemminger, Rob Sims, Kyle Rose, linux-kernel

Adrian Bunk wrote:
> On Sat, Sep 08, 2007 at 01:44:20PM -0400, Bill Davidsen wrote:
>   
>> ...
>> That was with 2.6.22.5 (or so), dropped back to an old kernel with sk98lin, 
>> previously had uptimes in three digit days. Up for a week or so now.
>>     
>
> There is a real long-term advantage of removing drivers like sk98lin 
> because it forces people to report bugs if the new driver doesn't work  
> instead of giving them the workaround of using the obsolete driver. 

The issue is that sk98lin is only obsolete because you say so! skge 
crashes the system, as Chris reports, sky2 just stops passing bits and 
behaves as if the network cable were idle, no error messages of any 
nature, ping claims it's sending packets, tcpdump claims packets are 
being sent, the switch never blinks and systems on the switch see no 
packets. Again, no error messages, no dumps, nothing which would help 
you debug it, and it happens after some undefined time.

skge and sky2 are up to eight or ten versions now, and they still don't 
work. Just because a driver works doesn't mean it's obsolete.
>    
> And this has the (at first sight surprising) effect that removing code  
> results in an improvement of the kernel.
>
>   
>> Haven't tried later kernels, don't intend to, while no network is really 
>> secure, it not really useful.
>>     
>
> You are a regular reader of linux-kernel, and therefore the sk98lin 
> removal can hardly be a surprise for you. If you prefer whining over 
> helping to improve the kernel that's your choice...
>   

I am trying to "improve the kernel" by advocating not removing reliable 
drivers in favor of unreliable drivers. Saying a driver is better 
because it has a clean design and good code is something I would expect 
from someone who hadn't written or used code. If skge and sky2 were so 
clean you wouldn't still be chasing obscure bugs after the driver had 
been in the kernel for six+ versions, you wouldn't have me wasting time 
trying to get a more secure kernel which is still reliable, wouldn't 
have Willy Tarreau suggesting you should be marking sk98lin as obsolete 
and leaving it in, wouldn't have someone maintaining sk98lin as a patch, 
wouldn't have Chris Stromsoe getting hard lock-ups. No matter how ugly 
sk98lin looks, and how well designed skge and sky2 may be, reliability 
is not a beauty contest.

The volume of complaint should give you a hint that in this case the new 
drivers aren't usefully stable for many people, and that you are 
advocating a removal which is at least premature. If you can't admit 
you're wrong on this one, you can say you have reconsidered the timing 
of removal in light of new information.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-10 14:32             ` Bill Davidsen
@ 2007-09-10 15:39               ` Adrian Bunk
  2007-09-11  4:23                 ` Kyle Moffett
  0 siblings, 1 reply; 35+ messages in thread
From: Adrian Bunk @ 2007-09-10 15:39 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: James Corey, Stephen Hemminger, Rob Sims, Kyle Rose, linux-kernel

On Mon, Sep 10, 2007 at 10:32:45AM -0400, Bill Davidsen wrote:
> Adrian Bunk wrote:
>> On Sat, Sep 08, 2007 at 01:44:20PM -0400, Bill Davidsen wrote:
>>   
>>> ...
>>> That was with 2.6.22.5 (or so), dropped back to an old kernel with 
>>> sk98lin, previously had uptimes in three digit days. Up for a week or so 
>>> now.
>>>     
>>
>> There is a real long-term advantage of removing drivers like sk98lin 
>> because it forces people to report bugs if the new driver doesn't work  
>> instead of giving them the workaround of using the obsolete driver. 
>
> The issue is that sk98lin is only obsolete because you say so!

No, it is obsolete because we have more than one driver for this 
hardware, and the people responsible for network drivers in the kernel 
decided some time ago that sk98lin is the one that is obsolete.

>...
>>    And this has the (at first sight surprising) effect that removing code  
>> results in an improvement of the kernel.
>>
>>   
>>> Haven't tried later kernels, don't intend to, while no network is really 
>>> secure, it not really useful.
>>>     
>>
>> You are a regular reader of linux-kernel, and therefore the sk98lin 
>> removal can hardly be a surprise for you. If you prefer whining over 
>> helping to improve the kernel that's your choice...
>>   
>
> I am trying to "improve the kernel" by advocating not removing reliable 
> drivers in favor of unreliable drivers. Saying a driver is better because 
> it has a clean design and good code is something I would expect from 
> someone who hadn't written or used code. If skge and sky2 were so clean you 
> wouldn't still be chasing obscure bugs after the driver had been in the 
> kernel for six+ versions, you wouldn't have me wasting time trying to get a 
> more secure kernel which is still reliable, wouldn't have Willy Tarreau 
> suggesting you should be marking sk98lin as obsolete and leaving it in, 
> wouldn't have someone maintaining sk98lin as a patch, wouldn't have Chris 
> Stromsoe getting hard lock-ups. No matter how ugly sk98lin looks, and how 
> well designed skge and sky2 may be, reliability is not a beauty contest.

A better written driver might still lack some workarounds for broken 
hardware or similar problems. Or simply contain some bugs like all 
software does.

The important word is not "reliability", it's "maintainability".
And that's something that pays off in the long term.

> The volume of complaint should give you a hint that in this case the new 
> drivers aren't usefully stable for many people, and that you are advocating 
> a removal which is at least premature. If you can't admit you're wrong on 
> this one, you can say you have reconsidered the timing of removal in light 
> of new information.

It was clear that sk98lin would go in the long term, and the only thing 
that could be discussed is the when and how of removal.

When you talk about "new information", why did this information not 
surface until after the sk98lin driver was removed?

Is there really a problem with "the timing of removal" or would we have 
faced exactly the same problems if the removal was timed a year later?

And this is really the essence when I'm saying "removing code improves 
the kernel": The goal is to get people to report if the new drivers 
aren't usefully stable for them, not to use sk98lin instead without 
sending a bug report.

Having different drivers with different sets of bugs and features is 
not a situation that should be retained for a longer time.

The underlying question is:
Is there anything better than a quick removal of the obsolete driver to 
get people to both test and report bugs with the new driver?
Keeping obsolete drivers longer only for running into exactly the same 
problem later isn't an improvement.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-10 15:39               ` Adrian Bunk
@ 2007-09-11  4:23                 ` Kyle Moffett
  0 siblings, 0 replies; 35+ messages in thread
From: Kyle Moffett @ 2007-09-11  4:23 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Bill Davidsen, James Corey, Stephen Hemminger, Rob Sims,
	Kyle Rose, linux-kernel

On Sep 10, 2007, at 11:39:53, Adrian Bunk wrote:
> No, it is obsolete because we have more than one driver for this  
> hardware, and the people responsible for network drivers in the  
> kernel decided some time ago that sk98lin is the one that is obsolete.

I would like to happily report that the sky2 driver works great in  
the NIC on my tablet where the sk98lin and skge drivers both fail  
utterly and hang the kernel.  On another system the sk98lin and skge  
drivers don't recognize the chipset at all (missing PCI ID?) while  
the sky2 driver works perfectly for large quantities of data  
transferred.

Cheers,
Kyle Moffett


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-09 11:13               ` Adrian Bunk
@ 2007-09-11  8:05                 ` Stephen Hemminger
  2007-09-11 11:54                   ` Adrian Bunk
  2007-09-11 22:20                   ` James Corey
  0 siblings, 2 replies; 35+ messages in thread
From: Stephen Hemminger @ 2007-09-11  8:05 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Kyle Rose, Bill Davidsen, James Corey, Rob Sims, linux-kernel

On Sun, 9 Sep 2007 13:13:26 +0200
Adrian Bunk <bunk@kernel.org> wrote:

> On Sat, Sep 08, 2007 at 10:42:20PM -0400, Kyle Rose wrote:
> > 
> > > You are a regular reader of linux-kernel, and therefore the sk98lin 
> > > removal can hardly be a surprise for you. If you prefer whining over 
> > > helping to improve the kernel that's your choice...
> > >   
> > In my case the issue is simply one of practicality: I cannot go to the
> > data center 5 times per day to reboot my colo box.  Therefore, I run
> > sk98lin.  It's really that simple.
> 
> When did you report this bug the first time?
> 
> What we need is that people when testing a new kernel they plan to use 
> test the new drivers *and report the bugs if they run into any*.
> 
> What could we have done so that you reported your bug without removing 
> the sk98lin driver?
> 
> > Kyle
> 
> cu
> Adrian


There are several different problems in this thread:
1. The removal of old sk98lin driver caused some users to be forced to use
    skge. These users have uncovered issues with the dual port fiber based versions
    of the board.  
    Short term: The sk98lin driver should be restored to previous state, 
       and the PCI table should be used to limit the usage to only fiber systems.
       If Adrian doesn't do it, I'll do it when I return from Germany.
    Long term: I have fiber based board (thanks ebay) on the way to resolve
       skge bug.

2. Sky2 driver has it's own fiber based problems.  Solve these after skge fiber.

3. Sky2 doesn't have as many workarounds for hardware problems as vendor sk98lin
    driver.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-11  8:05                 ` Stephen Hemminger
@ 2007-09-11 11:54                   ` Adrian Bunk
  2007-09-11 14:29                     ` Bill Davidsen
  2007-09-11 22:20                   ` James Corey
  1 sibling, 1 reply; 35+ messages in thread
From: Adrian Bunk @ 2007-09-11 11:54 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Kyle Rose, Bill Davidsen, James Corey, Rob Sims, linux-kernel,
	Jeff Garzik, netdev

On Tue, Sep 11, 2007 at 10:05:26AM +0200, Stephen Hemminger wrote:
> 
> There are several different problems in this thread:
> 1. The removal of old sk98lin driver caused some users to be forced to use
>     skge. These users have uncovered issues with the dual port fiber based versions
>     of the board.  
>     Short term: The sk98lin driver should be restored to previous state, 
>        and the PCI table should be used to limit the usage to only fiber systems.
>        If Adrian doesn't do it, I'll do it when I return from Germany.
>...

No problem with this, but since it was Jeff's patch it should better be 
him who reverts it (and he's anyway one step nearer to Linus).

But the underlying general problem still remains:

How can we get people to test and report bugs with the new drivers 
before removing the old driver?

That's a question especially for the people who now had problems after 
sk98lin was removed.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-11 11:54                   ` Adrian Bunk
@ 2007-09-11 14:29                     ` Bill Davidsen
  2007-09-11 15:03                       ` Adrian Bunk
  0 siblings, 1 reply; 35+ messages in thread
From: Bill Davidsen @ 2007-09-11 14:29 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Stephen Hemminger, Kyle Rose, James Corey, Rob Sims, linux-kernel,
	Jeff Garzik, netdev

Adrian Bunk wrote:
> On Tue, Sep 11, 2007 at 10:05:26AM +0200, Stephen Hemminger wrote:
>   
>> There are several different problems in this thread:
>> 1. The removal of old sk98lin driver caused some users to be forced to use
>>     skge. These users have uncovered issues with the dual port fiber based versions
>>     of the board.  
>>     Short term: The sk98lin driver should be restored to previous state, 
>>        and the PCI table should be used to limit the usage to only fiber systems.
>>        If Adrian doesn't do it, I'll do it when I return from Germany.
>> ...
>>     
>
> No problem with this, but since it was Jeff's patch it should better be 
> him who reverts it (and he's anyway one step nearer to Linus).
>
> But the underlying general problem still remains:
>
> How can we get people to test and report bugs with the new drivers 
> before removing the old driver?
>
>   
Sorry for a long answer, I'm trying to provide insight on two recent cases.

Thinking back to several drivers, when e100 was new I tried it because I 
had problems with eepro100 in the area of multiple cards, multiple 
cables on a single card, and jumbo packets. For a while I used both, 
until e100 worked where I need it. So I initially tried it because it 
had features I needed, and then dropped to older driver just to avoid 
having to decide.

With sk98lin, the driver worked flawlessly with all (3-4) systems, so I 
had no reason to try any other. When removing sk98lin was first 
proposed, I tried skge, first measurements showed it was 5-8% slower, 
NOT what I want, so I went back. For me there was no reliability issue, 
but I never tried it in a system with more than on NIC on the driver. 
Would "it's a little slower" be a valid bug report? Or would I have 
gotten "works fine for me" from people not beating it over Gbit? I 
didn't try sky2 until you suggested it, and I have reported my results 
previously, just stops working. Could it be my hardware? I tried it on 
one system, so yes, but sk98lin works for months.
> That's a question especially for the people who now had problems after 
> sk98lin was removed.

So if you want people to try a new driver, I think it really has to have 
some benefits to the users, in terms of performance, reliability, or 
features. "Cleaner design" doesn't motivate, and it does raise the 
question of why the old driver wasn't just cleaned up. I've been doing 
software for decades, I appreciate why, but users in general just want 
to use their system. Which raises the question of why to delete drivers 
which work for many or even most users? Testing a new kernel is no 
longer a drop in a boot operation if modprobe.conf must be edited to get 
the network up, and the typical user isn't going to write that shell 
script to try one or the other driver.

Honestly, new drivers which offer little benefit to most users are the 
exception rather than the rule, so this may a corner case I would like 
to see sk98lin back in the kernel, for a while I can build my own 
kernels and patch it in, but until other drivers are drop-in, I probably 
won't change.

Separate but related: why keep skge and sky2? Are we going through this 
again in a year? Is the benefit worth the effort?

Hope some of this is helpful.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-11 14:29                     ` Bill Davidsen
@ 2007-09-11 15:03                       ` Adrian Bunk
  2007-09-11 22:37                         ` Willy Tarreau
  0 siblings, 1 reply; 35+ messages in thread
From: Adrian Bunk @ 2007-09-11 15:03 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Stephen Hemminger, Kyle Rose, James Corey, Rob Sims, linux-kernel,
	Jeff Garzik, netdev

On Tue, Sep 11, 2007 at 10:29:47AM -0400, Bill Davidsen wrote:
> Adrian Bunk wrote:
>> On Tue, Sep 11, 2007 at 10:05:26AM +0200, Stephen Hemminger wrote:
>>   
>>> There are several different problems in this thread:
>>> 1. The removal of old sk98lin driver caused some users to be forced to 
>>> use
>>>     skge. These users have uncovered issues with the dual port fiber 
>>> based versions
>>>     of the board.      Short term: The sk98lin driver should be restored 
>>> to previous state,        and the PCI table should be used to limit the 
>>> usage to only fiber systems.
>>>        If Adrian doesn't do it, I'll do it when I return from Germany.
>>> ...
>>>     
>>
>> No problem with this, but since it was Jeff's patch it should better be 
>> him who reverts it (and he's anyway one step nearer to Linus).
>>
>> But the underlying general problem still remains:
>>
>> How can we get people to test and report bugs with the new drivers before 
>> removing the old driver?
>>
>>   
> Sorry for a long answer, I'm trying to provide insight on two recent cases.
>
> Thinking back to several drivers, when e100 was new I tried it because I 
> had problems with eepro100 in the area of multiple cards, multiple cables 
> on a single card, and jumbo packets. For a while I used both, until e100 
> worked where I need it. So I initially tried it because it had features I 
> needed, and then dropped to older driver just to avoid having to decide.
>
> With sk98lin, the driver worked flawlessly with all (3-4) systems, so I had 
> no reason to try any other. When removing sk98lin was first proposed, I 
> tried skge, first measurements showed it was 5-8% slower, NOT what I want, 
> so I went back. For me there was no reliability issue, but I never tried it 
> in a system with more than on NIC on the driver. Would "it's a little 
> slower" be a valid bug report? Or would I have gotten "works fine for me" 
> from people not beating it over Gbit?
>...

If you get less throughput that is a regression, and it should be 
reported and fixed.

I doubt anybody would have told you otherwise.

Is this bug still present as of 2.6.23-rc6?

>> That's a question especially for the people who now had problems after 
>> sk98lin was removed.
>
> So if you want people to try a new driver, I think it really has to have 
> some benefits to the users, in terms of performance, reliability, or 
> features. "Cleaner design" doesn't motivate, and it does raise the question 
> of why the old driver wasn't just cleaned up. I've been doing software for 
> decades, I appreciate why, but users in general just want to use their 
> system. Which raises the question of why to delete drivers which work for 
> many or even most users?

As I already explained, there is a long term advantage for all users if 
there is only one driver in the kernel. Therefore all users should 
switch away from obsolete drivers to the replacement drivers, and the 
obsolete driver will be removed at some point in time. The only question 
is how to do it.

> Testing a new kernel is no longer a drop in a boot 
> operation if modprobe.conf must be edited to get the network up, and the 
> typical user isn't going to write that shell script to try one or the other 
> driver.

The typical user will let his distribution handle this.

And MODULE_ALIAS can also handle this.

> Honestly, new drivers which offer little benefit to most users are the 
> exception rather than the rule, so this may a corner case I would like to 
> see sk98lin back in the kernel, for a while I can build my own kernels and 
> patch it in, but until other drivers are drop-in, I probably won't change.

That a new driver offers benefits that cause most users to switch isn't 
realistic.

You mention e100 as an example - well, I'm using this driver in my 
computer, but I doubt anything would be worse for me if I'd use the 
obsolete eepro100 driver instead since I'm not using any of the fancy 
e100 features you mentioned as advantages.

There is a long term advantage for all users if there is only one driver 
in the kernel. Therefore all users should switch away from obsolete 
drivers to the replacement drivers, and the obsolete driver will be 
removed at some point in time. The only question is how to do it.

> Separate but related: why keep skge and sky2? Are we going through this 
> again in a year? Is the benefit worth the effort?
>...

skge and sky2 support distinct hardware.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-11  8:05                 ` Stephen Hemminger
  2007-09-11 11:54                   ` Adrian Bunk
@ 2007-09-11 22:20                   ` James Corey
  1 sibling, 0 replies; 35+ messages in thread
From: James Corey @ 2007-09-11 22:20 UTC (permalink / raw)
  To: Stephen Hemminger, Adrian Bunk
  Cc: Kyle Rose, Bill Davidsen, James Corey, Rob Sims, linux-kernel


--- Stephen Hemminger
<shemminger@linux-foundation.org> wrote:

> On Sun, 9 Sep 2007 13:13:26 +0200
> Adrian Bunk <bunk@kernel.org> wrote:
> 
> > On Sat, Sep 08, 2007 at 10:42:20PM -0400, Kyle
> Rose wrote:
> > > 
> > > > You are a regular reader of linux-kernel, and
> therefore the sk98lin 
> > > > removal can hardly be a surprise for you. If
> you prefer whining over 
> > > > helping to improve the kernel that's your
> choice...
> > > >   
> > > In my case the issue is simply one of
> practicality: I cannot go to the
> > > data center 5 times per day to reboot my colo
> box.  Therefore, I run
> > > sk98lin.  It's really that simple.
> > 
> > When did you report this bug the first time?
> > 
> > What we need is that people when testing a new
> kernel they plan to use 
> > test the new drivers *and report the bugs if they
> run into any*.
> > 
> > What could we have done so that you reported your
> bug without removing 
> > the sk98lin driver?
> > 
> > > Kyle
> > 
> > cu
> > Adrian
> 
> 
> There are several different problems in this thread:
> 1. The removal of old sk98lin driver caused some
> users to be forced to use
>     skge. These users have uncovered issues with the
> dual port fiber based versions
>     of the board.  
>     Short term: The sk98lin driver should be
> restored to previous state, 
>        and the PCI table should be used to limit the
> usage to only fiber systems.
>        If Adrian doesn't do it, I'll do it when I
> return from Germany.
>     Long term: I have fiber based board (thanks
> ebay) on the way to resolve
>        skge bug.
> 
> 2. Sky2 driver has it's own fiber based problems. 
> Solve these after skge fiber.
> 
> 3. Sky2 doesn't have as many workarounds for
> hardware problems as vendor sk98lin
>     driver.
> -


Hm, hope I didn't trigger a religious debate. When
you get to the point of working on the SKY2 driver
problem with DGE-550SX (Syskonnect SK-9S81) also
known as the "hw csum failure" issue, I'll be 
glad to test a patch or take debug data. Til then,
I'll stay out of the way.

-J





      ____________________________________________________________________________________
Shape Yahoo! in your own image.  Join our Network Research Panel today!   http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-11 15:03                       ` Adrian Bunk
@ 2007-09-11 22:37                         ` Willy Tarreau
  0 siblings, 0 replies; 35+ messages in thread
From: Willy Tarreau @ 2007-09-11 22:37 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Bill Davidsen, Stephen Hemminger, Kyle Rose, James Corey,
	Rob Sims, linux-kernel, Jeff Garzik, netdev

On Tue, Sep 11, 2007 at 05:03:57PM +0200, Adrian Bunk wrote:
> On Tue, Sep 11, 2007 at 10:29:47AM -0400, Bill Davidsen wrote:
> > So if you want people to try a new driver, I think it really has to have 
> > some benefits to the users, in terms of performance, reliability, or 
> > features. "Cleaner design" doesn't motivate, and it does raise the question 
> > of why the old driver wasn't just cleaned up. I've been doing software for 
> > decades, I appreciate why, but users in general just want to use their 
> > system. Which raises the question of why to delete drivers which work for 
> > many or even most users?
> 
> As I already explained, there is a long term advantage for all users if 
> there is only one driver in the kernel.

Not only that. You have to place the switch in its context with history.
Stephen, please correct me if I'm wrong, but sk98lin has been randomly
working for a very long time. Not 100% the driver's fault, because it
has had to workaround a lot of chips bugs. The fact that this driver
supports *all* chips in the family makes it harder to identify whether
problems are caused by the hardware or by the driver because it is
bloated with tons of if/else.

I've personally encountered random data corruption on the receive path
with PCI-E hardware with sk98lin, as well as random TX stops. Sometimes
it would require one terabyte of data, sometimes just a few hundreds
megs. On other hardware (skge now), UDP would simply stop being sent
and some TCP traffic was necessary to restart UDP! One guy at Marvell
once asked me for more information, but it was not easy to provide
much more, given the randomness of the problems!

Stephen has done an excellent (and thankless) job at restarting from
scratch, and the idea to separate the two chips was a good one IMHO.
The problem is that he might have thought that most of the bugs were
in the driver, while most of them are in the hardware, and this requires
a lot of workarounds, which do not always work the same for everybody
(I remember having tried to disable flow control with sk98lin because
it helped with sky2).

In parallel, sk98lin has improved on the vendor's site. v8 exhibited
all the problems I explained above, but v10 has fixed a lot of them,
making the new sk98lin more reliable. In parallel, sky2 and skge had
got wider acceptance and testing. The nastiest hardware bugs will
slowly surface, a good deal of driver bugs have been detected too
(and that's expected from any new driver).

It is possible that after 2 or 3 patches, a lot of the remaining
problems will suddenly vanish. But it's also possible that the driver
will still not work for 1% of people for 1 or 2 years because of some
obscure hardware combinations which trigger some obscure hardware bugs.

> Therefore all users should 
> switch away from obsolete drivers to the replacement drivers, and the 
> obsolete driver will be removed at some point in time. The only question 
> is how to do it.

Desktop users genreally have no problem experimenting with multiple kernels
or drivers. They can report feedback too, but generally, they're not very
good at downloading alternative drivers and patching their kernel with those.

Server users cannot experiment for a long time. After 2 or 3 losses of
service, they *have* to provide a definitive solution. For some of them
when sky2 fails, it may very well be to switch over to sk98lin. Downloading
from the vendor's site and patching is not a problem for those users, but
it causes them the trouble of updating the kernel for security fixes, so
the old driver must be shipped with the kernel.

However, I remember something which might constitute a solution. In 2.4,
there's a small bug in the kbuild process on alpha. One question is always
asked during make oldconfig. Its saved value is ignored because of the way
it is computed. I don't know if we could do this with 2.6 kbuild. It would
then be nice to always set sk98lin to unset if it was set to "Y" or "M",
so that at each build, the user has to explicitly state he wants it. It's
annoying enough to give the other one a try once in a while, without causing
too much trouble to people who really have no other choice right now.

What we need with this driver is people being fed up with it, not them
being unable to use it as a last resort. Also, given that it has improved
over the last years (probably due to competition pressure from sky2/skge),
users will even less understand why there is such incentive to remove it.

Another trick for obsolete drivers would be to simply remove them from
the usual build system, but have them being available for explicit build.
Eg: make modules will not build them, but make obsolete-modules would do.

> > Testing a new kernel is no longer a drop in a boot 
> > operation if modprobe.conf must be edited to get the network up, and the 
> > typical user isn't going to write that shell script to try one or the other 
> > driver.
> 
> The typical user will let his distribution handle this.
> 
> And MODULE_ALIAS can also handle this.

No system config should be edited to switch back to the alternative,
otherwise it remains in its working state.

> > Honestly, new drivers which offer little benefit to most users are the 
> > exception rather than the rule, so this may a corner case I would like to 
> > see sk98lin back in the kernel, for a while I can build my own kernels and 
> > patch it in, but until other drivers are drop-in, I probably won't change.
> 
> That a new driver offers benefits that cause most users to switch isn't 
> realistic.

Desktop users are curious and have plenty of time to kill. Server users
are frightened and lazy. So I think that annoying the user slightly is
a good solution (eg: make obsolete-modules).

> You mention e100 as an example - well, I'm using this driver in my 
> computer, but I doubt anything would be worse for me if I'd use the 
> obsolete eepro100 driver instead since I'm not using any of the fancy 
> e100 features you mentioned as advantages.

After having been happy with eepro100 for years, I discovered many problems
with its VLAN support in 2.4 (MTU, ...) for which e100 was a solution. It
was a good reason to switch. But the old e100 driver took ages to load (half
of the machine boot time), which was not satisfying. So having a new driver
load faster is another good reason to switch.

> There is a long term advantage for all users if there is only one driver 
> in the kernel. Therefore all users should switch away from obsolete 
> drivers to the replacement drivers, and the obsolete driver will be 
> removed at some point in time. The only question is how to do it.

Hmmm we already read this paragraph above :-)

> > Separate but related: why keep skge and sky2? Are we going through this 
> > again in a year? Is the benefit worth the effort?
> >...
> 
> skge and sky2 support distinct hardware.

... and as such are both smaller than sk98lin which supports both.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-05  9:22     ` Stephen Hemminger
  2007-09-05 19:42       ` James Corey
@ 2007-09-12 16:46       ` Torsten Kaiser
  1 sibling, 0 replies; 35+ messages in thread
From: Torsten Kaiser @ 2007-09-12 16:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Rob Sims, Adrian Bunk, Kyle Rose, linux-kernel

On 9/5/07, Stephen Hemminger <shemminger@linux-foundation.org> wrote:
>
> The only known outstanding problems on 2.62.22.6 of sky2 are:
>  * problems with fibre PHY based systems
>  * suspend/resume issues, missing multicast reinitalization, etc.
> The previous stability problems have been addressed.

Sorry to disappoint you, but it just hung for me again.
After seeing the backport of commit  c59697e06058fc2361da8cefcfa3de85ac107582 as
"sky2: restore workarounds for lost interrupts" going into 2.6.22.5 I
decided to give it another try.

First tests worked and for two days I had no trouble, but today the
network hung again, until I removed and reinserted the sky2 module.

I'm using the Gentoo kernel 2.6.22-gentoo-r6 which is based on
2.6.22.6. (All patches at
http://dev.gentoo.org/~dsd/genpatches/patches-2.6.22-7.htm )
This is as x86_64 kernel but with a 32bit userland.

My hardware:
00:00.0 Host bridge: Intel Corporation 82915G/P/GV/GL/PL/910GL Memory
Controller Hub (rev 04)
00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL
Integrated Graphics Controller (rev 04)
00:02.1 Display controller: Intel Corporation 82915G Integrated
Graphics Controller (rev 04)
00:1b.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) High Definition Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) PCI Express Port 1 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #3 (rev 03)
00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #4 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB2 EHCI Controller (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3)
00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC
Interface Bridge (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) IDE Controller (rev 03)
00:1f.2 IDE interface: Intel Corporation 82801FB/FW (ICH6/ICH6W) SATA
Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
SMBus Controller (rev 03)
01:04.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A
IEEE-1394a-2000 Controller (PHY/Link)
01:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053
PCI-E Gigabit Ethernet Controller (rev 19)

The Marvell controller is onboard, more info:
linux ~ # lspci -vxxx -s 02:00.0
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053
PCI-E Gigabit Ethernet Controller (rev 19)
        Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit
Ethernet controller PCIe (Asus)
        Flags: bus master, fast devsel, latency 0, IRQ 318
        Memory at cfffc000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at e800 [size=256]
        Expansion ROM at cffc0000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+
Queue=0/1 Enable+
        Capabilities: [e0] Express Legacy Endpoint IRQ 0
00: ab 11 62 43 07 04 10 00 19 00 00 02 04 00 00 00
10: 04 c0 ff cf 00 00 00 00 01 e8 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 42 81
30: 00 00 fc cf 48 00 00 00 00 00 00 00 0a 01 00 00
40: 00 00 f0 01 00 80 a0 01 01 50 02 fe 00 20 00 13
50: 03 5c 00 80 00 00 00 01 00 00 00 01 05 e0 83 00
60: 0c 30 e0 fe 00 00 00 00 89 41 00 00 00 00 00 00
70: 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 10 00 11 00 c0 0f 00 00 00 20 1b 00 11 a4 03 00
f0: 08 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00

>From /proc/interrupts
318:     230462          0   PCI-MSI-edge      eth2

>From syslog:
Sep 12 11:01:27 linux [ 9580.538373]  CIFS VFS: server not responding
Sep 12 11:01:27 linux [ 9580.538385]  CIFS VFS: No response for cmd 50 mid 34863

Now the network was dead, I tried to restart it with ifconfig down &&
ifconfig up

Sep 12 11:03:54 linux [ 9727.917997] sky2 eth2: disabling interface
Sep 12 11:03:55 linux [ 9728.270436] sky2 eth2: enabling interface
Sep 12 11:03:55 linux [ 9728.272401] sky2 eth2: ram buffer 48K
Sep 12 11:03:56 linux [ 9730.016797] sky2 eth2: Link is up at 100
Mbps, full duplex, flow control both

As that did not help, I removed the sky2 module and reinserted it:

Sep 12 11:04:12 linux [ 9745.832197] sky2 eth2: disabling interface
Sep 12 11:04:18 linux [ 9751.197733] ACPI: PCI interrupt for device
0000:02:00.0 disabled
Sep 12 11:04:25 linux [ 9758.264714] ACPI: PCI Interrupt
0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
Sep 12 11:04:25 linux [ 9758.264736] PCI: Setting latency timer of
device 0000:02:00.0 to 64
Sep 12 11:04:25 linux [ 9758.265409] sky2 0000:02:00.0: v1.14 addr
0xcfffc000 irq 16 Yukon-EC (0xb6) rev 2
Sep 12 11:04:25 linux [ 9758.265910] sky2 eth0: addr 00:15:f2:55:ce:f9
Sep 12 11:04:25 linux [ 9758.267754] udev: renamed network interface
eth0 to eth2
Sep 12 11:04:25 linux [ 9758.705240] sky2 eth2: enabling interface
Sep 12 11:04:25 linux [ 9758.707076] sky2 eth2: ram buffer 48K
Sep 12 11:04:27 linux [ 9760.592061] sky2 eth2: Link is up at 100
Mbps, full duplex, flow control both

Now the network was up again, but around one hour later it hung again.
Again after removing and reinserting the module it started to work
again, this time until I went home.

I switched back to the Realtek 8139, as that card works.

I can provide more info about the hardware, but I can't test any
patches, as this server is needed for work and random hangs after
hours of working are not really the nicest things to debug.

Torsten

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-09-09 12:54             ` Chris Stromsoe
@ 2007-11-06 22:23               ` Stephen Hemminger
  2007-11-07  1:42                 ` Chris Stromsoe
  0 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2007-11-06 22:23 UTC (permalink / raw)
  To: Chris Stromsoe
  Cc: Adrian Bunk, Bill Davidsen, James Corey, Rob Sims, Kyle Rose,
	linux-kernel

On Sun, 9 Sep 2007 05:54:45 -0700 (PDT)
Chris Stromsoe <cbs@cts.ucla.edu> wrote:

> On Sat, 8 Sep 2007, Adrian Bunk wrote:
> > On Sat, Sep 08, 2007 at 01:44:20PM -0400, Bill Davidsen wrote:
> >
> >> Haven't tried later kernels, don't intend to, while no network is 
> >> really secure, it not really useful.
> >
> > You are a regular reader of linux-kernel, and therefore the sk98lin 
> > removal can hardly be a surprise for you. If you prefer whining over 
> > helping to improve the kernel that's your choice...
> 
> I've been trying to migrate off sk98lin to skge since earlier this year, 
> without success, starting with 2.6.18 or .19.
> 
> I have several of these cards in production using the sk98lin driver:
> 
> fresno:~# lspci -vv -s 02:01
> 02:01.0 Ethernet controller: SysKonnect SK-9872 Gigabit Ethernet Server Adapter (SK-NET GE-ZX dual link) (rev 11)
>          Subsystem: SysKonnect SK-9844 Gigabit Ethernet Server Adapter (SK-NET GE-SX dual link)
>          Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
>          Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>          Latency: 64 (5750ns min, 7750ns max), Cache Line Size: 32 bytes
>          Interrupt: pin A routed to IRQ 22
>          Region 0: Memory at febfc000 (32-bit, non-prefetchable) [size=16K]
>          Region 1: I/O ports at e800 [size=256]
>          Expansion ROM at febc0000 [disabled] [size=128K]
>          Capabilities: [48] Power Management version 1
>                  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                  Status: D0 PME-Enable- DSel=0 DScale=1 PME-
>          Capabilities: [50] Vital Product Data
> 
> They are dual port SX fiber.  Both ports are connected.  If I do this:
> 
> fresno:~# modprobe skge
> fresno:~# ip li set eth2 up
> fresno:~# ip li set eth2 down
> fresno:~# ip li set eth3 up
> 
> the system locks up and I have to power cycle it.  The order doesn't 
> matter (if I do eth3 up/down, then eth2 up kills it).
> 
> I don't have any problems with sk98lin.  This works fine:
> 
> fresno:~# modprobe sk98lin RlmtMode=DualNet
> fresno:~# ip li set eth2 up
> fresno:~# ip li set eth2 down
> fresno:~# ip li set eth3 up
> fresno:~# ip li set eth3 down
> 
> 
> I am more than happy to test various driver changes, and have tried a few 
> suggested patches but nothing has worked so far.  I would like to be using 
> skge instead of sk98lin, but so far haven't had any success.

Please test 2.6.24-rc1 (or -rc2) because there were several fixes for skge
that made it work correctly for dual port fiber board. The worst bug in skge
was that it configured the ram buffer incorrectly.

I just submitted these for next 2.6.23.X stable release as well

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sk98lin for 2.6.23-rc1
  2007-11-06 22:23               ` Stephen Hemminger
@ 2007-11-07  1:42                 ` Chris Stromsoe
  0 siblings, 0 replies; 35+ messages in thread
From: Chris Stromsoe @ 2007-11-07  1:42 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Adrian Bunk, Bill Davidsen, James Corey, Rob Sims, Kyle Rose,
	linux-kernel

On Tue, 6 Nov 2007, Stephen Hemminger wrote:
> On Sun, 9 Sep 2007 05:54:45 -0700 (PDT)
> Chris Stromsoe <cbs@cts.ucla.edu> wrote:
>
>> On Sat, 8 Sep 2007, Adrian Bunk wrote:
>>> On Sat, Sep 08, 2007 at 01:44:20PM -0400, Bill Davidsen wrote:
>>>
>>>> Haven't tried later kernels, don't intend to, while no network is
>>>> really secure, it not really useful.
>>>
>>> You are a regular reader of linux-kernel, and therefore the sk98lin
>>> removal can hardly be a surprise for you. If you prefer whining over
>>> helping to improve the kernel that's your choice...
>>
>> I've been trying to migrate off sk98lin to skge since earlier this year,
>> without success, starting with 2.6.18 or .19.
>>
>> I have several of these cards in production using the sk98lin driver:
>>
>> fresno:~# lspci -vv -s 02:01
>> 02:01.0 Ethernet controller: SysKonnect SK-9872 Gigabit Ethernet Server Adapter (SK-NET GE-ZX dual link) (rev 11)
>>          Subsystem: SysKonnect SK-9844 Gigabit Ethernet Server Adapter (SK-NET GE-SX dual link)
>>          Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
>>          Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>>          Latency: 64 (5750ns min, 7750ns max), Cache Line Size: 32 bytes
>>          Interrupt: pin A routed to IRQ 22
>>          Region 0: Memory at febfc000 (32-bit, non-prefetchable) [size=16K]
>>          Region 1: I/O ports at e800 [size=256]
>>          Expansion ROM at febc0000 [disabled] [size=128K]
>>          Capabilities: [48] Power Management version 1
>>                  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>                  Status: D0 PME-Enable- DSel=0 DScale=1 PME-
>>          Capabilities: [50] Vital Product Data
>>
>> They are dual port SX fiber.  Both ports are connected.  If I do this:
>>
>> fresno:~# modprobe skge
>> fresno:~# ip li set eth2 up
>> fresno:~# ip li set eth2 down
>> fresno:~# ip li set eth3 up
>>
>> the system locks up and I have to power cycle it.  The order doesn't
>> matter (if I do eth3 up/down, then eth2 up kills it).
>>
>> I don't have any problems with sk98lin.  This works fine:
>>
>> fresno:~# modprobe sk98lin RlmtMode=DualNet
>> fresno:~# ip li set eth2 up
>> fresno:~# ip li set eth2 down
>> fresno:~# ip li set eth3 up
>> fresno:~# ip li set eth3 down
>>
>>
>> I am more than happy to test various driver changes, and have tried a few
>> suggested patches but nothing has worked so far.  I would like to be using
>> skge instead of sk98lin, but so far haven't had any success.
>
> Please test 2.6.24-rc1 (or -rc2) because there were several fixes for skge
> that made it work correctly for dual port fiber board. The worst bug in skge
> was that it configured the ram buffer incorrectly.
>
> I just submitted these for next 2.6.23.X stable release as well


I tested 2.6.24-rc1.  This series of commands

   fresno:~# modprobe skge
   fresno:~# ip li set eth2 up
   fresno:~# ip li set eth2 down
   fresno:~# ip li set eth3 up

still hard-locks the box in the same place.  Was there anything in the 
-rc2 patch for skge?



-Chris

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2007-11-07  2:06 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-26 15:16 sk98lin for 2.6.23-rc1 Kyle Rose
2007-07-26 16:28 ` Jan Engelhardt
2007-07-26 16:30   ` Kyle Rose
2007-07-26 16:41     ` Jan Engelhardt
2007-07-27  1:07       ` Kyle Rose
2007-07-26 16:57 ` Adrian Bunk
2007-07-26 22:58   ` Chris Stromsoe
2007-07-26 23:38   ` Bill Davidsen
2007-07-26 23:41     ` Jeff Garzik
2007-07-30  3:01   ` Rob Sims
2007-09-05  9:22     ` Stephen Hemminger
2007-09-05 19:42       ` James Corey
2007-09-05 21:04         ` Kyle Rose
2007-09-05 23:00           ` Stephen Hemminger
2007-09-08 17:44         ` Bill Davidsen
2007-09-08 19:11           ` Adrian Bunk
2007-09-09  2:42             ` Kyle Rose
2007-09-09  4:48               ` Willy Tarreau
2007-09-09 11:13               ` Adrian Bunk
2007-09-11  8:05                 ` Stephen Hemminger
2007-09-11 11:54                   ` Adrian Bunk
2007-09-11 14:29                     ` Bill Davidsen
2007-09-11 15:03                       ` Adrian Bunk
2007-09-11 22:37                         ` Willy Tarreau
2007-09-11 22:20                   ` James Corey
2007-09-09 12:54             ` Chris Stromsoe
2007-11-06 22:23               ` Stephen Hemminger
2007-11-07  1:42                 ` Chris Stromsoe
2007-09-10 14:32             ` Bill Davidsen
2007-09-10 15:39               ` Adrian Bunk
2007-09-11  4:23                 ` Kyle Moffett
2007-09-12 16:46       ` Torsten Kaiser
2007-07-26 19:17 ` Stephen Hemminger
2007-07-26 23:52 ` Bill Davidsen
2007-07-27  1:13   ` Kyle Rose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox