netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
@ 2012-09-28 20:45 Ferenc Wagner
  2012-10-01  9:33 ` Michael Chan
  0 siblings, 1 reply; 11+ messages in thread
From: Ferenc Wagner @ 2012-09-28 20:45 UTC (permalink / raw)
  To: netdev
  Cc: Matt Carlson, Michael Chan, Grant Likely, Rob Herring,
	linux-kernel, wferi

Hi,

Upgrading the kernel on our HS20 blades resulted in their SoL (serial
over LAN) connection being broken.  The disconnection happens when eth0
(the interface involved in SoL) is brought up during the boot sequence.
If I later "ip link set eth0 down", then the connection is restored, but
"ip link set eth0 up" breaks it again on 3.2.  ethtool -a, -c, -g, -k
and -u show no difference; ethtool -i on the 2.6.32 kernel reports:

driver: tg3
version: 3.116
firmware-version: 5704s-v3.38, ASFIPMIs v2.47
bus-info: 0000:05:01.0

In the 3.2 kernel the driver version is 3.121.  In the output of lspci
-vv only the Interrupt line shows variability.  I'd be grateful for any
workaround suggestion; I understand that is't not easy given the lack of
information about the SoL implementation, but maybe somebody can guess
what changed in the driver which could affect it this way.
-- 
Regards,
Feri.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
  2012-09-28 20:45 tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL Ferenc Wagner
@ 2012-10-01  9:33 ` Michael Chan
  2012-10-02  9:31   ` Ferenc Wagner
  2012-10-02 12:07   ` Ferenc Wagner
  0 siblings, 2 replies; 11+ messages in thread
From: Michael Chan @ 2012-10-01  9:33 UTC (permalink / raw)
  To: Ferenc Wagner
  Cc: netdev, Matt Carlson, Grant Likely, Rob Herring, linux-kernel

On Fri, 2012-09-28 at 22:45 +0200, Ferenc Wagner wrote: 
> Hi,
> 
> Upgrading the kernel on our HS20 blades resulted in their SoL (serial
> over LAN) connection being broken.  The disconnection happens when eth0
> (the interface involved in SoL) is brought up during the boot sequence.
> If I later "ip link set eth0 down", then the connection is restored, but
> "ip link set eth0 up" breaks it again on 3.2.  ethtool -a, -c, -g, -k
> and -u show no difference; ethtool -i on the 2.6.32 kernel reports:
> 
> driver: tg3
> version: 3.116
> firmware-version: 5704s-v3.38, ASFIPMIs v2.47
> bus-info: 0000:05:01.0
> 
> In the 3.2 kernel the driver version is 3.121.

2.6.32 to 3.2 is a big jump.  Can you narrow this down further?  It will
be hard for us to find a HS20 with 5704 to test this.  Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
  2012-10-01  9:33 ` Michael Chan
@ 2012-10-02  9:31   ` Ferenc Wagner
  2012-10-02 12:07   ` Ferenc Wagner
  1 sibling, 0 replies; 11+ messages in thread
From: Ferenc Wagner @ 2012-10-02  9:31 UTC (permalink / raw)
  To: Michael Chan
  Cc: netdev, Matt Carlson, Grant Likely, Rob Herring, linux-kernel

"Michael Chan" <mchan@broadcom.com> writes:

> On Fri, 2012-09-28 at 22:45 +0200, Ferenc Wagner wrote: 
> 
>> Upgrading the kernel on our HS20 blades resulted in their SoL (serial
>> over LAN) connection being broken.  The disconnection happens when eth0
>> (the interface involved in SoL) is brought up during the boot sequence.
>> If I later "ip link set eth0 down", then the connection is restored, but
>> "ip link set eth0 up" breaks it again on 3.2.  ethtool -a, -c, -g, -k
>> and -u show no difference; ethtool -i on the 2.6.32 kernel reports:
>> 
>> driver: tg3
>> version: 3.116
>> firmware-version: 5704s-v3.38, ASFIPMIs v2.47
>> bus-info: 0000:05:01.0
>> 
>> In the 3.2 kernel the driver version is 3.121.
>
> 2.6.32 to 3.2 is a big jump.  Can you narrow this down further?  It will
> be hard for us to find a HS20 with 5704 to test this.  Thanks.

Certainly, I'm bisecting it now, but I thought I would drop in the
question in case it rings some bells somewhere.  Given the nature of the
problem, it isn't much fun to bisect, and the stripped down kernel I'm
testing with breaks the SoL connection for a couple of seconds even in
the "good" cases.  I'm already down to 13 steps...
-- 
Thanks,
Feri.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
  2012-10-01  9:33 ` Michael Chan
  2012-10-02  9:31   ` Ferenc Wagner
@ 2012-10-02 12:07   ` Ferenc Wagner
  2012-10-02 15:03     ` Michael Chan
  1 sibling, 1 reply; 11+ messages in thread
From: Ferenc Wagner @ 2012-10-02 12:07 UTC (permalink / raw)
  To: Michael Chan
  Cc: netdev, Matt Carlson, Grant Likely, Rob Herring, linux-kernel,
	wferi

"Michael Chan" <mchan@broadcom.com> writes:

> On Fri, 2012-09-28 at 22:45 +0200, Ferenc Wagner wrote: 
> 
>> Upgrading the kernel on our HS20 blades resulted in their SoL (serial
>> over LAN) connection being broken.  The disconnection happens when eth0
>> (the interface involved in SoL) is brought up during the boot sequence.
>> If I later "ip link set eth0 down", then the connection is restored, but
>> "ip link set eth0 up" breaks it again on 3.2.  ethtool -a, -c, -g, -k
>> and -u show no difference; ethtool -i on the 2.6.32 kernel reports:
>> 
>> driver: tg3
>> version: 3.116
>> firmware-version: 5704s-v3.38, ASFIPMIs v2.47
>> bus-info: 0000:05:01.0
>> 
>> In the 3.2 kernel the driver version is 3.121.
>
> 2.6.32 to 3.2 is a big jump.  Can you narrow this down further?  It will
> be hard for us to find a HS20 with 5704 to test this.  Thanks.

I'm done with bisecting it: the first bad commit is:

commit dabc5c670d3f86d15ee4f42ab38ec5bd2682487d
Author: Matt Carlson <mcarlson@broadcom.com>
Date:   Thu May 19 12:12:52 2011 +0000

    tg3: Move TSO_CAPABLE assignment
    
    This patch moves the code that asserts the TSO_CAPABLE flag closer to
    where the TSO capabilities flags are set.  There isn't a good enough
    reason for the code to be separated.
    
    Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
    Reviewed-by: Michael Chan <mchan@broadcom.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

On the other hand, losing the SoL console even temporarily during boot
(as it happens with a minimal kernel before this commit) isn't nice
either.  I'll try to look after that, too, just mentioning it here...
-- 
Regards,
Feri.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
  2012-10-02 12:07   ` Ferenc Wagner
@ 2012-10-02 15:03     ` Michael Chan
  2012-10-02 16:49       ` Ferenc Wagner
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Chan @ 2012-10-02 15:03 UTC (permalink / raw)
  To: Ferenc Wagner
  Cc: netdev, Matt Carlson, Grant Likely, Rob Herring, linux-kernel

On Tue, 2012-10-02 at 14:07 +0200, Ferenc Wagner wrote:
> I'm done with bisecting it: the first bad commit is:
> 
> commit dabc5c670d3f86d15ee4f42ab38ec5bd2682487d
> Author: Matt Carlson <mcarlson@broadcom.com>
> Date:   Thu May 19 12:12:52 2011 +0000
> 
>     tg3: Move TSO_CAPABLE assignment
>     
>     This patch moves the code that asserts the TSO_CAPABLE flag closer
> to
>     where the TSO capabilities flags are set.  There isn't a good
> enough
>     reason for the code to be separated.
>     
>     Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
>     Reviewed-by: Michael Chan <mchan@broadcom.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

Thanks, I'll look into this.
> 
> On the other hand, losing the SoL console even temporarily during boot
> (as it happens with a minimal kernel before this commit) isn't nice
> either.  I'll try to look after that, too, just mentioning it here... 

This is expected as the driver has to reset the link and you'll lose SoL
for a few seconds until link comes back up.  We can look into an
enhancement to not touch the link if it is already in a good state when
the driver comes up.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
  2012-10-02 15:03     ` Michael Chan
@ 2012-10-02 16:49       ` Ferenc Wagner
  2012-10-02 17:06         ` Michael Chan
  0 siblings, 1 reply; 11+ messages in thread
From: Ferenc Wagner @ 2012-10-02 16:49 UTC (permalink / raw)
  To: Michael Chan
  Cc: netdev, Matt Carlson, Grant Likely, Rob Herring, linux-kernel,
	wferi

"Michael Chan" <mchan@broadcom.com> writes:

> On Tue, 2012-10-02 at 14:07 +0200, Ferenc Wagner wrote:
>
>> I'm done with bisecting it: the first bad commit is:
>> 
>> commit dabc5c670d3f86d15ee4f42ab38ec5bd2682487d
>> Author: Matt Carlson <mcarlson@broadcom.com>
>> Date:   Thu May 19 12:12:52 2011 +0000
>> 
>>     tg3: Move TSO_CAPABLE assignment
>>     
>>     This patch moves the code that asserts the TSO_CAPABLE flag closer to
>>     where the TSO capabilities flags are set.  There isn't a good enough
>>     reason for the code to be separated.
>>     
>>     Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
>>     Reviewed-by: Michael Chan <mchan@broadcom.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> Thanks, I'll look into this.

Going into the opposite direction: I found that Linux 3.6 does not
permanently break the SoL console on upping eth0!  I'll try to find the
commit which (sort of) fixed it.

>> On the other hand, losing the SoL console even temporarily during boot
>> (as it happens with a minimal kernel before this commit) isn't nice
>> either.  I'll try to look after that, too, just mentioning it here... 
>
> This is expected as the driver has to reset the link and you'll lose SoL
> for a few seconds until link comes back up.  We can look into an
> enhancement to not touch the link if it is already in a good state when
> the driver comes up.

This looks more complicated here.  In our production setup under 2.6.32
(stock Debian squeeze system) the SoL console is not broken during boot
at all.  I don't say there are no dropouts at all, but the management
system does not detach the console, like it promptly did during the
bisection in every case.  I could not reproduce this (preferred)
behavior with self-built kernels yet (not even with 2.6.18, which also
worked fine when built by Debian, if I remember correctly.  I'll
continue investigating this issue.
-- 
Thanks,
Feri.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
  2012-10-02 16:49       ` Ferenc Wagner
@ 2012-10-02 17:06         ` Michael Chan
  2012-10-02 18:49           ` Ferenc Wagner
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Chan @ 2012-10-02 17:06 UTC (permalink / raw)
  To: Ferenc Wagner
  Cc: netdev, Matt Carlson, Grant Likely, Rob Herring, linux-kernel

On Tue, 2012-10-02 at 18:49 +0200, Ferenc Wagner wrote:
> Going into the opposite direction: I found that Linux 3.6 does not
> permanently break the SoL console on upping eth0!  I'll try to find
> the
> commit which (sort of) fixed it.

These are the likely fixes:
> 
commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db 
Author: Matt Carlson <mcarlson@broadcom.com>
Date: Mon Nov 28 09:41:03 2011 +0000

tg3: Fix TSO CAP for 5704 devs w / ASF enabled

commit 7196cd6c3d4863000ef88b09f34d6dd75610ec3e
Author: Matt Carlson <mcarlson@broadcom.com>
Date: Thu May 19 16:02:44 2011 +0000

tg3: Add braces around 5906 workaround.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
  2012-10-02 17:06         ` Michael Chan
@ 2012-10-02 18:49           ` Ferenc Wagner
  2012-10-02 19:06             ` Michael Tokarev
  0 siblings, 1 reply; 11+ messages in thread
From: Ferenc Wagner @ 2012-10-02 18:49 UTC (permalink / raw)
  To: Michael Chan
  Cc: netdev, Matt Carlson, Grant Likely, Rob Herring, linux-kernel,
	wferi

"Michael Chan" <mchan@broadcom.com> writes:

> On Tue, 2012-10-02 at 18:49 +0200, Ferenc Wagner wrote:
>
>> Going into the opposite direction: I found that Linux 3.6 does not
>> permanently break the SoL console on upping eth0!  I'll try to find
>> the commit which (sort of) fixed it.
>
> These are the likely fixes:
>
> commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db 
> Author: Matt Carlson <mcarlson@broadcom.com>
> Date: Mon Nov 28 09:41:03 2011 +0000
>
> tg3: Fix TSO CAP for 5704 devs w / ASF enabled

You are exactly right: cf9ecf4b fixed the premanent SoL breakage
introduced by dabc5c67.  Looks like ASF utilizes similar technology to
that of the HS20 BMC.  Thanks for the tip, it greatly reduced our CPU
wear. :)  It's a pity ethtool -k did not give a hint.  Do you think it's
possible to work around in 3.2 by eg. fiddling some ethtool setting?
-- 
Thanks,
Feri.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
  2012-10-02 18:49           ` Ferenc Wagner
@ 2012-10-02 19:06             ` Michael Tokarev
  2012-10-03  0:17               ` Ben Hutchings
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Tokarev @ 2012-10-02 19:06 UTC (permalink / raw)
  To: Ferenc Wagner
  Cc: Michael Chan, netdev, Matt Carlson, Grant Likely, Rob Herring,
	linux-kernel

On 02.10.2012 22:49, Ferenc Wagner wrote:
> "Michael Chan" <mchan@broadcom.com> writes:
>> These are the likely fixes:
>>
>> commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db 
>> Author: Matt Carlson <mcarlson@broadcom.com>
>> Date: Mon Nov 28 09:41:03 2011 +0000
>>
>> tg3: Fix TSO CAP for 5704 devs w / ASF enabled
> 
> You are exactly right: cf9ecf4b fixed the premanent SoL breakage
> introduced by dabc5c67.  Looks like ASF utilizes similar technology to
> that of the HS20 BMC.  Thanks for the tip, it greatly reduced our CPU
> wear. :)  It's a pity ethtool -k did not give a hint.  Do you think it's
> possible to work around in 3.2 by eg. fiddling some ethtool setting?

Maybe it's better to push this commit to -stable instead? (the commit
that broke things is part of 3.0 kernel so all current 3.x -stable
kernels are affected)

(Besides, that commit "This patch fixes the problem by revisiting and
reevaluating the decision after tg3_get_eeprom_hw_cfg() is called." -
merely copies a somewhat "twisted" chunk of code into another place,
which does not look optimal)

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
  2012-10-02 19:06             ` Michael Tokarev
@ 2012-10-03  0:17               ` Ben Hutchings
  2012-10-03  0:47                 ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Hutchings @ 2012-10-03  0:17 UTC (permalink / raw)
  To: Michael Tokarev, David Miller
  Cc: Ferenc Wagner, Michael Chan, netdev, Matt Carlson, Grant Likely,
	Rob Herring, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 1319 bytes --]

On Tue, 2012-10-02 at 23:06 +0400, Michael Tokarev wrote:
> On 02.10.2012 22:49, Ferenc Wagner wrote:
> > "Michael Chan" <mchan@broadcom.com> writes:
> >> These are the likely fixes:
> >>
> >> commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db 
> >> Author: Matt Carlson <mcarlson@broadcom.com>
> >> Date: Mon Nov 28 09:41:03 2011 +0000
> >>
> >> tg3: Fix TSO CAP for 5704 devs w / ASF enabled
> > 
> > You are exactly right: cf9ecf4b fixed the premanent SoL breakage
> > introduced by dabc5c67.  Looks like ASF utilizes similar technology to
> > that of the HS20 BMC.  Thanks for the tip, it greatly reduced our CPU
> > wear. :)  It's a pity ethtool -k did not give a hint.  Do you think it's
> > possible to work around in 3.2 by eg. fiddling some ethtool setting?
> 
> Maybe it's better to push this commit to -stable instead?

But that will take time, so I imagine a temporary workaround would be
useful to Ferenc.

> (the commit
> that broke things is part of 3.0 kernel so all current 3.x -stable
> kernels are affected)
[...]

The fix went into 3.3, so only 3.0 and 3.2 need it.

David, please can you include the above commit in your next batches for
these stable series?

Ben.

-- 
Ben Hutchings
For every complex problem
there is a solution that is simple, neat, and wrong.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL
  2012-10-03  0:17               ` Ben Hutchings
@ 2012-10-03  0:47                 ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2012-10-03  0:47 UTC (permalink / raw)
  To: ben
  Cc: mjt, wferi, mchan, netdev, mcarlson, grant.likely, rob.herring,
	linux-kernel, stable

From: Ben Hutchings <ben@decadent.org.uk>
Date: Wed, 03 Oct 2012 01:17:12 +0100

> On Tue, 2012-10-02 at 23:06 +0400, Michael Tokarev wrote:
>> On 02.10.2012 22:49, Ferenc Wagner wrote:
>> > "Michael Chan" <mchan@broadcom.com> writes:
>> >> These are the likely fixes:
>> >>
>> >> commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db 
>> >> Author: Matt Carlson <mcarlson@broadcom.com>
>> >> Date: Mon Nov 28 09:41:03 2011 +0000
>> >>
>> >> tg3: Fix TSO CAP for 5704 devs w / ASF enabled
 ...
> The fix went into 3.3, so only 3.0 and 3.2 need it.
> 
> David, please can you include the above commit in your next batches for
> these stable series?

Done.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-10-03  0:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-28 20:45 tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL Ferenc Wagner
2012-10-01  9:33 ` Michael Chan
2012-10-02  9:31   ` Ferenc Wagner
2012-10-02 12:07   ` Ferenc Wagner
2012-10-02 15:03     ` Michael Chan
2012-10-02 16:49       ` Ferenc Wagner
2012-10-02 17:06         ` Michael Chan
2012-10-02 18:49           ` Ferenc Wagner
2012-10-02 19:06             ` Michael Tokarev
2012-10-03  0:17               ` Ben Hutchings
2012-10-03  0:47                 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).