* Strange tg3 regression with UMP fw. link reporting
@ 2008-08-08 7:35 Benjamin Herrenschmidt
2008-08-08 8:58 ` Segher Boessenkool
0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2008-08-08 7:35 UTC (permalink / raw)
To: mcarlson; +Cc: Michael Chan, netdev, Nathan Lynch, linuxppc-dev list
Hi Matt !
The IBM PowerStation is a machine similar in design to our JS21 blades,
which uses an HT2000 bridge with it's dual 5780 TG3's.
I started investigating recently a problem where with recent kernels,
the machine will appear to "freeze" every second or two for a second
or two. The "freeze" would affect pretty much everything.
We noticed that it disappears when downing eth0, and finally bisected
it down to commit 7c5026aa9b81dd45df8d3f4e0be73e485976a8b6 "Add link
state reporting to UMP firmware".
I don't know yet for sure what happens, but a quick look at the commit
seems to show that the driver synchronously spin-waits for up to 2.5ms
with a lock held multiple times from a timer interrupt. I don't know
yet if that's where the problem comes from, or if it's an issue with
the FW going nuts and the chip hogging the machine's bus or whatever
else, I'll have to do some more experiments on monday, but in any case,
that spin is really not nice.
The relevant pieces of lspci and dmesg are:
0001:00:01.0 PCI bridge: Broadcom HT2000 PCI-X bridge (rev b0)
0001:00:02.0 PCI bridge: Broadcom HT2000 PCI-X bridge (rev b0)
0001:00:03.0 PCI bridge: Broadcom HT2000 PCI-Express bridge (rev b0)
0001:00:04.0 PCI bridge: Broadcom HT2000 PCI-Express bridge (rev b0)
0001:00:05.0 PCI bridge: Broadcom HT2000 PCI-Express bridge (rev b0)
0001:00:06.0 PCI bridge: Broadcom HT2000 PCI-Express bridge (rev b0)
0001:02:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5780 Gigabit Ethernet (rev 10)
0001:02:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5780 Gigabit Ethernet (rev 10)
tg3.c:v3.91 (April 18, 2008)
tg3 0001:02:04.0: enabling device (0140 -> 0142)
eth0: Tigon3 [partno(BCM95780) rev 8100 PHY(5780)] (PCIX:133MHz:64-bit) 10/100/1000Base-T Ethernet 00:14:5e:9e:01:82
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[76144000] dma_mask[40-bit]
tg3 0001:02:04.1: enabling device (0140 -> 0142)
eth1: Tigon3 [partno(BCM95780) rev 8100 PHY(5780)] (PCIX:133MHz:64-bit) 10/100/1000Base-T Ethernet 00:14:5e:9e:01:83
eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1]
eth1: dma_rwctrl[76144000] dma_mask[40-bit]
Any help sorting that out would be much appreciated !
Cheers,
Ben.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Strange tg3 regression with UMP fw. link reporting
2008-08-08 7:35 Strange tg3 regression with UMP fw. link reporting Benjamin Herrenschmidt
@ 2008-08-08 8:58 ` Segher Boessenkool
2008-08-08 9:18 ` Benjamin Herrenschmidt
2008-08-08 9:21 ` Arnd Bergmann
0 siblings, 2 replies; 9+ messages in thread
From: Segher Boessenkool @ 2008-08-08 8:58 UTC (permalink / raw)
To: benh; +Cc: mcarlson, linuxppc-dev list, netdev, Nathan Lynch, Michael Chan
> I don't know yet for sure what happens, but a quick look at the commit
> seems to show that the driver synchronously spin-waits for up to 2.5ms
That's what the comment says, but the code says 2.5 _seconds_:
+ /* Wait for up to 2.5 milliseconds */
+ for (i = 0; i < 250000; i++) {
+ if (!(tr32(GRC_RX_CPU_EVENT) & GRC_RX_CPU_DRIVER_EVENT))
+ break;
+ udelay(10);
+ }
(not that milliseconds wouldn't be bad already...)
Segher
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Strange tg3 regression with UMP fw. link reporting
2008-08-08 8:58 ` Segher Boessenkool
@ 2008-08-08 9:18 ` Benjamin Herrenschmidt
2008-08-08 18:43 ` Matt Carlson
2008-08-08 21:25 ` Nathan Lynch
2008-08-08 9:21 ` Arnd Bergmann
1 sibling, 2 replies; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2008-08-08 9:18 UTC (permalink / raw)
To: Segher Boessenkool
Cc: linuxppc-dev list, Nathan Lynch, mcarlson, Michael Chan, netdev
On Fri, 2008-08-08 at 10:58 +0200, Segher Boessenkool wrote:
> > I don't know yet for sure what happens, but a quick look at the commit
> > seems to show that the driver synchronously spin-waits for up to 2.5ms
>
> That's what the comment says, but the code says 2.5 _seconds_:
>
> + /* Wait for up to 2.5 milliseconds */
> + for (i = 0; i < 250000; i++) {
> + if (!(tr32(GRC_RX_CPU_EVENT) & GRC_RX_CPU_DRIVER_EVENT))
> + break;
> + udelay(10);
> + }
>
> (not that milliseconds wouldn't be bad already...)
Right, indeed. I think we have a good candidate for the problem :-) I'll
verify that on monday. Now, that leads to two questions:
- What such a synchronous and potentially horribly slow code is
going in a locked section or a timer interrupts ? Ie, the link
watch should probably move to a workqueue if that is to remain,
or the code turned into a state machine that periodically check for
events, or whatever is more sane than the above.
- The code should at least display some error and do something sane in
case of timeout such as disabling the new UMP feature instead of
repeatedly looping ...
- If this is indeed our problem (timing out in the code above), why is
our firmware not emitting the requested event -> maybe the PowerStations
need a tg3 firmware update.
Matt, what's your take on this ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Strange tg3 regression with UMP fw. link reporting
2008-08-08 8:58 ` Segher Boessenkool
2008-08-08 9:18 ` Benjamin Herrenschmidt
@ 2008-08-08 9:21 ` Arnd Bergmann
1 sibling, 0 replies; 9+ messages in thread
From: Arnd Bergmann @ 2008-08-08 9:21 UTC (permalink / raw)
To: linuxppc-dev
Cc: Segher Boessenkool, benh, Nathan Lynch, mcarlson, Michael Chan,
netdev
On Friday 08 August 2008, Segher Boessenkool wrote:
> > I don't know yet for sure what happens, but a quick look at the commit
> > seems to show that the driver synchronously spin-waits for up to 2.5ms
>
> That's what the comment says, but the code says 2.5 _seconds_:
>
> + /* Wait for up to 2.5 milliseconds */
> + for (i = 0; i < 250000; i++) {
> + if (!(tr32(GRC_RX_CPU_EVENT) & GRC_RX_CPU_DRIVER_EVENT))
> + break;
> + udelay(10);
> + }
>
> (not that milliseconds wouldn't be bad already...)
It can potentially be even much longer, because each udelay will wait
for *more* than 10 microseconds, and tr32() is an mmio read that takes
additional time, possibly in the order of microseconds as well.
The correct way to implement a timeout like this would be to use
time_before() in the condition, aside from better not doing a busy-loop
in the first place.
Arnd <><
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Strange tg3 regression with UMP fw. link reporting
2008-08-08 18:43 ` Matt Carlson
@ 2008-08-08 15:20 ` Michael Chan
2008-08-08 22:05 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 9+ messages in thread
From: Michael Chan @ 2008-08-08 15:20 UTC (permalink / raw)
To: Matthew Carlson
Cc: Benjamin Herrenschmidt, Segher Boessenkool, linuxppc-dev list,
netdev, Nathan Lynch
On Fri, 2008-08-08 at 11:43 -0700, Matthew Carlson wrote:
> Segher is right. The code should be 2.5 milliseconds but is actually
> much longer. This fix is actually already in my patch queue and needs
> to be sent upstream.
>
Matt, I think we can optimize this a little more. The heart beat event
is sent every second and the link event is sent whenever the link
changes. the spin wait is only needed in rare cases when the link event
is closer than 2.5 msec from the heart beat event.
We can save the jiffies time stamp right after sending each event. Next
time we need to send a heart beat or a link event, we check the jiffies
with the stored time stamp. If they are less than 2.5 msec apart and we
haven't received the ACK from the previous event yet, we just wait up to
the remainder. It should be very rare that we have to wait.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Strange tg3 regression with UMP fw. link reporting
2008-08-08 9:18 ` Benjamin Herrenschmidt
@ 2008-08-08 18:43 ` Matt Carlson
2008-08-08 15:20 ` Michael Chan
2008-08-08 22:05 ` Benjamin Herrenschmidt
2008-08-08 21:25 ` Nathan Lynch
1 sibling, 2 replies; 9+ messages in thread
From: Matt Carlson @ 2008-08-08 18:43 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Segher Boessenkool, mcarlson, linuxppc-dev list, netdev,
Nathan Lynch, Michael Chan
On Fri, Aug 08, 2008 at 07:18:31PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2008-08-08 at 10:58 +0200, Segher Boessenkool wrote:
> > > I don't know yet for sure what happens, but a quick look at the commit
> > > seems to show that the driver synchronously spin-waits for up to 2.5ms
> >
> > That's what the comment says, but the code says 2.5 _seconds_:
> >
> > + /* Wait for up to 2.5 milliseconds */
> > + for (i = 0; i < 250000; i++) {
> > + if (!(tr32(GRC_RX_CPU_EVENT) & GRC_RX_CPU_DRIVER_EVENT))
> > + break;
> > + udelay(10);
> > + }
> >
> > (not that milliseconds wouldn't be bad already...)
>
> Right, indeed. I think we have a good candidate for the problem :-) I'll
> verify that on monday. Now, that leads to two questions:
>
> - What such a synchronous and potentially horribly slow code is
> going in a locked section or a timer interrupts ? Ie, the link
> watch should probably move to a workqueue if that is to remain,
> or the code turned into a state machine that periodically check for
> events, or whatever is more sane than the above.
>
> - The code should at least display some error and do something sane in
> case of timeout such as disabling the new UMP feature instead of
> repeatedly looping ...
>
> - If this is indeed our problem (timing out in the code above), why is
> our firmware not emitting the requested event -> maybe the PowerStations
> need a tg3 firmware update.
>
> Matt, what's your take on this ?
Segher is right. The code should be 2.5 milliseconds but is actually
much longer. This fix is actually already in my patch queue and needs
to be sent upstream.
We really shouldn't be displaying any error messages in the event of a
timeout though. Earlier versions of the UMP firmware did not support
the link update interface. The best thing the driver can do for all
cases is give the firmware a chance to service the event but continue
as if the event were serviced if it did not get an explicit ACK.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Strange tg3 regression with UMP fw. link reporting
2008-08-08 22:05 ` Benjamin Herrenschmidt
@ 2008-08-08 20:20 ` Michael Chan
0 siblings, 0 replies; 9+ messages in thread
From: Michael Chan @ 2008-08-08 20:20 UTC (permalink / raw)
To: benh@kernel.crashing.org
Cc: Matthew Carlson, Segher Boessenkool, linuxppc-dev list, netdev,
Nathan Lynch
On Fri, 2008-08-08 at 15:05 -0700, Benjamin Herrenschmidt wrote:
> On Fri, 2008-08-08 at 11:43 -0700, Matt Carlson wrote:
> > We really shouldn't be displaying any error messages in the event of a
> > timeout though. Earlier versions of the UMP firmware did not support
> > the link update interface. The best thing the driver can do for all
> > cases is give the firmware a chance to service the event but continue
> > as if the event were serviced if it did not get an explicit ACK.
>
> But that means that the driver will continuously spin 2.5ms every
> timer tick or so ? Or do I miss something ? Could it be possible to
> count timeouts and if after N attempts at an ack, they all timed out,
> disable the feature completely ?
Please see my other email. Matt and I will fix it in a way to minimize
the spin as much as possible regardless of firmware version.
>
> Or is there a way to test the version of the firmware ?
>
> In any case, the fix should go into -stable as the problem is hurting
> 2.6.26. Also, should we consider updating the tg3 firmware on those
> machines ?
>
Right, we'll take care of -stable as well. Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Strange tg3 regression with UMP fw. link reporting
2008-08-08 9:18 ` Benjamin Herrenschmidt
2008-08-08 18:43 ` Matt Carlson
@ 2008-08-08 21:25 ` Nathan Lynch
1 sibling, 0 replies; 9+ messages in thread
From: Nathan Lynch @ 2008-08-08 21:25 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Segher Boessenkool, mcarlson, linuxppc-dev list, netdev,
Michael Chan
Benjamin Herrenschmidt wrote:
> On Fri, 2008-08-08 at 10:58 +0200, Segher Boessenkool wrote:
> > > I don't know yet for sure what happens, but a quick look at the commit
> > > seems to show that the driver synchronously spin-waits for up to 2.5ms
> >
> > That's what the comment says, but the code says 2.5 _seconds_:
> >
> > + /* Wait for up to 2.5 milliseconds */
> > + for (i = 0; i < 250000; i++) {
> > + if (!(tr32(GRC_RX_CPU_EVENT) & GRC_RX_CPU_DRIVER_EVENT))
> > + break;
> > + udelay(10);
> > + }
> >
> > (not that milliseconds wouldn't be bad already...)
>
> Right, indeed. I think we have a good candidate for the problem :-)
I put a printk in tg3_wait_for_event_ack and I can confirm that it's
timing out frequently (every few seconds) on my system. Changing the
number of loop iterations to 250 alleviates the problem.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Strange tg3 regression with UMP fw. link reporting
2008-08-08 18:43 ` Matt Carlson
2008-08-08 15:20 ` Michael Chan
@ 2008-08-08 22:05 ` Benjamin Herrenschmidt
2008-08-08 20:20 ` Michael Chan
1 sibling, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2008-08-08 22:05 UTC (permalink / raw)
To: Matt Carlson; +Cc: linuxppc-dev list, Michael Chan, Nathan Lynch, netdev
On Fri, 2008-08-08 at 11:43 -0700, Matt Carlson wrote:
>
> Segher is right. The code should be 2.5 milliseconds but is actually
> much longer. This fix is actually already in my patch queue and needs
> to be sent upstream.
>
> We really shouldn't be displaying any error messages in the event of a
> timeout though. Earlier versions of the UMP firmware did not support
> the link update interface. The best thing the driver can do for all
> cases is give the firmware a chance to service the event but continue
> as if the event were serviced if it did not get an explicit ACK.
But that means that the driver will continuously spin 2.5ms every
timer tick or so ? Or do I miss something ? Could it be possible to
count timeouts and if after N attempts at an ack, they all timed out,
disable the feature completely ?
Or is there a way to test the version of the firmware ?
In any case, the fix should go into -stable as the problem is hurting
2.6.26. Also, should we consider updating the tg3 firmware on those
machines ?
Ben.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-08-09 0:21 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-08 7:35 Strange tg3 regression with UMP fw. link reporting Benjamin Herrenschmidt
2008-08-08 8:58 ` Segher Boessenkool
2008-08-08 9:18 ` Benjamin Herrenschmidt
2008-08-08 18:43 ` Matt Carlson
2008-08-08 15:20 ` Michael Chan
2008-08-08 22:05 ` Benjamin Herrenschmidt
2008-08-08 20:20 ` Michael Chan
2008-08-08 21:25 ` Nathan Lynch
2008-08-08 9:21 ` Arnd Bergmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).