netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
       [not found] <200505101955.j4AJtX9x032464@shell0.pdx.osdl.net>
@ 2005-05-16  4:06 ` Jeff Garzik
  2005-05-16  5:08   ` Grant Grundler
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2005-05-16  4:06 UTC (permalink / raw)
  To: akpm; +Cc: T-Bone, grundler, varenet, Linux Kernel, Netdev

akpm@osdl.org wrote:
> +
> +					/* flush posted writes */
> +					ioread32(ioaddr + CSR15);
> +
> +					/* Sect 3.10.3 in DP83840A.pdf (p39) */
> +					udelay(500);
> +
> +					/* Section 4.2 in DP83840A.pdf (p43) */
> +					/* and IEEE 802.3 "22.2.4.1.1 Reset" */
> +					while (timeout-- &&
> +						(tulip_mdio_read (dev, phy_num, MII_BMCR) & BMCR_RESET))
> +						udelay(100);


Note that while the patch creates the correct behavior, the delays above 
occur inside spin_lock_irqsave() and/or timer context.

I have been to get HP to fix this patch's delay problem for -years-.

	Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-16  4:06 ` patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree Jeff Garzik
@ 2005-05-16  5:08   ` Grant Grundler
  2005-05-16 16:46     ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: Grant Grundler @ 2005-05-16  5:08 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: akpm, T-Bone, grundler, varenet, Linux Kernel, Netdev

On Mon, May 16, 2005 at 12:06:48AM -0400, Jeff Garzik wrote:
> Note that while the patch creates the correct behavior, the delays above 
> occur inside spin_lock_irqsave() and/or timer context.

Yes. Agreed.  So what?

If tulip phy init code is hit so often *and* seeing the worst case
2ms delay that it's a problem, the delay doesn't matter.
Something else is fundamentally wrong.

Fixing code that doesn't comply with published specs and has
been proven to not work on at least 3 archs seems a bit more
important than an occasional < 1ms (normal case) delay.

> I have been to get HP to fix this patch's delay problem for -years-.

I was "encouraged" to rewrite the tulip phy init sequence in 2002
to use schedule_timeout() and heard a claim it would be trivial.

I looked into it and concluded it was NOT trivial.
I approached Jeff at OLS2002 (or 2003) and explained why I thought it
was NOT trivial.  Didn't get a response that resolved any of the
concerns I raised. I'd be willing to take another look at fresh
ideas once all of the tulip patches I have ouststanding go in.

If the original suggestion really were trivial, we wouldn't be having
this conversation now.  Some high school student would have taken care
of it by now, right?

thanks,
grant

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-16  5:08   ` Grant Grundler
@ 2005-05-16 16:46     ` Jeff Garzik
  2005-05-16 22:26       ` Grant Grundler
  2005-05-21 22:39       ` Francois Romieu
  0 siblings, 2 replies; 14+ messages in thread
From: Jeff Garzik @ 2005-05-16 16:46 UTC (permalink / raw)
  To: Grant Grundler; +Cc: akpm, T-Bone, varenet, Linux Kernel, Netdev

Simply ensure that tulip_select_media() is always called from a process 
context. Then can you delay all you want.  Several of the calls are 
already this way, so that leaves two cases:

1) called from timer context, from the media poll timer

2) called from spin_lock_irqsave() context, in the ->tx_timeout hook.

The first case can be fixed by moved all the timer code to a workqueue. 
  Then when the existing timer fires, kick the workqueue.

The second case can be fixed by kicking the workqueue upon tx_timeout 
(which is the reason why I did not suggest queue_delayed_work() use).

See, it's not rocket science :)

	Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-16 16:46     ` Jeff Garzik
@ 2005-05-16 22:26       ` Grant Grundler
  2005-05-20 18:58         ` Jeff Garzik
  2005-05-21 22:39       ` Francois Romieu
  1 sibling, 1 reply; 14+ messages in thread
From: Grant Grundler @ 2005-05-16 22:26 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Grant Grundler, akpm, T-Bone, varenet, Linux Kernel, Netdev

On Mon, May 16, 2005 at 12:46:09PM -0400, Jeff Garzik wrote:
> Simply ensure that tulip_select_media() is always called from a process 
> context. Then can you delay all you want.  Several of the calls are 
> already this way, so that leaves two cases:
> 
> 1) called from timer context, from the media poll timer
> 
> 2) called from spin_lock_irqsave() context, in the ->tx_timeout hook.
> 
> The first case can be fixed by moved all the timer code to a workqueue. 
> Then when the existing timer fires, kick the workqueue.
> 
> The second case can be fixed by kicking the workqueue upon tx_timeout 
> (which is the reason why I did not suggest queue_delayed_work() use).

Thanks - the above guidance has much more detail than you offered before
and is much more useful.
Too bad that schedule_timeout() was the only option at the time. :^(

And I apologize I don't recall what the issues were with schedule_timeout().
I suspect they will rear their ugly head with the workqueue
implementation as well. But if they don't, that will be great.

> See, it's not rocket science :)

Well, then it's a great opportunity for someone interested in hacking
NIC drivers to cut their teeth on. :^)

After three years of using/maintaining the (trivial) tulip patch
in parisc-linux tree (and shipped with RH/SuSe ia64 releases),
I don't recall anyone complaining that udelays in tulip phy reset
caused them problems. Sorry, I'm unmotivated to revisit this.
Convince someone else to make tulip to use workqueues and I'll
resubmit a clean patch on top of that for the phy init sequences.

thanks,
grant

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-16 22:26       ` Grant Grundler
@ 2005-05-20 18:58         ` Jeff Garzik
  2005-05-20 19:15           ` Francois Romieu
  2005-05-20 21:12           ` Grant Grundler
  0 siblings, 2 replies; 14+ messages in thread
From: Jeff Garzik @ 2005-05-20 18:58 UTC (permalink / raw)
  To: Grant Grundler, akpm; +Cc: T-Bone, varenet, Linux Kernel, Netdev

Grant Grundler wrote:
> After three years of using/maintaining the (trivial) tulip patch
> in parisc-linux tree (and shipped with RH/SuSe ia64 releases),
> I don't recall anyone complaining that udelays in tulip phy reset
> caused them problems. Sorry, I'm unmotivated to revisit this.
> Convince someone else to make tulip to use workqueues and I'll
> resubmit a clean patch on top of that for the phy init sequences.


Long delays are unacceptable in new drivers, and we are working to 
remove them from older drivers.  Lack of complaints is irrelevant -- its 
a design requirement of all drivers.

Ingo and the real-time crowd are fighting against every delay, because 
every delay causes a spin, a blip in latency, an increase in CPU usage, 
and a complete stoppage of ALL work on a uniprocessor machine.

Your patch is not a special case.  We have been communicating this 
message on udelay/mdelay for -years-.  All your patch [as-is] does is 
cause more work for someone else.

This also presents a problem that Andrew points out on occasion:
what happens when a patch is useful, but the patch author isn't (for 
whatever reason) doing the legwork necessary to get it into the mainline 
kernel?  We certainly DON'T want to lose this patch, as the changes are 
useful.

	Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-20 18:58         ` Jeff Garzik
@ 2005-05-20 19:15           ` Francois Romieu
  2005-05-20 21:12           ` Grant Grundler
  1 sibling, 0 replies; 14+ messages in thread
From: Francois Romieu @ 2005-05-20 19:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Grant Grundler, akpm, T-Bone, varenet, Linux Kernel, Netdev

Jeff Garzik <jgarzik@pobox.com> :
[snip]

I should start chewing it later today: it just reached the top of the
pile (right before the "see if Chelsio publish something within 5 days"
item).

--
Ueimor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-20 18:58         ` Jeff Garzik
  2005-05-20 19:15           ` Francois Romieu
@ 2005-05-20 21:12           ` Grant Grundler
  2005-05-20 21:34             ` Jeff Garzik
  1 sibling, 1 reply; 14+ messages in thread
From: Grant Grundler @ 2005-05-20 21:12 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Grant Grundler, akpm, T-Bone, varenet, Linux Kernel, Netdev

On Fri, May 20, 2005 at 02:58:58PM -0400, Jeff Garzik wrote:
> Grant Grundler wrote:
> >After three years of using/maintaining the (trivial) tulip patch
> >in parisc-linux tree (and shipped with RH/SuSe ia64 releases),
> >I don't recall anyone complaining that udelays in tulip phy reset
> >caused them problems. Sorry, I'm unmotivated to revisit this.
> >Convince someone else to make tulip to use workqueues and I'll
> >resubmit a clean patch on top of that for the phy init sequences.
> 
> 
> Long delays are unacceptable in new drivers,

Agreed. But that didn't stop tg3 from having a 1.5 *SECOND*
spin delay during fiber phy init with interrupts off.
That is certainly a much newer driver than tulip.

(It's not obvious to me by code inspection which context
 tg3_setup_fiber_by_hand() gets called from now.)

> and we are working to remove them from older drivers.
>
> Lack of complaints is irrelevant -- its 
> a design requirement of all drivers.

It's totally relevant.
Complaints (bug reports) and patches drive change.  That's how both
commercial *and* non-commercial developers prioritize.

"Ingo and the real-time crowd" are a good example of a change
in priorities driven by non-commercial users.

> Ingo and the real-time crowd are fighting against every delay, because 
> every delay causes a spin, a blip in latency, an increase in CPU usage, 
> and a complete stoppage of ALL work on a uniprocessor machine.

Understood. But they were not the first ones. Donald Becker has a fairly
well known digust for use of CPU spin loops.  But we can't eliminate
*all* udelay just becuase of the way HW is designed and has bugs.
I think it would be reasonable to get rid of many udelay calls
(replace them with PCI read delay loops like Donald has advocated)
and get the rest into a context where it matters less.
I have no objection to people fixing those sorts of issues.


> Your patch is not a special case.  We have been communicating this 
> message on udelay/mdelay for -years-.  All your patch [as-is] does is 
> cause more work for someone else.

Not true. This patch brings the tulip driver into compliance with
published specs and makes the driver functional for parisc/mips/ia64 users.

> This also presents a problem that Andrew points out on occasion:
> what happens when a patch is useful, but the patch author isn't (for 
> whatever reason) doing the legwork necessary to get it into the mainline 
> kernel?

The "whatever reason" is clearly decided by the mainline kernel maintainer.
If we treat other people's labor as "free", then the right answer is
to drop the patch and wait for some subset of "we" to do the extra legwork.

Several people care if tulip phy init works right. OTOH, I'm only aware of
one person who specifically cares if tulip is holding the CPU hostage for
1 or 2 ms during the occasional phy init.

Is being a technology purist more important than moving forward with
what people care about?

> We certainly DON'T want to lose this patch, as the changes are 
> useful.

I think we also agree debugging the problem and providing a patch
that works is a worthy contribution.

We won't lose it. I maintain it in CVS on cvs.parisc-linux.org.

thanks,
grant

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-20 21:12           ` Grant Grundler
@ 2005-05-20 21:34             ` Jeff Garzik
  2005-05-21  0:51               ` Grant Grundler
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2005-05-20 21:34 UTC (permalink / raw)
  To: Grant Grundler; +Cc: akpm, T-Bone, varenet, Linux Kernel, Netdev

Grant Grundler wrote:
> On Fri, May 20, 2005 at 02:58:58PM -0400, Jeff Garzik wrote:
> 
>>Grant Grundler wrote:
>>
>>>After three years of using/maintaining the (trivial) tulip patch
>>>in parisc-linux tree (and shipped with RH/SuSe ia64 releases),
>>>I don't recall anyone complaining that udelays in tulip phy reset
>>>caused them problems. Sorry, I'm unmotivated to revisit this.
>>>Convince someone else to make tulip to use workqueues and I'll
>>>resubmit a clean patch on top of that for the phy init sequences.
>>
>>
>>Long delays are unacceptable in new drivers,
> 
> 
> Agreed. But that didn't stop tg3 from having a 1.5 *SECOND*
> spin delay during fiber phy init with interrupts off.
> That is certainly a much newer driver than tulip.
> 
> (It's not obvious to me by code inspection which context
>  tg3_setup_fiber_by_hand() gets called from now.)

Yes, tg3 is awful in this regard.  I have made a bit of progress by 
moving some of the stuff into a workqueue.

I got multiple reports from embedded tg3 platform developers, who had to 
really muck up the driver to fix the delay issue.  It was apparently 
_really_ noticeable on one embedded platform.


>>and we are working to remove them from older drivers.
>>
>>Lack of complaints is irrelevant -- its 
>>a design requirement of all drivers.
> 
> 
> It's totally relevant.
> Complaints (bug reports) and patches drive change.  That's how both
> commercial *and* non-commercial developers prioritize.
> 
> "Ingo and the real-time crowd" are a good example of a change
> in priorities driven by non-commercial users.

No, these are commercial users.  Embedded space really cares about this 
stuff.


>>Ingo and the real-time crowd are fighting against every delay, because 
>>every delay causes a spin, a blip in latency, an increase in CPU usage, 
>>and a complete stoppage of ALL work on a uniprocessor machine.
> 
> 
> Understood. But they were not the first ones. Donald Becker has a fairly
> well known digust for use of CPU spin loops.  But we can't eliminate
> *all* udelay just becuase of the way HW is designed and has bugs.
> I think it would be reasonable to get rid of many udelay calls
> (replace them with PCI read delay loops like Donald has advocated)
> and get the rest into a context where it matters less.
> I have no objection to people fixing those sorts of issues.

If you recognize the issue, you should object to new changes adding new 
issues!


>>Your patch is not a special case.  We have been communicating this 
>>message on udelay/mdelay for -years-.  All your patch [as-is] does is 
>>cause more work for someone else.
> 
> 
> Not true. This patch brings the tulip driver into compliance with
> published specs and makes the driver functional for parisc/mips/ia64 users.

Ok, yes, it fixes the tulip issue AND causes work for others.


>>This also presents a problem that Andrew points out on occasion:
>>what happens when a patch is useful, but the patch author isn't (for 
>>whatever reason) doing the legwork necessary to get it into the mainline 
>>kernel?
> 
> 
> The "whatever reason" is clearly decided by the mainline kernel maintainer.
> If we treat other people's labor as "free", then the right answer is
> to drop the patch and wait for some subset of "we" to do the extra legwork.

Um, this is how all kernel development works :)

Maintainers are not supposed to merge patches into upstream, if they 
have flaws still remaining to be corrected.


> Several people care if tulip phy init works right. OTOH, I'm only aware of
> one person who specifically cares if tulip is holding the CPU hostage for
> 1 or 2 ms during the occasional phy init.
> 
> Is being a technology purist more important than moving forward with
> what people care about?

There will _always_ be ugly patches that get hardware going for some users.

Tons of changes are kept outside the kernel until they are ready.  This 
is just one more example.

Merging code into the kernel is a big deal.  That code will have to be 
maintained for years, maybe decades.  "when in doubt, don't merge" is 
generally the right answer.

I don't want your patch to become an issue that embedded developers must 
address (like the tg3 example above), causing further patching and 
headache in that area.

	Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-20 21:34             ` Jeff Garzik
@ 2005-05-21  0:51               ` Grant Grundler
  0 siblings, 0 replies; 14+ messages in thread
From: Grant Grundler @ 2005-05-21  0:51 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Grant Grundler, akpm, T-Bone, varenet, Linux Kernel, Netdev

On Fri, May 20, 2005 at 05:34:26PM -0400, Jeff Garzik wrote:
> Yes, tg3 is awful in this regard.  I have made a bit of progress by 
> moving some of the stuff into a workqueue.

ah good! At some point I should re-install the fiber bcm5701 cards
I have and see if they work better then.

> >It's totally relevant.
> >Complaints (bug reports) and patches drive change.  That's how both
> >commercial *and* non-commercial developers prioritize.
> >
> >"Ingo and the real-time crowd" are a good example of a change
> >in priorities driven by non-commercial users.
> 
> No, these are commercial users.  Embedded space really cares about this 
> stuff.

Sorry - maybe VM support for VIVT cache is a better example?
(I'm thinking sparc/parisc support)

Almost everything I think of has some form of commercial interest
either directly (embedded, HA, infiniband, Server IO, ia64/amd64, etc)
or indirectly (sponsorships at Universities or sourceforge projects).

> >Understood. But they were not the first ones. Donald Becker has a fairly
> >well known digust for use of CPU spin loops.  But we can't eliminate
> >*all* udelay just becuase of the way HW is designed and has bugs.
> >I think it would be reasonable to get rid of many udelay calls
> >(replace them with PCI read delay loops like Donald has advocated)
> >and get the rest into a context where it matters less.
> >I have no objection to people fixing those sorts of issues.
> 
> If you recognize the issue, you should object to new changes adding new 
> issues!

And ignore fixes for existing issues.
"The enemy of good is perfect"

...
> >Not true. This patch brings the tulip driver into compliance with
> >published specs and makes the driver functional for parisc/mips/ia64 users.
> 
> Ok, yes, it fixes the tulip issue AND causes work for others.

But it's not new work.

In timer.c, tulip_restart_rxtx() is called right after tulip_select_media().
tulip_restart_rxtx() has a potential 1300 usec delay loop. About
the same as the code in tulip_select_media() needs.

I now appreciate much more that a RH employee (IIRC, John Linville)
submitted that patch indirectly on my behalf. I filed the original
bugzilla and provided the patch based on work by Charlie Brett.


> >The "whatever reason" is clearly decided by the mainline kernel maintainer.
> >If we treat other people's labor as "free", then the right answer is
> >to drop the patch and wait for some subset of "we" to do the extra legwork.
> 
> Um, this is how all kernel development works :)

*nod*

> Maintainers are not supposed to merge patches into upstream, if they 
> have flaws still remaining to be corrected.

Many patches are not this black and white.
All patches have to survive at least one subjective evaluation:
	Is the cure worse than the illness?


> >Several people care if tulip phy init works right. OTOH, I'm only aware of
> >one person who specifically cares if tulip is holding the CPU hostage for
> >1 or 2 ms during the occasional phy init.
> >
> >Is being a technology purist more important than moving forward with
> >what people care about?
> 
> There will _always_ be ugly patches that get hardware going for some users.

"ugly" is subjective.  Especially in the context of tulip phy init sequence.
Anyone who finds beauty in that chunk of code is truly twisted. :^)

(It just reflects the ugly mess of HW that too many vendors shipped.
I don't ever expect tulip driver to be "clean" here.)

> Tons of changes are kept outside the kernel until they are ready.  This 
> is just one more example.
> 
> Merging code into the kernel is a big deal.  That code will have to be 
> maintained for years, maybe decades.  "when in doubt, don't merge" is 
> generally the right answer.

Agreed.

> I don't want your patch to become an issue that embedded developers must 
> address (like the tg3 example above), causing further patching and 
> headache in that area.

Certainly. No problem with that.

BTW, which embedded developers still care about tulip driver?
One could use this patch as a litmus test to see how much they care. :^)

I was expecting they'd be using gigabit NICs, Wi-Fi, GSM,
or bluetooth these days.

thanks,
grant

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-16 16:46     ` Jeff Garzik
  2005-05-16 22:26       ` Grant Grundler
@ 2005-05-21 22:39       ` Francois Romieu
  2005-05-22  4:56         ` Grant Grundler
  2005-05-27  2:16         ` Jeff Garzik
  1 sibling, 2 replies; 14+ messages in thread
From: Francois Romieu @ 2005-05-21 22:39 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Grant Grundler, akpm, T-Bone, varenet, Linux Kernel, Netdev

Jeff Garzik <jgarzik@pobox.com> :
[tulip_media_select]
> 1) called from timer context, from the media poll timer
> 
> 2) called from spin_lock_irqsave() context, in the ->tx_timeout hook.
> 
> The first case can be fixed by moved all the timer code to a workqueue. 
>  Then when the existing timer fires, kick the workqueue.
> 
> The second case can be fixed by kicking the workqueue upon tx_timeout 
> (which is the reason why I did not suggest queue_delayed_work() use).

First try below. It only moves tulip_select_media() to process context.
The original patch (with s/udelay/msleep/ or such) is not included.

Patch applies/compiles against 2.6.12-rc4.

Comments/suggestions ?

diff -puN a/drivers/net/tulip/tulip.h~tulip-000 b/drivers/net/tulip/tulip.h
--- a/drivers/net/tulip/tulip.h~tulip-000	2005-05-21 22:30:08.133891956 +0200
+++ b/drivers/net/tulip/tulip.h	2005-05-22 00:18:45.598682086 +0200
@@ -46,6 +46,7 @@ struct tulip_chip_table {
 	int valid_intrs;	/* CSR7 interrupt enable settings */
 	int flags;
 	void (*media_timer) (unsigned long data);
+	void (*media_task) (void *);
 };
 
 
@@ -368,6 +369,7 @@ struct tulip_private {
 	unsigned int medialock:1;	/* Don't sense media type. */
 	unsigned int mediasense:1;	/* Media sensing in progress. */
 	unsigned int nway:1, nwayset:1;		/* 21143 internal NWay. */
+	unsigned int timeout_recovery: 1;
 	unsigned int csr0;	/* CSR0 setting. */
 	unsigned int csr6;	/* Current CSR6 control settings. */
 	unsigned char eeprom[EEPROM_SIZE];	/* Serial EEPROM contents. */
@@ -386,6 +388,7 @@ struct tulip_private {
 	void __iomem *base_addr;
 	int csr12_shadow;
 	int pad0;		/* Used for 8-byte alignment */
+	struct work_struct media_work;                          
 };
 
 
@@ -400,7 +403,7 @@ struct eeprom_fixup {
 
 /* 21142.c */
 extern u16 t21142_csr14[];
-void t21142_timer(unsigned long data);
+void t21142_media_task(void *data);
 void t21142_start_nway(struct net_device *dev);
 void t21142_lnk_change(struct net_device *dev, int csr5);
 
@@ -438,7 +441,7 @@ void pnic_lnk_change(struct net_device *
 void pnic_timer(unsigned long data);
 
 /* timer.c */
-void tulip_timer(unsigned long data);
+void tulip_media_task(void *data);
 void mxic_timer(unsigned long data);
 void comet_timer(unsigned long data);
 
@@ -490,4 +493,13 @@ static inline void tulip_restart_rxtx(st
 	tulip_start_rxtx(tp);
 }
 
+static inline void tulip_tx_timeout_complete(struct tulip_private *tp, void __iomem *ioaddr)
+{
+        /* Stop and restart the chip's Tx processes. */
+        tulip_restart_rxtx(tp);
+        /* Trigger an immediate transmit demand. */
+        iowrite32(0, ioaddr + CSR1);
+                 
+        tp->stats.tx_errors++;
+}
 #endif /* __NET_TULIP_H__ */

diff -puN a/drivers/net/tulip/tulip_core.c~tulip-000 b/drivers/net/tulip/tulip_core.c
--- a/drivers/net/tulip/tulip_core.c~tulip-000	2005-05-21 20:57:38.385293188 +0200
+++ b/drivers/net/tulip/tulip_core.c	2005-05-22 00:24:29.144946592 +0200
@@ -132,6 +132,16 @@ int tulip_debug = 1;
 #endif
 
 
+static struct workqueue_struct *ktulipd_workqueue;
+
+static void tulip_timer(unsigned long data)
+{
+	struct net_device *dev = (struct net_device *)data;
+	struct tulip_private *tp = netdev_priv(dev);
+
+	if (likely(netif_running(dev)))
+		queue_work(ktulipd_workqueue, &tp->media_work);
+}
 
 /*
  * This table use during operation for capabilities and media timer.
@@ -145,63 +155,65 @@ struct tulip_chip_table tulip_tbl[] = {
 
   /* DC21140 */
   { "Digital DS21140 Tulip", 128, 0x0001ebef,
-	HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM | HAS_PCI_MWI, tulip_timer },
+	HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM | HAS_PCI_MWI, tulip_timer,
+	tulip_media_task },
 
   /* DC21142, DC21143 */
   { "Digital DS21143 Tulip", 128, 0x0801fbff,
 	HAS_MII | HAS_MEDIA_TABLE | ALWAYS_CHECK_MII | HAS_ACPI | HAS_NWAY
-	| HAS_INTR_MITIGATION | HAS_PCI_MWI, t21142_timer },
+	| HAS_INTR_MITIGATION | HAS_PCI_MWI, tulip_timer, t21142_media_task },
 
   /* LC82C168 */
   { "Lite-On 82c168 PNIC", 256, 0x0001fbef,
-	HAS_MII | HAS_PNICNWAY, pnic_timer },
+	HAS_MII | HAS_PNICNWAY, pnic_timer, },
 
   /* MX98713 */
   { "Macronix 98713 PMAC", 128, 0x0001ebef,
-	HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM, mxic_timer },
+	HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM, mxic_timer, },
 
   /* MX98715 */
   { "Macronix 98715 PMAC", 256, 0x0001ebef,
-	HAS_MEDIA_TABLE, mxic_timer },
+	HAS_MEDIA_TABLE, mxic_timer, },
 
   /* MX98725 */
   { "Macronix 98725 PMAC", 256, 0x0001ebef,
-	HAS_MEDIA_TABLE, mxic_timer },
+	HAS_MEDIA_TABLE, mxic_timer, },
 
   /* AX88140 */
   { "ASIX AX88140", 128, 0x0001fbff,
 	HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM | MC_HASH_ONLY
-	| IS_ASIX, tulip_timer },
+	| IS_ASIX, tulip_timer, tulip_media_task },
 
   /* PNIC2 */
   { "Lite-On PNIC-II", 256, 0x0801fbff,
-	HAS_MII | HAS_NWAY | HAS_8023X | HAS_PCI_MWI, pnic2_timer },
+	HAS_MII | HAS_NWAY | HAS_8023X | HAS_PCI_MWI, pnic2_timer, },
 
   /* COMET */
   { "ADMtek Comet", 256, 0x0001abef,
-	HAS_MII | MC_HASH_ONLY | COMET_MAC_ADDR, comet_timer },
+	HAS_MII | MC_HASH_ONLY | COMET_MAC_ADDR, comet_timer, },
 
   /* COMPEX9881 */
   { "Compex 9881 PMAC", 128, 0x0001ebef,
-	HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM, mxic_timer },
+	HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM, mxic_timer, },
 
   /* I21145 */
   { "Intel DS21145 Tulip", 128, 0x0801fbff,
 	HAS_MII | HAS_MEDIA_TABLE | ALWAYS_CHECK_MII | HAS_ACPI
-	| HAS_NWAY | HAS_PCI_MWI, t21142_timer },
+	| HAS_NWAY | HAS_PCI_MWI, tulip_timer, t21142_media_task },
 
   /* DM910X */
   { "Davicom DM9102/DM9102A", 128, 0x0001ebef,
 	HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM | HAS_ACPI,
-	tulip_timer },
+	tulip_timer, tulip_media_task },
 
   /* RS7112 */
   { "Conexant LANfinity", 256, 0x0001ebef,
-	HAS_MII | HAS_ACPI, tulip_timer },
+	HAS_MII | HAS_ACPI, tulip_timer, tulip_media_task },
 
    /* ULi526X */
    { "ULi M5261/M5263", 128, 0x0001ebef,
-        HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM | HAS_ACPI, tulip_timer },
+	HAS_MII | HAS_MEDIA_TABLE | CSR12_IN_SROM | HAS_ACPI, tulip_timer,
+	tulip_media_task },
 };
 
 
@@ -265,6 +277,17 @@ static void set_rx_mode(struct net_devic
 static void poll_tulip(struct net_device *dev);
 #endif
 
+static inline int tulip_create_workqueue(void)
+{
+	ktulipd_workqueue = create_workqueue("ktulipd");
+	return ktulipd_workqueue ? 0 : -ENOMEM;
+}
+
+static inline void tulip_destroy_workqueue(void)
+{
+	destroy_workqueue(ktulipd_workqueue);
+}
+
 static void tulip_set_power_state (struct tulip_private *tp,
 				   int sleep, int snooze)
 {
@@ -526,20 +549,9 @@ static void tulip_tx_timeout(struct net_
 			   "SIA %8.8x %8.8x %8.8x %8.8x, resetting...\n",
 			   dev->name, ioread32(ioaddr + CSR5), ioread32(ioaddr + CSR12),
 			   ioread32(ioaddr + CSR13), ioread32(ioaddr + CSR14), ioread32(ioaddr + CSR15));
-		if ( ! tp->medialock  &&  tp->mtable) {
-			do
-				--tp->cur_index;
-			while (tp->cur_index >= 0
-				   && (tulip_media_cap[tp->mtable->mleaf[tp->cur_index].media]
-					   & MediaIsFD));
-			if (--tp->cur_index < 0) {
-				/* We start again, but should instead look for default. */
-				tp->cur_index = tp->mtable->leafcount - 1;
-			}
-			tulip_select_media(dev, 0);
-			printk(KERN_WARNING "%s: transmit timed out, switching to %s "
-				   "media.\n", dev->name, medianame[dev->if_port]);
-		}
+		tp->timeout_recovery = 1;
+		queue_work(ktulipd_workqueue, &tp->media_work);
+		goto out_unlock;
 	} else if (tp->chip_id == PNIC2) {
 		printk(KERN_WARNING "%s: PNIC2 transmit timed out, status %8.8x, "
 		       "CSR6/7 %8.8x / %8.8x CSR12 %8.8x, resetting...\n",
@@ -579,14 +591,9 @@ static void tulip_tx_timeout(struct net_
 	}
 #endif
 
-	/* Stop and restart the chip's Tx processes . */
-
-	tulip_restart_rxtx(tp);
-	/* Trigger an immediate transmit demand. */
-	iowrite32(0, ioaddr + CSR1);
-
-	tp->stats.tx_errors++;
+	tulip_tx_timeout_complete(tp, ioaddr);
 
+out_unlock:
 	spin_unlock_irqrestore (&tp->lock, flags);
 	dev->trans_start = jiffies;
 	netif_wake_queue (dev);
@@ -736,6 +743,8 @@ static void tulip_down (struct net_devic
 	void __iomem *ioaddr = tp->base_addr;
 	unsigned long flags;
 
+	flush_workqueue(ktulipd_workqueue);
+
 	del_timer_sync (&tp->timer);
 #ifdef CONFIG_TULIP_NAPI
 	del_timer_sync (&tp->oom_timer);
@@ -1408,6 +1417,8 @@ static int __devinit tulip_init_one (str
 	tp->timer.data = (unsigned long)dev;
 	tp->timer.function = tulip_tbl[tp->chip_id].media_timer;
 
+	INIT_WORK(&tp->media_work, tulip_tbl[tp->chip_id].media_task, dev);
+
 	dev->base_addr = (unsigned long)ioaddr;
 
 #ifdef CONFIG_TULIP_MWI
@@ -1838,6 +1849,8 @@ static struct pci_driver tulip_driver = 
 
 static int __init tulip_init (void)
 {
+	int ret;
+
 #ifdef MODULE
 	printk (KERN_INFO "%s", version);
 #endif
@@ -1846,14 +1859,23 @@ static int __init tulip_init (void)
 	tulip_rx_copybreak = rx_copybreak;
 	tulip_max_interrupt_work = max_interrupt_work;
 
+	ret = tulip_create_workqueue();
+	if (ret < 0)
+		goto out;
+
 	/* probe for and init boards */
-	return pci_module_init (&tulip_driver);
+	ret = pci_module_init(&tulip_driver);
+	if (ret < 0)
+		tulip_destroy_workqueue();
+out:
+	return ret;
 }
 
 
 static void __exit tulip_cleanup (void)
 {
 	pci_unregister_driver (&tulip_driver);
+	tulip_destroy_workqueue();
 }
 
 
diff -puN a/drivers/net/tulip/21142.c~tulip-000 b/drivers/net/tulip/21142.c
--- a/drivers/net/tulip/21142.c~tulip-000	2005-05-21 20:57:37.978358686 +0200
+++ b/drivers/net/tulip/21142.c	2005-05-22 00:10:21.717431845 +0200
@@ -26,9 +26,9 @@ static u16 t21142_csr15[] = { 0x0008, 0x
 
 /* Handle the 21143 uniquely: do autoselect with NWay, not the EEPROM list
    of available transceivers.  */
-void t21142_timer(unsigned long data)
+void t21142_media_task(void *data)
 {
-	struct net_device *dev = (struct net_device *)data;
+	struct net_device *dev = data;
 	struct tulip_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->base_addr;
 	int csr12 = ioread32(ioaddr + CSR12);
diff -puN a/drivers/net/tulip/timer.c~tulip-000 b/drivers/net/tulip/timer.c
--- a/drivers/net/tulip/timer.c~tulip-000	2005-05-21 20:57:38.251314753 +0200
+++ b/drivers/net/tulip/timer.c	2005-05-22 00:24:05.889719386 +0200
@@ -18,13 +18,14 @@
 #include "tulip.h"
 
 
-void tulip_timer(unsigned long data)
+void tulip_media_task(void *data)
 {
-	struct net_device *dev = (struct net_device *)data;
+	struct net_device *dev = data;
 	struct tulip_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->base_addr;
 	u32 csr12 = ioread32(ioaddr + CSR12);
 	int next_tick = 2*HZ;
+	unsigned long flags;
 
 	if (tulip_debug > 2) {
 		printk(KERN_DEBUG "%s: Media selection tick, %s, status %8.8x mode"
@@ -127,6 +128,14 @@ void tulip_timer(unsigned long data)
 	}
 	break;
 	}
+
+	spin_lock_irqsave (&tp->lock, flags);
+	if (tp->timeout_recovery) {
+		tp->timeout_recovery = 0;
+		tulip_tx_timeout_complete(tp, ioaddr);
+	}
+	spin_unlock_irqrestore (&tp->lock, flags);
+
 	/* mod_timer synchronizes us with potential add_timer calls
 	 * from interrupts.
 	 */
_

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-21 22:39       ` Francois Romieu
@ 2005-05-22  4:56         ` Grant Grundler
  2005-05-27  2:16         ` Jeff Garzik
  1 sibling, 0 replies; 14+ messages in thread
From: Grant Grundler @ 2005-05-22  4:56 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Jeff Garzik, Grant Grundler, akpm, T-Bone, varenet, Linux Kernel,
	Netdev

On Sun, May 22, 2005 at 12:39:59AM +0200, Francois Romieu wrote:
> Jeff Garzik <jgarzik@pobox.com> :
> [tulip_media_select]
> > 1) called from timer context, from the media poll timer
> > 
> > 2) called from spin_lock_irqsave() context, in the ->tx_timeout hook.
> > 
> > The first case can be fixed by moved all the timer code to a workqueue. 
> >  Then when the existing timer fires, kick the workqueue.
> > 
> > The second case can be fixed by kicking the workqueue upon tx_timeout 
> > (which is the reason why I did not suggest queue_delayed_work() use).
> 
> First try below. It only moves tulip_select_media() to process context.

Cool - thanks.

> The original patch (with s/udelay/msleep/ or such) is not included.

That's fine. I'll take care of that once Jeff is happy with this.


> Comments/suggestions ?

Basic workqueue create/destroy looks correct - but I've only played with
workqueues once before.
It wouldn't hurt if someone else double checked too.

Comments below are mostly about the other parts.


> +static inline int tulip_create_workqueue(void)
> +{
> +	ktulipd_workqueue = create_workqueue("ktulipd");
> +	return ktulipd_workqueue ? 0 : -ENOMEM;
> +}

This just obfuscates the code. It's only called in one place.
Please just directly call create_workqueue("ktulipd") from tulip_init()
and check the return value.

> +static inline void tulip_destroy_workqueue(void)
> +{
> +	destroy_workqueue(ktulipd_workqueue);
> +}

Same thing.

> @@ -526,20 +549,9 @@ static void tulip_tx_timeout(struct net_
...
> +		tp->timeout_recovery = 1;
> +		queue_work(ktulipd_workqueue, &tp->media_work);
> +		goto out_unlock;

This is the key bit.

> -	/* Stop and restart the chip's Tx processes . */
> -
> -	tulip_restart_rxtx(tp);
> -	/* Trigger an immediate transmit demand. */
> -	iowrite32(0, ioaddr + CSR1);
> -
> -	tp->stats.tx_errors++;
> +	tulip_tx_timeout_complete(tp, ioaddr);

This doesn't fix the existing issue with tulip_restart_rxtx().
Even without the patch to tulip_select_media(),
tulip_restart_rxtx() does not comply with jgarzik's linux driver
requirements becuase it can spin delay up to 1200us.


>  static void __exit tulip_cleanup (void)
>  {
>  	pci_unregister_driver (&tulip_driver);
> +	tulip_destroy_workqueue();
>  }

Only one workqueue for all instances of tulip cards, right?


...
> @@ -127,6 +128,14 @@ void tulip_timer(unsigned long data)
>  	}
>  	break;
>  	}
> +
> +	spin_lock_irqsave (&tp->lock, flags);
> +	if (tp->timeout_recovery) {
> +		tp->timeout_recovery = 0;
> +		tulip_tx_timeout_complete(tp, ioaddr);
> +	}
> +	spin_unlock_irqrestore (&tp->lock, flags);


This suffers the original issue: blocked IRQs while CPU might spin
for 1200us in tulip_tx_timeout_complete().

If tp->timeout_recovery acts as a sort of semaphore for us,
do we even need the spinlock?

I suspect "yes" because timeout_recovery is a bitfield and clearing
it is a read/modify/write operation. This is why I don't like bitfields.

ie. something like:
	if (tp->timeout_recovery) {
		tulip_tx_timeout_complete(tp, ioaddr);

		spin_lock_irqsave (&tp->lock, flags);
		tp->timeout_recovery = 0;	/* Bitfields are NOT atomic. */
		spin_unlock_irqrestore (&tp->lock, flags);
	}

thanks,
grant

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-21 22:39       ` Francois Romieu
  2005-05-22  4:56         ` Grant Grundler
@ 2005-05-27  2:16         ` Jeff Garzik
  2005-05-27  7:17           ` Francois Romieu
  1 sibling, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2005-05-27  2:16 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Grant Grundler, akpm, T-Bone, varenet, Linux Kernel, Netdev

Francois Romieu wrote:
> Jeff Garzik <jgarzik@pobox.com> :
> [tulip_media_select]
> 
>>1) called from timer context, from the media poll timer
>>
>>2) called from spin_lock_irqsave() context, in the ->tx_timeout hook.
>>
>>The first case can be fixed by moved all the timer code to a workqueue. 
>> Then when the existing timer fires, kick the workqueue.
>>
>>The second case can be fixed by kicking the workqueue upon tx_timeout 
>>(which is the reason why I did not suggest queue_delayed_work() use).
> 
> 
> First try below. It only moves tulip_select_media() to process context.
> The original patch (with s/udelay/msleep/ or such) is not included.
> 
> Patch applies/compiles against 2.6.12-rc4.

Looks pretty good to me, at first look.

I'll give it some thought, and probably apply it, in a few days.

	Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-27  2:16         ` Jeff Garzik
@ 2005-05-27  7:17           ` Francois Romieu
  2005-10-04 13:18             ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: Francois Romieu @ 2005-05-27  7:17 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Francois Romieu, Grant Grundler, akpm, T-Bone, varenet,
	Linux Kernel, Netdev

Jeff Garzik <jgarzik@pobox.com> :
[...]
> Looks pretty good to me, at first look.
> 
> I'll give it some thought, and probably apply it, in a few days.

An updated version is cooking.

--
Ueimor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree
  2005-05-27  7:17           ` Francois Romieu
@ 2005-10-04 13:18             ` Jeff Garzik
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff Garzik @ 2005-10-04 13:18 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Grant Grundler, akpm, T-Bone, varenet, Linux Kernel, Netdev

Francois Romieu wrote:
> Jeff Garzik <jgarzik@pobox.com> :
> [...]
> 
>>Looks pretty good to me, at first look.
>>
>>I'll give it some thought, and probably apply it, in a few days.
> 
> 
> An updated version is cooking.

Ever finish cooking this tulip patch?

	Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-10-04 13:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200505101955.j4AJtX9x032464@shell0.pdx.osdl.net>
2005-05-16  4:06 ` patch tulip-natsemi-dp83840a-phy-fix.patch added to -mm tree Jeff Garzik
2005-05-16  5:08   ` Grant Grundler
2005-05-16 16:46     ` Jeff Garzik
2005-05-16 22:26       ` Grant Grundler
2005-05-20 18:58         ` Jeff Garzik
2005-05-20 19:15           ` Francois Romieu
2005-05-20 21:12           ` Grant Grundler
2005-05-20 21:34             ` Jeff Garzik
2005-05-21  0:51               ` Grant Grundler
2005-05-21 22:39       ` Francois Romieu
2005-05-22  4:56         ` Grant Grundler
2005-05-27  2:16         ` Jeff Garzik
2005-05-27  7:17           ` Francois Romieu
2005-10-04 13:18             ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).