* Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
@ 2006-11-09 14:22 Basheer, Mansoor Ahamed
2006-11-09 23:46 ` Francois Romieu
0 siblings, 1 reply; 10+ messages in thread
From: Basheer, Mansoor Ahamed @ 2006-11-09 14:22 UTC (permalink / raw)
To: netdev
Hi All
I found an issue with Realtek 8139 driver (2.6.10 branch) at high
bi-directional traffic. On transmit timeout, driver's timeout callback
re-enables the receive interrupt. On the next receive interrupt, the ISR
disables the receive interrupt "only" when the receive poll task is not
active. But the poll task is actually active and hence it doesn't
disable the receive interrupt. So the ISR returns without clearing the
receive interrupt. This un-serviced receive interrupt brings the system
into hung state.
My understanding here is, on receive interrupt the ISR should disable
the receive interrupt irrespective of the polling task's state (active
or inactive). I changed the code (as shown below) and it works
perfectly.
Is this a known issue? If so, is there a fix already available?
Also, we get frequent TX timeouts during high rate of traffic. What
could be the reason for this frequent TX timeouts?
--- old/8139too.c 2006-11-09 11:49:25.000000000 +0530
+++ new/8139too.c 2006-11-09 11:50:02.000000000 +0530
@@ -2200,8 +2200,8 @@
/* Receive packets are processed by poll routine.
If not running start it now. */
if (status & RxAckBits){
- if (netif_rx_schedule_prep(dev)) {
RTL_W16_F (IntrMask, rtl8139_norx_intr_mask);
+ if (netif_rx_schedule_prep(dev)) {
__netif_rx_schedule (dev);
}
}
Thanks
Mansoor
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
2006-11-09 14:22 Basheer, Mansoor Ahamed
@ 2006-11-09 23:46 ` Francois Romieu
2006-11-14 5:33 ` Basheer, Mansoor Ahamed
0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2006-11-09 23:46 UTC (permalink / raw)
To: Basheer, Mansoor Ahamed; +Cc: netdev
Basheer, Mansoor Ahamed <mansoor.ahamed@ti.com> :
[...]
> My understanding here is, on receive interrupt the ISR should disable
> the receive interrupt irrespective of the polling task's state (active
> or inactive).
Apparently it could happen even with 2.6.19-rc5, yes.
> I changed the code (as shown below) and it works perfectly.
Afaics your change may disable the Rx irq right after the poll
routine enabled it again. It will not always work either.
The (slow) timeout watchdog could grab the poll handler and hack
the irq mask depending on whether poll was scheduled or not.
> Is this a known issue? If so, is there a fix already available?
>
> Also, we get frequent TX timeouts during high rate of traffic. What
> could be the reason for this frequent TX timeouts?
No idea. Have you considered upgrading to a kernel which is not almost
2 years old ?
--
Ueimor
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
2006-11-09 23:46 ` Francois Romieu
@ 2006-11-14 5:33 ` Basheer, Mansoor Ahamed
0 siblings, 0 replies; 10+ messages in thread
From: Basheer, Mansoor Ahamed @ 2006-11-14 5:33 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev
Francois Romieu [mailto:romieu@fr.zoreil.com] wrote:
> Afaics your change may disable the Rx irq right after the poll
> routine enabled it again. It will not always work either.
>
> The (slow) timeout watchdog could grab the poll handler and hack
> the irq mask depending on whether poll was scheduled or not.
Based on your inputs I'm attaching below the updated patch.
Signed-off-by: Mansoor Ahamed <mansoor.ahamed@ti.com>
--- old/8139too.c 2006-11-14 10:44:27.000000000 +0530
+++ new/8139too.c 2006-11-14 10:44:18.000000000 +0530
@@ -1438,8 +1438,18 @@
if ((!(tmp & CmdRxEnb)) || (!(tmp & CmdTxEnb)))
RTL_W8 (ChipCmd, CmdRxEnb | CmdTxEnb);
- /* Enable all known interrupts by setting the interrupt mask. */
- RTL_W16 (IntrMask, rtl8139_intr_mask);
+ local_irq_disable();
+ /* Don't enable RX if RX was already scheduled */
+ if(test_bit(__LINK_STATE_START, &dev->state) &&
+ test_bit(__LINK_STATE_RX_SCHED, &dev->state) ) {
+ /* Enable all interrupts except RX by setting the
interrupt mask. */
+ RTL_W16 (IntrMask, rtl8139_norx_intr_mask);
+ }
+ else {
+ /* Enable all known interrupts by setting the interrupt
mask. */
+ RTL_W16 (IntrMask, rtl8139_intr_mask);
+ }
+ local_irq_enable();
}
Thanks
Mansoor
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
@ 2006-11-29 8:50 Basheer, Mansoor Ahamed
2006-11-29 20:24 ` Stephen Hemminger
0 siblings, 1 reply; 10+ messages in thread
From: Basheer, Mansoor Ahamed @ 2006-11-29 8:50 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev
Francois Romieu [mailto:romieu@fr.zoreil.com] wrote:
> Afaics your change may disable the Rx irq right after the poll routine
> enabled it again. It will not always work either.
>
> The (slow) timeout watchdog could grab the poll handler and hack the
> irq mask depending on whether poll was scheduled or not.
Could you please confirm whether the attached patch would work?
I tested it and it works for me.
Signed-off-by: Mansoor Ahamed <mansoor.ahamed@ti.com>
--- old/8139too.c 2006-11-14 10:44:27.000000000 +0530
+++ new/8139too.c 2006-11-14 10:44:18.000000000 +0530
@@ -1438,8 +1438,18 @@
if ((!(tmp & CmdRxEnb)) || (!(tmp & CmdTxEnb)))
RTL_W8 (ChipCmd, CmdRxEnb | CmdTxEnb);
- /* Enable all known interrupts by setting the interrupt mask. */
- RTL_W16 (IntrMask, rtl8139_intr_mask);
+ local_irq_disable();
+ /* Don't enable RX if RX was already scheduled */
+ if(test_bit(__LINK_STATE_START, &dev->state) &&
+ test_bit(__LINK_STATE_RX_SCHED, &dev->state) ) {
+ /* Enable all interrupts except RX by setting the
interrupt mask. */
+ RTL_W16 (IntrMask, rtl8139_norx_intr_mask);
+ }
+ else {
+ /* Enable all known interrupts by setting the interrupt
mask. */
+ RTL_W16 (IntrMask, rtl8139_intr_mask);
+ }
+ local_irq_enable();
}
Thanks
Mansoor
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
2006-11-29 8:50 Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic Basheer, Mansoor Ahamed
@ 2006-11-29 20:24 ` Stephen Hemminger
2006-11-29 22:44 ` Francois Romieu
0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2006-11-29 20:24 UTC (permalink / raw)
To: Basheer, Mansoor Ahamed; +Cc: Francois Romieu, netdev
On Wed, 29 Nov 2006 14:20:31 +0530
"Basheer, Mansoor Ahamed" <mansoor.ahamed@ti.com> wrote:
> Francois Romieu [mailto:romieu@fr.zoreil.com] wrote:
>
> > Afaics your change may disable the Rx irq right after the poll routine
>
> > enabled it again. It will not always work either.
> >
> > The (slow) timeout watchdog could grab the poll handler and hack the
> > irq mask depending on whether poll was scheduled or not.
>
> Could you please confirm whether the attached patch would work?
> I tested it and it works for me.
>
>
> Signed-off-by: Mansoor Ahamed <mansoor.ahamed@ti.com>
>
> --- old/8139too.c 2006-11-14 10:44:27.000000000 +0530
> +++ new/8139too.c 2006-11-14 10:44:18.000000000 +0530
> @@ -1438,8 +1438,18 @@
> if ((!(tmp & CmdRxEnb)) || (!(tmp & CmdTxEnb)))
> RTL_W8 (ChipCmd, CmdRxEnb | CmdTxEnb);
>
> - /* Enable all known interrupts by setting the interrupt mask. */
> - RTL_W16 (IntrMask, rtl8139_intr_mask);
> + local_irq_disable();
> + /* Don't enable RX if RX was already scheduled */
> + if(test_bit(__LINK_STATE_START, &dev->state) &&
> + test_bit(__LINK_STATE_RX_SCHED, &dev->state) ) {
> + /* Enable all interrupts except RX by setting the
> interrupt mask. */
> + RTL_W16 (IntrMask, rtl8139_norx_intr_mask);
> + }
> + else {
> + /* Enable all known interrupts by setting the interrupt
> mask. */
> + RTL_W16 (IntrMask, rtl8139_intr_mask);
> + }
> + local_irq_enable();
> }
Sorry, that's not the right way. Testing for bits is not
SMP safe and is usually a bad idea. The rx_lock model is not the
best way. Try something like this:
--- a/drivers/net/8139too.c.orig 2006-11-29 12:22:32.000000000 -0800
+++ b/drivers/net/8139too.c 2006-11-29 12:22:06.000000000 -0800
@@ -589,7 +589,6 @@ struct rtl8139_private {
unsigned int default_port : 4; /* Last dev->if_port value. */
unsigned int have_thread : 1;
spinlock_t lock;
- spinlock_t rx_lock;
chip_t chipset;
u32 rx_config;
struct rtl_extra_stats xstats;
@@ -1009,7 +1008,6 @@ static int __devinit rtl8139_init_one (s
tp->msg_enable =
(debug < 0 ? RTL8139_DEF_MSG_ENABLE : ((1 << debug) - 1));
spin_lock_init (&tp->lock);
- spin_lock_init (&tp->rx_lock);
INIT_WORK(&tp->thread, rtl8139_thread, dev);
tp->mii.dev = dev;
tp->mii.mdio_read = mdio_read;
@@ -1654,6 +1652,9 @@ static void rtl8139_tx_timeout_task (voi
int i;
u8 tmp8;
+ if (!netif_running(dev))
+ return;
+
printk (KERN_DEBUG "%s: Transmit timeout, status %2.2x %4.4x %4.4x "
"media %2.2x.\n", dev->name, RTL_R8 (ChipCmd),
RTL_R16(IntrStatus), RTL_R16(IntrMask), RTL_R8(MediaStatus));
@@ -1673,7 +1674,9 @@ static void rtl8139_tx_timeout_task (voi
if (tmp8 & CmdTxEnb)
RTL_W8 (ChipCmd, CmdRxEnb);
- spin_lock_bh(&tp->rx_lock);
+ /* prevent NAPI poll from running */
+ netif_poll_disable();
+
/* Disable interrupts by clearing the interrupt mask. */
RTL_W16 (IntrMask, 0x0000);
@@ -1682,12 +1685,11 @@ static void rtl8139_tx_timeout_task (voi
rtl8139_tx_clear (tp);
spin_unlock_irq(&tp->lock);
+ netif_poll_enable();
+
/* ...and finally, reset everything */
- if (netif_running(dev)) {
- rtl8139_hw_start (dev);
- netif_wake_queue (dev);
- }
- spin_unlock_bh(&tp->rx_lock);
+ rtl8139_hw_start (dev);
+ netif_wake_queue (dev);
}
static void rtl8139_tx_timeout (struct net_device *dev)
@@ -2116,7 +2118,6 @@ static int rtl8139_poll(struct net_devic
int orig_budget = min(*budget, dev->quota);
int done = 1;
- spin_lock(&tp->rx_lock);
if (likely(RTL_R16(IntrStatus) & RxAckBits)) {
int work_done;
@@ -2138,7 +2139,6 @@ static int rtl8139_poll(struct net_devic
__netif_rx_complete(dev);
local_irq_enable();
}
- spin_unlock(&tp->rx_lock);
return !done;
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
2006-11-29 20:24 ` Stephen Hemminger
@ 2006-11-29 22:44 ` Francois Romieu
2006-11-29 22:50 ` Stephen Hemminger
0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2006-11-29 22:44 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Basheer, Mansoor Ahamed, netdev
Stephen Hemminger <shemminger@osdl.org> :
[...]
> @@ -1682,12 +1685,11 @@ static void rtl8139_tx_timeout_task (voi
> rtl8139_tx_clear (tp);
> spin_unlock_irq(&tp->lock);
>
> + netif_poll_enable();
^ -> dev
> +
> /* ...and finally, reset everything */
> - if (netif_running(dev)) {
> - rtl8139_hw_start (dev);
> - netif_wake_queue (dev);
> - }
> - spin_unlock_bh(&tp->rx_lock);
> + rtl8139_hw_start (dev);
> + netif_wake_queue (dev);
> }
rtl8139_hw_start() may mess with cur_rx, whence a race with rtl8139_rx()
if an in-flight interruption enables it a bit too fast. I'd rather go
with:
[...]
rtl8139_tx_clear (tp);
rtl8139_hw_start (dev);
netif_wake_queue (dev);
netif_poll_enable(dev);
spin_unlock_irq(&tp->lock);
}
Otherwise the patch is cool.
--
Ueimor
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
2006-11-29 22:44 ` Francois Romieu
@ 2006-11-29 22:50 ` Stephen Hemminger
2006-11-29 23:32 ` Francois Romieu
0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2006-11-29 22:50 UTC (permalink / raw)
To: Francois Romieu; +Cc: Basheer, Mansoor Ahamed, netdev
On Wed, 29 Nov 2006 23:44:00 +0100
Francois Romieu <romieu@fr.zoreil.com> wrote:
> Stephen Hemminger <shemminger@osdl.org> :
> [...]
> > @@ -1682,12 +1685,11 @@ static void rtl8139_tx_timeout_task (voi
> > rtl8139_tx_clear (tp);
> > spin_unlock_irq(&tp->lock);
> >
> > + netif_poll_enable();
> ^ -> dev
> > +
> > /* ...and finally, reset everything */
> > - if (netif_running(dev)) {
> > - rtl8139_hw_start (dev);
> > - netif_wake_queue (dev);
> > - }
> > - spin_unlock_bh(&tp->rx_lock);
> > + rtl8139_hw_start (dev);
> > + netif_wake_queue (dev);
> > }
>
> rtl8139_hw_start() may mess with cur_rx, whence a race with rtl8139_rx()
> if an in-flight interruption enables it a bit too fast. I'd rather go
> with:
but rt8139_rx is not possible here because we have blocked the poll
routine from starting. Basically it uses the NAPI rx scheduler bit
to replace the rx_lock.
It is totally, untested.
--
Stephen Hemminger <shemminger@osdl.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
2006-11-29 22:50 ` Stephen Hemminger
@ 2006-11-29 23:32 ` Francois Romieu
2006-11-30 0:09 ` Stephen Hemminger
0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2006-11-29 23:32 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Basheer, Mansoor Ahamed, netdev
Stephen Hemminger <shemminger@osdl.org> :
> Francois Romieu <romieu@fr.zoreil.com> wrote:
> > Stephen Hemminger <shemminger@osdl.org> :
> > [...]
> > > @@ -1682,12 +1685,11 @@ static void rtl8139_tx_timeout_task (voi
> > > rtl8139_tx_clear (tp);
> > > spin_unlock_irq(&tp->lock);
> > >
> > > + netif_poll_enable();
> > ^ -> dev
> > > +
> > > /* ...and finally, reset everything */
> > > - if (netif_running(dev)) {
> > > - rtl8139_hw_start (dev);
> > > - netif_wake_queue (dev);
> > > - }
> > > - spin_unlock_bh(&tp->rx_lock);
> > > + rtl8139_hw_start (dev);
> > > + netif_wake_queue (dev);
> > > }
> >
> > rtl8139_hw_start() may mess with cur_rx, whence a race with rtl8139_rx()
> > if an in-flight interruption enables it a bit too fast. I'd rather go
> > with:
>
> but rt8139_rx is not possible here because we have blocked the poll
> routine from starting. Basically it uses the NAPI rx scheduler bit
> to replace the rx_lock.
1 - the irq handler is waiting for tp->lock
2 - rtl8139_tx_timeout_task releases the lock
3 - rtl8139_tx_timeout_task issues netif_poll_enable
4 - the irq handler schedules ->poll(), returns
5 - rtl8139_hw_start() races with ->poll(), aka rtl8139_rx(), for cur_rx
--
Ueimor
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
2006-11-29 23:32 ` Francois Romieu
@ 2006-11-30 0:09 ` Stephen Hemminger
2006-11-30 0:25 ` Francois Romieu
0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2006-11-30 0:09 UTC (permalink / raw)
To: Francois Romieu; +Cc: Basheer, Mansoor Ahamed, netdev
On Thu, 30 Nov 2006 00:32:19 +0100
Francois Romieu <romieu@fr.zoreil.com> wrote:
> Stephen Hemminger <shemminger@osdl.org> :
> > Francois Romieu <romieu@fr.zoreil.com> wrote:
> > > Stephen Hemminger <shemminger@osdl.org> :
> > > [...]
> > > > @@ -1682,12 +1685,11 @@ static void rtl8139_tx_timeout_task (voi
> > > > rtl8139_tx_clear (tp);
> > > > spin_unlock_irq(&tp->lock);
> > > >
> > > > + netif_poll_enable();
> > > ^ -> dev
> > > > +
> > > > /* ...and finally, reset everything */
> > > > - if (netif_running(dev)) {
> > > > - rtl8139_hw_start (dev);
> > > > - netif_wake_queue (dev);
> > > > - }
> > > > - spin_unlock_bh(&tp->rx_lock);
> > > > + rtl8139_hw_start (dev);
> > > > + netif_wake_queue (dev);
> > > > }
> > >
> > > rtl8139_hw_start() may mess with cur_rx, whence a race with rtl8139_rx()
> > > if an in-flight interruption enables it a bit too fast. I'd rather go
> > > with:
> >
> > but rt8139_rx is not possible here because we have blocked the poll
> > routine from starting. Basically it uses the NAPI rx scheduler bit
> > to replace the rx_lock.
>
> 1 - the irq handler is waiting for tp->lock
> 2 - rtl8139_tx_timeout_task releases the lock
> 3 - rtl8139_tx_timeout_task issues netif_poll_enable
> 4 - the irq handler schedules ->poll(), returns
Move the poll_enable to after hw_start() or put it inside hw_start.
-
Stephen Hemminger <shemminger@osdl.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic
2006-11-30 0:09 ` Stephen Hemminger
@ 2006-11-30 0:25 ` Francois Romieu
0 siblings, 0 replies; 10+ messages in thread
From: Francois Romieu @ 2006-11-30 0:25 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Basheer, Mansoor Ahamed, netdev
Stephen Hemminger <shemminger@osdl.org> :
[...]
> Move the poll_enable to after hw_start() or put it inside hw_start.
"after" probably The order would be the opposite of the one used
in rtl8139_poll (which does __netif_rx_complete then irq_unlock)
and it's past 1 AM. It starts to be a bit foggy.
--
Ueimor
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-11-30 0:27 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-29 8:50 Realtek 8139 driver (8139too.c) TX Timeout doesn't allow interrupt handler to disable receive interrupts at high bi-directional traffic Basheer, Mansoor Ahamed
2006-11-29 20:24 ` Stephen Hemminger
2006-11-29 22:44 ` Francois Romieu
2006-11-29 22:50 ` Stephen Hemminger
2006-11-29 23:32 ` Francois Romieu
2006-11-30 0:09 ` Stephen Hemminger
2006-11-30 0:25 ` Francois Romieu
-- strict thread matches above, loose matches on Subject: below --
2006-11-09 14:22 Basheer, Mansoor Ahamed
2006-11-09 23:46 ` Francois Romieu
2006-11-14 5:33 ` Basheer, Mansoor Ahamed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).