* [PATCH net-next 1/1] net: fec: Fix NAPI race
@ 2014-12-16 10:25 Fugang Duan
2014-12-16 11:33 ` Fabio Estevam
2014-12-16 20:24 ` David Miller
0 siblings, 2 replies; 5+ messages in thread
From: Fugang Duan @ 2014-12-16 10:25 UTC (permalink / raw)
To: davem; +Cc: netdev, R49496, bhutchings, stephen, b38611
Do camera capture test on i.MX6q sabresd board, and save the capture data to
nfs rootfs. The command is:
gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 ! tee name=t !
queue ! imxv4l2sink sync=false t. ! queue ! vpuenc ! queue ! mux. pulsesrc num-buffers=3720937
blocksize=4096 ! 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc ! mpegaudioparse !
queue ! mux. qtmux name=mux ! filesink location=video_recording_long.mov
After about 10 hours running, there have net watchdog timeout kernel dump:
...
WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x2b4/0x2d8()
NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.24-01051-gdb840b7 #440
[<80014e6c>] (unwind_backtrace) from [<800118ac>] (show_stack+0x10/0x14)
[<800118ac>] (show_stack) from [<806ae3f0>] (dump_stack+0x78/0xc0)
[<806ae3f0>] (dump_stack) from [<8002b504>] (warn_slowpath_common+0x68/0x8c)
[<8002b504>] (warn_slowpath_common) from [<8002b558>] (warn_slowpath_fmt+0x30/0x40)
[<8002b558>] (warn_slowpath_fmt) from [<8055e0d4>] (dev_watchdog+0x2b4/0x2d8)
[<8055e0d4>] (dev_watchdog) from [<800352d8>] (call_timer_fn.isra.33+0x24/0x8c)
[<800352d8>] (call_timer_fn.isra.33) from [<800354c4>] (run_timer_softirq+0x184/0x220)
[<800354c4>] (run_timer_softirq) from [<8002f420>] (__do_softirq+0xc0/0x22c)
[<8002f420>] (__do_softirq) from [<8002f804>] (irq_exit+0xa8/0xf4)
[<8002f804>] (irq_exit) from [<8000ee5c>] (handle_IRQ+0x54/0xb4)
[<8000ee5c>] (handle_IRQ) from [<80008598>] (gic_handle_irq+0x28/0x5c)
[<80008598>] (gic_handle_irq) from [<800123c0>] (__irq_svc+0x40/0x74)
Exception stack(0x80d27f18 to 0x80d27f60)
7f00: 80d27f60 0000014c
7f20: 8858c60e 0000004d 884e4540 0000004d ab7250d0 80d34348 00000000 00000000
7f40: 00000001 00000000 00000017 80d27f60 800702a4 80476e6c 600f0013 ffffffff
[<800123c0>] (__irq_svc) from [<80476e6c>] (cpuidle_enter_state+0x50/0xe0)
[<80476e6c>] (cpuidle_enter_state) from [<80476fa8>] (cpuidle_idle_call+0xac/0x154)
[<80476fa8>] (cpuidle_idle_call) from [<8000f174>] (arch_cpu_idle+0x8/0x44)
[<8000f174>] (arch_cpu_idle) from [<80064c54>] (cpu_startup_entry+0x100/0x158)
[<80064c54>] (cpu_startup_entry) from [<80cd8a9c>] (start_kernel+0x304/0x368)
---[ end trace 09ebd32fb032f86d ]---
...
There might have a race in napi_schedule(), leaving interrupts disabled forever.
After these patch, the case still work more than 40 hours running.
Signed-off-by: Fugang Duan <B38611@freescale.com>
---
drivers/net/ethernet/freescale/fec_main.c | 19 +++++++------------
1 files changed, 7 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 8c5b15e..5c4a8bd 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1558,20 +1558,21 @@ fec_enet_interrupt(int irq, void *dev_id)
{
struct net_device *ndev = dev_id;
struct fec_enet_private *fep = netdev_priv(ndev);
- const unsigned napi_mask = FEC_ENET_RXF | FEC_ENET_TXF;
uint int_events;
irqreturn_t ret = IRQ_NONE;
int_events = readl(fep->hwp + FEC_IEVENT);
- writel(int_events & ~napi_mask, fep->hwp + FEC_IEVENT);
+ writel(int_events, fep->hwp + FEC_IEVENT);
fec_enet_collect_events(fep, int_events);
- if (int_events & napi_mask) {
+ if (fep->work_tx || fep->work_rx) {
ret = IRQ_HANDLED;
- /* Disable the NAPI interrupts */
- writel(FEC_ENET_MII, fep->hwp + FEC_IMASK);
- napi_schedule(&fep->napi);
+ if (napi_schedule_prep(&fep->napi)) {
+ /* Disable the NAPI interrupts */
+ writel(FEC_ENET_MII, fep->hwp + FEC_IMASK);
+ __napi_schedule(&fep->napi);
+ }
}
if (int_events & FEC_ENET_MII) {
@@ -1591,12 +1592,6 @@ static int fec_enet_rx_napi(struct napi_struct *napi, int budget)
struct fec_enet_private *fep = netdev_priv(ndev);
int pkts;
- /*
- * Clear any pending transmit or receive interrupts before
- * processing the rings to avoid racing with the hardware.
- */
- writel(FEC_ENET_RXF | FEC_ENET_TXF, fep->hwp + FEC_IEVENT);
-
pkts = fec_enet_rx(ndev, budget);
fec_enet_tx(ndev);
--
1.7.8
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 1/1] net: fec: Fix NAPI race
2014-12-16 10:25 [PATCH net-next 1/1] net: fec: Fix NAPI race Fugang Duan
@ 2014-12-16 11:33 ` Fabio Estevam
2014-12-16 11:41 ` Russell King - ARM Linux
2014-12-16 20:24 ` David Miller
1 sibling, 1 reply; 5+ messages in thread
From: Fabio Estevam @ 2014-12-16 11:33 UTC (permalink / raw)
To: Fugang Duan
Cc: David S. Miller, netdev@vger.kernel.org, Estevam Fabio-R49496,
Ben Hutchings, Stephen Hemminger, robert.daniels,
Marek Vašut, Russell King
Hi Fugang,
On Tue, Dec 16, 2014 at 8:25 AM, Fugang Duan <b38611@freescale.com> wrote:
> Do camera capture test on i.MX6q sabresd board, and save the capture data to
> nfs rootfs. The command is:
> gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 ! tee name=t !
> queue ! imxv4l2sink sync=false t. ! queue ! vpuenc ! queue ! mux. pulsesrc num-buffers=3720937
> blocksize=4096 ! 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc ! mpegaudioparse !
> queue ! mux. qtmux name=mux ! filesink location=video_recording_long.mov
>
> After about 10 hours running, there have net watchdog timeout kernel dump:
> ...
> WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x2b4/0x2d8()
> NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
Adding more people who reported similar issues in the past.
Marek,
Does this patch solve the problem you reported at
http://www.spinics.net/lists/netdev/msg268167.html ?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 1/1] net: fec: Fix NAPI race
2014-12-16 11:33 ` Fabio Estevam
@ 2014-12-16 11:41 ` Russell King - ARM Linux
2014-12-16 13:34 ` Marek Vasut
0 siblings, 1 reply; 5+ messages in thread
From: Russell King - ARM Linux @ 2014-12-16 11:41 UTC (permalink / raw)
To: Fabio Estevam
Cc: Fugang Duan, David S. Miller, netdev@vger.kernel.org,
Estevam Fabio-R49496, Ben Hutchings, Stephen Hemminger,
robert.daniels, Marek Vašut
On Tue, Dec 16, 2014 at 09:33:53AM -0200, Fabio Estevam wrote:
> Hi Fugang,
>
> On Tue, Dec 16, 2014 at 8:25 AM, Fugang Duan <b38611@freescale.com> wrote:
> > Do camera capture test on i.MX6q sabresd board, and save the capture data to
> > nfs rootfs. The command is:
> > gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 ! tee name=t !
> > queue ! imxv4l2sink sync=false t. ! queue ! vpuenc ! queue ! mux. pulsesrc num-buffers=3720937
> > blocksize=4096 ! 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc ! mpegaudioparse !
> > queue ! mux. qtmux name=mux ! filesink location=video_recording_long.mov
> >
> > After about 10 hours running, there have net watchdog timeout kernel dump:
> > ...
> > WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x2b4/0x2d8()
> > NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
>
> Adding more people who reported similar issues in the past.
>
> Marek,
>
> Does this patch solve the problem you reported at
> http://www.spinics.net/lists/netdev/msg268167.html ?
My set of patches fixed stuff exactly like this...
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 1/1] net: fec: Fix NAPI race
2014-12-16 11:41 ` Russell King - ARM Linux
@ 2014-12-16 13:34 ` Marek Vasut
0 siblings, 0 replies; 5+ messages in thread
From: Marek Vasut @ 2014-12-16 13:34 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Fabio Estevam, Fugang Duan, David S. Miller,
netdev@vger.kernel.org, Estevam Fabio-R49496, Ben Hutchings,
Stephen Hemminger, robert.daniels
On Tuesday, December 16, 2014 at 12:41:31 PM, Russell King - ARM Linux wrote:
> On Tue, Dec 16, 2014 at 09:33:53AM -0200, Fabio Estevam wrote:
> > Hi Fugang,
> >
> > On Tue, Dec 16, 2014 at 8:25 AM, Fugang Duan <b38611@freescale.com> wrote:
> > > Do camera capture test on i.MX6q sabresd board, and save the capture
> > > data to nfs rootfs. The command is:
> > > gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 !
> > > tee name=t ! queue ! imxv4l2sink sync=false t. ! queue ! vpuenc !
> > > queue ! mux. pulsesrc num-buffers=3720937 blocksize=4096 !
> > > 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc !
> > > mpegaudioparse ! queue ! mux. qtmux name=mux ! filesink
> > > location=video_recording_long.mov
> > >
> > > After about 10 hours running, there have net watchdog timeout kernel
> > > dump: ...
> > > WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264
> > > dev_watchdog+0x2b4/0x2d8() NETDEV WATCHDOG: eth0 (fec): transmit queue
> > > 0 timed out
> >
> > Adding more people who reported similar issues in the past.
> >
> > Marek,
> >
> > Does this patch solve the problem you reported at
> > http://www.spinics.net/lists/netdev/msg268167.html ?
>
> My set of patches fixed stuff exactly like this...
I still keep your G+ post open, in case I ever manage to find free time to dive
into it. It's be a terrible waste to let these patches go. Right now, I'm in the
process of finishing my degree (finally) so things are just crap, apologies.
Best regards,
Marek Vasut
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 1/1] net: fec: Fix NAPI race
2014-12-16 10:25 [PATCH net-next 1/1] net: fec: Fix NAPI race Fugang Duan
2014-12-16 11:33 ` Fabio Estevam
@ 2014-12-16 20:24 ` David Miller
1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2014-12-16 20:24 UTC (permalink / raw)
To: b38611; +Cc: netdev, R49496, bhutchings, stephen
From: Fugang Duan <b38611@freescale.com>
Date: Tue, 16 Dec 2014 18:25:58 +0800
> Do camera capture test on i.MX6q sabresd board, and save the capture data to
> nfs rootfs. The command is:
> gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 ! tee name=t !
> queue ! imxv4l2sink sync=false t. ! queue ! vpuenc ! queue ! mux. pulsesrc num-buffers=3720937
> blocksize=4096 ! 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc ! mpegaudioparse !
> queue ! mux. qtmux name=mux ! filesink location=video_recording_long.mov
>
> After about 10 hours running, there have net watchdog timeout kernel dump:
...
> There might have a race in napi_schedule(), leaving interrupts disabled forever.
> After these patch, the case still work more than 40 hours running.
>
> Signed-off-by: Fugang Duan <B38611@freescale.com>
Applied, thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-12-16 20:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-16 10:25 [PATCH net-next 1/1] net: fec: Fix NAPI race Fugang Duan
2014-12-16 11:33 ` Fabio Estevam
2014-12-16 11:41 ` Russell King - ARM Linux
2014-12-16 13:34 ` Marek Vasut
2014-12-16 20:24 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).