netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 1/1] net: fec: Fix NAPI race
@ 2014-12-16 10:25 Fugang Duan
  2014-12-16 11:33 ` Fabio Estevam
  2014-12-16 20:24 ` David Miller
  0 siblings, 2 replies; 5+ messages in thread
From: Fugang Duan @ 2014-12-16 10:25 UTC (permalink / raw)
  To: davem; +Cc: netdev, R49496, bhutchings, stephen, b38611

Do camera capture test on i.MX6q sabresd board, and save the capture data to
nfs rootfs. The command is:
gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 ! tee name=t !
queue ! imxv4l2sink sync=false t. ! queue ! vpuenc ! queue ! mux. pulsesrc num-buffers=3720937
blocksize=4096 ! 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc ! mpegaudioparse !
queue ! mux. qtmux name=mux ! filesink location=video_recording_long.mov

After about 10 hours running, there have net watchdog timeout kernel dump:
...
WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x2b4/0x2d8()
NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.24-01051-gdb840b7 #440
[<80014e6c>] (unwind_backtrace) from [<800118ac>] (show_stack+0x10/0x14)
[<800118ac>] (show_stack) from [<806ae3f0>] (dump_stack+0x78/0xc0)
[<806ae3f0>] (dump_stack) from [<8002b504>] (warn_slowpath_common+0x68/0x8c)
[<8002b504>] (warn_slowpath_common) from [<8002b558>] (warn_slowpath_fmt+0x30/0x40)
[<8002b558>] (warn_slowpath_fmt) from [<8055e0d4>] (dev_watchdog+0x2b4/0x2d8)
[<8055e0d4>] (dev_watchdog) from [<800352d8>] (call_timer_fn.isra.33+0x24/0x8c)
[<800352d8>] (call_timer_fn.isra.33) from [<800354c4>] (run_timer_softirq+0x184/0x220)
[<800354c4>] (run_timer_softirq) from [<8002f420>] (__do_softirq+0xc0/0x22c)
[<8002f420>] (__do_softirq) from [<8002f804>] (irq_exit+0xa8/0xf4)
[<8002f804>] (irq_exit) from [<8000ee5c>] (handle_IRQ+0x54/0xb4)
[<8000ee5c>] (handle_IRQ) from [<80008598>] (gic_handle_irq+0x28/0x5c)
[<80008598>] (gic_handle_irq) from [<800123c0>] (__irq_svc+0x40/0x74)
Exception stack(0x80d27f18 to 0x80d27f60)
7f00:                                                       80d27f60 0000014c
7f20: 8858c60e 0000004d 884e4540 0000004d ab7250d0 80d34348 00000000 00000000
7f40: 00000001 00000000 00000017 80d27f60 800702a4 80476e6c 600f0013 ffffffff
[<800123c0>] (__irq_svc) from [<80476e6c>] (cpuidle_enter_state+0x50/0xe0)
[<80476e6c>] (cpuidle_enter_state) from [<80476fa8>] (cpuidle_idle_call+0xac/0x154)
[<80476fa8>] (cpuidle_idle_call) from [<8000f174>] (arch_cpu_idle+0x8/0x44)
[<8000f174>] (arch_cpu_idle) from [<80064c54>] (cpu_startup_entry+0x100/0x158)
[<80064c54>] (cpu_startup_entry) from [<80cd8a9c>] (start_kernel+0x304/0x368)
---[ end trace 09ebd32fb032f86d ]---
...

There might have a race in napi_schedule(), leaving interrupts disabled forever.
After these patch, the case still work more than 40 hours running.

Signed-off-by: Fugang Duan <B38611@freescale.com>
---
 drivers/net/ethernet/freescale/fec_main.c |   19 +++++++------------
 1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 8c5b15e..5c4a8bd 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1558,20 +1558,21 @@ fec_enet_interrupt(int irq, void *dev_id)
 {
 	struct net_device *ndev = dev_id;
 	struct fec_enet_private *fep = netdev_priv(ndev);
-	const unsigned napi_mask = FEC_ENET_RXF | FEC_ENET_TXF;
 	uint int_events;
 	irqreturn_t ret = IRQ_NONE;
 
 	int_events = readl(fep->hwp + FEC_IEVENT);
-	writel(int_events & ~napi_mask, fep->hwp + FEC_IEVENT);
+	writel(int_events, fep->hwp + FEC_IEVENT);
 	fec_enet_collect_events(fep, int_events);
 
-	if (int_events & napi_mask) {
+	if (fep->work_tx || fep->work_rx) {
 		ret = IRQ_HANDLED;
 
-		/* Disable the NAPI interrupts */
-		writel(FEC_ENET_MII, fep->hwp + FEC_IMASK);
-		napi_schedule(&fep->napi);
+		if (napi_schedule_prep(&fep->napi)) {
+			/* Disable the NAPI interrupts */
+			writel(FEC_ENET_MII, fep->hwp + FEC_IMASK);
+			__napi_schedule(&fep->napi);
+		}
 	}
 
 	if (int_events & FEC_ENET_MII) {
@@ -1591,12 +1592,6 @@ static int fec_enet_rx_napi(struct napi_struct *napi, int budget)
 	struct fec_enet_private *fep = netdev_priv(ndev);
 	int pkts;
 
-	/*
-	 * Clear any pending transmit or receive interrupts before
-	 * processing the rings to avoid racing with the hardware.
-	 */
-	writel(FEC_ENET_RXF | FEC_ENET_TXF, fep->hwp + FEC_IEVENT);
-
 	pkts = fec_enet_rx(ndev, budget);
 
 	fec_enet_tx(ndev);
-- 
1.7.8

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/1] net: fec: Fix NAPI race
  2014-12-16 10:25 [PATCH net-next 1/1] net: fec: Fix NAPI race Fugang Duan
@ 2014-12-16 11:33 ` Fabio Estevam
  2014-12-16 11:41   ` Russell King - ARM Linux
  2014-12-16 20:24 ` David Miller
  1 sibling, 1 reply; 5+ messages in thread
From: Fabio Estevam @ 2014-12-16 11:33 UTC (permalink / raw)
  To: Fugang Duan
  Cc: David S. Miller, netdev@vger.kernel.org, Estevam Fabio-R49496,
	Ben Hutchings, Stephen Hemminger, robert.daniels,
	Marek Vašut, Russell King

Hi Fugang,

On Tue, Dec 16, 2014 at 8:25 AM, Fugang Duan <b38611@freescale.com> wrote:
> Do camera capture test on i.MX6q sabresd board, and save the capture data to
> nfs rootfs. The command is:
> gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 ! tee name=t !
> queue ! imxv4l2sink sync=false t. ! queue ! vpuenc ! queue ! mux. pulsesrc num-buffers=3720937
> blocksize=4096 ! 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc ! mpegaudioparse !
> queue ! mux. qtmux name=mux ! filesink location=video_recording_long.mov
>
> After about 10 hours running, there have net watchdog timeout kernel dump:
> ...
> WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x2b4/0x2d8()
> NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out

Adding more people who reported similar issues in the past.

Marek,

Does this patch solve the problem you reported at
http://www.spinics.net/lists/netdev/msg268167.html ?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/1] net: fec: Fix NAPI race
  2014-12-16 11:33 ` Fabio Estevam
@ 2014-12-16 11:41   ` Russell King - ARM Linux
  2014-12-16 13:34     ` Marek Vasut
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King - ARM Linux @ 2014-12-16 11:41 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Fugang Duan, David S. Miller, netdev@vger.kernel.org,
	Estevam Fabio-R49496, Ben Hutchings, Stephen Hemminger,
	robert.daniels, Marek Vašut

On Tue, Dec 16, 2014 at 09:33:53AM -0200, Fabio Estevam wrote:
> Hi Fugang,
> 
> On Tue, Dec 16, 2014 at 8:25 AM, Fugang Duan <b38611@freescale.com> wrote:
> > Do camera capture test on i.MX6q sabresd board, and save the capture data to
> > nfs rootfs. The command is:
> > gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 ! tee name=t !
> > queue ! imxv4l2sink sync=false t. ! queue ! vpuenc ! queue ! mux. pulsesrc num-buffers=3720937
> > blocksize=4096 ! 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc ! mpegaudioparse !
> > queue ! mux. qtmux name=mux ! filesink location=video_recording_long.mov
> >
> > After about 10 hours running, there have net watchdog timeout kernel dump:
> > ...
> > WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x2b4/0x2d8()
> > NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
> 
> Adding more people who reported similar issues in the past.
> 
> Marek,
> 
> Does this patch solve the problem you reported at
> http://www.spinics.net/lists/netdev/msg268167.html ?

My set of patches fixed stuff exactly like this...

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/1] net: fec: Fix NAPI race
  2014-12-16 11:41   ` Russell King - ARM Linux
@ 2014-12-16 13:34     ` Marek Vasut
  0 siblings, 0 replies; 5+ messages in thread
From: Marek Vasut @ 2014-12-16 13:34 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Fabio Estevam, Fugang Duan, David S. Miller,
	netdev@vger.kernel.org, Estevam Fabio-R49496, Ben Hutchings,
	Stephen Hemminger, robert.daniels

On Tuesday, December 16, 2014 at 12:41:31 PM, Russell King - ARM Linux wrote:
> On Tue, Dec 16, 2014 at 09:33:53AM -0200, Fabio Estevam wrote:
> > Hi Fugang,
> > 
> > On Tue, Dec 16, 2014 at 8:25 AM, Fugang Duan <b38611@freescale.com> wrote:
> > > Do camera capture test on i.MX6q sabresd board, and save the capture
> > > data to nfs rootfs. The command is:
> > > gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 !
> > > tee name=t ! queue ! imxv4l2sink sync=false t. ! queue ! vpuenc !
> > > queue ! mux. pulsesrc num-buffers=3720937 blocksize=4096 !
> > > 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc !
> > > mpegaudioparse ! queue ! mux. qtmux name=mux ! filesink
> > > location=video_recording_long.mov
> > > 
> > > After about 10 hours running, there have net watchdog timeout kernel
> > > dump: ...
> > > WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264
> > > dev_watchdog+0x2b4/0x2d8() NETDEV WATCHDOG: eth0 (fec): transmit queue
> > > 0 timed out
> > 
> > Adding more people who reported similar issues in the past.
> > 
> > Marek,
> > 
> > Does this patch solve the problem you reported at
> > http://www.spinics.net/lists/netdev/msg268167.html ?
> 
> My set of patches fixed stuff exactly like this...

I still keep your G+ post open, in case I ever manage to find free time to dive 
into it. It's be a terrible waste to let these patches go. Right now, I'm in the 
process of finishing my degree (finally) so things are just crap, apologies.

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/1] net: fec: Fix NAPI race
  2014-12-16 10:25 [PATCH net-next 1/1] net: fec: Fix NAPI race Fugang Duan
  2014-12-16 11:33 ` Fabio Estevam
@ 2014-12-16 20:24 ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2014-12-16 20:24 UTC (permalink / raw)
  To: b38611; +Cc: netdev, R49496, bhutchings, stephen

From: Fugang Duan <b38611@freescale.com>
Date: Tue, 16 Dec 2014 18:25:58 +0800

> Do camera capture test on i.MX6q sabresd board, and save the capture data to
> nfs rootfs. The command is:
> gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 ! tee name=t !
> queue ! imxv4l2sink sync=false t. ! queue ! vpuenc ! queue ! mux. pulsesrc num-buffers=3720937
> blocksize=4096 ! 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc ! mpegaudioparse !
> queue ! mux. qtmux name=mux ! filesink location=video_recording_long.mov
> 
> After about 10 hours running, there have net watchdog timeout kernel dump:
 ...
> There might have a race in napi_schedule(), leaving interrupts disabled forever.
> After these patch, the case still work more than 40 hours running.
> 
> Signed-off-by: Fugang Duan <B38611@freescale.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-12-16 20:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-16 10:25 [PATCH net-next 1/1] net: fec: Fix NAPI race Fugang Duan
2014-12-16 11:33 ` Fabio Estevam
2014-12-16 11:41   ` Russell King - ARM Linux
2014-12-16 13:34     ` Marek Vasut
2014-12-16 20:24 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).