From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lucas Stach Subject: Re: [PATCH v4 1/1 net] net: fec: fix kernel oops when plug/unplug cable many times Date: Tue, 07 May 2013 12:13:37 +0200 Message-ID: <1367921617.4126.36.camel@weser.hi.pengutronix.de> References: <1367898205-6272-1-git-send-email-Frank.Li@freescale.com> <1367920728.4126.32.camel@weser.hi.pengutronix.de> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Frank Li , Francois Romieu , Robert Schwebel , David Miller , "netdev@vger.kernel.org" , Fabio Estevam , Shawn Guo To: Frank Li Return-path: Received: from metis.ext.pengutronix.de ([92.198.50.35]:51537 "EHLO metis.ext.pengutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754552Ab3EGKPA (ORCPT ); Tue, 7 May 2013 06:15:00 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Am Dienstag, den 07.05.2013, 18:02 +0800 schrieb Frank Li: > 2013/5/7 Lucas Stach : > > Am Dienstag, den 07.05.2013, 11:43 +0800 schrieb Frank Li: > >> reproduce steps > >> 1. flood ping from other machine > >> ping -f -s 41000 IP > >> 2. run below script > >> while [ 1 ]; do ethtool -s eth0 autoneg off; > >> sleep 3;ethtool -s eth0 autoneg on; sleep 4; done; > >> > >> You can see oops in one hour. > >> > >> The reason is fec_restart clear BD but NAPI may use it. > >> The solution is disable NAPI and stop xmit when reset BD. > >> disable NAPI may sleep, so fec_restart can't be call in > >> atomic context. > >> > >> Signed-off-by: Frank Li > > One minor nitpick below, otherwise > > Reviewed-by: Lucas Stach > > Tested-by: Lucas Stach > > > > Could this patch please be marked as a candidate for the 3.9 stable > > tree? It fixes a real and severe problem for me, as I seem to be able to > > trigger the bug much more easily than Frank. > > > > How to mark as a candidate for the 3.9 stable? > See Documentation/stable_kernel_rules.txt Basically for this patch just add an Cc: # 3.9 below the sign-off area, but don't actually send the patch there. It will get cherry-picked just based on the tag. > >> --- > >> Change from v1 to v2 > >> * Add netif_tx_lock(ndev) to avoid xmit runing when reset hardware > >> Change from v2 to v3 > >> * Move put real statements after function variable declarations according to David's comments > >> * Remove lock in adjust_link according to Lucas Stach's comments > >> Change from v3 to v4 > >> * rebase to latest net/master > >> * remove hw_lock because not used again > >> * reduce delay work to 0 > >> * group delay work related feild to one structure > >> * call netif_device_detach() in fec_restart > >> > >> drivers/net/ethernet/freescale/fec.h | 10 ++++-- > >> drivers/net/ethernet/freescale/fec_main.c | 44 +++++++++++++++++++++------- > >> 2 files changed, 39 insertions(+), 15 deletions(-) > >> > > [...] > >> > >> static void > >> @@ -644,8 +658,22 @@ fec_timeout(struct net_device *ndev) > >> > >> ndev->stats.tx_errors++; > >> > >> - fec_restart(ndev, fep->full_duplex); > >> - netif_wake_queue(ndev); > >> + fep->delay_work.timeout = 1; > > I would like to see a proper true/false used in conjunction with the > > bool data type. > > > >> + schedule_delayed_work(&(fep->delay_work.delay_work), 0); > >> +} > >> + > >> +static void fec_enet_work(struct work_struct *work) > >> +{ > >> + struct fec_enet_private *fep = > >> + container_of(work, > >> + struct fec_enet_private, > >> + delay_work.delay_work.work); > >> + > >> + if (fep->delay_work.timeout) { > >> + fep->delay_work.timeout = 0; > > Same as above. > > > >> + fec_restart(fep->netdev, fep->full_duplex); > >> + netif_wake_queue(fep->netdev); > >> + } > >> } > >> > > [...] -- Pengutronix e.K. | Lucas Stach | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |