From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linuxfoundation.org ([140.211.169.12]:50524 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932448AbeCGDam (ORCPT ); Tue, 6 Mar 2018 22:30:42 -0500 Subject: Patch "net: ethernet: ti: cpsw: fix net watchdog timeout" has been added to the 4.14-stable tree To: grygorii.strashko@ti.com, davem@davemloft.net, gregkh@linuxfoundation.org, ivan.khoronzhuk@linaro.org Cc: , From: Date: Tue, 06 Mar 2018 19:30:28 -0800 Message-ID: <152039342812867@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org List-ID: This is a note to let you know that I've just added the patch titled net: ethernet: ti: cpsw: fix net watchdog timeout to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: net-ethernet-ti-cpsw-fix-net-watchdog-timeout.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >>From foo@baz Tue Mar 6 19:02:12 PST 2018 From: Grygorii Strashko Date: Tue, 6 Feb 2018 19:17:06 -0600 Subject: net: ethernet: ti: cpsw: fix net watchdog timeout From: Grygorii Strashko [ Upstream commit 62f94c2101f35cd45775df00ba09bde77580e26a ] It was discovered that simple program which indefinitely sends 200b UDP packets and runs on TI AM574x SoC (SMP) under RT Kernel triggers network watchdog timeout in TI CPSW driver (<6 hours run). The network watchdog timeout is triggered due to race between cpsw_ndo_start_xmit() and cpsw_tx_handler() [NAPI] cpsw_ndo_start_xmit() if (unlikely(!cpdma_check_free_tx_desc(txch))) { txq = netdev_get_tx_queue(ndev, q_idx); netif_tx_stop_queue(txq); ^^ as per [1] barier has to be used after set_bit() otherwise new value might not be visible to other cpus } cpsw_tx_handler() if (unlikely(netif_tx_queue_stopped(txq))) netif_tx_wake_queue(txq); and when it happens ndev TX queue became disabled forever while driver's HW TX queue is empty. Fix this, by adding smp_mb__after_atomic() after netif_tx_stop_queue() calls and double check for free TX descriptors after stopping ndev TX queue - if there are free TX descriptors wake up ndev TX queue. [1] https://www.kernel.org/doc/html/latest/core-api/atomic_ops.html Signed-off-by: Grygorii Strashko Reviewed-by: Ivan Khoronzhuk Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1618,6 +1618,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(s q_idx = q_idx % cpsw->tx_ch_num; txch = cpsw->txv[q_idx].ch; + txq = netdev_get_tx_queue(ndev, q_idx); ret = cpsw_tx_packet_submit(priv, skb, txch); if (unlikely(ret != 0)) { cpsw_err(priv, tx_err, "desc submit failed\n"); @@ -1628,15 +1629,26 @@ static netdev_tx_t cpsw_ndo_start_xmit(s * tell the kernel to stop sending us tx frames. */ if (unlikely(!cpdma_check_free_tx_desc(txch))) { - txq = netdev_get_tx_queue(ndev, q_idx); netif_tx_stop_queue(txq); + + /* Barrier, so that stop_queue visible to other cpus */ + smp_mb__after_atomic(); + + if (cpdma_check_free_tx_desc(txch)) + netif_tx_wake_queue(txq); } return NETDEV_TX_OK; fail: ndev->stats.tx_dropped++; - txq = netdev_get_tx_queue(ndev, skb_get_queue_mapping(skb)); netif_tx_stop_queue(txq); + + /* Barrier, so that stop_queue visible to other cpus */ + smp_mb__after_atomic(); + + if (cpdma_check_free_tx_desc(txch)) + netif_tx_wake_queue(txq); + return NETDEV_TX_BUSY; } Patches currently in stable-queue which might be from grygorii.strashko@ti.com are queue-4.14/net-ethernet-ti-cpsw-fix-net-watchdog-timeout.patch