From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-2?Q?Vlastimil_=A9etka?= Subject: Altera TSE (altera_tse) net_rx_action WARNING - polling bug in altera_tse_main.c? Date: Mon, 26 Jan 2015 13:39:33 +0100 Message-ID: <54C63585.1010908@vsis.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2; format=flowed Content-Transfer-Encoding: 7bit To: vbridger@opensource.altera.com, netdev@vger.kernel.org Return-path: Received: from mail.spsostrov.cz ([217.117.209.230]:53467 "EHLO mail.spsostrov.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755052AbbAZMqh (ORCPT ); Mon, 26 Jan 2015 07:46:37 -0500 Sender: netdev-owner@vger.kernel.org List-ID: Hello, I am using Altera TSE kernel driver (altera_tse module) on Altera socfpga platform (Cyclone V SoC with ARM Cortex-A9) and I probably discovered a bug in it. I have two TSE controllers instantiated in FPGA - my FPGA HW design is based on this tutorial: http://www.rocketboards.org/foswiki/Projects/AlteraSoCTripleSpeedEthernetDesignExample The kernel version is 3.10.37-ltsi with RT patch, from http://rocketboards.org/gitweb/?p=linux-socfpga.git;a=commit;h=7ea94617cfae6a62ee963adc1ae340196dbe2b34 with backported some altera_tse fixes from current 3.19-rc5. I was not able to get TSE ethernets working on vanilla 3.19-rc5, probably because of some changes around interrupts and devicetree, but it's another story. After some time (minutes to hours) of exhaustive traffic generated by iperf through altera_tse ethernet, I can see a kernel warning on console like this: ------------[ cut here ]------------ WARNING: at net/core/dev.c:4255 net_rx_action+0x268/0x28c() Modules linked in: gpio_altera altera_sysid altera_tse CPU: 0 PID: 5885 Comm: irq/75-eth2 Not tainted 3.10.37-ltsi-rt37-vs-2-1-00062-g861955e #1 [<800166c4>] (unwind_backtrace+0x0/0x100) from [<80012edc>] (show_stack+0x20/0x24) [<80012edc>] (show_stack+0x20/0x24) from [<80503404>] (dump_stack+0x24/0x28) [<80503404>] (dump_stack+0x24/0x28) from [<8002303c>] (warn_slowpath_common+0x64/0x7c) [<8002303c>] (warn_slowpath_common+0x64/0x7c) from [<80023110>] (warn_slowpath_null+0x2c/0x34) [<80023110>] (warn_slowpath_null+0x2c/0x34) from [<80404d48>] (net_rx_action+0x268/0x28c) [<80404d48>] (net_rx_action+0x268/0x28c) from [<8002bd18>] (do_current_softirqs+0x1e4/0x388) [<8002bd18>] (do_current_softirqs+0x1e4/0x388) from [<8002bf34>] (local_bh_enable+0x78/0x90) [<8002bf34>] (local_bh_enable+0x78/0x90) from [<80086c9c>] (irq_forced_thread_fn+0x50/0x74) [<80086c9c>] (irq_forced_thread_fn+0x50/0x74) from [<80086fbc>] (irq_thread+0x16c/0x1c8) [<80086fbc>] (irq_thread+0x16c/0x1c8) from [<80048104>] (kthread+0xb4/0xb8) [<80048104>] (kthread+0xb4/0xb8) from [<8000e718>] (ret_from_fork+0x14/0x20) ---[ end trace 0000000000000002 ]--- The warning point is: WARN_ON_ONCE(work > weight); at http://rocketboards.org/gitweb/?p=linux-socfpga.git;a=blob;f=net/core/dev.c;h=2193b5dc276ad6aa54adb1ee15ef3de625915fcd;hb=7ea94617cfae6a62ee963adc1ae340196dbe2b34#l4255 After a warning, interface is still working without problems. I am not much familiar with Linux network stack and device drivers. But I probably found a root cause in: # drivers/net/ethernet/altera/altera_tse_main.c. # http://rocketboards.org/gitweb/?p=linux-socfpga.git;a=blob;f=drivers/net/ethernet/altera/altera_tse_main.c;h=07c0b193c55722d18ff2723f0a7e137671746ba1;hb=7ea94617cfae6a62ee963adc1ae340196dbe2b34#l368 static int tse_rx(struct altera_tse_private *priv, int limit) the `limit` parameter is not used anywhere in the function! When `tse_rx` is called from `tse_poll` it can return more frames than limit, which in the end triggers the kernel warning as I think: # drivers/net/ethernet/altera/altera_tse_main.c # http://rocketboards.org/gitweb/?p=linux-socfpga.git;a=blob;f=drivers/net/ethernet/altera/altera_tse_main.c;h=07c0b193c55722d18ff2723f0a7e137671746ba1;hb=7ea94617cfae6a62ee963adc1ae340196dbe2b34#l488 static int tse_poll(struct napi_struct *napi, int budget) { ... txcomplete = tse_tx_complete(priv); rxcomplete = tse_rx(priv, budget); if (rxcomplete >= budget || txcomplete > 0) return rxcomplete; Condition `if (rxcomplete >= budget || txcomplete > 0) return rxcomplete;` is also very weird for me. I am not sure if it's buggy, but I think it should be at least commented how it works. Vlastimil Setka