From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: stmmac: Race in coalesce timer and NAPI Date: Fri, 21 Sep 2018 06:54:10 -0700 Message-ID: <109edd5e-15d2-e89a-9331-a63798eb292b@gmail.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit To: Jose Abreu , "netdev@vger.kernel.org" , Joao Pinto Return-path: Received: from mail-pl1-f177.google.com ([209.85.214.177]:33640 "EHLO mail-pl1-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727392AbeIUTnL (ORCPT ); Fri, 21 Sep 2018 15:43:11 -0400 Received: by mail-pl1-f177.google.com with SMTP id b97-v6so6030989plb.0 for ; Fri, 21 Sep 2018 06:54:12 -0700 (PDT) In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 09/21/2018 02:19 AM, Jose Abreu wrote: > Hello, > > I'm getting a race in stmmac coalesce timer and the > napi_schedule() interrupt and I'm asking for advice. Currently, > we are scheduling NAPI in coalesce timer but this leads to > stmmac_tx_clean() deadlock because this function tries to acquire > queue lock. This is strange. Which lock are you talking about ? The napi_schedule() stuff should be enough to protect your use case. > > I find that this is not expected because only one instance of > NAPI should run at same time so I was wondering if it is possible > that xmit() callback is causing the deadlock ? > > BTW, this is solved by: > - Directly call stmmac_tx_clean() in timer function AND > - Use netif_tx_trylock() in stmmac_tx_clean(). Then, if queue > is already locked we re-arm coalesce timer or reschedule NAPI. > > This is easily reproducible in an ARM board with 8 core running > at 100MHz each. > > Thanks and Best Regards, > Jose Miguel Abreu > It looks to me stmmac_napi_poll() should not apply/consume any budget for TX completion. The budget for a NAPI poll shared by RX and TX is really only for the RX side. netpoll will specificall call the poll() with budget==0 to only drain TX