From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Greear Subject: Re: [PATCH] net-fq: Add WARN_ON check for null flow. Date: Thu, 7 Jun 2018 09:33:11 -0700 Message-ID: <03fb542b-2754-fa2c-435b-928fe2b2afbc@candelatech.com> References: <1528387585-5806-1-git-send-email-greearb@candelatech.com> <577ce32e-cbee-c5d8-da12-81075e924b23@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit To: Eric Dumazet , netdev@vger.kernel.org Return-path: Received: from mail2.candelatech.com ([208.74.158.173]:43658 "EHLO mail2.candelatech.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753165AbeFGQdO (ORCPT ); Thu, 7 Jun 2018 12:33:14 -0400 In-Reply-To: <577ce32e-cbee-c5d8-da12-81075e924b23@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On 06/07/2018 09:17 AM, Eric Dumazet wrote: > > > On 06/07/2018 09:06 AM, greearb@candelatech.com wrote: >> From: Ben Greear >> >> While testing an ath10k firmware that often crashed under load, >> I was seeing kernel crashes as well. One of them appeared to >> be a dereference of a NULL flow object in fq_tin_dequeue. >> >> I have since fixed the firmware flaw, but I think it would be >> worth adding the WARN_ON in case the problem appears again. >> >> common_interrupt+0xf/0xf >> >> > > Please find the exact commit that brought this bug, > and add a corresponding Fixes: tag It will be a total pain to bisect this problem since my test case that causes this is running my modified firmware (and a buggy one at that), modified ath10k driver (to work with this firmware and support my test case easily), and the failure case appears to cause multiple different-but-probably-related crashes and often hangs or reboots the test system. Probably this is all caused by some nasty race or buggy logic related to dealing with a crashed ath10k firmware tearing down txq logic from the bottom up. There have been many such bugs in the past, I and others fixed a few, and very likely more remain. For what it is worth, I didn't see this crash in 4.13, and I spent some time testing buggy firmware there occasionally. If someone else has interest in debugging the ath10k driver, I will be happy to generate a mostly-stock firmware image with ability to crash in the TX path and give it to them. It will crash the stock upstream code reliably in my experience. Thanks, Ben > >> Signed-off-by: Ben Greear >> --- >> include/net/fq_impl.h | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h >> index be7c0fa..e40354d 100644 >> --- a/include/net/fq_impl.h >> +++ b/include/net/fq_impl.h >> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, >> >> flow = list_first_entry(head, struct fq_flow, flowchain); >> >> + if (WARN_ON_ONCE(!flow)) >> + return NULL; >> + >> if (flow->deficit <= 0) { >> flow->deficit += fq->quantum; >> list_move_tail(&flow->flowchain, >> > -- Ben Greear Candela Technologies Inc http://www.candelatech.com