* [PATCH] net-fq: Add WARN_ON check for null flow.
@ 2018-06-07 16:06 greearb
2018-06-07 16:17 ` Eric Dumazet
2018-06-07 21:29 ` Cong Wang
0 siblings, 2 replies; 7+ messages in thread
From: greearb @ 2018-06-07 16:06 UTC (permalink / raw)
To: netdev; +Cc: Ben Greear
From: Ben Greear <greearb@candelatech.com>
While testing an ath10k firmware that often crashed under load,
I was seeing kernel crashes as well. One of them appeared to
be a dereference of a NULL flow object in fq_tin_dequeue.
I have since fixed the firmware flaw, but I think it would be
worth adding the WARN_ON in case the problem appears again.
BUG: unable to handle kernel NULL pointer dereference at 000000000000003c
IP: ieee80211_tx_dequeue+0xfb/0xb10 [mac80211]
PGD 80000001417fe067 P4D 80000001417fe067 PUD 13db41067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c vrf 8021q garp mrp stp llc fuse macvlan wanlink(O) pktgen lm78 ]
CPU: 2 PID: 21733 Comm: ip Tainted: G W O 4.16.8+ #35
Hardware name: _ _/, BIOS 5.11 08/26/2016
RIP: 0010:ieee80211_tx_dequeue+0xfb/0xb10 [mac80211]
RSP: 0018:ffff880172d03c30 EFLAGS: 00010286
RAX: ffff88013b2c0000 RBX: ffff88013b2c00b8 RCX: 0000000000000898
RDX: 0000000000000001 RSI: ffff88013b2c00d8 RDI: ffff88016ac40820
RBP: ffff88016ac42ba0 R08: 0000000000200000 R09: 0000000000000000
R10: 0000000000100000 R11: 0000001256c89fd8 R12: ffff88013b2c0000
R13: ffff88013b2c00d8 R14: 0000000000000000 R15: ffff88013b2c00d8
FS: 00007f04e3606700(0000) GS:ffff880172d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000003c CR3: 000000013b35a005 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
? update_load_avg+0x607/0x6f0
ath10k_mac_tx_push_txq+0x6e/0x220 [ath10k_core]
ath10k_mac_tx_push_pending+0x151/0x1e0 [ath10k_core]
ath10k_htt_txrx_compl_task+0x113e/0x1940 [ath10k_core]
? ath10k_ce_completed_send_next_nolock+0x6f/0x90 [ath10k_pci]
? ath10k_ce_completed_send_next+0x31/0x40 [ath10k_pci]
? ath10k_pci_htc_tx_cb+0x30/0xc0 [ath10k_pci]
? ath10k_bus_pci_write32+0x3c/0xa0 [ath10k_pci]
ath10k_pci_napi_poll+0x44/0xf0 [ath10k_pci]
net_rx_action+0x250/0x3b0
__do_softirq+0xc2/0x2c2
irq_exit+0x93/0xa0
do_IRQ+0x45/0xc0
common_interrupt+0xf/0xf
</IRQ>
Signed-off-by: Ben Greear <greearb@candelatech.com>
---
include/net/fq_impl.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index be7c0fa..e40354d 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
flow = list_first_entry(head, struct fq_flow, flowchain);
+ if (WARN_ON_ONCE(!flow))
+ return NULL;
+
if (flow->deficit <= 0) {
flow->deficit += fq->quantum;
list_move_tail(&flow->flowchain,
--
2.4.11
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] net-fq: Add WARN_ON check for null flow.
2018-06-07 16:06 [PATCH] net-fq: Add WARN_ON check for null flow greearb
@ 2018-06-07 16:17 ` Eric Dumazet
2018-06-07 16:33 ` Ben Greear
2018-06-07 21:29 ` Cong Wang
1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2018-06-07 16:17 UTC (permalink / raw)
To: greearb, netdev
On 06/07/2018 09:06 AM, greearb@candelatech.com wrote:
> From: Ben Greear <greearb@candelatech.com>
>
> While testing an ath10k firmware that often crashed under load,
> I was seeing kernel crashes as well. One of them appeared to
> be a dereference of a NULL flow object in fq_tin_dequeue.
>
> I have since fixed the firmware flaw, but I think it would be
> worth adding the WARN_ON in case the problem appears again.
>
> common_interrupt+0xf/0xf
> </IRQ>
>
Please find the exact commit that brought this bug,
and add a corresponding Fixes: tag
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
> include/net/fq_impl.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
> index be7c0fa..e40354d 100644
> --- a/include/net/fq_impl.h
> +++ b/include/net/fq_impl.h
> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>
> flow = list_first_entry(head, struct fq_flow, flowchain);
>
> + if (WARN_ON_ONCE(!flow))
> + return NULL;
> +
> if (flow->deficit <= 0) {
> flow->deficit += fq->quantum;
> list_move_tail(&flow->flowchain,
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] net-fq: Add WARN_ON check for null flow.
2018-06-07 16:17 ` Eric Dumazet
@ 2018-06-07 16:33 ` Ben Greear
0 siblings, 0 replies; 7+ messages in thread
From: Ben Greear @ 2018-06-07 16:33 UTC (permalink / raw)
To: Eric Dumazet, netdev
On 06/07/2018 09:17 AM, Eric Dumazet wrote:
>
>
> On 06/07/2018 09:06 AM, greearb@candelatech.com wrote:
>> From: Ben Greear <greearb@candelatech.com>
>>
>> While testing an ath10k firmware that often crashed under load,
>> I was seeing kernel crashes as well. One of them appeared to
>> be a dereference of a NULL flow object in fq_tin_dequeue.
>>
>> I have since fixed the firmware flaw, but I think it would be
>> worth adding the WARN_ON in case the problem appears again.
>>
>> common_interrupt+0xf/0xf
>> </IRQ>
>>
>
> Please find the exact commit that brought this bug,
> and add a corresponding Fixes: tag
It will be a total pain to bisect this problem since my test
case that causes this is running my modified firmware (and a buggy one at that),
modified ath10k driver (to work with this firmware and support my test case easily),
and the failure case appears to cause multiple different-but-probably-related
crashes and often hangs or reboots the test system.
Probably this is all caused by some nasty race or buggy logic related to
dealing with a crashed ath10k firmware tearing down txq logic from the
bottom up. There have been many such bugs in the past, I and others fixed a few,
and very likely more remain.
For what it is worth, I didn't see this crash in 4.13, and I spent some time
testing buggy firmware there occasionally.
If someone else has interest in debugging the ath10k driver, I will be happy to generate
a mostly-stock firmware image with ability to crash in the TX path and give it to them.
It will crash the stock upstream code reliably in my experience.
Thanks,
Ben
>
>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>> ---
>> include/net/fq_impl.h | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
>> index be7c0fa..e40354d 100644
>> --- a/include/net/fq_impl.h
>> +++ b/include/net/fq_impl.h
>> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>>
>> flow = list_first_entry(head, struct fq_flow, flowchain);
>>
>> + if (WARN_ON_ONCE(!flow))
>> + return NULL;
>> +
>> if (flow->deficit <= 0) {
>> flow->deficit += fq->quantum;
>> list_move_tail(&flow->flowchain,
>>
>
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net-fq: Add WARN_ON check for null flow.
2018-06-07 16:06 [PATCH] net-fq: Add WARN_ON check for null flow greearb
2018-06-07 16:17 ` Eric Dumazet
@ 2018-06-07 21:29 ` Cong Wang
2018-06-07 21:41 ` Ben Greear
1 sibling, 1 reply; 7+ messages in thread
From: Cong Wang @ 2018-06-07 21:29 UTC (permalink / raw)
To: Ben Greear; +Cc: Linux Kernel Network Developers
On Thu, Jun 7, 2018 at 9:06 AM, <greearb@candelatech.com> wrote:
> --- a/include/net/fq_impl.h
> +++ b/include/net/fq_impl.h
> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>
> flow = list_first_entry(head, struct fq_flow, flowchain);
>
> + if (WARN_ON_ONCE(!flow))
> + return NULL;
> +
How could even possibly list_first_entry() returns NULL?
You need list_first_entry_or_null().
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net-fq: Add WARN_ON check for null flow.
2018-06-07 21:29 ` Cong Wang
@ 2018-06-07 21:41 ` Ben Greear
2018-06-07 21:52 ` Cong Wang
0 siblings, 1 reply; 7+ messages in thread
From: Ben Greear @ 2018-06-07 21:41 UTC (permalink / raw)
To: Cong Wang; +Cc: Linux Kernel Network Developers
On 06/07/2018 02:29 PM, Cong Wang wrote:
> On Thu, Jun 7, 2018 at 9:06 AM, <greearb@candelatech.com> wrote:
>> --- a/include/net/fq_impl.h
>> +++ b/include/net/fq_impl.h
>> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>>
>> flow = list_first_entry(head, struct fq_flow, flowchain);
>>
>> + if (WARN_ON_ONCE(!flow))
>> + return NULL;
>> +
>
> How could even possibly list_first_entry() returns NULL?
> You need list_first_entry_or_null().
>
I don't know for certain flow as null, but something was NULL in this method
near that line and it looked like a likely culprit.
I guess possibly tin or fq was passed in as NULL?
Anyway, if the patch seems worthless just ignore it. I'll leave it in my tree
since it should be harmless and will let you know if I ever hit it.
If someone else hits a similar crash, hopefully they can report it.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net-fq: Add WARN_ON check for null flow.
2018-06-07 21:41 ` Ben Greear
@ 2018-06-07 21:52 ` Cong Wang
2018-06-07 22:08 ` Ben Greear
0 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2018-06-07 21:52 UTC (permalink / raw)
To: Ben Greear; +Cc: Linux Kernel Network Developers
On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 06/07/2018 02:29 PM, Cong Wang wrote:
>>
>> On Thu, Jun 7, 2018 at 9:06 AM, <greearb@candelatech.com> wrote:
>>>
>>> --- a/include/net/fq_impl.h
>>> +++ b/include/net/fq_impl.h
>>> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>>>
>>> flow = list_first_entry(head, struct fq_flow, flowchain);
>>>
>>> + if (WARN_ON_ONCE(!flow))
>>> + return NULL;
>>> +
>>
>>
>> How could even possibly list_first_entry() returns NULL?
>> You need list_first_entry_or_null().
>>
>
> I don't know for certain flow as null, but something was NULL in this method
> near that line and it looked like a likely culprit.
>
> I guess possibly tin or fq was passed in as NULL?
A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c
too, but you are checking against 0 in your patch, that is the problem and
that is why list_first_entry_or_null() exists.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] net-fq: Add WARN_ON check for null flow.
2018-06-07 21:52 ` Cong Wang
@ 2018-06-07 22:08 ` Ben Greear
0 siblings, 0 replies; 7+ messages in thread
From: Ben Greear @ 2018-06-07 22:08 UTC (permalink / raw)
To: Cong Wang; +Cc: Linux Kernel Network Developers
On 06/07/2018 02:52 PM, Cong Wang wrote:
> On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear <greearb@candelatech.com> wrote:
>> On 06/07/2018 02:29 PM, Cong Wang wrote:
>>>
>>> On Thu, Jun 7, 2018 at 9:06 AM, <greearb@candelatech.com> wrote:
>>>>
>>>> --- a/include/net/fq_impl.h
>>>> +++ b/include/net/fq_impl.h
>>>> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>>>>
>>>> flow = list_first_entry(head, struct fq_flow, flowchain);
>>>>
>>>> + if (WARN_ON_ONCE(!flow))
>>>> + return NULL;
>>>> +
>>>
>>>
>>> How could even possibly list_first_entry() returns NULL?
>>> You need list_first_entry_or_null().
>>>
>>
>> I don't know for certain flow as null, but something was NULL in this method
>> near that line and it looked like a likely culprit.
>>
>> I guess possibly tin or fq was passed in as NULL?
>
> A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c
> too, but you are checking against 0 in your patch, that is the problem and
> that is why list_first_entry_or_null() exists.
>
Ahh, I see what you mean, and that is my mistake. In my case, it did seem to
be a mostly-null deref, not a 0x0 deref.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-06-07 22:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-07 16:06 [PATCH] net-fq: Add WARN_ON check for null flow greearb
2018-06-07 16:17 ` Eric Dumazet
2018-06-07 16:33 ` Ben Greear
2018-06-07 21:29 ` Cong Wang
2018-06-07 21:41 ` Ben Greear
2018-06-07 21:52 ` Cong Wang
2018-06-07 22:08 ` Ben Greear
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).