* Re: [PATCH v2] net-fq: Add WARN_ON check for null flow. [not found] ` <CAM_iQpULrWMNtgDcrZkc-uLtB0XOVFeZxQ6cFgpXwv7DtA9jzA@mail.gmail.com> @ 2018-06-08 15:17 ` Ben Greear 2018-06-08 21:40 ` Arend van Spriel 0 siblings, 1 reply; 4+ messages in thread From: Ben Greear @ 2018-06-08 15:17 UTC (permalink / raw) To: Michal Kazior Cc: Cong Wang, Linux Kernel Network Developers, linux-wireless@vger.kernel.org On 06/07/2018 04:59 PM, Cong Wang wrote: > On Thu, Jun 7, 2018 at 4:48 PM, <greearb@candelatech.com> wrote: >> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h >> index be7c0fa..cb911f0 100644 >> --- a/include/net/fq_impl.h >> +++ b/include/net/fq_impl.h >> @@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, >> return NULL; >> } >> >> - flow = list_first_entry(head, struct fq_flow, flowchain); >> + flow = list_first_entry_or_null(head, struct fq_flow, flowchain); >> + >> + if (WARN_ON_ONCE(!flow)) >> + return NULL; > > This does not make sense either. list_first_entry_or_null() > returns NULL only when the list is empty, but we already check > list_empty() right before this code, and it is protected by fq->lock. > Hello Michal, git blame shows you as the author of the fq_impl.h code. I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks kernel. There was an apparent mostly-null deref in the fq_tin_dequeue method. According to gdb, it was within 1 line of the dereference of 'flow'. My hack above is probably not that useful. Cong thinks maybe the locking is bad. If you get a chance, please review this thread and see if you have any ideas for a better fix (or better debugging code). As always, if you would like me to generate you a buggy firmware that will crash in the tx path and cause all sorts of mayhem in the ath10k driver and wifi stack, I will be happy to do so. https://www.mail-archive.com/netdev@vger.kernel.org/msg239738.html Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] net-fq: Add WARN_ON check for null flow. 2018-06-08 15:17 ` [PATCH v2] net-fq: Add WARN_ON check for null flow Ben Greear @ 2018-06-08 21:40 ` Arend van Spriel 2018-06-10 17:10 ` Michał Kazior 0 siblings, 1 reply; 4+ messages in thread From: Arend van Spriel @ 2018-06-08 21:40 UTC (permalink / raw) To: Ben Greear, Michał Kazior Cc: Cong Wang, Linux Kernel Network Developers, linux-wireless@vger.kernel.org On 6/8/2018 5:17 PM, Ben Greear wrote: I recalled an email from Michał leaving tieto so adding his alternate email he provided back then. Gr. AvS > On 06/07/2018 04:59 PM, Cong Wang wrote: >> On Thu, Jun 7, 2018 at 4:48 PM, <greearb@candelatech.com> wrote: >>> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h >>> index be7c0fa..cb911f0 100644 >>> --- a/include/net/fq_impl.h >>> +++ b/include/net/fq_impl.h >>> @@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, >>> return NULL; >>> } >>> >>> - flow = list_first_entry(head, struct fq_flow, flowchain); >>> + flow = list_first_entry_or_null(head, struct fq_flow, >>> flowchain); >>> + >>> + if (WARN_ON_ONCE(!flow)) >>> + return NULL; >> >> This does not make sense either. list_first_entry_or_null() >> returns NULL only when the list is empty, but we already check >> list_empty() right before this code, and it is protected by fq->lock. >> > > Hello Michal, > > git blame shows you as the author of the fq_impl.h code. > > I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks > kernel. There was an apparent > mostly-null deref in the fq_tin_dequeue method. According to gdb, it > was within > 1 line of the dereference of 'flow'. > > My hack above is probably not that useful. Cong thinks maybe the > locking is bad. > > If you get a chance, please review this thread and see if you have any > ideas for > a better fix (or better debugging code). > > As always, if you would like me to generate you a buggy firmware that > will crash > in the tx path and cause all sorts of mayhem in the ath10k driver and > wifi stack, > I will be happy to do so. > > https://www.mail-archive.com/netdev@vger.kernel.org/msg239738.html > > Thanks, > Ben > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] net-fq: Add WARN_ON check for null flow. 2018-06-08 21:40 ` Arend van Spriel @ 2018-06-10 17:10 ` Michał Kazior 2018-06-11 13:18 ` Ben Greear 0 siblings, 1 reply; 4+ messages in thread From: Michał Kazior @ 2018-06-10 17:10 UTC (permalink / raw) To: Arend van Spriel Cc: Ben Greear, Cong Wang, Linux Kernel Network Developers, linux-wireless@vger.kernel.org Ben, The patch is symptomatic. fq_tin_dequeue() already checks if the list is empty before it tries to access first entry. I see no point in using the _or_null() + WARN_ON. The 0x3c deref is likely an offset off of NULL base pointer. Did you check gdb/addr2line of the ieee80211_tx_dequeue+0xfb? Where did it point to? I suspect there's not enough synchronization between quescing the device/ath10k after fw crashes and performing mac80211's reconfig procedure. Micha=C5=82 On 8 June 2018 at 23:40, Arend van Spriel <arend.vanspriel@broadcom.com> wr= ote: > On 6/8/2018 5:17 PM, Ben Greear wrote: > > I recalled an email from Micha=C5=82 leaving tieto so adding his alternat= e email > he provided back then. > > Gr. AvS > > >> On 06/07/2018 04:59 PM, Cong Wang wrote: >>> >>> On Thu, Jun 7, 2018 at 4:48 PM, <greearb@candelatech.com> wrote: >>>> >>>> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h >>>> index be7c0fa..cb911f0 100644 >>>> --- a/include/net/fq_impl.h >>>> +++ b/include/net/fq_impl.h >>>> @@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq= , >>>> return NULL; >>>> } >>>> >>>> - flow =3D list_first_entry(head, struct fq_flow, flowchain); >>>> + flow =3D list_first_entry_or_null(head, struct fq_flow, >>>> flowchain); >>>> + >>>> + if (WARN_ON_ONCE(!flow)) >>>> + return NULL; >>> >>> >>> This does not make sense either. list_first_entry_or_null() >>> returns NULL only when the list is empty, but we already check >>> list_empty() right before this code, and it is protected by fq->lock. >>> >> >> Hello Michal, >> >> git blame shows you as the author of the fq_impl.h code. >> >> I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks >> kernel. There was an apparent >> mostly-null deref in the fq_tin_dequeue method. According to gdb, it >> was within >> 1 line of the dereference of 'flow'. >> >> My hack above is probably not that useful. Cong thinks maybe the >> locking is bad. >> >> If you get a chance, please review this thread and see if you have any >> ideas for >> a better fix (or better debugging code). >> >> As always, if you would like me to generate you a buggy firmware that >> will crash >> in the tx path and cause all sorts of mayhem in the ath10k driver and >> wifi stack, >> I will be happy to do so. >> >> https://www.mail-archive.com/netdev@vger.kernel.org/msg239738.html >> >> Thanks, >> Ben >> > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2] net-fq: Add WARN_ON check for null flow. 2018-06-10 17:10 ` Michał Kazior @ 2018-06-11 13:18 ` Ben Greear 0 siblings, 0 replies; 4+ messages in thread From: Ben Greear @ 2018-06-11 13:18 UTC (permalink / raw) To: Michał Kazior, Arend van Spriel Cc: Cong Wang, Linux Kernel Network Developers, linux-wireless@vger.kernel.org On 06/10/2018 10:10 AM, Michał Kazior wrote: > Ben, > > The patch is symptomatic. fq_tin_dequeue() already checks if the list > is empty before it tries to access first entry. I see no point in > using the _or_null() + WARN_ON. > > The 0x3c deref is likely an offset off of NULL base pointer. Did you > check gdb/addr2line of the ieee80211_tx_dequeue+0xfb? Where did it > point to? gdb pointed to one line above the flow dereference, which is why I was going to put some debugging in there. > > I suspect there's not enough synchronization between quescing the > device/ath10k after fw crashes and performing mac80211's reconfig > procedure. I am already running this patch which helps with some of that. That patch never made it upstream, but it fixed problems for me earlier. https://patchwork.kernel.org/patch/9457639/ Could easily be there are some more issues in that logic. Someone else posted a patch to disable mac-80211 tx when FW crashes, I think...I have not tried to backport that. https://patchwork.kernel.org/patch/10411967/ Thanks, Ben > > > Michał > > On 8 June 2018 at 23:40, Arend van Spriel <arend.vanspriel@broadcom.com> wrote: >> On 6/8/2018 5:17 PM, Ben Greear wrote: >> >> I recalled an email from Michał leaving tieto so adding his alternate email >> he provided back then. >> >> Gr. AvS >> >> >>> On 06/07/2018 04:59 PM, Cong Wang wrote: >>>> >>>> On Thu, Jun 7, 2018 at 4:48 PM, <greearb@candelatech.com> wrote: >>>>> >>>>> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h >>>>> index be7c0fa..cb911f0 100644 >>>>> --- a/include/net/fq_impl.h >>>>> +++ b/include/net/fq_impl.h >>>>> @@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq, >>>>> return NULL; >>>>> } >>>>> >>>>> - flow = list_first_entry(head, struct fq_flow, flowchain); >>>>> + flow = list_first_entry_or_null(head, struct fq_flow, >>>>> flowchain); >>>>> + >>>>> + if (WARN_ON_ONCE(!flow)) >>>>> + return NULL; >>>> >>>> >>>> This does not make sense either. list_first_entry_or_null() >>>> returns NULL only when the list is empty, but we already check >>>> list_empty() right before this code, and it is protected by fq->lock. >>>> >>> >>> Hello Michal, >>> >>> git blame shows you as the author of the fq_impl.h code. >>> >>> I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks >>> kernel. There was an apparent >>> mostly-null deref in the fq_tin_dequeue method. According to gdb, it >>> was within >>> 1 line of the dereference of 'flow'. >>> >>> My hack above is probably not that useful. Cong thinks maybe the >>> locking is bad. >>> >>> If you get a chance, please review this thread and see if you have any >>> ideas for >>> a better fix (or better debugging code). >>> >>> As always, if you would like me to generate you a buggy firmware that >>> will crash >>> in the tx path and cause all sorts of mayhem in the ath10k driver and >>> wifi stack, >>> I will be happy to do so. >>> >>> https://www.mail-archive.com/netdev@vger.kernel.org/msg239738.html >>> >>> Thanks, >>> Ben >>> >> > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-06-11 13:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1528415316-6379-1-git-send-email-greearb@candelatech.com>
[not found] ` <CAM_iQpULrWMNtgDcrZkc-uLtB0XOVFeZxQ6cFgpXwv7DtA9jzA@mail.gmail.com>
2018-06-08 15:17 ` [PATCH v2] net-fq: Add WARN_ON check for null flow Ben Greear
2018-06-08 21:40 ` Arend van Spriel
2018-06-10 17:10 ` Michał Kazior
2018-06-11 13:18 ` Ben Greear
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).