* [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests
@ 2020-06-01 17:11 Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-01 17:11 ` [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended Alvin =?unknown-8bit?q?=C5=A0ipraga?=
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-01 17:11 UTC (permalink / raw)
To: iwd
[-- Attachment #1: Type: text/plain, Size: 1173 bytes --]
If start_scan_next_request() is called while a scan request
(NL80211_CMD_TRIGGER_SCAN) is still running, the same scan request will
be sent again. Add a check in the function to avoid sending a request if
one is already in progress.
This also fixes a crash that occurs if the following conditions are met:
- the duplicated request is the only request in the scan request
queue, and
- both scan requests fail with an error not EBUSY.
In this case, the first callback to scan_request_triggered() will delete
the request from the scan request queue. The second callback will find
an empty queue and consequently pass a NULL scan_request pointer to
scan_request_failed(), causing a segmentation fault.
---
src/scan.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/scan.c b/src/scan.c
index 718f7497..106fa81c 100644
--- a/src/scan.c
+++ b/src/scan.c
@@ -839,6 +839,9 @@ static bool start_next_scan_request(struct scan_context *sc)
if (sc->state != SCAN_STATE_NOT_RUNNING)
return true;
+ if (sc->start_cmd_id)
+ return true;
+
while (sr) {
if (!scan_request_send_trigger(sc, sr))
return true;
--
2.26.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended 2020-06-01 17:11 [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-01 17:11 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2020-06-04 18:19 ` Denis Kenzior 2020-06-02 8:45 ` [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Andrew Zaborowski 2020-06-04 18:31 ` Denis Kenzior 2 siblings, 1 reply; 8+ messages in thread From: Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-01 17:11 UTC (permalink / raw) To: iwd [-- Attachment #1: Type: text/plain, Size: 616 bytes --] If scanning is suspended, have scan_common() queue its scan request rather than issuing it immediately. This respects the assumption that scans are not requested while sc->suspended is true. --- src/scan.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/scan.c b/src/scan.c index 106fa81c..e10c5447 100644 --- a/src/scan.c +++ b/src/scan.c @@ -546,6 +546,9 @@ static uint32_t scan_common(uint64_t wdev_id, bool passive, if (!l_queue_isempty(sc->requests)) goto done; + if (sc->suspended) + goto done; + if (sc->state != SCAN_STATE_NOT_RUNNING) goto done; -- 2.26.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended 2020-06-01 17:11 ` [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-04 18:19 ` Denis Kenzior 0 siblings, 0 replies; 8+ messages in thread From: Denis Kenzior @ 2020-06-04 18:19 UTC (permalink / raw) To: iwd [-- Attachment #1: Type: text/plain, Size: 410 bytes --] Hi Alvin, On 6/1/20 12:11 PM, Alvin Šipraga wrote: > If scanning is suspended, have scan_common() queue its scan request > rather than issuing it immediately. This respects the assumption that > scans are not requested while sc->suspended is true. > --- > src/scan.c | 3 +++ > 1 file changed, 3 insertions(+) > I reworded the commit header slightly and applied, thanks. Regards, -Denis ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests 2020-06-01 17:11 [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2020-06-01 17:11 ` [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-02 8:45 ` Andrew Zaborowski 2020-06-04 18:31 ` Denis Kenzior 2 siblings, 0 replies; 8+ messages in thread From: Andrew Zaborowski @ 2020-06-02 8:45 UTC (permalink / raw) To: iwd [-- Attachment #1: Type: text/plain, Size: 1584 bytes --] On Mon, 1 Jun 2020 at 19:11, Alvin Šipraga <alsi@bang-olufsen.dk> wrote: > If start_scan_next_request() is called while a scan request > (NL80211_CMD_TRIGGER_SCAN) is still running, the same scan request will > be sent again. Add a check in the function to avoid sending a request if > one is already in progress. > > This also fixes a crash that occurs if the following conditions are met: > - the duplicated request is the only request in the scan request > queue, and > - both scan requests fail with an error not EBUSY. > > In this case, the first callback to scan_request_triggered() will delete > the request from the scan request queue. The second callback will find > an empty queue and consequently pass a NULL scan_request pointer to > scan_request_failed(), causing a segmentation fault. > --- > src/scan.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/scan.c b/src/scan.c > index 718f7497..106fa81c 100644 > --- a/src/scan.c > +++ b/src/scan.c > @@ -839,6 +839,9 @@ static bool start_next_scan_request(struct scan_context *sc) > if (sc->state != SCAN_STATE_NOT_RUNNING) > return true; > > + if (sc->start_cmd_id) > + return true; How about checking sc->get_scan_cmd_id as well? I guess I'd put these two and the sc->state check in one if (). Also you can now drop those checks in scan_notify in the NL80211_CMD_SCAN_ABORTED case because as you mentioned in the other thread you're moving the checks from the callers into start_next_scan_request. Best regards ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests 2020-06-01 17:11 [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2020-06-01 17:11 ` [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2020-06-02 8:45 ` [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Andrew Zaborowski @ 2020-06-04 18:31 ` Denis Kenzior 2020-06-06 17:33 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2 siblings, 1 reply; 8+ messages in thread From: Denis Kenzior @ 2020-06-04 18:31 UTC (permalink / raw) To: iwd [-- Attachment #1: Type: text/plain, Size: 1825 bytes --] Hi Alvin, On 6/1/20 12:11 PM, Alvin Šipraga wrote: > If start_scan_next_request() is called while a scan request > (NL80211_CMD_TRIGGER_SCAN) is still running, the same scan request will > be sent again. Add a check in the function to avoid sending a request if > one is already in progress. > > This also fixes a crash that occurs if the following conditions are met: > - the duplicated request is the only request in the scan request > queue, and > - both scan requests fail with an error not EBUSY. Nice catch on this. Scanning code is some of the most complex in the project, so kudos. > > In this case, the first callback to scan_request_triggered() will delete > the request from the scan request queue. The second callback will find > an empty queue and consequently pass a NULL scan_request pointer to > scan_request_failed(), causing a segmentation fault. So for my own education, can you elaborate a bit more on how this is actually triggered? I'm guessing that there are pending scans (probably triggered by invocation of Scan() via D-Bus?). But an exact sequence of what happens would be nice. > --- > src/scan.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/scan.c b/src/scan.c > index 718f7497..106fa81c 100644 > --- a/src/scan.c > +++ b/src/scan.c > @@ -839,6 +839,9 @@ static bool start_next_scan_request(struct scan_context *sc) > if (sc->state != SCAN_STATE_NOT_RUNNING) > return true; > > + if (sc->start_cmd_id) > + return true; > + So I think Andrew's suggestion of also checking for get_scan_id being not zero also makes sense. It would keep things consistent with the comment in scan_notify where it says that we do not start the next request until GET_SCAN has been done. Regards, -Denis ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests 2020-06-04 18:31 ` Denis Kenzior @ 2020-06-06 17:33 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2020-06-09 11:49 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?= 0 siblings, 1 reply; 8+ messages in thread From: Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-06 17:33 UTC (permalink / raw) To: iwd [-- Attachment #1: Type: text/plain, Size: 1236 bytes --] Hi Andrew and Denis, Thanks for your comments. On 6/4/20 8:31 PM, Denis Kenzior wrote: >> In this case, the first callback to scan_request_triggered() will delete >> the request from the scan request queue. The second callback will find >> an empty queue and consequently pass a NULL scan_request pointer to >> scan_request_failed(), causing a segmentation fault. > > So for my own education, can you elaborate a bit more on how this is > actually triggered? I'm guessing that there are pending scans (probably > triggered by invocation of Scan() via D-Bus?). But an exact sequence of > what happens would be nice. The crash is a little hard to reproduce. I have only observed it while looping a Wi-Fi integration test we have, where it happens maybe 10-20% of the time. When I am back in the office on Monday I will try to prepare a detailed answer to your question. > So I think Andrew's suggestion of also checking for get_scan_id being > not zero also makes sense. It would keep things consistent with the > comment in scan_notify where it says that we do not start the next > request until GET_SCAN has been done. Agreed. I'll prepare a v3 patch and send it to you soon. Kind regards, Alvin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests 2020-06-06 17:33 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-09 11:49 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2020-06-09 20:10 ` Denis Kenzior 0 siblings, 1 reply; 8+ messages in thread From: Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-09 11:49 UTC (permalink / raw) To: iwd [-- Attachment #1: Type: text/plain, Size: 4496 bytes --] Hi Denis, As promised, here is some more detail on how the duplicate scan request gets sent, and the subsequent crash. On 6/6/20 7:33 PM, Alvin Šipraga wrote: > On 6/4/20 8:31 PM, Denis Kenzior wrote: >>> In this case, the first callback to scan_request_triggered() will delete >>> the request from the scan request queue. The second callback will find >>> an empty queue and consequently pass a NULL scan_request pointer to >>> scan_request_failed(), causing a segmentation fault. >> >> So for my own education, can you elaborate a bit more on how this is >> actually triggered? I'm guessing that there are pending scans (probably >> triggered by invocation of Scan() via D-Bus?). But an exact sequence of >> what happens would be nice. > > The crash is a little hard to reproduce. I have only observed it while > looping a Wi-Fi integration test we have, where it happens maybe 10-20% > of the time. When I am back in the office on Monday I will try to > prepare a detailed answer to your question. I traced the following events leading to a duplicate scan request being sent. The scenario assumes that the scan request queue is empty in the beginning. station_enter_state: disconnected -> autoconnect_quick -> station_quick_scan_trigger -> station_scan_trigger: notify cb = station_quick_scan_results -> scan_passive -> scan_common -> scan_request_send_trigger -> l_queue_push_tail: scan request queue size = 1 scan_notify: quick scan triggered network_connect: requested over dbus -> network_connect_psk -> station_connect_network -> station_enter_state: autoconnect_quick -> connecting -> periodic_scan_stop -> scan_periodic_stop -> scan_cancel scan_notify: quick scan aborted -> scan_finished -> l_queue_remove: scan request queue size = 0 -> station_quick_scan_results: quick scan's notify callback . -> station_enter_state: connecting -> autoconnect_full . -> scan_periodic_start . -> scan_periodic_queue . -> scan_{active_full,passive}: idk which; doesn't matter . -> scan_common . -> scan_request_send_trigger . -> l_queue_push_tail: scan request queue size = 1 -> start_next_scan_request: head of queue has already been sent -> scan_request_send_trigger: send it again Now the two requests both fail because the driver is busy connecting: kernel: brcmfmac: brcmf_cfg80211_escan: Connecting: status (3) kernel: brcmfmac: brcmf_cfg80211_scan: scan error (-11) kernel: brcmfmac: brcmf_cfg80211_escan: Connecting: status (3) kernel: brcmfmac: brcmf_cfg80211_scan: scan error (-11) The errors get reported via the scan_request_triggered callback. The first callback deletes the scan_request from the scan request queue. The second callback gets a NULL scan_request pointer due to the queue being empty, and subsequently dereferences that pointer. Here is the stack trace after the segmentation fault: (gdb) bt #0 scan_request_failed (sr=sr(a)entry=0x0, err=err(a)entry=-11, sc=<optimized out>) at /usr/src/debug/iwd/1.7-r0/git/src/scan.c:157 #1 0x0000000000424bf0 in scan_request_triggered (msg=<optimized out>, userdata=0x37f320e0) at /usr/src/debug/iwd/1.7-r0/git/src/scan.c:222 #2 0x0000ffffb770bc0c in process_unicast (nlmsg=0xffffc2ab2dc0, genl=0x37f16830) at /usr/src/debug/ell/0.31-r0/ell-0.31/ell/genl.c:979 #3 received_data (io=<optimized out>, user_data=0x37f16830) at /usr/src/debug/ell/0.31-r0/ell-0.31/ell/genl.c:1087 #4 0x0000ffffb7707e78 in io_callback (fd=<optimized out>, events=1, user_data=0x37f168d0) at /usr/src/debug/ell/0.31-r0/ell-0.31/ell/io.c:126 #5 0x0000ffffb77070d4 in l_main_iterate (timeout=<optimized out>) at /usr/src/debug/ell/0.31-r0/ell-0.31/ell/main.c:473 #6 0x0000ffffb7707194 in l_main_run () at /usr/src/debug/ell/0.31-r0/ell-0.31/ell/main.c:520 #7 0x0000ffffb77073ac in l_main_run_with_signal (callback=callback(a)entry=0x408ee0 <signal_handler>, user_data=user_data(a)entry=0x0) at /usr/src/debug/ell/0.31-r0/ell-0.31/ell/main.c:642 #8 0x0000000000408b8c in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/iwd/1.7-r0/git/src/main.c:512 Hope that helps - let me know if you have any questions. And thanks for accepting the patches. Kind regards, Alvin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests 2020-06-09 11:49 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-09 20:10 ` Denis Kenzior 0 siblings, 0 replies; 8+ messages in thread From: Denis Kenzior @ 2020-06-09 20:10 UTC (permalink / raw) To: iwd [-- Attachment #1: Type: text/plain, Size: 1997 bytes --] Hi Alvin, > I traced the following events leading to a duplicate scan request being sent. > The scenario assumes that the scan request queue is empty in the beginning. > > > station_enter_state: disconnected -> autoconnect_quick > -> station_quick_scan_trigger > -> station_scan_trigger: notify cb = station_quick_scan_results > -> scan_passive > -> scan_common > -> scan_request_send_trigger > -> l_queue_push_tail: scan request queue size = 1 > scan_notify: quick scan triggered > network_connect: requested over dbus > -> network_connect_psk > -> station_connect_network > -> station_enter_state: autoconnect_quick -> connecting > -> periodic_scan_stop > -> scan_periodic_stop > -> scan_cancel > scan_notify: quick scan aborted > -> scan_finished > -> l_queue_remove: scan request queue size = 0 > -> station_quick_scan_results: quick scan's notify callback > . -> station_enter_state: connecting -> autoconnect_full Aha, so this is the real culprit. I fixed this as well in commit 532f6b154e03b5fa222502098b5688a1784cc4e3. > . -> scan_periodic_start > . -> scan_periodic_queue > . -> scan_{active_full,passive}: idk which; doesn't matter > . -> scan_common > . -> scan_request_send_trigger > . -> l_queue_push_tail: scan request queue size = 1 Okay, so we start a periodic scan and in scan_finished call start_next_scan_request again. > -> start_next_scan_request: head of queue has already been sent > -> scan_request_send_trigger: send it again > > And since we don't check that one is already going, we issue the trigger again. Makes sense to me now. Much appreciated. Regards, -Denis ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-06-09 20:10 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-06-01 17:11 [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2020-06-01 17:11 ` [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2020-06-04 18:19 ` Denis Kenzior 2020-06-02 8:45 ` [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Andrew Zaborowski 2020-06-04 18:31 ` Denis Kenzior 2020-06-06 17:33 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2020-06-09 11:49 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?= 2020-06-09 20:10 ` Denis Kenzior
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox