* [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests
@ 2020-06-01 17:11 Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-01 17:11 ` [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended Alvin =?unknown-8bit?q?=C5=A0ipraga?=
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-01 17:11 UTC (permalink / raw)
To: iwd
[-- Attachment #1: Type: text/plain, Size: 1173 bytes --]
If start_scan_next_request() is called while a scan request
(NL80211_CMD_TRIGGER_SCAN) is still running, the same scan request will
be sent again. Add a check in the function to avoid sending a request if
one is already in progress.
This also fixes a crash that occurs if the following conditions are met:
- the duplicated request is the only request in the scan request
queue, and
- both scan requests fail with an error not EBUSY.
In this case, the first callback to scan_request_triggered() will delete
the request from the scan request queue. The second callback will find
an empty queue and consequently pass a NULL scan_request pointer to
scan_request_failed(), causing a segmentation fault.
---
src/scan.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/scan.c b/src/scan.c
index 718f7497..106fa81c 100644
--- a/src/scan.c
+++ b/src/scan.c
@@ -839,6 +839,9 @@ static bool start_next_scan_request(struct scan_context *sc)
if (sc->state != SCAN_STATE_NOT_RUNNING)
return true;
+ if (sc->start_cmd_id)
+ return true;
+
while (sr) {
if (!scan_request_send_trigger(sc, sr))
return true;
--
2.26.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended
2020-06-01 17:11 [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Alvin =?unknown-8bit?q?=C5=A0ipraga?=
@ 2020-06-01 17:11 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-04 18:19 ` Denis Kenzior
2020-06-02 8:45 ` [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Andrew Zaborowski
2020-06-04 18:31 ` Denis Kenzior
2 siblings, 1 reply; 8+ messages in thread
From: Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-01 17:11 UTC (permalink / raw)
To: iwd
[-- Attachment #1: Type: text/plain, Size: 616 bytes --]
If scanning is suspended, have scan_common() queue its scan request
rather than issuing it immediately. This respects the assumption that
scans are not requested while sc->suspended is true.
---
src/scan.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/scan.c b/src/scan.c
index 106fa81c..e10c5447 100644
--- a/src/scan.c
+++ b/src/scan.c
@@ -546,6 +546,9 @@ static uint32_t scan_common(uint64_t wdev_id, bool passive,
if (!l_queue_isempty(sc->requests))
goto done;
+ if (sc->suspended)
+ goto done;
+
if (sc->state != SCAN_STATE_NOT_RUNNING)
goto done;
--
2.26.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests
2020-06-01 17:11 [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-01 17:11 ` [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended Alvin =?unknown-8bit?q?=C5=A0ipraga?=
@ 2020-06-02 8:45 ` Andrew Zaborowski
2020-06-04 18:31 ` Denis Kenzior
2 siblings, 0 replies; 8+ messages in thread
From: Andrew Zaborowski @ 2020-06-02 8:45 UTC (permalink / raw)
To: iwd
[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]
On Mon, 1 Jun 2020 at 19:11, Alvin Šipraga <alsi@bang-olufsen.dk> wrote:
> If start_scan_next_request() is called while a scan request
> (NL80211_CMD_TRIGGER_SCAN) is still running, the same scan request will
> be sent again. Add a check in the function to avoid sending a request if
> one is already in progress.
>
> This also fixes a crash that occurs if the following conditions are met:
> - the duplicated request is the only request in the scan request
> queue, and
> - both scan requests fail with an error not EBUSY.
>
> In this case, the first callback to scan_request_triggered() will delete
> the request from the scan request queue. The second callback will find
> an empty queue and consequently pass a NULL scan_request pointer to
> scan_request_failed(), causing a segmentation fault.
> ---
> src/scan.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/src/scan.c b/src/scan.c
> index 718f7497..106fa81c 100644
> --- a/src/scan.c
> +++ b/src/scan.c
> @@ -839,6 +839,9 @@ static bool start_next_scan_request(struct scan_context *sc)
> if (sc->state != SCAN_STATE_NOT_RUNNING)
> return true;
>
> + if (sc->start_cmd_id)
> + return true;
How about checking sc->get_scan_cmd_id as well? I guess I'd put these
two and the sc->state check in one if ().
Also you can now drop those checks in scan_notify in the
NL80211_CMD_SCAN_ABORTED case because as you mentioned in the other
thread you're moving the checks from the callers into
start_next_scan_request.
Best regards
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended
2020-06-01 17:11 ` [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended Alvin =?unknown-8bit?q?=C5=A0ipraga?=
@ 2020-06-04 18:19 ` Denis Kenzior
0 siblings, 0 replies; 8+ messages in thread
From: Denis Kenzior @ 2020-06-04 18:19 UTC (permalink / raw)
To: iwd
[-- Attachment #1: Type: text/plain, Size: 410 bytes --]
Hi Alvin,
On 6/1/20 12:11 PM, Alvin Šipraga wrote:
> If scanning is suspended, have scan_common() queue its scan request
> rather than issuing it immediately. This respects the assumption that
> scans are not requested while sc->suspended is true.
> ---
> src/scan.c | 3 +++
> 1 file changed, 3 insertions(+)
>
I reworded the commit header slightly and applied, thanks.
Regards,
-Denis
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests
2020-06-01 17:11 [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-01 17:11 ` [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-02 8:45 ` [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Andrew Zaborowski
@ 2020-06-04 18:31 ` Denis Kenzior
2020-06-06 17:33 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2 siblings, 1 reply; 8+ messages in thread
From: Denis Kenzior @ 2020-06-04 18:31 UTC (permalink / raw)
To: iwd
[-- Attachment #1: Type: text/plain, Size: 1825 bytes --]
Hi Alvin,
On 6/1/20 12:11 PM, Alvin Šipraga wrote:
> If start_scan_next_request() is called while a scan request
> (NL80211_CMD_TRIGGER_SCAN) is still running, the same scan request will
> be sent again. Add a check in the function to avoid sending a request if
> one is already in progress.
>
> This also fixes a crash that occurs if the following conditions are met:
> - the duplicated request is the only request in the scan request
> queue, and
> - both scan requests fail with an error not EBUSY.
Nice catch on this. Scanning code is some of the most complex in the
project, so kudos.
>
> In this case, the first callback to scan_request_triggered() will delete
> the request from the scan request queue. The second callback will find
> an empty queue and consequently pass a NULL scan_request pointer to
> scan_request_failed(), causing a segmentation fault.
So for my own education, can you elaborate a bit more on how this is
actually triggered? I'm guessing that there are pending scans (probably
triggered by invocation of Scan() via D-Bus?). But an exact sequence of
what happens would be nice.
> ---
> src/scan.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/src/scan.c b/src/scan.c
> index 718f7497..106fa81c 100644
> --- a/src/scan.c
> +++ b/src/scan.c
> @@ -839,6 +839,9 @@ static bool start_next_scan_request(struct scan_context *sc)
> if (sc->state != SCAN_STATE_NOT_RUNNING)
> return true;
>
> + if (sc->start_cmd_id)
> + return true;
> +
So I think Andrew's suggestion of also checking for get_scan_id being
not zero also makes sense. It would keep things consistent with the
comment in scan_notify where it says that we do not start the next
request until GET_SCAN has been done.
Regards,
-Denis
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests
2020-06-04 18:31 ` Denis Kenzior
@ 2020-06-06 17:33 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-09 11:49 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?=
0 siblings, 1 reply; 8+ messages in thread
From: Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-06 17:33 UTC (permalink / raw)
To: iwd
[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]
Hi Andrew and Denis,
Thanks for your comments.
On 6/4/20 8:31 PM, Denis Kenzior wrote:
>> In this case, the first callback to scan_request_triggered() will delete
>> the request from the scan request queue. The second callback will find
>> an empty queue and consequently pass a NULL scan_request pointer to
>> scan_request_failed(), causing a segmentation fault.
>
> So for my own education, can you elaborate a bit more on how this is
> actually triggered? I'm guessing that there are pending scans (probably
> triggered by invocation of Scan() via D-Bus?). But an exact sequence of
> what happens would be nice.
The crash is a little hard to reproduce. I have only observed it while
looping a Wi-Fi integration test we have, where it happens maybe 10-20%
of the time. When I am back in the office on Monday I will try to
prepare a detailed answer to your question.
> So I think Andrew's suggestion of also checking for get_scan_id being
> not zero also makes sense. It would keep things consistent with the
> comment in scan_notify where it says that we do not start the next
> request until GET_SCAN has been done.
Agreed. I'll prepare a v3 patch and send it to you soon.
Kind regards,
Alvin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests
2020-06-06 17:33 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?=
@ 2020-06-09 11:49 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-09 20:10 ` Denis Kenzior
0 siblings, 1 reply; 8+ messages in thread
From: Alvin =?unknown-8bit?q?=C5=A0ipraga?= @ 2020-06-09 11:49 UTC (permalink / raw)
To: iwd
[-- Attachment #1: Type: text/plain, Size: 4496 bytes --]
Hi Denis,
As promised, here is some more detail on how the duplicate scan request
gets sent, and the subsequent crash.
On 6/6/20 7:33 PM, Alvin Šipraga wrote:
> On 6/4/20 8:31 PM, Denis Kenzior wrote:
>>> In this case, the first callback to scan_request_triggered() will delete
>>> the request from the scan request queue. The second callback will find
>>> an empty queue and consequently pass a NULL scan_request pointer to
>>> scan_request_failed(), causing a segmentation fault.
>>
>> So for my own education, can you elaborate a bit more on how this is
>> actually triggered? I'm guessing that there are pending scans (probably
>> triggered by invocation of Scan() via D-Bus?). But an exact sequence of
>> what happens would be nice.
>
> The crash is a little hard to reproduce. I have only observed it while
> looping a Wi-Fi integration test we have, where it happens maybe 10-20%
> of the time. When I am back in the office on Monday I will try to
> prepare a detailed answer to your question.
I traced the following events leading to a duplicate scan request being
sent. The scenario assumes that the scan request queue is empty in the
beginning.
station_enter_state: disconnected -> autoconnect_quick
-> station_quick_scan_trigger
-> station_scan_trigger: notify cb = station_quick_scan_results
-> scan_passive
-> scan_common
-> scan_request_send_trigger
-> l_queue_push_tail: scan request queue size = 1
scan_notify: quick scan triggered
network_connect: requested over dbus
-> network_connect_psk
-> station_connect_network
-> station_enter_state: autoconnect_quick -> connecting
-> periodic_scan_stop
-> scan_periodic_stop
-> scan_cancel
scan_notify: quick scan aborted
-> scan_finished
-> l_queue_remove: scan request queue size = 0
-> station_quick_scan_results: quick scan's notify callback
. -> station_enter_state: connecting -> autoconnect_full
. -> scan_periodic_start
. -> scan_periodic_queue
. -> scan_{active_full,passive}: idk which; doesn't matter
. -> scan_common
. -> scan_request_send_trigger
. -> l_queue_push_tail: scan request queue size = 1
-> start_next_scan_request: head of queue has already been sent
-> scan_request_send_trigger: send it again
Now the two requests both fail because the driver is busy connecting:
kernel: brcmfmac: brcmf_cfg80211_escan: Connecting: status (3)
kernel: brcmfmac: brcmf_cfg80211_scan: scan error (-11)
kernel: brcmfmac: brcmf_cfg80211_escan: Connecting: status (3)
kernel: brcmfmac: brcmf_cfg80211_scan: scan error (-11)
The errors get reported via the scan_request_triggered callback. The
first callback deletes the scan_request from the scan request queue. The
second callback gets a NULL scan_request pointer due to the queue being
empty, and subsequently dereferences that pointer. Here is the stack
trace after the segmentation fault:
(gdb) bt
#0 scan_request_failed (sr=sr(a)entry=0x0, err=err(a)entry=-11,
sc=<optimized out>)
at /usr/src/debug/iwd/1.7-r0/git/src/scan.c:157
#1 0x0000000000424bf0 in scan_request_triggered (msg=<optimized out>,
userdata=0x37f320e0)
at /usr/src/debug/iwd/1.7-r0/git/src/scan.c:222
#2 0x0000ffffb770bc0c in process_unicast (nlmsg=0xffffc2ab2dc0,
genl=0x37f16830)
at /usr/src/debug/ell/0.31-r0/ell-0.31/ell/genl.c:979
#3 received_data (io=<optimized out>, user_data=0x37f16830)
at /usr/src/debug/ell/0.31-r0/ell-0.31/ell/genl.c:1087
#4 0x0000ffffb7707e78 in io_callback (fd=<optimized out>, events=1,
user_data=0x37f168d0)
at /usr/src/debug/ell/0.31-r0/ell-0.31/ell/io.c:126
#5 0x0000ffffb77070d4 in l_main_iterate (timeout=<optimized out>)
at /usr/src/debug/ell/0.31-r0/ell-0.31/ell/main.c:473
#6 0x0000ffffb7707194 in l_main_run () at
/usr/src/debug/ell/0.31-r0/ell-0.31/ell/main.c:520
#7 0x0000ffffb77073ac in l_main_run_with_signal
(callback=callback(a)entry=0x408ee0 <signal_handler>,
user_data=user_data(a)entry=0x0) at
/usr/src/debug/ell/0.31-r0/ell-0.31/ell/main.c:642
#8 0x0000000000408b8c in main (argc=<optimized out>, argv=<optimized out>)
at /usr/src/debug/iwd/1.7-r0/git/src/main.c:512
Hope that helps - let me know if you have any questions. And thanks for
accepting the patches.
Kind regards,
Alvin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests
2020-06-09 11:49 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?=
@ 2020-06-09 20:10 ` Denis Kenzior
0 siblings, 0 replies; 8+ messages in thread
From: Denis Kenzior @ 2020-06-09 20:10 UTC (permalink / raw)
To: iwd
[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]
Hi Alvin,
> I traced the following events leading to a duplicate scan request being sent.
> The scenario assumes that the scan request queue is empty in the beginning.
>
>
> station_enter_state: disconnected -> autoconnect_quick
> -> station_quick_scan_trigger
> -> station_scan_trigger: notify cb = station_quick_scan_results
> -> scan_passive
> -> scan_common
> -> scan_request_send_trigger
> -> l_queue_push_tail: scan request queue size = 1
> scan_notify: quick scan triggered
> network_connect: requested over dbus
> -> network_connect_psk
> -> station_connect_network
> -> station_enter_state: autoconnect_quick -> connecting
> -> periodic_scan_stop
> -> scan_periodic_stop
> -> scan_cancel
> scan_notify: quick scan aborted
> -> scan_finished
> -> l_queue_remove: scan request queue size = 0
> -> station_quick_scan_results: quick scan's notify callback
> . -> station_enter_state: connecting -> autoconnect_full
Aha, so this is the real culprit. I fixed this as well in commit
532f6b154e03b5fa222502098b5688a1784cc4e3.
> . -> scan_periodic_start
> . -> scan_periodic_queue
> . -> scan_{active_full,passive}: idk which; doesn't matter
> . -> scan_common
> . -> scan_request_send_trigger
> . -> l_queue_push_tail: scan request queue size = 1
Okay, so we start a periodic scan and in scan_finished call
start_next_scan_request again.
> -> start_next_scan_request: head of queue has already been sent
> -> scan_request_send_trigger: send it again
>
>
And since we don't check that one is already going, we issue the trigger again.
Makes sense to me now. Much appreciated.
Regards,
-Denis
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-06-09 20:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-01 17:11 [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-01 17:11 ` [PATCH v2 2/2] scan: refactor scan_common to not scan while scanning is suspended Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-04 18:19 ` Denis Kenzior
2020-06-02 8:45 ` [PATCH v2 1/2] scan: refactor start_next_scan_request to not send duplicate requests Andrew Zaborowski
2020-06-04 18:31 ` Denis Kenzior
2020-06-06 17:33 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-09 11:49 ` Alvin =?unknown-8bit?q?=C5=A0ipraga?=
2020-06-09 20:10 ` Denis Kenzior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox