public inbox for iwd@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH 1/2] scan: check scan request in get_survey_done before deref
@ 2024-09-05 13:43 James Prestwood
  2024-09-05 13:43 ` [PATCH 2/2] scan: check pending requests after regdom update James Prestwood
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: James Prestwood @ 2024-09-05 13:43 UTC (permalink / raw)
  To: iwd; +Cc: James Prestwood, Daniel Bond

Due to the possibility of external scans the scan request pointer
could be NULL. Prior to surveys IWD would still get the results in
order for periodic scans to utilize them. This behavior can be
retained by checking both if we don't have a request or if the
request was canceled. This check is identical to the one in
get_scan_done.

This fixes a crash when checking if the NULL scan request has been
canceled:

0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059
0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0
0x0000749646a243cb in ?? () from /usr/lib/libell.so.0
0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0
0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0
0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0
0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614

Reported-by: Daniel Bond <danielbondno@gmail.com>
---
 src/scan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/scan.c b/src/scan.c
index debdeb1f..205365cd 100644
--- a/src/scan.c
+++ b/src/scan.c
@@ -2056,7 +2056,7 @@ static void get_survey_done(void *user_data)
 
 	sc->get_survey_cmd_id = 0;
 
-	if (!results->sr->canceled)
+	if (!results->sr || !results->sr->canceled)
 		get_results(results);
 	else
 		get_scan_done(user_data);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] scan: check pending requests after regdom update
  2024-09-05 13:43 [PATCH 1/2] scan: check scan request in get_survey_done before deref James Prestwood
@ 2024-09-05 13:43 ` James Prestwood
  2024-09-05 13:56 ` [PATCH 1/2] scan: check scan request in get_survey_done before deref James Prestwood
  2024-09-05 14:37 ` Denis Kenzior
  2 siblings, 0 replies; 7+ messages in thread
From: James Prestwood @ 2024-09-05 13:43 UTC (permalink / raw)
  To: iwd; +Cc: James Prestwood

While there is proper handling for a regdom update during a
TRIGGER_SCAN scan, prior to NEW_SCAN_RESULTS there is no such
handling if the regdom update comes in during a GET_SCAN or
GET_SURVEY.

In both the 6ghz and non-6ghz code paths we have some issues:
  - For non-6ghz devices, or regdom updates that did not enable
    6ghz the wiphy state watch callback will automatically issues
    another GET_SURVEY/GET_SCAN without checking if there was
    already one pending. It does this using the current scan request
    which gets freed by the prior GET_SCAN/GET_SURVEY calls when
    they complete, causing invalid reads when the subsequent calls
    finish.
 - If 6ghz was enabled by the update we actually append another
   trigger command to the list and potentially run it if its the
   current request. This also will end up in the same situation as
   the request is freed by the pending GET_SURVEY/GET_SCAN calls.

For the non-6ghz case there is little to no harm in ignoring the
regdom update because its very unlikely it changed the allowed
frequencies.

For the 6ghz case we could potentially handle the new trigger scan
within get_scan_done, but thats beyond the scope of this change
and is likely quite intrusive.
---
 src/scan.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/scan.c b/src/scan.c
index 205365cd..ccd1e9e1 100644
--- a/src/scan.c
+++ b/src/scan.c
@@ -2120,6 +2120,22 @@ static void scan_wiphy_watch(struct wiphy *wiphy,
 	if (!sr)
 		return;
 
+	/*
+	 * If the regdom update finished with GET_SCAN/GET_SURVEY in flight
+	 * don't try and get the results again and allow those calls to finish.
+	 * For the non-6ghz case this has no downside as the results should not
+	 * differ.
+	 *
+	 * If 6ghz was enabled by this regdom update there is still not much we
+	 * can do since the scan itself is already completed. Appending to the
+	 * command list won't do anything.
+	 *
+	 * TODO: Handle the 6ghz case by checking for this case in get_scan_done
+	 *       and continuing to iterate the sr->cmds array.
+	 */
+	if (sc->get_scan_cmd_id || sc->get_survey_cmd_id)
+		return;
+
 	/*
 	 * This update did not allow 6GHz, or the original request was
 	 * not expecting 6GHz. The periodic scan should now be ended.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scan: check scan request in get_survey_done before deref
  2024-09-05 13:43 [PATCH 1/2] scan: check scan request in get_survey_done before deref James Prestwood
  2024-09-05 13:43 ` [PATCH 2/2] scan: check pending requests after regdom update James Prestwood
@ 2024-09-05 13:56 ` James Prestwood
  2024-09-05 14:37 ` Denis Kenzior
  2 siblings, 0 replies; 7+ messages in thread
From: James Prestwood @ 2024-09-05 13:56 UTC (permalink / raw)
  To: iwd; +Cc: Daniel Bond

Daniel,

On 9/5/24 6:43 AM, James Prestwood wrote:
> Due to the possibility of external scans the scan request pointer
> could be NULL. Prior to surveys IWD would still get the results in
> order for periodic scans to utilize them. This behavior can be
> retained by checking both if we don't have a request or if the
> request was canceled. This check is identical to the one in
> get_scan_done.
>
> This fixes a crash when checking if the NULL scan request has been
> canceled:
>
> 0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059
> 0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0
> 0x0000749646a243cb in ?? () from /usr/lib/libell.so.0
> 0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0
> 0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0
> 0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0
> 0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614
>
> Reported-by: Daniel Bond <danielbondno@gmail.com>
> ---
>   src/scan.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/scan.c b/src/scan.c
> index debdeb1f..205365cd 100644
> --- a/src/scan.c
> +++ b/src/scan.c
> @@ -2056,7 +2056,7 @@ static void get_survey_done(void *user_data)
>   
>   	sc->get_survey_cmd_id = 0;
>   
> -	if (!results->sr->canceled)
> +	if (!results->sr || !results->sr->canceled)
>   		get_results(results);
>   	else
>   		get_scan_done(user_data);

Feel free to re-submit under your name as this is nearly the same as 
your patch, but the difference in logic change is important.

I also forgot about Co-authored-by, which maybe is more fitting for this 
case. I can send that in v2, or maybe it could be added before merging.

Thanks,

James


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scan: check scan request in get_survey_done before deref
  2024-09-05 13:43 [PATCH 1/2] scan: check scan request in get_survey_done before deref James Prestwood
  2024-09-05 13:43 ` [PATCH 2/2] scan: check pending requests after regdom update James Prestwood
  2024-09-05 13:56 ` [PATCH 1/2] scan: check scan request in get_survey_done before deref James Prestwood
@ 2024-09-05 14:37 ` Denis Kenzior
  2024-09-05 14:44   ` James Prestwood
  2 siblings, 1 reply; 7+ messages in thread
From: Denis Kenzior @ 2024-09-05 14:37 UTC (permalink / raw)
  To: James Prestwood, iwd; +Cc: Daniel Bond

Hi James,

On 9/5/24 8:43 AM, James Prestwood wrote:
> Due to the possibility of external scans the scan request pointer

Do we know why external scans are happening?  I can understand a one off because 
someone triggered iw scan, but from the bug report it sounded like it was 
crashing iwd repeatedly?

> could be NULL. Prior to surveys IWD would still get the results in
> order for periodic scans to utilize them. This behavior can be
> retained by checking both if we don't have a request or if the
> request was canceled. This check is identical to the one in
> get_scan_done.
> 
> This fixes a crash when checking if the NULL scan request has been
> canceled:
> 
> 0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at src/scan.c:2059
> 0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0
> 0x0000749646a243cb in ?? () from /usr/lib/libell.so.0
> 0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0
> 0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0
> 0x0000749646a263a4 in l_main_run_with_signal () from /usr/lib/libell.so.0
> 0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:614
> 
> Reported-by: Daniel Bond <danielbondno@gmail.com>
> ---
>   src/scan.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/scan.c b/src/scan.c
> index debdeb1f..205365cd 100644
> --- a/src/scan.c
> +++ b/src/scan.c
> @@ -2056,7 +2056,7 @@ static void get_survey_done(void *user_data)
>   
>   	sc->get_survey_cmd_id = 0;
>   
> -	if (!results->sr->canceled)
> +	if (!results->sr || !results->sr->canceled)

I still don't understand why we're even bothering requesting a survey for a scan 
we didn't trigger?  In other words, we shouldn't even be in this function.

>   		get_results(results);
>   	else
>   		get_scan_done(user_data);

Regards,
-Denis

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scan: check scan request in get_survey_done before deref
  2024-09-05 14:37 ` Denis Kenzior
@ 2024-09-05 14:44   ` James Prestwood
  2024-09-05 14:57     ` Denis Kenzior
  0 siblings, 1 reply; 7+ messages in thread
From: James Prestwood @ 2024-09-05 14:44 UTC (permalink / raw)
  To: Denis Kenzior, iwd; +Cc: Daniel Bond

Hi Denis,

On 9/5/24 7:37 AM, Denis Kenzior wrote:
> Hi James,
>
> On 9/5/24 8:43 AM, James Prestwood wrote:
>> Due to the possibility of external scans the scan request pointer
>
> Do we know why external scans are happening?  I can understand a one 
> off because someone triggered iw scan, but from the bug report it 
> sounded like it was crashing iwd repeatedly?
NetworkManager? wpa_supplicant running? I really have no idea. Either 
way its out of our control.
>
>> could be NULL. Prior to surveys IWD would still get the results in
>> order for periodic scans to utilize them. This behavior can be
>> retained by checking both if we don't have a request or if the
>> request was canceled. This check is identical to the one in
>> get_scan_done.
>>
>> This fixes a crash when checking if the NULL scan request has been
>> canceled:
>>
>> 0x00005ffa6a0376de in get_survey_done (user_data=0x5ffa783a3f90) at 
>> src/scan.c:2059
>> 0x0000749646a29bbd in ?? () from /usr/lib/libell.so.0
>> 0x0000749646a243cb in ?? () from /usr/lib/libell.so.0
>> 0x0000749646a24655 in l_main_iterate () from /usr/lib/libell.so.0
>> 0x0000749646a24ace in l_main_run () from /usr/lib/libell.so.0
>> 0x0000749646a263a4 in l_main_run_with_signal () from 
>> /usr/lib/libell.so.0
>> 0x00005ffa6a00d642 in main (argc=<optimized out>, argv=<optimized 
>> out>) at src/main.c:614
>>
>> Reported-by: Daniel Bond <danielbondno@gmail.com>
>> ---
>>   src/scan.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/scan.c b/src/scan.c
>> index debdeb1f..205365cd 100644
>> --- a/src/scan.c
>> +++ b/src/scan.c
>> @@ -2056,7 +2056,7 @@ static void get_survey_done(void *user_data)
>>         sc->get_survey_cmd_id = 0;
>>   -    if (!results->sr->canceled)
>> +    if (!results->sr || !results->sr->canceled)
>
> I still don't understand why we're even bothering requesting a survey 
> for a scan we didn't trigger?  In other words, we shouldn't even be in 
> this function.
Because we still utilize external scan results for periodic scans. So if 
periodic scans are running we will still try and get the survey/results.
>
>>           get_results(results);
>>       else
>>           get_scan_done(user_data);
>
> Regards,
> -Denis

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scan: check scan request in get_survey_done before deref
  2024-09-05 14:44   ` James Prestwood
@ 2024-09-05 14:57     ` Denis Kenzior
  2024-09-05 15:09       ` James Prestwood
  0 siblings, 1 reply; 7+ messages in thread
From: Denis Kenzior @ 2024-09-05 14:57 UTC (permalink / raw)
  To: James Prestwood, iwd; +Cc: Daniel Bond

Hi James,

>> Do we know why external scans are happening?  I can understand a one off 
>> because someone triggered iw scan, but from the bug report it sounded like it 
>> was crashing iwd repeatedly?
> NetworkManager? wpa_supplicant running? I really have no idea. Either way its 
> out of our control.

Would be good to know and fix that.

>>
>> I still don't understand why we're even bothering requesting a survey for a 
>> scan we didn't trigger?  In other words, we shouldn't even be in this function.
> Because we still utilize external scan results for periodic scans. So if 
> periodic scans are running we will still try and get the survey/results.

Should we though?  I'm not really sure how surveys are implemented at the driver 
level.  Are you sure this isn't introducing other timing problems?

Regards,
-Denis

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scan: check scan request in get_survey_done before deref
  2024-09-05 14:57     ` Denis Kenzior
@ 2024-09-05 15:09       ` James Prestwood
  0 siblings, 0 replies; 7+ messages in thread
From: James Prestwood @ 2024-09-05 15:09 UTC (permalink / raw)
  To: Denis Kenzior, iwd; +Cc: Daniel Bond

Hi Denis,

On 9/5/24 7:57 AM, Denis Kenzior wrote:
> Hi James,
>
>>> Do we know why external scans are happening?  I can understand a one 
>>> off because someone triggered iw scan, but from the bug report it 
>>> sounded like it was crashing iwd repeatedly?
>> NetworkManager? wpa_supplicant running? I really have no idea. Either 
>> way its out of our control.
>
> Would be good to know and fix that.
>
>>>
>>> I still don't understand why we're even bothering requesting a 
>>> survey for a scan we didn't trigger?  In other words, we shouldn't 
>>> even be in this function.
>> Because we still utilize external scan results for periodic scans. So 
>> if periodic scans are running we will still try and get the 
>> survey/results.
>
> Should we though?  I'm not really sure how surveys are implemented at 
> the driver level.  Are you sure this isn't introducing other timing 
> problems?

So looking at the nl80211 code it is actually making a driver call for 
surveys, versus GET_SCAN simply returns the list. I can't speak to the 
implementation but based on testing several cards it appears that 
GET_SURVEY was always gathering data from the last scan request. E.g. if 
you do a limited scan then survey it will only have updated results for 
those limited frequencies.

But anyways, if this seems sketchy to you I'm also fine skipping the 
survey and only doing GET_SCAN in the case of external scans. I guess 
without knowing what the driver is doing a survey could in theory take 
an excessive amount of time and cause unintended behavior, like if our 
own internal scan got issued during that.

Thanks,

James

>
> Regards,
> -Denis

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-09-05 15:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-05 13:43 [PATCH 1/2] scan: check scan request in get_survey_done before deref James Prestwood
2024-09-05 13:43 ` [PATCH 2/2] scan: check pending requests after regdom update James Prestwood
2024-09-05 13:56 ` [PATCH 1/2] scan: check scan request in get_survey_done before deref James Prestwood
2024-09-05 14:37 ` Denis Kenzior
2024-09-05 14:44   ` James Prestwood
2024-09-05 14:57     ` Denis Kenzior
2024-09-05 15:09       ` James Prestwood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox