* [PATCH] station: check for roam timeout before rearming
@ 2024-08-27 14:10 James Prestwood
2024-08-28 2:36 ` Denis Kenzior
0 siblings, 1 reply; 6+ messages in thread
From: James Prestwood @ 2024-08-27 14:10 UTC (permalink / raw)
To: iwd; +Cc: James Prestwood
A user reported a crash which was due to the roam trigger timeout
being overwritten, followed by a disconnect. Post-disconnect the
timer would fire and result in a crash. Its not clear exactly where
the overwrite was happening but upon code inspection it could
happen in the following scenario:
1. Beacon loss event, start roam timeout
2. Signal low event, no check if timeout is running and the timeout
gets overwritten.
The reported crash actually didn't appear to be from the above
scenario but something else, so now all instances where the timer
is rearmed we also check if there is already a timer set.
---
src/station.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/station.c b/src/station.c
index 30a1232a..e188aed4 100644
--- a/src/station.c
+++ b/src/station.c
@@ -2221,7 +2221,7 @@ static void station_roamed(struct station *station)
* Schedule another roaming attempt in case the signal continues to
* remain low. A subsequent high signal notification will cancel it.
*/
- if (station->signal_low)
+ if (station->signal_low && L_WARN_ON(!station->roam_trigger_timeout))
station_roam_timeout_rearm(station, roam_retry_interval);
if (station->netconfig)
@@ -2260,7 +2260,7 @@ static void station_roam_retry(struct station *station)
station->roam_scan_full = false;
station->ap_directed_roaming = false;
- if (station->signal_low)
+ if (station->signal_low && !station->roam_trigger_timeout)
station_roam_timeout_rearm(station, roam_retry_interval);
}
@@ -3202,6 +3202,9 @@ static void station_low_rssi(struct station *station)
if (station_cannot_roam(station))
return;
+ if (station->roam_trigger_timeout)
+ return;
+
/* Set a 5-second initial timeout */
station_roam_timeout_rearm(station, 5);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] station: check for roam timeout before rearming
2024-08-27 14:10 [PATCH] station: check for roam timeout before rearming James Prestwood
@ 2024-08-28 2:36 ` Denis Kenzior
2024-08-28 12:08 ` James Prestwood
0 siblings, 1 reply; 6+ messages in thread
From: Denis Kenzior @ 2024-08-28 2:36 UTC (permalink / raw)
To: James Prestwood, iwd
Hi James,
On 8/27/24 9:10 AM, James Prestwood wrote:
> A user reported a crash which was due to the roam trigger timeout
> being overwritten, followed by a disconnect. Post-disconnect the
> timer would fire and result in a crash. Its not clear exactly where
> the overwrite was happening but upon code inspection it could
> happen in the following scenario:
>
> 1. Beacon loss event, start roam timeout
> 2. Signal low event, no check if timeout is running and the timeout
> gets overwritten.
>
> The reported crash actually didn't appear to be from the above
> scenario but something else, so now all instances where the timer
> is rearmed we also check if there is already a timer set.
> ---
> src/station.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/src/station.c b/src/station.c
> index 30a1232a..e188aed4 100644
> --- a/src/station.c
> +++ b/src/station.c
> @@ -2221,7 +2221,7 @@ static void station_roamed(struct station *station)
> * Schedule another roaming attempt in case the signal continues to
> * remain low. A subsequent high signal notification will cancel it.
> */
> - if (station->signal_low)
> + if (station->signal_low && L_WARN_ON(!station->roam_trigger_timeout))
I'm reading this as:
if signal_low and roam_trigger_timeout is NULL:
1. print a warning
2. rearm the timer
Is 1. intended?
> station_roam_timeout_rearm(station, roam_retry_interval);
>
> if (station->netconfig)
> @@ -2260,7 +2260,7 @@ static void station_roam_retry(struct station *station)
> station->roam_scan_full = false;
> station->ap_directed_roaming = false;
>
> - if (station->signal_low)
> + if (station->signal_low && !station->roam_trigger_timeout)
> station_roam_timeout_rearm(station, roam_retry_interval);
> }
>
> @@ -3202,6 +3202,9 @@ static void station_low_rssi(struct station *station)
> if (station_cannot_roam(station))
> return;
>
> + if (station->roam_trigger_timeout)
> + return;
> +
One thing I'm a bit worried about with such a simple approach is that we might
have different roam timeouts. For example, the 60 second roam retry interval is
active and we get a low rssi event. In this case we probably should schedule a
roam sooner. Maybe we should book-keep timeout information better better and
make sure the timeout happens within the minimum time (active time remaining /
new timeout)?
> /* Set a 5-second initial timeout */
> station_roam_timeout_rearm(station, 5);
> }
Regards,
-Denis
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] station: check for roam timeout before rearming
2024-08-28 2:36 ` Denis Kenzior
@ 2024-08-28 12:08 ` James Prestwood
2024-08-28 14:41 ` Denis Kenzior
0 siblings, 1 reply; 6+ messages in thread
From: James Prestwood @ 2024-08-28 12:08 UTC (permalink / raw)
To: Denis Kenzior, iwd
Hi Denis,
On 8/27/24 7:36 PM, Denis Kenzior wrote:
> Hi James,
>
> On 8/27/24 9:10 AM, James Prestwood wrote:
>> A user reported a crash which was due to the roam trigger timeout
>> being overwritten, followed by a disconnect. Post-disconnect the
>> timer would fire and result in a crash. Its not clear exactly where
>> the overwrite was happening but upon code inspection it could
>> happen in the following scenario:
>>
>> 1. Beacon loss event, start roam timeout
>> 2. Signal low event, no check if timeout is running and the timeout
>> gets overwritten.
>>
>> The reported crash actually didn't appear to be from the above
>> scenario but something else, so now all instances where the timer
>> is rearmed we also check if there is already a timer set.
>> ---
>> src/station.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/station.c b/src/station.c
>> index 30a1232a..e188aed4 100644
>> --- a/src/station.c
>> +++ b/src/station.c
>> @@ -2221,7 +2221,7 @@ static void station_roamed(struct station
>> *station)
>> * Schedule another roaming attempt in case the signal
>> continues to
>> * remain low. A subsequent high signal notification will
>> cancel it.
>> */
>> - if (station->signal_low)
>> + if (station->signal_low &&
>> L_WARN_ON(!station->roam_trigger_timeout))
>
> I'm reading this as:
> if signal_low and roam_trigger_timeout is NULL:
> 1. print a warning
> 2. rearm the timer
>
> Is 1. intended?
>
>> station_roam_timeout_rearm(station, roam_retry_interval);
>> if (station->netconfig)
>> @@ -2260,7 +2260,7 @@ static void station_roam_retry(struct station
>> *station)
>> station->roam_scan_full = false;
>> station->ap_directed_roaming = false;
>> - if (station->signal_low)
>> + if (station->signal_low && !station->roam_trigger_timeout)
>> station_roam_timeout_rearm(station, roam_retry_interval);
>> }
>> @@ -3202,6 +3202,9 @@ static void station_low_rssi(struct station
>> *station)
>> if (station_cannot_roam(station))
>> return;
>> + if (station->roam_trigger_timeout)
>> + return;
>> +
>
> One thing I'm a bit worried about with such a simple approach is that
> we might have different roam timeouts. For example, the 60 second
> roam retry interval is active and we get a low rssi event. In this
> case we probably should schedule a roam sooner. Maybe we should
> book-keep timeout information better better and make sure the timeout
> happens within the minimum time (active time remaining / new timeout)?
Sure, the thought crossed my mind as well. Looks like there is no way to
get the active timeout within the l_timeout itself, it doesn't track
that so we can manually.
>
>> /* Set a 5-second initial timeout */
>> station_roam_timeout_rearm(station, 5);
>> }
>
> Regards,
> -Denis
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] station: check for roam timeout before rearming
2024-08-28 12:08 ` James Prestwood
@ 2024-08-28 14:41 ` Denis Kenzior
0 siblings, 0 replies; 6+ messages in thread
From: Denis Kenzior @ 2024-08-28 14:41 UTC (permalink / raw)
To: James Prestwood, iwd
Hi James,
>>
>> One thing I'm a bit worried about with such a simple approach is that we might
>> have different roam timeouts. For example, the 60 second roam retry interval
>> is active and we get a low rssi event. In this case we probably should
>> schedule a roam sooner. Maybe we should book-keep timeout information better
>> better and make sure the timeout happens within the minimum time (active time
>> remaining / new timeout)?
> Sure, the thought crossed my mind as well. Looks like there is no way to get the
> active timeout within the l_timeout itself, it doesn't track that so we can
> manually.
You might be able to add something into ell using timerfd_gettime.
Regards,
-Denis
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] station: check for roam timeout before rearming
@ 2024-08-29 11:27 James Prestwood
2024-09-03 15:18 ` Denis Kenzior
0 siblings, 1 reply; 6+ messages in thread
From: James Prestwood @ 2024-08-29 11:27 UTC (permalink / raw)
To: iwd; +Cc: James Prestwood
A user reported a crash which was due to the roam trigger timeout
being overwritten, followed by a disconnect. Post-disconnect the
timer would fire and result in a crash. Its not clear exactly where
the overwrite was happening but upon code inspection it could
happen in the following scenario:
1. Beacon loss event, start roam timeout
2. Signal low event, no check if timeout is running and the timeout
gets overwritten.
The reported crash actually didn't appear to be from the above
scenario but something else, so this logic is being hardened and
improved
Now if a roam timeout already exists and trying to be rearmed IWD
will check the time remaining on the current timer and either keep
the active timer or reschedule it to the lesser of the two values
(current or new rearm time). This will avoid cases such as a long
roam timer being active (e.g. 60 seconds) followed by a beacon or
packet loss event which should trigger a more agressive roam
schedule.
---
src/station.c | 48 +++++++++++++++++++++++++++++-------------------
1 file changed, 29 insertions(+), 19 deletions(-)
diff --git a/src/station.c b/src/station.c
index 30a1232a..8b29df77 100644
--- a/src/station.c
+++ b/src/station.c
@@ -102,7 +102,6 @@ struct station {
struct l_queue *owe_hidden_scan_ids;
/* Roaming related members */
- struct timespec roam_min_time;
struct l_timeout *roam_trigger_timeout;
uint32_t roam_scan_id;
uint8_t preauth_bssid[6];
@@ -1820,7 +1819,6 @@ static void station_roam_state_clear(struct station *station)
station->preparing_roam = false;
station->roam_scan_full = false;
station->signal_low = false;
- station->roam_min_time.tv_sec = 0;
station->netconfig_after_roam = false;
station->last_roam_scan = 0;
@@ -3041,20 +3039,33 @@ static void station_roam_trigger_cb(struct l_timeout *timeout, void *user_data)
static void station_roam_timeout_rearm(struct station *station, int seconds)
{
- struct timespec now, min_timeout;
+ uint64_t remaining;
- clock_gettime(CLOCK_MONOTONIC, &now);
+ if (!station->roam_trigger_timeout)
+ goto new_timeout;
- min_timeout = now;
- min_timeout.tv_sec += seconds;
+ /* If we cant get the remaining time just create a new timer */
+ if (L_WARN_ON(!l_timeout_remaining(station->roam_trigger_timeout,
+ &remaining))) {
+ l_timeout_remove(station->roam_trigger_timeout);
+ goto new_timeout;
+ }
+
+ /* Our current timeout is less than the rearm, keep current */
+ if (l_time_before(remaining, seconds * L_USEC_PER_SEC)) {
+ l_debug("Keeping current roam timeout of %lu seconds",
+ l_time_to_secs(remaining));
+ return;
+ }
+
+ l_debug("Rescheduling roam timeout from %lu to %u seconds",
+ l_time_to_secs(remaining), seconds);
+ l_timeout_modify(station->roam_trigger_timeout, seconds);
- if (station->roam_min_time.tv_sec < min_timeout.tv_sec ||
- (station->roam_min_time.tv_sec == min_timeout.tv_sec &&
- station->roam_min_time.tv_nsec < min_timeout.tv_nsec))
- station->roam_min_time = min_timeout;
+ return;
- seconds = station->roam_min_time.tv_sec - now.tv_sec +
- (station->roam_min_time.tv_nsec > now.tv_nsec ? 1 : 0);
+new_timeout:
+ l_debug("Arming new roam timer for %u seconds", seconds);
station->roam_trigger_timeout =
l_timeout_create(seconds, station_roam_trigger_cb,
@@ -3212,7 +3223,6 @@ static void station_ok_rssi(struct station *station)
station->roam_trigger_timeout = NULL;
station->signal_low = false;
- station->roam_min_time.tv_sec = 0;
}
static void station_event_roamed(struct station *station, struct scan_bss *new)
@@ -3558,14 +3568,17 @@ static void station_packets_lost(struct station *station, uint32_t num_pkts)
l_debug("Too many roam attempts in %u second timeframe, "
"delaying roam", LOSS_ROAM_RATE_LIMIT);
- if (station->roam_trigger_timeout)
- return;
-
station_roam_timeout_rearm(station, LOSS_ROAM_RATE_LIMIT);
return;
}
+ if (station->roam_trigger_timeout) {
+ l_debug("canceling roam timer to roam immediately");
+ l_timeout_remove(station->roam_trigger_timeout);
+ station->roam_trigger_timeout = NULL;
+ }
+
station_start_roam(station);
}
@@ -3578,9 +3591,6 @@ static void station_beacon_lost(struct station *station)
station_debug_event(station, "beacon-loss-roam");
- if (station->roam_trigger_timeout)
- return;
-
station_roam_timeout_rearm(station, LOSS_ROAM_RATE_LIMIT);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] station: check for roam timeout before rearming
2024-08-29 11:27 James Prestwood
@ 2024-09-03 15:18 ` Denis Kenzior
0 siblings, 0 replies; 6+ messages in thread
From: Denis Kenzior @ 2024-09-03 15:18 UTC (permalink / raw)
To: James Prestwood, iwd
Hi James,
On 8/29/24 6:27 AM, James Prestwood wrote:
> A user reported a crash which was due to the roam trigger timeout
> being overwritten, followed by a disconnect. Post-disconnect the
> timer would fire and result in a crash. Its not clear exactly where
> the overwrite was happening but upon code inspection it could
> happen in the following scenario:
>
> 1. Beacon loss event, start roam timeout
> 2. Signal low event, no check if timeout is running and the timeout
> gets overwritten.
>
> The reported crash actually didn't appear to be from the above
> scenario but something else, so this logic is being hardened and
> improved
>
> Now if a roam timeout already exists and trying to be rearmed IWD
> will check the time remaining on the current timer and either keep
> the active timer or reschedule it to the lesser of the two values
> (current or new rearm time). This will avoid cases such as a long
> roam timer being active (e.g. 60 seconds) followed by a beacon or
> packet loss event which should trigger a more agressive roam
> schedule.
> ---
> src/station.c | 48 +++++++++++++++++++++++++++++-------------------
> 1 file changed, 29 insertions(+), 19 deletions(-)
>
Applied, thanks.
Regards,
-Denis
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-09-03 15:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-27 14:10 [PATCH] station: check for roam timeout before rearming James Prestwood
2024-08-28 2:36 ` Denis Kenzior
2024-08-28 12:08 ` James Prestwood
2024-08-28 14:41 ` Denis Kenzior
-- strict thread matches above, loose matches on Subject: below --
2024-08-29 11:27 James Prestwood
2024-09-03 15:18 ` Denis Kenzior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox