From: James Prestwood <prestwoj@gmail.com>
To: Denis Kenzior <denkenz@gmail.com>, iwd@lists.linux.dev
Subject: Re: [PATCH 3/8] station: add handling for new NETCONFIG state
Date: Thu, 4 Jan 2024 10:31:03 -0800 [thread overview]
Message-ID: <ee118809-a4a5-410b-b552-bdb1ad2cf347@gmail.com> (raw)
In-Reply-To: <9196b264-5f4c-4830-88eb-b273c39489ad@gmail.com>
On 1/4/24 10:14 AM, Denis Kenzior wrote:
> Hi James,
>
> On 1/3/24 12:46, James Prestwood wrote:
>> There was an unhandled corner case if netconfig was running and
>> multiple roam conditions happened in sequence, all before netconfig
>> had completed. A single roam before netconfig was already handled
>> (23f0f5717c) but this did not take into account any additional roam
>> conditions.
>
> So if netconfig hasn't completed, we're in the 'connecting' state.
> Any subsequent roams should still be treated as if we are in
> 'connecting' state. Are we transitioning from connecting -> roaming at
> the D-Bus API level? We shouldn't be.
Yes we are as it is today.
>
> Another weirdness is that I think we're sending the d-bus reply after
> connecting to the AP, but before netconfig runs.
Probably this as well... But I'm not sure we can really send it after
netconfig unless we want to add a timeout error or something. If
netconfig takes a long time DBus will be unhappy. IMO netconfig is
somewhat of a special case in this regard, and consumers of the API
should be waiting on the connected state, not only the DBus method return.
>
>>
>> If IWD is in this state, having started netconfig, then roamed, and
>> again restarted netconfig it is still in a roaming state which will
>> prevent any further roams. IWD will remain "stuck" on the current
>> BSS until netconfig completes or gets disconnected.
>
> Makes sense, since roaming means netconfig isn't really doing anything.
>
>>
>> To fix this a new internal station state was added (no changes to
>> the DBus API) to distinguish between a purely WiFi connecting state
>> (STATION_STATE_CONNECTING/AUTO) and netconfig
>> (STATION_STATE_NETCONFIG). This allows IWD roam as needed if
>> netconfig is still running.
>
> Okay, but how would you distinguish between connecting -> netconfig
> and netconfig->roaming->netconfig?
I hadn't had a need to distinguish, but given the above of wanting to
remain the connecting state I think we'll need to.
>
>>
>> The change is mainly just adding STATION_STATE_NETCONFIG anywhere
>> that STATION_STATE_CONNECTING is to maintain the same behavior,
>> except within the netconfig event handler. In this case we should
>> never get here without being in either a NETCONFIG or ROAMING
>> state.
>>
>> For some background this scenario happens if the DHCP server goes
>> down for an extended period, e.g. if its being upgraded/serviced.
>> ---
>> src/station.c | 23 +++++++++++++++++------
>> 1 file changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/station.c b/src/station.c
>> index 57d22e91..8f310ec8 100644
>> --- a/src/station.c
>> +++ b/src/station.c
>> @@ -1768,6 +1768,7 @@ static void
>> station_reset_connection_state(struct station *station)
>> if (station->state == STATION_STATE_CONNECTED ||
>> station->state == STATION_STATE_CONNECTING ||
>> station->state == STATION_STATE_CONNECTING_AUTO ||
>> + station->state == STATION_STATE_NETCONFIG ||
>> station_is_roaming(station))
>> network_disconnected(network);
>> }
>> @@ -2043,8 +2044,9 @@ static void
>> station_netconfig_event_handler(enum netconfig_event event,
>> dbus_pending_reply(&station->connect_pending, reply);
>> }
>> - if (L_IN_SET(station->state, STATION_STATE_CONNECTING,
>> - STATION_STATE_CONNECTING_AUTO))
>> + if (L_IN_SET(station->state, STATION_STATE_NETCONFIG,
>> + STATION_STATE_ROAMING, STATION_STATE_FT_ROAMING,
>> + STATION_STATE_FW_ROAMING))
>
> I understand why NETCONFIG state is in the set, but why the others?
This was because if we are roaming and netconfig fails for some reason
we want to disconnect, right?
>
>> network_connect_failed(station->connected_network,
>> false);
>> @@ -2070,9 +2072,14 @@ static bool netconfig_after_roam(struct
>> station *station)
>> network_get_settings(network)))
>> return false;
>> - return netconfig_configure(station->netconfig,
>> + if (L_WARN_ON(!netconfig_configure(station->netconfig,
>> station_netconfig_event_handler,
>> - station);
>> + station)))
>
> You already have an L_WARN_ON in the single call site of
> netconfig_after_roam?
>
>> + return false;
>> +
>> + station_enter_state(station, STATION_STATE_NETCONFIG);
>> +
>> + return true;
>> }
>> static void station_roamed(struct station *station)
>> @@ -3255,6 +3262,8 @@ static void station_connect_ok(struct station
>> *station)
>> station_netconfig_event_handler,
>> station)))
>> return;
>> +
>> + station_enter_state(station, STATION_STATE_NETCONFIG);
>> } else
>> station_enter_state(station, STATION_STATE_CONNECTED);
>> }
>> @@ -4067,7 +4076,8 @@ static struct l_dbus_message
>> *station_dbus_scan(struct l_dbus *dbus,
>> return dbus_error_busy(message);
>> if (station->state == STATION_STATE_CONNECTING ||
>> - station->state == STATION_STATE_CONNECTING_AUTO)
>> + station->state == STATION_STATE_CONNECTING_AUTO ||
>> + station->state == STATION_STATE_NETCONFIG)
>
> Might as well use L_IN_SET here
>
>> return dbus_error_busy(message);
>> station->dbus_scan_subset_idx = 0;
>> @@ -5025,7 +5035,8 @@ static struct l_dbus_message
>> *station_debug_scan(struct l_dbus *dbus,
>> return dbus_error_busy(message);
>> if (station->state == STATION_STATE_CONNECTING ||
>> - station->state == STATION_STATE_CONNECTING_AUTO)
>> + station->state == STATION_STATE_CONNECTING_AUTO ||
>> + station->state == STATION_STATE_NETCONFIG)
>
> Also, shouldn't this also cover the roaming states? And do we still
> need this check given wiphy_work? Does netconfig use wiphy work to
> make sure nothing tries to scan or go off-channel?
Yes, we shouldn't scan/offchannel while roaming. And no netconfig
doesn't uses the wiphy queue, but it really should.
>
>> return dbus_error_busy(message);
>> if (!l_dbus_message_get_arguments(message, "aq", &iter))
So sounds like I opened up a can of worms here :) We only noticed these
issues cropping up recently because of extended server upgrade times,
i.e. the DHCP server was down for a long time. Clients roamed and if
they didn't have the recent changes to restart netconfig they'd be stuck
connected without an IP. I then noticed some of these other limitations
now, but at least currently being unable to roam is better than having
no IP and requiring physical attention.
> Regards,
> -Denis
next prev parent reply other threads:[~2024-01-04 18:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-03 18:46 [PATCH 1/8] station: handle netconfig after roaming for FW roams James Prestwood
2024-01-03 18:46 ` [PATCH 2/8] station: add additional internal state, STATION_STATE_NETCONFIG James Prestwood
2024-01-03 18:46 ` [PATCH 3/8] station: add handling for new NETCONFIG state James Prestwood
2024-01-04 18:14 ` Denis Kenzior
2024-01-04 18:31 ` James Prestwood [this message]
2024-01-04 18:55 ` Denis Kenzior
2024-01-04 19:55 ` James Prestwood
2024-01-04 21:01 ` Denis Kenzior
2024-01-03 18:46 ` [PATCH 4/8] station: add debug events for internal states James Prestwood
2024-01-04 17:57 ` Denis Kenzior
2024-01-03 18:46 ` [PATCH 5/8] auto-t: update roam test to use new debug events James Prestwood
2024-01-04 17:58 ` Denis Kenzior
2024-01-03 18:46 ` [PATCH 6/8] auto-t: add test for roaming + netconfig James Prestwood
2024-01-03 18:46 ` [PATCH 7/8] auto-t: improve failure handling in testPSK-roam James Prestwood
2024-01-04 18:00 ` Denis Kenzior
2024-01-03 18:46 ` [PATCH 8/8] auto-t: fix random testPSK-roam failure James Prestwood
2024-01-04 18:00 ` Denis Kenzior
2024-01-04 17:56 ` [PATCH 1/8] station: handle netconfig after roaming for FW roams Denis Kenzior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ee118809-a4a5-410b-b552-bdb1ad2cf347@gmail.com \
--to=prestwoj@gmail.com \
--cc=denkenz@gmail.com \
--cc=iwd@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox