public inbox for iwd@lists.linux.dev
 help / color / mirror / Atom feed
From: James Prestwood <prestwoj@gmail.com>
To: Denis Kenzior <denkenz@gmail.com>, iwd@lists.linux.dev
Subject: Re: [PATCH 3/8] station: add handling for new NETCONFIG state
Date: Thu, 4 Jan 2024 10:31:03 -0800	[thread overview]
Message-ID: <ee118809-a4a5-410b-b552-bdb1ad2cf347@gmail.com> (raw)
In-Reply-To: <9196b264-5f4c-4830-88eb-b273c39489ad@gmail.com>


On 1/4/24 10:14 AM, Denis Kenzior wrote:
> Hi James,
>
> On 1/3/24 12:46, James Prestwood wrote:
>> There was an unhandled corner case if netconfig was running and
>> multiple roam conditions happened in sequence, all before netconfig
>> had completed. A single roam before netconfig was already handled
>> (23f0f5717c) but this did not take into account any additional roam
>> conditions.
>
> So if netconfig hasn't completed, we're in the 'connecting' state.  
> Any subsequent roams should still be treated as if we are in 
> 'connecting' state. Are we transitioning from connecting -> roaming at 
> the D-Bus API level? We shouldn't be.
Yes we are as it is today.
>
> Another weirdness is that I think we're sending the d-bus reply after 
> connecting to the AP, but before netconfig runs.
Probably this as well... But I'm not sure we can really send it after 
netconfig unless we want to add a timeout error or something. If 
netconfig takes a long time DBus will be unhappy. IMO netconfig is 
somewhat of a special case in this regard, and consumers of the API 
should be waiting on the connected state, not only the DBus method return.
>
>>
>> If IWD is in this state, having started netconfig, then roamed, and
>> again restarted netconfig it is still in a roaming state which will
>> prevent any further roams. IWD will remain "stuck" on the current
>> BSS until netconfig completes or gets disconnected.
>
> Makes sense, since roaming means netconfig isn't really doing anything.
>
>>
>> To fix this a new internal station state was added (no changes to
>> the DBus API) to distinguish between a purely WiFi connecting state
>> (STATION_STATE_CONNECTING/AUTO) and netconfig
>> (STATION_STATE_NETCONFIG). This allows IWD roam as needed if
>> netconfig is still running.
>
> Okay, but how would you distinguish between connecting -> netconfig 
> and netconfig->roaming->netconfig?
I hadn't had a need to distinguish, but given the above of wanting to 
remain the connecting state I think we'll need to.
>
>>
>> The change is mainly just adding STATION_STATE_NETCONFIG anywhere
>> that STATION_STATE_CONNECTING is to maintain the same behavior,
>> except within the netconfig event handler. In this case we should
>> never get here without being in either a NETCONFIG or ROAMING
>> state.
>>
>> For some background this scenario happens if the DHCP server goes
>> down for an extended period, e.g. if its being upgraded/serviced.
>> ---
>>   src/station.c | 23 +++++++++++++++++------
>>   1 file changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/station.c b/src/station.c
>> index 57d22e91..8f310ec8 100644
>> --- a/src/station.c
>> +++ b/src/station.c
>> @@ -1768,6 +1768,7 @@ static void 
>> station_reset_connection_state(struct station *station)
>>       if (station->state == STATION_STATE_CONNECTED ||
>>               station->state == STATION_STATE_CONNECTING ||
>>               station->state == STATION_STATE_CONNECTING_AUTO ||
>> +            station->state == STATION_STATE_NETCONFIG ||
>>               station_is_roaming(station))
>>           network_disconnected(network);
>>   }
>> @@ -2043,8 +2044,9 @@ static void 
>> station_netconfig_event_handler(enum netconfig_event event,
>> dbus_pending_reply(&station->connect_pending, reply);
>>           }
>>   -        if (L_IN_SET(station->state, STATION_STATE_CONNECTING,
>> -                STATION_STATE_CONNECTING_AUTO))
>> +        if (L_IN_SET(station->state, STATION_STATE_NETCONFIG,
>> +                STATION_STATE_ROAMING, STATION_STATE_FT_ROAMING,
>> +                STATION_STATE_FW_ROAMING))
>
> I understand why NETCONFIG state is in the set, but why the others?
This was because if we are roaming and netconfig fails for some reason 
we want to disconnect, right?
>
>> network_connect_failed(station->connected_network,
>>                           false);
>>   @@ -2070,9 +2072,14 @@ static bool netconfig_after_roam(struct 
>> station *station)
>>                       network_get_settings(network)))
>>           return false;
>>   -    return netconfig_configure(station->netconfig,
>> +    if (L_WARN_ON(!netconfig_configure(station->netconfig,
>>                       station_netconfig_event_handler,
>> -                    station);
>> +                    station)))
>
> You already have an L_WARN_ON in the single call site of 
> netconfig_after_roam?
>
>> +        return false;
>> +
>> +    station_enter_state(station, STATION_STATE_NETCONFIG);
>> +
>> +    return true;
>>   }
>>     static void station_roamed(struct station *station)
>> @@ -3255,6 +3262,8 @@ static void station_connect_ok(struct station 
>> *station)
>>                           station_netconfig_event_handler,
>>                           station)))
>>               return;
>> +
>> +        station_enter_state(station, STATION_STATE_NETCONFIG);
>>       } else
>>           station_enter_state(station, STATION_STATE_CONNECTED);
>>   }
>> @@ -4067,7 +4076,8 @@ static struct l_dbus_message 
>> *station_dbus_scan(struct l_dbus *dbus,
>>           return dbus_error_busy(message);
>>         if (station->state == STATION_STATE_CONNECTING ||
>> -            station->state == STATION_STATE_CONNECTING_AUTO)
>> +            station->state == STATION_STATE_CONNECTING_AUTO ||
>> +            station->state == STATION_STATE_NETCONFIG)
>
> Might as well use L_IN_SET here
>
>>           return dbus_error_busy(message);
>>         station->dbus_scan_subset_idx = 0;
>> @@ -5025,7 +5035,8 @@ static struct l_dbus_message 
>> *station_debug_scan(struct l_dbus *dbus,
>>           return dbus_error_busy(message);
>>         if (station->state == STATION_STATE_CONNECTING ||
>> -            station->state == STATION_STATE_CONNECTING_AUTO)
>> +            station->state == STATION_STATE_CONNECTING_AUTO ||
>> +            station->state == STATION_STATE_NETCONFIG)
>
> Also, shouldn't this also cover the roaming states?  And do we still 
> need this check given wiphy_work?  Does netconfig use wiphy work to 
> make sure nothing tries to scan or go off-channel?
Yes, we shouldn't scan/offchannel while roaming. And no netconfig 
doesn't uses the wiphy queue, but it really should.
>
>>           return dbus_error_busy(message);
>>         if (!l_dbus_message_get_arguments(message, "aq", &iter))
So sounds like I opened up a can of worms here :) We only noticed these 
issues cropping up recently because of extended server upgrade times, 
i.e. the DHCP server was down for a long time. Clients roamed and if 
they didn't have the recent changes to restart netconfig they'd be stuck 
connected without an IP. I then noticed some of these other limitations 
now, but at least currently being unable to roam is better than having 
no IP and requiring physical attention.
> Regards,
> -Denis

  reply	other threads:[~2024-01-04 18:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-03 18:46 [PATCH 1/8] station: handle netconfig after roaming for FW roams James Prestwood
2024-01-03 18:46 ` [PATCH 2/8] station: add additional internal state, STATION_STATE_NETCONFIG James Prestwood
2024-01-03 18:46 ` [PATCH 3/8] station: add handling for new NETCONFIG state James Prestwood
2024-01-04 18:14   ` Denis Kenzior
2024-01-04 18:31     ` James Prestwood [this message]
2024-01-04 18:55       ` Denis Kenzior
2024-01-04 19:55         ` James Prestwood
2024-01-04 21:01           ` Denis Kenzior
2024-01-03 18:46 ` [PATCH 4/8] station: add debug events for internal states James Prestwood
2024-01-04 17:57   ` Denis Kenzior
2024-01-03 18:46 ` [PATCH 5/8] auto-t: update roam test to use new debug events James Prestwood
2024-01-04 17:58   ` Denis Kenzior
2024-01-03 18:46 ` [PATCH 6/8] auto-t: add test for roaming + netconfig James Prestwood
2024-01-03 18:46 ` [PATCH 7/8] auto-t: improve failure handling in testPSK-roam James Prestwood
2024-01-04 18:00   ` Denis Kenzior
2024-01-03 18:46 ` [PATCH 8/8] auto-t: fix random testPSK-roam failure James Prestwood
2024-01-04 18:00   ` Denis Kenzior
2024-01-04 17:56 ` [PATCH 1/8] station: handle netconfig after roaming for FW roams Denis Kenzior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee118809-a4a5-410b-b552-bdb1ad2cf347@gmail.com \
    --to=prestwoj@gmail.com \
    --cc=denkenz@gmail.com \
    --cc=iwd@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox