public inbox for iwd@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH 0/1] Retry when ifup fails
@ 2023-12-14 19:03 James Prestwood
  2023-12-14 19:03 ` [PATCH 1/1] netdev: retry on failed ifup James Prestwood
  2023-12-14 20:09 ` [PATCH 0/1] Retry when ifup fails James Prestwood
  0 siblings, 2 replies; 3+ messages in thread
From: James Prestwood @ 2023-12-14 19:03 UTC (permalink / raw)
  To: iwd; +Cc: James Prestwood

For some background, at its core this is a driver issue. The problem
seemed to happen much more frequently when power save was disabled
while the interface was down (this was reordered in a prior patch)
but I've received two reports of this happening now even after
that change was made. The driver logs are the same in both cases.
It appears to be some race between bringing the interface down and
up quickly.

This issue seems to have cropped up somewhat recently, in the last
few months (or nobody was reporting it). This specific code path
hasn't changed in a long time so I suspect other IWD changes, or gcc
variations slightly altered timing/scheduling and exposed this ath10k
bug.

After applying this patch I did see it happen again and the retry
was able to bring the interface back up successufully so to me this
seems like a viable option until the driver is fixed (if it ever is).

I do have an open thread with some ath10k engineers about this.

The one occurrence I've seen since the workaround was applied:

iwd[1571924]: src/manager.c:manager_new_p2p_interface_cb()
iwd[1571924]: src/p2p.c:p2p_device_update_from_genl() Created P2P device 15
kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
iwd[1571924]: Error bringing interface 14 up: Connection timed out, retrying in 1s
kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
iwd[1571924]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
iwd[1571924]: src/netdev.c:netdev_set_4addr() netdev: 14 use_4addr: 0
iwd[1571924]: src/netdev.c:netdev_initial_up_cb() Interface 14 initialized
iwd[1571924]: src/netconfig.c:netconfig_new() Creating netconfig for interface: 14
iwd[1571924]: src/station.c:station_enter_state() Old State: disconnected, new state: autoconnect_quick

James Prestwood (1):
  netdev: retry on failed ifup

 src/netdev.c | 41 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] netdev: retry on failed ifup
  2023-12-14 19:03 [PATCH 0/1] Retry when ifup fails James Prestwood
@ 2023-12-14 19:03 ` James Prestwood
  2023-12-14 20:09 ` [PATCH 0/1] Retry when ifup fails James Prestwood
  1 sibling, 0 replies; 3+ messages in thread
From: James Prestwood @ 2023-12-14 19:03 UTC (permalink / raw)
  To: iwd; +Cc: James Prestwood

If the interface fails to be brought up (except for RFKILL) continue
to retry. Certain wireless hardware isn't very reliable when it
comes to this code path and its been seen (on ath10k) that rarely
this call fails. It was originally thought to be tied to power save
but recently it was happening without the disable power save setting.

For headless devices its very important IWD starts up and connects
so to guard against buggy hardware continue to retry bringing the
interface up.
---
 src/netdev.c | 41 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/src/netdev.c b/src/netdev.c
index 522baf7a..6f666178 100644
--- a/src/netdev.c
+++ b/src/netdev.c
@@ -139,6 +139,7 @@ struct netdev {
 	struct l_timeout *sa_query_timeout;
 	struct l_timeout *sa_query_delay;
 	struct l_timeout *group_handshake_timeout;
+	struct l_timeout *ifup_retry_timeout;
 	uint16_t sa_query_id;
 	int8_t rssi_levels[16];
 	uint8_t rssi_levels_num;
@@ -1027,6 +1028,11 @@ static void netdev_free(void *data)
 		netdev->get_link_cmd_id = 0;
 	}
 
+	if (netdev->ifup_retry_timeout) {
+		l_timeout_remove(netdev->ifup_retry_timeout);
+		netdev->ifup_retry_timeout = NULL;
+	}
+
 	scan_wdev_remove(netdev->wdev_id);
 
 	watchlist_destroy(&netdev->station_watches);
@@ -5884,6 +5890,21 @@ static bool netdev_disable_power_save(struct netdev *netdev)
 	return true;
 }
 
+static void netdev_initial_up_cb(int error, uint16_t type, const void *data,
+					uint32_t len, void *user_data);
+
+static void netdev_initial_up_retry(struct l_timeout *timeout, void *user_data)
+{
+	struct netdev *netdev = user_data;
+
+	l_timeout_remove(timeout);
+	netdev->ifup_retry_timeout = NULL;
+
+	netdev->set_powered_cmd_id =
+		l_rtnl_set_powered(rtnl, netdev->index, true,
+					netdev_initial_up_cb, netdev, NULL);
+}
+
 static void netdev_initial_up_cb(int error, uint16_t type, const void *data,
 					uint32_t len, void *user_data)
 {
@@ -5891,14 +5912,20 @@ static void netdev_initial_up_cb(int error, uint16_t type, const void *data,
 
 	netdev->set_powered_cmd_id = 0;
 
-	if (!error)
+	switch (error) {
+	case 0:
 		netdev->ifi_flags |= IFF_UP;
-	else {
-		l_error("Error bringing interface %i up: %s", netdev->index,
-			strerror(-error));
-
-		if (error != -ERFKILL)
-			return;
+		break;
+	case -ERFKILL:
+		l_error("Error bringing interface %i up due to RFKILL",
+					netdev->index);
+		break;
+	default:
+		l_error("Error bringing interface %i up: %s, retrying in 1s",
+					netdev->index, strerror(-error));
+		netdev->ifup_retry_timeout = l_timeout_create(1,
+				netdev_initial_up_retry, netdev, NULL);
+		return;
 	}
 
 	l_rtnl_set_linkmode_and_operstate(rtnl, netdev->index,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/1] Retry when ifup fails
  2023-12-14 19:03 [PATCH 0/1] Retry when ifup fails James Prestwood
  2023-12-14 19:03 ` [PATCH 1/1] netdev: retry on failed ifup James Prestwood
@ 2023-12-14 20:09 ` James Prestwood
  1 sibling, 0 replies; 3+ messages in thread
From: James Prestwood @ 2023-12-14 20:09 UTC (permalink / raw)
  To: iwd

On 12/14/23 11:03, James Prestwood wrote:
> For some background, at its core this is a driver issue. The problem
> seemed to happen much more frequently when power save was disabled
> while the interface was down (this was reordered in a prior patch)
> but I've received two reports of this happening now even after
> that change was made. The driver logs are the same in both cases.
> It appears to be some race between bringing the interface down and
> up quickly.
>
> This issue seems to have cropped up somewhat recently, in the last
> few months (or nobody was reporting it). This specific code path
> hasn't changed in a long time so I suspect other IWD changes, or gcc
> variations slightly altered timing/scheduling and exposed this ath10k
> bug.
>
> After applying this patch I did see it happen again and the retry
> was able to bring the interface back up successufully so to me this
> seems like a viable option until the driver is fixed (if it ever is).
>
> I do have an open thread with some ath10k engineers about this.
>
> The one occurrence I've seen since the workaround was applied:
>
> iwd[1571924]: src/manager.c:manager_new_p2p_interface_cb()
> iwd[1571924]: src/p2p.c:p2p_device_update_from_genl() Created P2P device 15
> kernel: ath10k_pci 0000:02:00.0: wmi service ready event not received
> iwd[1571924]: Error bringing interface 14 up: Connection timed out, retrying in 1s
> kernel: ath10k_pci 0000:02:00.0: Could not init core: -110
> iwd[1571924]: src/netdev.c:netdev_link_notify() event 16 on ifindex 14
> iwd[1571924]: src/netdev.c:netdev_set_4addr() netdev: 14 use_4addr: 0
> iwd[1571924]: src/netdev.c:netdev_initial_up_cb() Interface 14 initialized
> iwd[1571924]: src/netconfig.c:netconfig_new() Creating netconfig for interface: 14
> iwd[1571924]: src/station.c:station_enter_state() Old State: disconnected, new state: autoconnect_quick

After closer inspection, while the ifup succeeded, the driver still 
seemed to be in a bad state. Probably better to hold off until more is 
known. I suspect the interface needs to be completely removed and 
created again in order to get the driver working when this happens.

Reported on linux-wireless:

https://lore.kernel.org/linux-wireless/abbb7874-7f7f-423b-b67c-6ef850ae5bd6@gmail.com/T/#u

>
> James Prestwood (1):
>    netdev: retry on failed ifup
>
>   src/netdev.c | 41 ++++++++++++++++++++++++++++++++++-------
>   1 file changed, 34 insertions(+), 7 deletions(-)
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-12-14 20:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-14 19:03 [PATCH 0/1] Retry when ifup fails James Prestwood
2023-12-14 19:03 ` [PATCH 1/1] netdev: retry on failed ifup James Prestwood
2023-12-14 20:09 ` [PATCH 0/1] Retry when ifup fails James Prestwood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox