Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH wpan-next v3 04/11] net: mac802154: Follow the count of ongoing transmissions
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

In order to create a synchronous API for MLME command purposes, we need
to be able to track the end of the ongoing transmissions. Let's
introduce an atomic variable which is incremented when a transmission
starts and decremented when relevant so that we know at any moment
whether there is an ongoing transmission.

The counter gets decremented in the following situations:
- The operation is asynchronous and there was a failure during the
  offloading process.
- The operation is synchronous and the synchronous operation failed.
- The operation finished, either successfully or not.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 include/net/cfg802154.h | 3 +++
 net/mac802154/tx.c      | 3 +++
 net/mac802154/util.c    | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 85f9e8417688..473ebcb9b155 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -214,6 +214,9 @@ struct wpan_phy {
 	/* the network namespace this phy lives in currently */
 	possible_net_t _net;
 
+	/* Transmission monitoring */
+	atomic_t ongoing_txs;
+
 	char priv[] __aligned(NETDEV_ALIGN);
 };
 
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 4a46ce8d2ac8..33f64ecd96c7 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -44,6 +44,7 @@ void ieee802154_xmit_sync_worker(struct work_struct *work)
 err_tx:
 	/* Restart the netif queue on each sub_if_data object. */
 	ieee802154_wake_queue(&local->hw);
+	atomic_dec(&local->phy->ongoing_txs);
 	kfree_skb(skb);
 	netdev_dbg(dev, "transmission failed\n");
 }
@@ -75,6 +76,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 
 	/* Stop the netif queue on each sub_if_data object. */
 	ieee802154_stop_queue(&local->hw);
+	atomic_inc(&local->phy->ongoing_txs);
 
 	/* Drivers should preferably implement the async callback. In some rare
 	 * cases they only provide a sync callback which we will use as a
@@ -98,6 +100,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 
 err_wake_netif_queue:
 	ieee802154_wake_queue(&local->hw);
+	atomic_dec(&local->phy->ongoing_txs);
 err_free_skb:
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index 9f024d85563b..76dc663e2af4 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -88,6 +88,7 @@ void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb,
 	}
 
 	dev_consume_skb_any(skb);
+	atomic_dec(&hw->phy->ongoing_txs);
 }
 EXPORT_SYMBOL(ieee802154_xmit_complete);
 
@@ -99,6 +100,7 @@ void ieee802154_xmit_error(struct ieee802154_hw *hw, struct sk_buff *skb,
 	local->tx_result = reason;
 	ieee802154_wake_queue(hw);
 	dev_kfree_skb_any(skb);
+	atomic_dec(&hw->phy->ongoing_txs);
 }
 EXPORT_SYMBOL(ieee802154_xmit_error);
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH wpan-next v3 05/11] net: mac802154: Bring the ability to hold the transmit queue
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

Create a hold_txs atomic variable and increment/decrement it when
relevant, ie. when we want to hold the queue or release it: currently
all the "stopped" situations are suitable, but very soon we will more
extensively use this feature for MLME purposes.

Upon release, the atomic counter is decremented and checked. If it is
back to 0, then the netif queue gets woken up. This makes the whole
process fully transparent, provided that all the users of
ieee802154_wake/stop_queue() now call ieee802154_hold/release_queue()
instead.

In no situation individual drivers should call any of these helpers
manually in order to avoid messing with the counters. There are other
functions more suited for this purpose which have been introduced, such
as the _xmit_complete() and _xmit_error() helpers which will handle all
that for them.

One advantage is that, as no more drivers call the stop/wake helpers
directly, we can safely stop exporting them and only declare the
hold/release ones in a header only accessible to the core.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 include/net/cfg802154.h      |  6 +++--
 include/net/mac802154.h      | 27 -------------------
 net/ieee802154/core.c        |  2 ++
 net/mac802154/cfg.c          |  4 +--
 net/mac802154/ieee802154_i.h | 19 +++++++++++++
 net/mac802154/tx.c           |  6 ++---
 net/mac802154/util.c         | 52 +++++++++++++++++++++++++++++++-----
 7 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 473ebcb9b155..7a191418f258 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -11,7 +11,7 @@
 
 #include <linux/ieee802154.h>
 #include <linux/netdevice.h>
-#include <linux/mutex.h>
+#include <linux/spinlock.h>
 #include <linux/bug.h>
 
 #include <net/nl802154.h>
@@ -214,8 +214,10 @@ struct wpan_phy {
 	/* the network namespace this phy lives in currently */
 	possible_net_t _net;
 
-	/* Transmission monitoring */
+	/* Transmission monitoring and control */
+	spinlock_t queue_lock;
 	atomic_t ongoing_txs;
+	atomic_t hold_txs;
 
 	char priv[] __aligned(NETDEV_ALIGN);
 };
diff --git a/include/net/mac802154.h b/include/net/mac802154.h
index bdac0ddbdcdb..357d25ef627a 100644
--- a/include/net/mac802154.h
+++ b/include/net/mac802154.h
@@ -460,33 +460,6 @@ void ieee802154_unregister_hw(struct ieee802154_hw *hw);
  */
 void ieee802154_rx_irqsafe(struct ieee802154_hw *hw, struct sk_buff *skb,
 			   u8 lqi);
-/**
- * ieee802154_wake_queue - wake ieee802154 queue
- * @hw: pointer as obtained from ieee802154_alloc_hw().
- *
- * Tranceivers usually have either one transmit framebuffer or one framebuffer
- * for both transmitting and receiving. Hence, the core currently only handles
- * one frame at a time for each phy, which means we had to stop the queue to
- * avoid new skb to come during the transmission. The queue then needs to be
- * woken up after the operation.
- *
- * Drivers should use this function instead of netif_wake_queue.
- */
-void ieee802154_wake_queue(struct ieee802154_hw *hw);
-
-/**
- * ieee802154_stop_queue - stop ieee802154 queue
- * @hw: pointer as obtained from ieee802154_alloc_hw().
- *
- * Tranceivers usually have either one transmit framebuffer or one framebuffer
- * for both transmitting and receiving. Hence, the core currently only handles
- * one frame at a time for each phy, which means we need to tell upper layers to
- * stop giving us new skbs while we are busy with the transmitted one. The queue
- * must then be stopped before transmitting.
- *
- * Drivers should use this function instead of netif_stop_queue.
- */
-void ieee802154_stop_queue(struct ieee802154_hw *hw);
 
 /**
  * ieee802154_xmit_complete - frame transmission complete
diff --git a/net/ieee802154/core.c b/net/ieee802154/core.c
index de259b5170ab..47a4de6df88b 100644
--- a/net/ieee802154/core.c
+++ b/net/ieee802154/core.c
@@ -130,6 +130,8 @@ wpan_phy_new(const struct cfg802154_ops *ops, size_t priv_size)
 
 	init_waitqueue_head(&rdev->dev_wait);
 
+	spin_lock_init(&rdev->wpan_phy.queue_lock);
+
 	return &rdev->wpan_phy;
 }
 EXPORT_SYMBOL(wpan_phy_new);
diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c
index 1e4a9f74ed43..b51100fd9e3f 100644
--- a/net/mac802154/cfg.c
+++ b/net/mac802154/cfg.c
@@ -46,7 +46,7 @@ static int ieee802154_suspend(struct wpan_phy *wpan_phy)
 	if (!local->open_count)
 		goto suspend;
 
-	ieee802154_stop_queue(&local->hw);
+	ieee802154_hold_queue(local);
 	synchronize_net();
 
 	/* stop hardware - this must stop RX */
@@ -72,7 +72,7 @@ static int ieee802154_resume(struct wpan_phy *wpan_phy)
 		return ret;
 
 wake_up:
-	ieee802154_wake_queue(&local->hw);
+	ieee802154_release_queue(local);
 	local->suspended = false;
 	return 0;
 }
diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index a8b7b9049f14..0c7ff9e0b632 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -130,6 +130,25 @@ netdev_tx_t
 ieee802154_subif_start_xmit(struct sk_buff *skb, struct net_device *dev);
 enum hrtimer_restart ieee802154_xmit_ifs_timer(struct hrtimer *timer);
 
+/**
+ * ieee802154_hold_queue - hold ieee802154 queue
+ * @local: main mac object
+ *
+ * Hold a queue by incrementing an atomic counter and requesting the netif
+ * queues to be stopped. The queues cannot be woken up while the counter has not
+ * been reset with as any ieee802154_release_queue() calls as needed.
+ */
+void ieee802154_hold_queue(struct ieee802154_local *local);
+
+/**
+ * ieee802154_release_queue - release ieee802154 queue
+ * @local: main mac object
+ *
+ * Release a queue which is held by decrementing an atomic counter and wake it
+ * up only if the counter reaches 0.
+ */
+void ieee802154_release_queue(struct ieee802154_local *local);
+
 /* MIB callbacks */
 void mac802154_dev_set_page_channel(struct net_device *dev, u8 page, u8 chan);
 
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 33f64ecd96c7..6a53c83cf039 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -43,7 +43,7 @@ void ieee802154_xmit_sync_worker(struct work_struct *work)
 
 err_tx:
 	/* Restart the netif queue on each sub_if_data object. */
-	ieee802154_wake_queue(&local->hw);
+	ieee802154_release_queue(local);
 	atomic_dec(&local->phy->ongoing_txs);
 	kfree_skb(skb);
 	netdev_dbg(dev, "transmission failed\n");
@@ -75,7 +75,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 	}
 
 	/* Stop the netif queue on each sub_if_data object. */
-	ieee802154_stop_queue(&local->hw);
+	ieee802154_hold_queue(local);
 	atomic_inc(&local->phy->ongoing_txs);
 
 	/* Drivers should preferably implement the async callback. In some rare
@@ -99,7 +99,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 	return NETDEV_TX_OK;
 
 err_wake_netif_queue:
-	ieee802154_wake_queue(&local->hw);
+	ieee802154_release_queue(local);
 	atomic_dec(&local->phy->ongoing_txs);
 err_free_skb:
 	kfree_skb(skb);
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index 76dc663e2af4..6176cc40df91 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -13,7 +13,17 @@
 /* privid for wpan_phys to determine whether they belong to us or not */
 const void *const mac802154_wpan_phy_privid = &mac802154_wpan_phy_privid;
 
-void ieee802154_wake_queue(struct ieee802154_hw *hw)
+/**
+ * ieee802154_wake_queue - wake ieee802154 queue
+ * @local: main mac object
+ *
+ * Tranceivers usually have either one transmit framebuffer or one framebuffer
+ * for both transmitting and receiving. Hence, the core currently only handles
+ * one frame at a time for each phy, which means we had to stop the queue to
+ * avoid new skb to come during the transmission. The queue then needs to be
+ * woken up after the operation.
+ */
+static void ieee802154_wake_queue(struct ieee802154_hw *hw)
 {
 	struct ieee802154_local *local = hw_to_local(hw);
 	struct ieee802154_sub_if_data *sdata;
@@ -27,9 +37,18 @@ void ieee802154_wake_queue(struct ieee802154_hw *hw)
 	}
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(ieee802154_wake_queue);
 
-void ieee802154_stop_queue(struct ieee802154_hw *hw)
+/**
+ * ieee802154_stop_queue - stop ieee802154 queue
+ * @local: main mac object
+ *
+ * Tranceivers usually have either one transmit framebuffer or one framebuffer
+ * for both transmitting and receiving. Hence, the core currently only handles
+ * one frame at a time for each phy, which means we need to tell upper layers to
+ * stop giving us new skbs while we are busy with the transmitted one. The queue
+ * must then be stopped before transmitting.
+ */
+static void ieee802154_stop_queue(struct ieee802154_hw *hw)
 {
 	struct ieee802154_local *local = hw_to_local(hw);
 	struct ieee802154_sub_if_data *sdata;
@@ -43,14 +62,33 @@ void ieee802154_stop_queue(struct ieee802154_hw *hw)
 	}
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(ieee802154_stop_queue);
+
+void ieee802154_hold_queue(struct ieee802154_local *local)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&local->phy->queue_lock, flags);
+	ieee802154_stop_queue(&local->hw);
+	atomic_inc(&local->phy->hold_txs);
+	spin_unlock_irqrestore(&local->phy->queue_lock, flags);
+}
+
+void ieee802154_release_queue(struct ieee802154_local *local)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&local->phy->queue_lock, flags);
+	if (!atomic_dec_and_test(&local->phy->hold_txs))
+		ieee802154_wake_queue(&local->hw);
+	spin_unlock_irqrestore(&local->phy->queue_lock, flags);
+}
 
 enum hrtimer_restart ieee802154_xmit_ifs_timer(struct hrtimer *timer)
 {
 	struct ieee802154_local *local =
 		container_of(timer, struct ieee802154_local, ifs_timer);
 
-	ieee802154_wake_queue(&local->hw);
+	ieee802154_release_queue(local);
 
 	return HRTIMER_NORESTART;
 }
@@ -84,7 +122,7 @@ void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb,
 				      hw->phy->sifs_period * NSEC_PER_USEC,
 				      HRTIMER_MODE_REL);
 	} else {
-		ieee802154_wake_queue(hw);
+		ieee802154_release_queue(local);
 	}
 
 	dev_consume_skb_any(skb);
@@ -98,7 +136,7 @@ void ieee802154_xmit_error(struct ieee802154_hw *hw, struct sk_buff *skb,
 	struct ieee802154_local *local = hw_to_local(hw);
 
 	local->tx_result = reason;
-	ieee802154_wake_queue(hw);
+	ieee802154_release_queue(local);
 	dev_kfree_skb_any(skb);
 	atomic_dec(&hw->phy->ongoing_txs);
 }
-- 
2.34.1


^ permalink raw reply related

* [PATCH wpan-next v3 06/11] net: mac802154: Create a hot tx path
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

Let's rename the current Tx path to show that this is the "hot" Tx
path. We will soon introduce a slower Tx path for MLME commands.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/tx.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 6a53c83cf039..607019b8f8ab 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -106,6 +106,12 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 	return NETDEV_TX_OK;
 }
 
+static netdev_tx_t
+ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
+{
+	return ieee802154_tx(local, skb);
+}
+
 netdev_tx_t
 ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
@@ -113,7 +119,7 @@ ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	skb->skb_iif = dev->ifindex;
 
-	return ieee802154_tx(sdata->local, skb);
+	return ieee802154_hot_tx(sdata->local, skb);
 }
 
 netdev_tx_t
@@ -135,5 +141,5 @@ ieee802154_subif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	skb->skb_iif = dev->ifindex;
 
-	return ieee802154_tx(sdata->local, skb);
+	return ieee802154_hot_tx(sdata->local, skb);
 }
-- 
2.34.1


^ permalink raw reply related

* [PATCH wpan-next v3 09/11] net: mac802154: Introduce a synchronous API for MLME commands
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

This is the slow path, we need to wait for each command to be processed
before continuing so let's introduce an helper which does the
transmission and blocks until it gets notified of its asynchronous
completion. This helper is going to be used when introducing scan
support.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/ieee802154_i.h |  1 +
 net/mac802154/tx.c           | 46 ++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index a057827fc48a..b42c6ac789f5 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -125,6 +125,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
 void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
 void ieee802154_xmit_sync_worker(struct work_struct *work);
 int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
+int ieee802154_mlme_tx_one(struct ieee802154_local *local, struct sk_buff *skb);
 netdev_tx_t
 ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
 netdev_tx_t
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 38f74b8b6740..6cc4e5c7ba94 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -128,6 +128,52 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
 	return ieee802154_sync_queue(local);
 }
 
+static int ieee802154_mlme_op_pre(struct ieee802154_local *local)
+{
+	return ieee802154_sync_and_hold_queue(local);
+}
+
+static int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *skb)
+{
+	int ret;
+
+	/* Avoid possible calls to ->ndo_stop() when we asynchronously perform
+	 * MLME transmissions.
+	 */
+	rtnl_lock();
+
+	/* Ensure the device was not stopped, otherwise error out */
+	if (!local->open_count)
+		return -EBUSY;
+
+	ieee802154_tx(local, skb);
+	ret = ieee802154_sync_queue(local);
+
+	rtnl_unlock();
+
+	return ret;
+}
+
+static void ieee802154_mlme_op_post(struct ieee802154_local *local)
+{
+	ieee802154_release_queue(local);
+}
+
+int ieee802154_mlme_tx_one(struct ieee802154_local *local, struct sk_buff *skb)
+{
+	int ret;
+
+	ret = ieee802154_mlme_op_pre(local);
+	if (ret)
+		return ret;
+
+	ret = ieee802154_mlme_tx(local, skb);
+
+	ieee802154_mlme_op_post(local);
+
+	return ret;
+}
+
 static netdev_tx_t
 ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
 {
-- 
2.34.1


^ permalink raw reply related

* [PATCH wpan-next v3 08/11] net: mac802154: Introduce a tx queue flushing mechanism
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

Right now we are able to stop a queue but we have no indication if a
transmission is ongoing or not.

Thanks to recent additions, we can track the number of ongoing
transmissions so we know if the last transmission is over. Adding on top
of it an internal wait queue also allows to be woken up asynchronously
when this happens. If, beforehands, we marked the queue to be held and
stopped it, we end up flushing and stopping the tx queue.

Thanks to this feature, we will soon be able to introduce a synchronous
transmit API.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 include/net/cfg802154.h      |  1 +
 net/ieee802154/core.c        |  1 +
 net/mac802154/cfg.c          |  2 +-
 net/mac802154/ieee802154_i.h |  1 +
 net/mac802154/tx.c           | 26 ++++++++++++++++++++++++--
 net/mac802154/util.c         |  6 ++++--
 6 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 7a191418f258..8881b6126b58 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -218,6 +218,7 @@ struct wpan_phy {
 	spinlock_t queue_lock;
 	atomic_t ongoing_txs;
 	atomic_t hold_txs;
+	wait_queue_head_t sync_txq;
 
 	char priv[] __aligned(NETDEV_ALIGN);
 };
diff --git a/net/ieee802154/core.c b/net/ieee802154/core.c
index 47a4de6df88b..57546e07e06a 100644
--- a/net/ieee802154/core.c
+++ b/net/ieee802154/core.c
@@ -129,6 +129,7 @@ wpan_phy_new(const struct cfg802154_ops *ops, size_t priv_size)
 	wpan_phy_net_set(&rdev->wpan_phy, &init_net);
 
 	init_waitqueue_head(&rdev->dev_wait);
+	init_waitqueue_head(&rdev->wpan_phy.sync_txq);
 
 	spin_lock_init(&rdev->wpan_phy.queue_lock);
 
diff --git a/net/mac802154/cfg.c b/net/mac802154/cfg.c
index b51100fd9e3f..93df24f75572 100644
--- a/net/mac802154/cfg.c
+++ b/net/mac802154/cfg.c
@@ -46,7 +46,7 @@ static int ieee802154_suspend(struct wpan_phy *wpan_phy)
 	if (!local->open_count)
 		goto suspend;
 
-	ieee802154_hold_queue(local);
+	ieee802154_sync_and_hold_queue(local);
 	synchronize_net();
 
 	/* stop hardware - this must stop RX */
diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index e34db1d49ef4..a057827fc48a 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -124,6 +124,7 @@ extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
 
 void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
 void ieee802154_xmit_sync_worker(struct work_struct *work);
+int ieee802154_sync_and_hold_queue(struct ieee802154_local *local);
 netdev_tx_t
 ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
 netdev_tx_t
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 607019b8f8ab..38f74b8b6740 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -44,7 +44,8 @@ void ieee802154_xmit_sync_worker(struct work_struct *work)
 err_tx:
 	/* Restart the netif queue on each sub_if_data object. */
 	ieee802154_release_queue(local);
-	atomic_dec(&local->phy->ongoing_txs);
+	if (!atomic_dec_and_test(&local->phy->ongoing_txs))
+		wake_up(&local->phy->sync_txq);
 	kfree_skb(skb);
 	netdev_dbg(dev, "transmission failed\n");
 }
@@ -100,12 +101,33 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 
 err_wake_netif_queue:
 	ieee802154_release_queue(local);
-	atomic_dec(&local->phy->ongoing_txs);
+	if (!atomic_dec_and_test(&local->phy->ongoing_txs))
+		wake_up(&local->phy->sync_txq);
 err_free_skb:
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
 }
 
+static int ieee802154_sync_queue(struct ieee802154_local *local)
+{
+	int ret;
+
+	ieee802154_hold_queue(local);
+	ieee802154_disable_queue(local);
+	wait_event(local->phy->sync_txq, !atomic_read(&local->phy->ongoing_txs));
+	ret = local->tx_result;
+	ieee802154_release_queue(local);
+
+	return ret;
+}
+
+int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
+{
+	ieee802154_hold_queue(local);
+
+	return ieee802154_sync_queue(local);
+}
+
 static netdev_tx_t
 ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
 {
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index 02645f57fc2a..cddb42984484 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -140,7 +140,8 @@ void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb,
 	}
 
 	dev_consume_skb_any(skb);
-	atomic_dec(&hw->phy->ongoing_txs);
+	if (!atomic_dec_and_test(&hw->phy->ongoing_txs))
+		wake_up(&hw->phy->sync_txq);
 }
 EXPORT_SYMBOL(ieee802154_xmit_complete);
 
@@ -152,7 +153,8 @@ void ieee802154_xmit_error(struct ieee802154_hw *hw, struct sk_buff *skb,
 	local->tx_result = reason;
 	ieee802154_release_queue(local);
 	dev_kfree_skb_any(skb);
-	atomic_dec(&hw->phy->ongoing_txs);
+	if (!atomic_dec_and_test(&hw->phy->ongoing_txs))
+		wake_up(&hw->phy->sync_txq);
 }
 EXPORT_SYMBOL(ieee802154_xmit_error);
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH wpan-next v3 10/11] net: mac802154: Add a warning in the hot path
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

We should never start a transmission after the queue has been stopped.

But because it might work we don't kill the function here but rather
warn loudly the user that something is wrong.

Set an atomic when the queue will remain stopped. Reset this atomic when
the queue actually gets restarded. Just check this atomic to know if the
transmission is legitimate, warn if it is not.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 include/net/cfg802154.h |  1 +
 net/mac802154/tx.c      | 16 +++++++++++++++-
 net/mac802154/util.c    |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 8881b6126b58..f4e7b3fe7cf0 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -218,6 +218,7 @@ struct wpan_phy {
 	spinlock_t queue_lock;
 	atomic_t ongoing_txs;
 	atomic_t hold_txs;
+	unsigned long queue_stopped;
 	wait_queue_head_t sync_txq;
 
 	char priv[] __aligned(NETDEV_ALIGN);
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 6cc4e5c7ba94..e36aca788ea2 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -123,9 +123,13 @@ static int ieee802154_sync_queue(struct ieee802154_local *local)
 
 int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
 {
+	int ret;
+
 	ieee802154_hold_queue(local);
+	ret = ieee802154_sync_queue(local);
+	set_bit(0, &local->phy->queue_stopped);
 
-	return ieee802154_sync_queue(local);
+	return ret;
 }
 
 static int ieee802154_mlme_op_pre(struct ieee802154_local *local)
@@ -174,9 +178,19 @@ int ieee802154_mlme_tx_one(struct ieee802154_local *local, struct sk_buff *skb)
 	return ret;
 }
 
+static bool ieee802154_queue_is_stopped(struct ieee802154_local *local)
+{
+	return test_bit(0, &local->phy->queue_stopped);
+}
+
 static netdev_tx_t
 ieee802154_hot_tx(struct ieee802154_local *local, struct sk_buff *skb)
 {
+	/* Warn if the net interface tries to transmit frames while the
+	 * ieee802154 core assumes the queue is stopped.
+	 */
+	WARN_ON_ONCE(ieee802154_queue_is_stopped(local));
+
 	return ieee802154_tx(local, skb);
 }
 
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index cddb42984484..faf12c2135bf 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -29,6 +29,7 @@ static void ieee802154_wake_queue(struct ieee802154_hw *hw)
 	struct ieee802154_sub_if_data *sdata;
 
 	rcu_read_lock();
+	clear_bit(0, &local->phy->queue_stopped);
 	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
 		if (!sdata->dev)
 			continue;
-- 
2.34.1


^ permalink raw reply related

* [PATCH wpan-next v3 07/11] net: mac802154: Introduce a helper to disable the queue
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

Sometimes calling the stop queue helper is not enough because it does
not hold any lock. In order to be safe and avoid racy situations when
trying to (soon) sync the Tx queue, for instance before sending an MLME
frame, let's now introduce an helper which actually hold the necessary
locks when doing so.

Suggested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/ieee802154_i.h | 12 ++++++++++++
 net/mac802154/util.c         | 14 ++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index 0c7ff9e0b632..e34db1d49ef4 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -149,6 +149,18 @@ void ieee802154_hold_queue(struct ieee802154_local *local);
  */
 void ieee802154_release_queue(struct ieee802154_local *local);
 
+/**
+ * ieee802154_disable_queue - disable ieee802154 queue
+ * @local: main mac object
+ *
+ * When trying to sync the Tx queue, we cannot just stop the queue
+ * (which is basically a bit being set without proper lock handling)
+ * because it would be racy. We actually need to call netif_tx_disable()
+ * instead, which is done by this helper. Restarting the queue can
+ * however still be done with a regular wake call.
+ */
+void ieee802154_disable_queue(struct ieee802154_local *local);
+
 /* MIB callbacks */
 void mac802154_dev_set_page_channel(struct net_device *dev, u8 page, u8 chan);
 
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index 6176cc40df91..02645f57fc2a 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -83,6 +83,20 @@ void ieee802154_release_queue(struct ieee802154_local *local)
 	spin_unlock_irqrestore(&local->phy->queue_lock, flags);
 }
 
+void ieee802154_disable_queue(struct ieee802154_local *local)
+{
+	struct ieee802154_sub_if_data *sdata;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
+		if (!sdata->dev)
+			continue;
+
+		netif_tx_disable(sdata->dev);
+	}
+	rcu_read_unlock();
+}
+
 enum hrtimer_restart ieee802154_xmit_ifs_timer(struct hrtimer *timer)
 {
 	struct ieee802154_local *local =
-- 
2.34.1


^ permalink raw reply related

* [PATCH wpan-next v3 11/11] net: mac802154: Add a warning in the slow path
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

In order to be able to detect possible conflicts between the net
interface core and the ieee802154 core, let's add a warning in the slow
path: we want to be sure that whenever we start an asynchronous MLME
transmission (which can be fully asynchronous) the net core somehow
agrees that this transmission is possible, ie. the device was not
stopped. Warning in this case would allow us to track down more easily
possible issues with the MLME logic if we ever get reports.

Unlike in the hot path, such a situation cannot be handled.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/tx.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index e36aca788ea2..53a8be822e33 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -132,6 +132,25 @@ int ieee802154_sync_and_hold_queue(struct ieee802154_local *local)
 	return ret;
 }
 
+static bool ieee802154_netif_is_down(struct ieee802154_local *local)
+{
+	struct ieee802154_sub_if_data *sdata;
+	bool is_down = true;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(sdata, &local->interfaces, list) {
+		if (!sdata->dev)
+			continue;
+
+		is_down = !(sdata->dev->flags & IFF_UP);
+		if (is_down)
+			break;
+	}
+	rcu_read_unlock();
+
+	return is_down;
+}
+
 static int ieee802154_mlme_op_pre(struct ieee802154_local *local)
 {
 	return ieee802154_sync_and_hold_queue(local);
@@ -150,6 +169,12 @@ static int ieee802154_mlme_tx(struct ieee802154_local *local, struct sk_buff *sk
 	if (!local->open_count)
 		return -EBUSY;
 
+	/* Warn if the ieee802154 core thinks MLME frames can be sent while the
+	 * net interface expects this cannot happen.
+	 */
+	if (WARN_ON_ONCE(ieee802154_netif_is_down(local)))
+		return -EHOSTDOWN;
+
 	ieee802154_tx(local, skb);
 	ret = ieee802154_sync_queue(local);
 
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
From: Guilherme G. Piccoli @ 2022-05-17 16:37 UTC (permalink / raw)
  To: Petr Mladek, Evan Green, David Gow, Julius Werner
  Cc: Scott Branden, bcm-kernel-feedback-list, Sebastian Reichel,
	Linux PM, Florian Fainelli, Andrew Morton, bhe, kexec, LKML,
	linuxppc-dev, linux-alpha, linux-arm Mailing List, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc,
	linux-remoteproc, linux-s390, linux-tegra, linux-um, linux-xtensa,
	netdev, openipmi-developer, rcu, sparclinux, xen-devel, x86,
	kernel-dev, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	Andy Shevchenko, Arnd Bergmann, Borislav Petkov, Jonathan Corbet,
	d.hatayama, dave.hansen, dyoung, feng.tang, Greg Kroah-Hartman,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, Kees Cook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky,
	Alan Stern, Thomas Gleixner, vgoyal, vkuznets, Will Deacon,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi, Stephen Boyd
In-Reply-To: <YoOi9PFK/JnNwH+D@alley>

On 17/05/2022 10:28, Petr Mladek wrote:
> [...]
>>> Disagree here. I'm looping Google maintainers, so they can comment.
>>> (CCed Evan, David, Julius)
>>>
>>> This notifier is clearly a hypervisor notification mechanism. I've fixed
>>> a locking stuff there (in previous patch), I feel it's low-risk but even
>>> if it's mid-risk, the class of such callback remains a perfect fit with
>>> the hypervisor list IMHO.
>>
>> This logs a panic to our "eventlog", a tiny logging area in SPI flash
>> for critical and power-related events. In some cases this ends up
>> being the only clue we get in a Chromebook feedback report that a
>> panic occurred, so from my perspective moving it to the front of the
>> line seems like a good idea.
> 
> IMHO, this would really better fit into the pre-reboot notifier list:
> 
>    + the callback stores the log so it is similar to kmsg_dump()
>      or console_flush_on_panic()
> 
>    + the callback should be proceed after "info" notifiers
>      that might add some other useful information.
> 
> Honestly, I am not sure what exactly hypervisor callbacks do. But I
> think that they do not try to extract the kernel log because they
> would need to handle the internal format.
> 

I guess the main point in your response is : "I am not sure what exactly
hypervisor callbacks do". We need to be sure about the semantics of such
list, and agree on that.

So, my opinion about this first list, that we call "hypervisor list",
is: it contains callbacks that

(1) should run early, preferably before kdump (or even if kdump isn't
set, should run ASAP);

(2) these callbacks perform some communication with an abstraction that
runs "below" the kernel, like a firmware or hypervisor. Classic example:
pvpanic, that communicates with VMM (usually qemu) and allow such VMM to
snapshot the full guest memory, for example.

(3) Should be low-risk. What defines risk is the level of reliability of
subsequent operations - if the callback have 50% of chance of "bricking"
the system totally and prevent kdump / kmsg_dump() / reboot , this is
high risk one for example.

Some good fits IMO: pvpanic, sstate_panic_event() [sparc], fadump in
powerpc, etc.

So, this is a good case for the Google notifier as well - it's not
collecting data like the dmesg (hence your second bullet seems to not
apply here, info notifiers won't add info to be collected by gsmi). It
is a firmware/hypervisor/whatever-gsmi-is notification mechanism, that
tells such "lower" abstraction a panic occurred. It seems low risk and
we want it to run ASAP, if possible.

So, I'd like to keep it here, unless gsmi maintainers disagree or I'm
perhaps misunderstanding the meaning of this first list.
Cheers,

Guilherme

^ permalink raw reply

* Re: [PATCH 19/30] panic: Add the panic hypervisor notifier list
From: Guilherme G. Piccoli @ 2022-05-17 16:42 UTC (permalink / raw)
  To: Petr Mladek, Scott Branden, Sebastian Reichel, Florian Fainelli
  Cc: David Gow, Evan Green, Julius Werner, bcm-kernel-feedback-list,
	linux-pm, akpm, bhe, kexec, linux-kernel, linuxppc-dev,
	linux-alpha, linux-arm-kernel, linux-edac, linux-hyperv,
	linux-leds, linux-mips, linux-parisc, linux-remoteproc,
	linux-s390, linux-tegra, linux-um, linux-xtensa, netdev,
	openipmi-developer, rcu, sparclinux, xen-devel, x86, kernel-dev,
	kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Alexander Gordeev, Andrea Parri, Ard Biesheuvel,
	Benjamin Herrenschmidt, Brian Norris, Christian Borntraeger,
	Christophe JAILLET, David S. Miller, Dexuan Cui, Doug Berger,
	Haiyang Zhang, Hari Bathini, Heiko Carstens, Justin Chen,
	K. Y. Srinivasan, Lee Jones, Markus Mayer, Michael Ellerman,
	Mihai Carabas, Nicholas Piggin, Paul Mackerras, Pavel Machek,
	Shile Zhang, Stephen Hemminger, Sven Schnelle,
	Thomas Bogendoerfer, Tianyu Lan, Vasily Gorbik, Wang ShaoBo,
	Wei Liu, zhenwei pi
In-Reply-To: <YoOpyW1+q+Z5as78@alley>

On 17/05/2022 10:57, Petr Mladek wrote:
> [...]
>>>> --- a/drivers/misc/bcm-vk/bcm_vk_dev.c
>>>> +++ b/drivers/misc/bcm-vk/bcm_vk_dev.c
>>>> @@ -1446,7 +1446,7 @@ static int bcm_vk_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>>> [... snip ...]
>>> It seems to reset some hardware or so. IMHO, it should go into the
>>> pre-reboot list.
>>
>> Mixed feelings here, I'm looping Broadcom maintainers to comment.
>> (CC Scott and Broadcom list)
>>
>> I'm afraid it breaks kdump if this device is not reset beforehand - it's
>> a doorbell write, so not high risk I think...
>>
>> But in case the not-reset device can be probed normally in kdump kernel,
>> then I'm fine in moving this to the reboot list! I don't have the HW to
>> test myself.
> 
> Good question. Well, it if has to be called before kdump then
> even "hypervisor" list is a wrong place because is not always
> called before kdump.

Agreed! I'll defer that to Scott and Broadcom folks to comment.
If it's not strictly necessary, I'll happily move it to the reboot list.

If necessary, we could use the machine_crash_kexec() approach, but we'll
fall into the case arm64 doesn't support it and I'm not sure if this
device is available for arm - again a question for the maintainers.


>  [...]
>>>> --- a/drivers/power/reset/ltc2952-poweroff.c
>>>> +++ b/drivers/power/reset/ltc2952-poweroff.c
>> [...]
>> This is setting a variable only, and once it's set (data->kernel_panic
>> is the bool's name), it just bails out the IRQ handler and a timer
>> setting - this timer seems kinda tricky, so bailing out ASAP makes sense
>> IMHO.
> 
> IMHO, the timer informs the hardware that the system is still alive
> in the middle of panic(). If the timer is not working then the
> hardware (chip) will think that the system frozen in panic()
> and will power off the system. See the comments in
> drivers/power/reset/ltc2952-poweroff.c:
> [.... snip ...]
> IMHO, we really have to keep it alive until we reach the reboot stage.
> 
> Another question is how it actually works when the interrupts are
> disabled during panic() and the timer callbacks are not handled.

Agreed here! Guess I can move this one the reboot list, fine by me.
Unless PM folks think otherwise.


> [...]
>> Disagree here, I'm CCing Florian for information.
>>
>> This notifier preserves RAM so it's *very interesting* if we have
>> kmsg_dump() for example, but maybe might be also relevant in case kdump
>> kernel is configured to store something in a persistent RAM (then,
>> without this notifier, after kdump reboots the system data would be lost).
> 
> I see. It is actually similar problem as with
> drivers/firmware/google/gsmi.c.
> 
> I does similar things like kmsg_dump() so it should be called in
> the same location (after info notifier list and before kdump).
> 
> A solution might be to put it at these notifiers at the very
> end of the "info" list or make extra "dump" notifier list.

Here I still disagree. I've commented in the other response thread
(about Google gsmi) about the semantics of the hypervisor list, but
again: this list should contain callbacks that

(a) Should run early, _by default_ before a kdump;
(b) Communicate with the firmware/hypervisor in a "low-risk" way;

Imagine a scenario where users configure kdump kernel to save something
in a persistent form in DRAM - it'd be like a late pstore, in the next
kernel. This callback enables that, it's meant to inform FW "hey, panic
happened, please from now on don't clear the RAM in the next FW-reboot".
I don't see a reason to postpone that - let's see if the maintainers
have an opinion.

Cheers,


Guilherme

^ permalink raw reply

* Re: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
From: Guilherme G. Piccoli @ 2022-05-17 16:45 UTC (permalink / raw)
  To: Petr Mladek, Luck, Tony, Dinh Nguyen
  Cc: akpm@linux-foundation.org, bhe@redhat.com,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	bcm-kernel-feedback-list@broadcom.com,
	linuxppc-dev@lists.ozlabs.org, linux-alpha@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-leds@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-parisc@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-remoteproc@vger.kernel.org, linux-s390@vger.kernel.org,
	linux-tegra@vger.kernel.org, linux-um@lists.infradead.org,
	linux-xtensa@linux-xtensa.org, netdev@vger.kernel.org,
	openipmi-developer@lists.sourceforge.net, rcu@vger.kernel.org,
	sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org,
	x86@kernel.org, kernel-dev@igalia.com, kernel@gpiccoli.net,
	halves@canonical.com, fabiomirmar@gmail.com,
	alejandro.j.jimenez@oracle.com, andriy.shevchenko@linux.intel.com,
	arnd@arndb.de, bp@alien8.de, corbet@lwn.net,
	d.hatayama@jp.fujitsu.com, dave.hansen@linux.intel.com,
	dyoung@redhat.com, Tang, Feng, gregkh@linuxfoundation.org,
	mikelley@microsoft.com, hidehiro.kawai.ez@hitachi.com,
	jgross@suse.com, john.ogness@linutronix.de, keescook@chromium.org,
	luto@kernel.org, mhiramat@kernel.org, mingo@redhat.com,
	paulmck@kernel.org, peterz@infradead.org, rostedt@goodmis.org,
	senozhatsky@chromium.org, stern@rowland.harvard.edu,
	tglx@linutronix.de, vgoyal@redhat.com, vkuznets@redhat.com,
	will@kernel.org, Alex Elder, Alexander Gordeev, Anton Ivanov,
	Benjamin Herrenschmidt, Bjorn Andersson, Boris Ostrovsky,
	Chris Zankel, Christian Borntraeger, Corey Minyard, Dexuan Cui,
	H. Peter Anvin, Haiyang Zhang, Heiko Carstens, Helge Deller,
	Ivan Kokshaysky, James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle,
	Vasily Gorbik, Wei Liu
In-Reply-To: <YoOs9GJ5Ovq63u5Q@alley>

On 17/05/2022 11:11, Petr Mladek wrote:
> [...]
>>> Then notifiers could make an informed choice on whether to deep dive to
>>> get all the possible details (when there is no kdump) or just skim the high
>>> level stuff (to maximize chance of getting a successful kdump).
>>>
>>> -Tony
>>
>> Good idea Tony! What if I wire a kexec_crash_loaded() in the notifier?
> 
> I like this idea.
> 
> One small problem is that kexec_crash_loaded() has valid result
> only under kexec_mutex. On the other hand, it should stay true
> once loaded so that the small race window should be innocent.
> 
>> With that, are you/Petr/Dinh OK in moving it for the info list?
> 
> Sounds good to me.
> 
> Best Regards,
> Petr

Perfect, I'll do that for V2 then =)

Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
we run the code as-is? Does that make sense to you?

I'll postpone it to run almost in the end of info list (last position is
for panic_print).

Thanks,


Guilherme

^ permalink raw reply

* Re: [PATCH v2 01/14] thermal/core: Change thermal_zone_ops to thermal_sensor_ops
From: srinivas pandruvada @ 2022-05-17 16:50 UTC (permalink / raw)
  To: Rafael J. Wysocki, Daniel Lezcano
  Cc: Daniel Lezcano, Kevin Hilman, Alexandre Bailon, Linux PM,
	Linux Kernel Mailing List, Amit Kucheria, Zhang Rui,
	Jonathan Corbet, Len Brown, Raju Rangoju, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Ido Schimmel, Petr Machata,
	Luca Coelho, Kalle Valo, Peter Kaestle, Hans de Goede, Mark Gross,
	Sebastian Reichel, Miquel Raynal, Support Opensource, Shawn Guo,
	Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	NXP Linux Team, Niklas Söderlund, Miri Korenblit,
	Johannes Berg, Sumeet Pawnikar, Dan Carpenter, Chuansheng Liu,
	Jiasheng Jiang, Antoine Tenart, Andy Shevchenko,
	open list:DOCUMENTATION, open list:ACPI THERMAL DRIVER,
	open list:CXGB4 ETHERNET DRIVER (CXGB4),
	open list:INTEL WIRELESS WIFI LINK (iwlwifi),
	open list:ACER ASPIRE ONE TEMPERATURE AND FAN DRIVER,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	open list:RENESAS R-CAR THERMAL DRIVERS
In-Reply-To: <CAJZ5v0ik_JQ4Awtw7iR68W4-9ZL8FRDsDd-kWmL-n09fgg3reg@mail.gmail.com>

On Tue, 2022-05-17 at 17:42 +0200, Rafael J. Wysocki wrote:
> On Sat, May 7, 2022 at 2:55 PM Daniel Lezcano
> <daniel.lezcano@linexp.org> wrote:
> > 
> > A thermal zone is software abstraction of a sensor associated with
> > properties and cooling devices if any.
> > 
> > The fact that we have thermal_zone and thermal_zone_ops mixed is
> > confusing and does not clearly identify the different components
> > entering in the thermal management process. A thermal zone appears
> > to
> > be a sensor while it is not.
> 
> Well, the majority of the operations in thermal_zone_ops don't apply
> to thermal sensors.  For example, ->set_trips(), ->get_trip_type(),
> ->get_trip_temp().
> 
In past we discussed adding thermal sensor sysfs with threshold to
notify temperature.

So sensor can have set/get_threshold() functions instead of the
set/get_trip for zones.

Like we have /sys/class/thermal_zone* we can have
/sys/class/thermal_sensor*.

Thermal sensor(s) are bound to  thermal zones. This can also include
multiple sensors in a zone and can create a virtual sensor also.

Thanks,
Srinivas

> > In order to set the scene for multiple thermal sensors aggregated
> > into
> > a single thermal zone. Rename the thermal_zone_ops to
> > thermal_sensor_ops, that will appear clearyl the thermal zone is
> > not a
> > sensor but an abstraction of one [or multiple] sensor(s).
> 
> So I'm not convinced that the renaming mentioned above is
> particularly
> clean either.
> 
> IMV the way to go would be to split the thermal sensor operations,
> like ->get_temp(), out of thermal_zone_ops.
> 
> But then it is not clear what a thermal zone with multiple sensors in
> it really means.  I guess it would require an aggregation function to
> combine the thermal sensors in it that would produce an effective
> temperature to check against the trip points.
> 
> Honestly, I don't think that setting a separate set of trips for each
> sensor in a thermal zone would make a lot of sense.


^ permalink raw reply

* Re: [PATCH] bpf: add access control for map
From: Alexei Starovoitov @ 2022-05-17 16:57 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, Network Development, bpf, LKML, Menglong Dong,
	Andrii Nakryiko, Daniel Borkmann, Alan Maguire
In-Reply-To: <20220517034107.92194-1-imagedong@tencent.com>

On Mon, May 16, 2022 at 8:44 PM <menglong8.dong@gmail.com> wrote:
>
> From: Menglong Dong <imagedong@tencent.com>
>
> Hello,
>
> I have a idea about the access control of eBPF map, could you help
> to see if it works?
>
> For now, only users with the CAP_SYS_ADMIN or CAP_BPF have the right
> to access the data in eBPF maps. So I'm thinking, are there any way
> to control the access to the maps, just like what we do to files?

The bpf objects pinned in bpffs should always be accessible
as files regardless of sysctl or cap-s.

> Therefore, we can decide who have the right to read the map and who
> can write.
>
> I think it is useful in some case. For example, I made a eBPF-based
> network statistics program, and the information is stored in an array
> map. And I want all users can read the information in the map, without
> changing the capacity of them. As the information is iunsensitive,
> normal users can read it. This make publish-consume mode possible,
> the eBPF program is publisher and the user space program is consumer.

Right. It is a choice of the bpf prog which data expose in the map.

> So this aim can be achieve, if we can control the access of maps as a
> file. There are many ways I thought, and I choosed one to implement:
>
> While pining the map, add the inode that is created to a list on the
> map. root can change the permission of the inode through the pin path.
> Therefore, we can try to find the inode corresponding to current user
> namespace in the list, and check whether user have permission to read
> or write.
>
> The steps can be:
>
> 1. create the map with BPF_F_UMODE flags, which imply that enable
>    access control in this map.
> 2. load and pin the map on /sys/fs/bpf/xxx.
> 3. change the umode of /sys/fs/bpf/xxx with 'chmod 744 /sys/fs/bpf/xxx',
>    therefor all user can read the map.

This behavior should be available by default.
Only sysctl was preventing it. It's being fixed by
the following patch. Please take a look at:
https://patchwork.kernel.org/project/netdevbpf/patch/1652788780-25520-2-git-send-email-alan.maguire@oracle.com/

Does it solve your use case?

> @@ -542,14 +557,26 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
>         if (IS_ERR(raw))
>                 return PTR_ERR(raw);
>
> -       if (type == BPF_TYPE_PROG)
> +       if (type != BPF_TYPE_MAP && !bpf_capable())
> +               return -EPERM;

obj_get already implements normal ACL style access to files.
Let's not fragment this security model with extra cap checks.

> +
> +       switch (type) {
> +       case BPF_TYPE_PROG:
>                 ret = bpf_prog_new_fd(raw);
> -       else if (type == BPF_TYPE_MAP)
> +               break;
> +       case BPF_TYPE_MAP:
> +               if (bpf_map_permission(raw, f_flags)) {
> +                       bpf_any_put(raw, type);
> +                       return -EPERM;
> +               }

bpf_obj_do_get() already does such check.

> +int bpf_map_permission(struct bpf_map *map, int flags)
> +{
> +       struct bpf_map_inode *map_inode;
> +       struct user_namespace *ns;
> +
> +       if (capable(CAP_SYS_ADMIN))
> +               return 0;
> +
> +       if (!(map->map_flags & BPF_F_UMODE))
> +               return -1;
> +
> +       rcu_read_lock();
> +       list_for_each_entry_rcu(map_inode, &map->inode_list, list) {
> +               ns = map_inode->inode->i_sb->s_user_ns;
> +               if (ns == current_user_ns())
> +                       goto found;
> +       }
> +       rcu_read_unlock();
> +       return -1;
> +found:
> +       rcu_read_unlock();
> +       return inode_permission(ns, map_inode->inode, ACC_MODE(flags));
> +}

See path_permission() in bpf_obj_do_get().

>  static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
> @@ -3720,9 +3757,6 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
>             attr->open_flags & ~BPF_OBJ_FLAG_MASK)
>                 return -EINVAL;
>
> -       if (!capable(CAP_SYS_ADMIN))
> -               return -EPERM;
> -

This part we cannot relax.
What you're trying to do is to bypass path checks
by pointing at a map with its ID only.
That contradicts to your official goal in the cover letter.

bpf_map_get_fd_by_id() has to stay cap_sys_admin only.
Exactly for the reason that bpf subsystem has file ACL style.
fd_by_id is a debug interface used by tools like bpftool and
root admin that needs to see the system as a whole.
Normal tasks/processes need to use bpffs and pin files with
correct permissions to pass maps from one process to another.
Or use FD passing kernel facilities.

^ permalink raw reply

* Re: [PATCH net-next v4 1/2] net: Add a second bind table hashed by port and address
From: Eric Dumazet @ 2022-05-17 16:59 UTC (permalink / raw)
  To: Joanne Koong; +Cc: netdev, Martin KaFai Lau, Jakub Kicinski, David Miller
In-Reply-To: <20220512205041.1208962-2-joannelkoong@gmail.com>

On Thu, May 12, 2022 at 1:51 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> We currently have one tcp bind table (bhash) which hashes by port
> number only. In the socket bind path, we check for bind conflicts by
> traversing the specified port's inet_bind2_bucket while holding the
> bucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()).
>
> In instances where there are tons of sockets hashed to the same port
> at different addresses, checking for a bind conflict is time-intensive
> and can cause softirq cpu lockups, as well as stops new tcp connections
> since __inet_inherit_port() also contests for the spinlock.
>
> This patch proposes adding a second bind table, bhash2, that hashes by
> port and ip address. Searching the bhash2 table leads to significantly
> faster conflict resolution and less time holding the spinlock.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>

Scary patch, but I could not find obvious issues with it.
Let's give it a try, thanks !

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH bpf-next v4 1/7] bpf: add bpf_skc_to_mptcp_sock_proto
From: Alexei Starovoitov @ 2022-05-17 17:00 UTC (permalink / raw)
  To: Geliang Tang
  Cc: Martin KaFai Lau, Geliang Tang, Mat Martineau, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	MPTCP Upstream, Nicolas Rybowski, Matthieu Baerts
In-Reply-To: <CA+WQbwvHidwt0ua=g67CJfmjtCow8SCvZp4Sz=2AZa+ocDxnpg@mail.gmail.com>

On Mon, May 16, 2022 at 10:26 PM Geliang Tang <geliangtang@gmail.com> wrote:
>
> Martin KaFai Lau <kafai@fb.com> 于2022年5月17日周二 09:07写道：
> >
> > On Fri, May 13, 2022 at 03:48:21PM -0700, Mat Martineau wrote:
> > [ ... ]
> >
> > > diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> > > index 8b1afd6f5cc4..2ba09de955c7 100644
> > > --- a/include/net/mptcp.h
> > > +++ b/include/net/mptcp.h
> > > @@ -284,4 +284,10 @@ static inline int mptcpv6_init(void) { return 0; }
> > >  static inline void mptcpv6_handle_mapped(struct sock *sk, bool mapped) { }
> > >  #endif
> > >
> > > +#if defined(CONFIG_MPTCP) && defined(CONFIG_BPF_SYSCALL)
> > > +struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk);
> > Can this be inline ?
>
> This function can't be inline since it uses struct mptcp_subflow_context.
>
> mptcp_subflow_context is defined in net/mptcp/protocol.h, and we don't
> want to export it to user space in net/mptcp/protocol.h.

The above function can be made static inline in a header file.
That doesn't automatically expose it to user space.

^ permalink raw reply

* RE: [PATCH 21/30] panic: Introduce the panic pre-reboot notifier list
From: Luck, Tony @ 2022-05-17 17:02 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Petr Mladek, Dinh Nguyen
  Cc: akpm@linux-foundation.org, bhe@redhat.com,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	bcm-kernel-feedback-list@broadcom.com,
	linuxppc-dev@lists.ozlabs.org, linux-alpha@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-leds@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-parisc@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-remoteproc@vger.kernel.org, linux-s390@vger.kernel.org,
	linux-tegra@vger.kernel.org, linux-um@lists.infradead.org,
	linux-xtensa@linux-xtensa.org, netdev@vger.kernel.org,
	openipmi-developer@lists.sourceforge.net, rcu@vger.kernel.org,
	sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org,
	x86@kernel.org, kernel-dev@igalia.com, kernel@gpiccoli.net,
	halves@canonical.com, fabiomirmar@gmail.com,
	alejandro.j.jimenez@oracle.com, andriy.shevchenko@linux.intel.com,
	arnd@arndb.de, bp@alien8.de, corbet@lwn.net,
	d.hatayama@jp.fujitsu.com, dave.hansen@linux.intel.com,
	dyoung@redhat.com, Tang, Feng, gregkh@linuxfoundation.org,
	mikelley@microsoft.com, hidehiro.kawai.ez@hitachi.com,
	jgross@suse.com, john.ogness@linutronix.de, keescook@chromium.org,
	luto@kernel.org, mhiramat@kernel.org, mingo@redhat.com,
	paulmck@kernel.org, peterz@infradead.org, rostedt@goodmis.org,
	senozhatsky@chromium.org, stern@rowland.harvard.edu,
	tglx@linutronix.de, vgoyal@redhat.com, vkuznets@redhat.com,
	will@kernel.org, Alex Elder, Alexander Gordeev, Anton Ivanov,
	Benjamin Herrenschmidt, Bjorn Andersson, Boris Ostrovsky,
	Chris Zankel, Christian Borntraeger, Corey Minyard, Dexuan Cui,
	H. Peter Anvin, Haiyang Zhang, Heiko Carstens, Helge Deller,
	Ivan Kokshaysky, James E.J. Bottomley, James Morse, Johannes Berg,
	K. Y. Srinivasan, Mathieu Poirier, Matt Turner,
	Mauro Carvalho Chehab, Max Filippov, Michael Ellerman,
	Paul Mackerras, Pavel Machek, Richard Weinberger, Robert Richter,
	Stefano Stabellini, Stephen Hemminger, Sven Schnelle,
	Vasily Gorbik, Wei Liu
In-Reply-To: <599b72f6-76a4-8e6d-5432-56fb1ffd7e0b@igalia.com>

> Tony / Dinh - can I just *skip* this notifier *if kdump* is set or else
> we run the code as-is? Does that make sense to you?

The "skip" option sounds like it needs some special flag associated with
an entry on the notifier chain. But there are other notifier chains ... so that
sounds messy to me.

Just all the notifiers in priority order. If any want to take different actions
based on kdump status, change the code. That seems more flexible than
an "all or nothing" approach by skipping.

-Tony

^ permalink raw reply

* Re: [PATCH net-next] net: ifdefy the wireless pointers in struct net_device
From: Jakub Kicinski @ 2022-05-17 17:32 UTC (permalink / raw)
  To: Kalle Valo
  Cc: davem, netdev, edumazet, pabeni, johannes, alex.aring, stefan,
	mareklindner, sw, a, sven, linux-wireless, linux-wpan
In-Reply-To: <87zgjgwza1.fsf@kernel.org>

On Tue, 17 May 2022 07:36:54 +0300 Kalle Valo wrote:
> > +void cfg80211_unregister_netdevice(struct net_device *dev)
> > +{
> > +	cfg80211_unregister_wdev(dev->ieee80211_ptr);
> > +}
> > +EXPORT_SYMBOL(cfg80211_unregister_netdevice);  
> 
> Why moving this to a proper function? Just curious, I couldn't figure it
> out.

Sorry, I went too far with the "explain what not why".
The header is included in places which get built without 
WIRELESS while the C source is not.

^ permalink raw reply

* Re: [PATCHv2 net] bonding: fix missed rcu protection
From: Jonathan Toppins @ 2022-05-17 17:32 UTC (permalink / raw)
  To: liuhangbin
  Cc: andy, davem, dsahern, eric.dumazet, j.vosburgh, jtoppins, kuba,
	netdev, pabeni, syzbot+92beb3d46aab498710fa, vfalico,
	vladimir.oltean, Eric Dumazet, linux-kernel
In-Reply-To: <20220517082312.805824-1-liuhangbin@gmail.com>

Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
---
RESEND, list still didn't receive my last version

The diffstat is slightly larger but IMO a slightly more readable version.
When I was reading v2 I found myself jumping around.
I only compile tested it, so YMMV.

If this amount of change is too much v2 from Hangbin looks correct to
me.

 drivers/net/bonding/bond_main.c | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 38e152548126..f9d27b63c454 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5591,23 +5591,32 @@ static int bond_ethtool_get_ts_info(struct net_device *bond_dev,
 	const struct ethtool_ops *ops;
 	struct net_device *real_dev;
 	struct phy_device *phydev;
+	int ret = 0;
 
+	rcu_read_lock();
 	real_dev = bond_option_active_slave_get_rcu(bond);
-	if (real_dev) {
-		ops = real_dev->ethtool_ops;
-		phydev = real_dev->phydev;
-
-		if (phy_has_tsinfo(phydev)) {
-			return phy_ts_info(phydev, info);
-		} else if (ops->get_ts_info) {
-			return ops->get_ts_info(real_dev, info);
-		}
-	}
+	if (real_dev)
+		dev_hold(real_dev);
+	rcu_read_unlock();
+
+	if (!real_dev)
+		goto software;
 
+	ops = real_dev->ethtool_ops;
+	phydev = real_dev->phydev;
+
+	if (phy_has_tsinfo(phydev))
+		ret = phy_ts_info(phydev, info);
+	else if (ops->get_ts_info)
+		ret = ops->get_ts_info(real_dev, info);
+
+	dev_put(real_dev);
+	return ret;
+
+software:
 	info->so_timestamping = SOF_TIMESTAMPING_RX_SOFTWARE |
 				SOF_TIMESTAMPING_SOFTWARE;
 	info->phc_index = -1;
-
 	return 0;
 }
 
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH net-next] net: ifdefy the wireless pointers in struct net_device
From: Jakub Kicinski @ 2022-05-17 17:37 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Florian Fainelli, davem, netdev, edumazet, pabeni, alex.aring,
	stefan, mareklindner, sw, a, sven, linux-wireless, linux-wpan
In-Reply-To: <74bdbec0580ed05d0f18533eae9af50bc0a4a0ef.camel@sipsolutions.net>

On Tue, 17 May 2022 09:48:24 +0200 Johannes Berg wrote:
> On Mon, 2022-05-16 at 19:12 -0700, Florian Fainelli wrote:
> > 
> > On 5/16/2022 2:56 PM, Jakub Kicinski wrote:  
> > > Most protocol-specific pointers in struct net_device are under
> > > a respective ifdef. Wireless is the notable exception. Since
> > > there's a sizable number of custom-built kernels for datacenter
> > > workloads which don't build wireless it seems reasonable to
> > > ifdefy those pointers as well.
> > > 
> > > While at it move IPv4 and IPv6 pointers up, those are special
> > > for obvious reasons.
> > > 
> > > Signed-off-by: Jakub Kicinski <kuba@kernel.org>  
> > 
> > Could not we move to an union of pointers in the future since in many 
> > cases a network device can only have one of those pointers at any given 
> > time?  
> 
> Then at the very least we'd need some kind of type that we can assign to
> disambiguate, because today e.g. we have a netdev notifier (and other
> code) that could get a non-wireless netdev and check like this:
> 
> static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
>                                          unsigned long state, void *ptr)
> {
>         struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>         struct wireless_dev *wdev = dev->ieee80211_ptr;
> [...]
>         if (!wdev)
>                 return NOTIFY_DONE;

Can we use enum netdev_ml_priv_type netdev::ml_priv and
netdev::ml_priv_type for this?

^ permalink raw reply

* Re: [PATCH net-next] net: ifdefy the wireless pointers in struct net_device
From: Jakub Kicinski @ 2022-05-17 17:44 UTC (permalink / raw)
  To: Johannes Berg
  Cc: davem, netdev, edumazet, pabeni, alex.aring, stefan, mareklindner,
	sw, a, sven, linux-wireless, linux-wpan
In-Reply-To: <8b9d18e351cc58aed65c4a4c7f12f167984ee088.camel@sipsolutions.net>

On Tue, 17 May 2022 09:51:31 +0200 Johannes Berg wrote:
> On Mon, 2022-05-16 at 14:56 -0700, Jakub Kicinski wrote:
> > 
> > +#if IS_ENABLED(CONFIG_WIRELESS)
> >  	struct wireless_dev	*ieee80211_ptr;
> > +#endif  
> 
> Technically, you should be able to use CONFIG_CFG80211 here, but in
> practice I'd really hope nobody enables WIRELESS without CFG80211 :)

ack

> > +++ b/include/net/cfg80211.h
> > @@ -8004,10 +8004,7 @@ int cfg80211_register_netdevice(struct net_device *dev);
> >   *
> >   * Requires the RTNL and wiphy mutex to be held.
> >   */
> > -static inline void cfg80211_unregister_netdevice(struct net_device *dev)
> > -{
> > -	cfg80211_unregister_wdev(dev->ieee80211_ptr);
> > -}
> > +void cfg80211_unregister_netdevice(struct net_device *dev);  
> 
> Exported functions aren't free either - I think in this case I'd
> (slightly) prefer the extra ifdef.

fine

> Anyway, we can do this, but I also like Florian's suggestion about the
> union, and sent an attempt at a disambiguation patch there.

Would you be willing to do that as a follow up? Are you talking about
wifi only or all the proto pointers?

As a netdev maintainer I'd like to reduce the divergence in whether 
the proto pointers are ifdef'd or not.

^ permalink raw reply

* [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2022-05-17
From: Tony Nguyen @ 2022-05-17 17:45 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet; +Cc: Tony Nguyen, netdev, richardcochran

This series contains updates to ice driver only.

Arkadiusz prevents writing of timestamps when rings are being
configured to resolve null pointer dereference.

Paul changes a delayed call to baseline statistics to occur immediately
which was causing misreporting of statistics due to the delay.

Michal fixes incorrect restoration of interrupt moderation settings.

The following are changes since commit edf410cb74dc612fd47ef5be319c5a0bcd6e6ccd:
  net: vmxnet3: fix possible NULL pointer dereference in vmxnet3_rq_cleanup()
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 100GbE

Arkadiusz Kubalewski (1):
  ice: fix crash when writing timestamp on RX rings

Michal Wilczynski (1):
  ice: Fix interrupt moderation settings getting cleared

Paul Greenwalt (1):
  ice: fix possible under reporting of ethtool Tx and Rx statistics

 drivers/net/ethernet/intel/ice/ice_lib.c  | 16 ++++++++--------
 drivers/net/ethernet/intel/ice/ice_main.c |  7 ++++---
 drivers/net/ethernet/intel/ice/ice_ptp.c  | 19 +++++++++++++++----
 drivers/net/ethernet/intel/ice/ice_txrx.h | 11 ++++++++---
 4 files changed, 35 insertions(+), 18 deletions(-)

-- 
2.35.1


^ permalink raw reply

* [PATCH net 2/3] ice: fix possible under reporting of ethtool Tx and Rx statistics
From: Tony Nguyen @ 2022-05-17 17:45 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Paul Greenwalt, netdev, anthony.l.nguyen, richardcochran,
	Gurucharan
In-Reply-To: <20220517174547.1757401-1-anthony.l.nguyen@intel.com>

From: Paul Greenwalt <paul.greenwalt@intel.com>

The hardware statistics counters are not cleared during resets so the
drivers first access is to initialize the baseline and then subsequent
reads are for reporting the counters. The statistics counters are read
during the watchdog subtask when the interface is up. If the baseline
is not initialized before the interface is up, then there can be a brief
window in which some traffic can be transmitted/received before the
initial baseline reading takes place.

Directly initialize ethtool statistics in driver open so the baseline will
be initialized when the interface is up, and any dropped packets
incremented before the interface is up won't be reported.

Fixes: 28dc1b86f8ea9 ("ice: ignore dropped packets during init")
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 949669fed7d6..963a5f40e071 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6172,9 +6172,10 @@ static int ice_up_complete(struct ice_vsi *vsi)
 			ice_ptp_link_change(pf, pf->hw.pf_id, true);
 	}

-	/* clear this now, and the first stats read will be used as baseline */
-	vsi->stat_offsets_loaded = false;
-
+	/* Perform an initial read of the statistics registers now to
+	 * set the baseline so counters are ready when interface is up
+	 */
+	ice_update_eth_stats(vsi);
 	ice_service_task_schedule(pf);

 	return 0;
-- 
2.35.1

^ permalink raw reply related

* [PATCH net 1/3] ice: fix crash when writing timestamp on RX rings
From: Tony Nguyen @ 2022-05-17 17:45 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Arkadiusz Kubalewski, netdev, anthony.l.nguyen, richardcochran,
	Michal Schmidt, Dave Cain, Gurucharan
In-Reply-To: <20220517174547.1757401-1-anthony.l.nguyen@intel.com>

From: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>

Do not allow to write timestamps on RX rings if PF is being configured.
When PF is being configured RX rings can be freed or rebuilt. If at the
same time timestamps are updated, the kernel will crash by dereferencing
null RX ring pointer.

PID: 1449   TASK: ff187d28ed658040  CPU: 34  COMMAND: "ice-ptp-0000:51"
 #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be
 #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d
 #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd
 #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54
 #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4
 #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c
 #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4
 #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e
    [exception RIP: ice_ptp_update_cached_phctime+91]
    RIP: ffffffffc076db8b  RSP: ff1966a94a713e98  RFLAGS: 00010246
    RAX: 16e3db9c6b7ccae4  RBX: ff187d269dd3c180  RCX: ff187d269cd4d018
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ff187d269cfcc644   R8: ff187d339b9641b0   R9: 0000000000000000
    R10: 0000000000000002  R11: 0000000000000000  R12: ff187d269cfcc648
    R13: ffffffff9f128784  R14: ffffffff9d101b70  R15: ff187d269cfcc640
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice]
 #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b
 #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d
 #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f

Fixes: 77a781155a65 ("ice: enable receive hardware timestamping")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Dave Cain <dcain@redhat.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ptp.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
index da025c204577..662947c882e8 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
@@ -500,12 +500,19 @@ ice_ptp_read_src_clk_reg(struct ice_pf *pf, struct ptp_system_timestamp *sts)
  * This function must be called periodically to ensure that the cached value
  * is never more than 2 seconds old. It must also be called whenever the PHC
  * time has been changed.
+ *
+ * Return:
+ * * 0 - OK, successfully updated
+ * * -EAGAIN - PF was busy, need to reschedule the update
  */
-static void ice_ptp_update_cached_phctime(struct ice_pf *pf)
+static int ice_ptp_update_cached_phctime(struct ice_pf *pf)
 {
 	u64 systime;
 	int i;
 
+	if (test_and_set_bit(ICE_CFG_BUSY, pf->state))
+		return -EAGAIN;
+
 	/* Read the current PHC time */
 	systime = ice_ptp_read_src_clk_reg(pf, NULL);
 
@@ -528,6 +535,9 @@ static void ice_ptp_update_cached_phctime(struct ice_pf *pf)
 			WRITE_ONCE(vsi->rx_rings[j]->cached_phctime, systime);
 		}
 	}
+	clear_bit(ICE_CFG_BUSY, pf->state);
+
+	return 0;
 }
 
 /**
@@ -2330,17 +2340,18 @@ static void ice_ptp_periodic_work(struct kthread_work *work)
 {
 	struct ice_ptp *ptp = container_of(work, struct ice_ptp, work.work);
 	struct ice_pf *pf = container_of(ptp, struct ice_pf, ptp);
+	int err;
 
 	if (!test_bit(ICE_FLAG_PTP, pf->flags))
 		return;
 
-	ice_ptp_update_cached_phctime(pf);
+	err = ice_ptp_update_cached_phctime(pf);
 
 	ice_ptp_tx_tstamp_cleanup(&pf->hw, &pf->ptp.port.tx);
 
-	/* Run twice a second */
+	/* Run twice a second or reschedule if phc update failed */
 	kthread_queue_delayed_work(ptp->kworker, &ptp->work,
-				   msecs_to_jiffies(500));
+				   msecs_to_jiffies(err ? 10 : 500));
 }
 
 /**
-- 
2.35.1


^ permalink raw reply related

* [PATCH net 3/3] ice: Fix interrupt moderation settings getting cleared
From: Tony Nguyen @ 2022-05-17 17:45 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: Michal Wilczynski, netdev, anthony.l.nguyen, richardcochran,
	Gurucharan
In-Reply-To: <20220517174547.1757401-1-anthony.l.nguyen@intel.com>

From: Michal Wilczynski <michal.wilczynski@intel.com>

Adaptive-rx and Adaptive-tx are interrupt moderation settings
that can be enabled/disabled using ethtool:
ethtool -C ethX adaptive-rx on/off adaptive-tx on/off

Unfortunately those settings are getting cleared after
changing number of queues, or in ethtool world 'channels':
ethtool -L ethX rx 1 tx 1

Clearing was happening due to introduction of bit fields
in ice_ring_container struct. This way only itr_setting
bits were rebuilt during ice_vsi_rebuild_set_coalesce().

Introduce an anonymous struct of bitfields and create a
union to refer to them as a single variable.
This way variable can be easily saved and restored.

Fixes: 61dc79ced7aa ("ice: Restore interrupt throttle settings after VSI rebuild")
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c  | 16 ++++++++--------
 drivers/net/ethernet/intel/ice/ice_txrx.h | 11 ++++++++---
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 6d19c58ccacd..454e01ae09b9 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -3043,8 +3043,8 @@ ice_vsi_rebuild_get_coalesce(struct ice_vsi *vsi,
 	ice_for_each_q_vector(vsi, i) {
 		struct ice_q_vector *q_vector = vsi->q_vectors[i];
 
-		coalesce[i].itr_tx = q_vector->tx.itr_setting;
-		coalesce[i].itr_rx = q_vector->rx.itr_setting;
+		coalesce[i].itr_tx = q_vector->tx.itr_settings;
+		coalesce[i].itr_rx = q_vector->rx.itr_settings;
 		coalesce[i].intrl = q_vector->intrl;
 
 		if (i < vsi->num_txq)
@@ -3100,21 +3100,21 @@ ice_vsi_rebuild_set_coalesce(struct ice_vsi *vsi,
 		 */
 		if (i < vsi->alloc_rxq && coalesce[i].rx_valid) {
 			rc = &vsi->q_vectors[i]->rx;
-			rc->itr_setting = coalesce[i].itr_rx;
+			rc->itr_settings = coalesce[i].itr_rx;
 			ice_write_itr(rc, rc->itr_setting);
 		} else if (i < vsi->alloc_rxq) {
 			rc = &vsi->q_vectors[i]->rx;
-			rc->itr_setting = coalesce[0].itr_rx;
+			rc->itr_settings = coalesce[0].itr_rx;
 			ice_write_itr(rc, rc->itr_setting);
 		}
 
 		if (i < vsi->alloc_txq && coalesce[i].tx_valid) {
 			rc = &vsi->q_vectors[i]->tx;
-			rc->itr_setting = coalesce[i].itr_tx;
+			rc->itr_settings = coalesce[i].itr_tx;
 			ice_write_itr(rc, rc->itr_setting);
 		} else if (i < vsi->alloc_txq) {
 			rc = &vsi->q_vectors[i]->tx;
-			rc->itr_setting = coalesce[0].itr_tx;
+			rc->itr_settings = coalesce[0].itr_tx;
 			ice_write_itr(rc, rc->itr_setting);
 		}
 
@@ -3128,12 +3128,12 @@ ice_vsi_rebuild_set_coalesce(struct ice_vsi *vsi,
 	for (; i < vsi->num_q_vectors; i++) {
 		/* transmit */
 		rc = &vsi->q_vectors[i]->tx;
-		rc->itr_setting = coalesce[0].itr_tx;
+		rc->itr_settings = coalesce[0].itr_tx;
 		ice_write_itr(rc, rc->itr_setting);
 
 		/* receive */
 		rc = &vsi->q_vectors[i]->rx;
-		rc->itr_setting = coalesce[0].itr_rx;
+		rc->itr_settings = coalesce[0].itr_rx;
 		ice_write_itr(rc, rc->itr_setting);
 
 		vsi->q_vectors[i]->intrl = coalesce[0].intrl;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index cead3eb149bd..ffb3f6a589da 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -384,9 +384,14 @@ struct ice_ring_container {
 	/* this matches the maximum number of ITR bits, but in usec
 	 * values, so it is shifted left one bit (bit zero is ignored)
 	 */
-	u16 itr_setting:13;
-	u16 itr_reserved:2;
-	u16 itr_mode:1;
+	union {
+		struct {
+			u16 itr_setting:13;
+			u16 itr_reserved:2;
+			u16 itr_mode:1;
+		};
+		u16 itr_settings;
+	};
 	enum ice_container_type type;
 };
 
-- 
2.35.1


^ permalink raw reply related

* Re: [PATCH v3 3/4] can: skb:: move can_dropped_invalid_skb and can_skb_headroom_valid to skb.c
From: Max Staudt @ 2022-05-17 17:52 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, Vincent MAILHOL, linux-can, linux-kernel,
	netdev
In-Reply-To: <e51a64cd-1130-9c65-2bde-483c63f6aa10@hartkopp.net>

On Tue, 17 May 2022 17:50:03 +0200
Oliver Hartkopp <socketcan@hartkopp.net> wrote:

> On 17.05.22 17:38, Max Staudt wrote:
> > I'm a bit lost - what would CAN_DEV_SW do?  
> 
> It should be just *one* enabler of building can-dev-ko
> 
> > If it enables can_dropped_invalid_skb(), then the HW drivers would
> > also need to depend on CAN_DEV_SW directly or indirectly, or am I
> > missing something?  
> 
> And CAN_DEV is another enabler that would build the skb stuff from 
> CAN_DEV_SW too (but also the netlink stuff).
> 
> That's what I meant with "some Makefile magic" which is then building 
> the can-dev.ko with the required features depending on CAN_DEV_SW, 
> CAN_DEV, CAN_DEV_RX_OFFLOAD, CAN_CALC_BITTIMING, etc

Ah, I see!
Sounds good :)


Max

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox