Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH wpan-next v3 04/11] net: mac802154: Follow the count of ongoing transmissions
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

In order to create a synchronous API for MLME command purposes, we need
to be able to track the end of the ongoing transmissions. Let's
introduce an atomic variable which is incremented when a transmission
starts and decremented when relevant so that we know at any moment
whether there is an ongoing transmission.

The counter gets decremented in the following situations:
- The operation is asynchronous and there was a failure during the
  offloading process.
- The operation is synchronous and the synchronous operation failed.
- The operation finished, either successfully or not.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 include/net/cfg802154.h | 3 +++
 net/mac802154/tx.c      | 3 +++
 net/mac802154/util.c    | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index 85f9e8417688..473ebcb9b155 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -214,6 +214,9 @@ struct wpan_phy {
 	/* the network namespace this phy lives in currently */
 	possible_net_t _net;
 
+	/* Transmission monitoring */
+	atomic_t ongoing_txs;
+
 	char priv[] __aligned(NETDEV_ALIGN);
 };
 
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 4a46ce8d2ac8..33f64ecd96c7 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -44,6 +44,7 @@ void ieee802154_xmit_sync_worker(struct work_struct *work)
 err_tx:
 	/* Restart the netif queue on each sub_if_data object. */
 	ieee802154_wake_queue(&local->hw);
+	atomic_dec(&local->phy->ongoing_txs);
 	kfree_skb(skb);
 	netdev_dbg(dev, "transmission failed\n");
 }
@@ -75,6 +76,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 
 	/* Stop the netif queue on each sub_if_data object. */
 	ieee802154_stop_queue(&local->hw);
+	atomic_inc(&local->phy->ongoing_txs);
 
 	/* Drivers should preferably implement the async callback. In some rare
 	 * cases they only provide a sync callback which we will use as a
@@ -98,6 +100,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 
 err_wake_netif_queue:
 	ieee802154_wake_queue(&local->hw);
+	atomic_dec(&local->phy->ongoing_txs);
 err_free_skb:
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
diff --git a/net/mac802154/util.c b/net/mac802154/util.c
index 9f024d85563b..76dc663e2af4 100644
--- a/net/mac802154/util.c
+++ b/net/mac802154/util.c
@@ -88,6 +88,7 @@ void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb,
 	}
 
 	dev_consume_skb_any(skb);
+	atomic_dec(&hw->phy->ongoing_txs);
 }
 EXPORT_SYMBOL(ieee802154_xmit_complete);
 
@@ -99,6 +100,7 @@ void ieee802154_xmit_error(struct ieee802154_hw *hw, struct sk_buff *skb,
 	local->tx_result = reason;
 	ieee802154_wake_queue(hw);
 	dev_kfree_skb_any(skb);
+	atomic_dec(&hw->phy->ongoing_txs);
 }
 EXPORT_SYMBOL(ieee802154_xmit_error);
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH wpan-next v3 03/11] net: mac802154: Enhance the error path in the main tx helper
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

Before adding more logic in the error path, let's move the wake queue
call there, rename the default label and create an additional one.

There is no functional change.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/tx.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index a01689ddd547..4a46ce8d2ac8 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -65,7 +65,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 				consume_skb(skb);
 				skb = nskb;
 			} else {
-				goto err_tx;
+				goto err_free_skb;
 			}
 		}
 
@@ -84,10 +84,8 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 		unsigned int len = skb->len;
 
 		ret = drv_xmit_async(local, skb);
-		if (ret) {
-			ieee802154_wake_queue(&local->hw);
-			goto err_tx;
-		}
+		if (ret)
+			goto err_wake_netif_queue;
 
 		dev->stats.tx_packets++;
 		dev->stats.tx_bytes += len;
@@ -98,7 +96,9 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 
 	return NETDEV_TX_OK;
 
-err_tx:
+err_wake_netif_queue:
+	ieee802154_wake_queue(&local->hw);
+err_free_skb:
 	kfree_skb(skb);
 	return NETDEV_TX_OK;
 }
-- 
2.34.1


^ permalink raw reply related

* [PATCH wpan-next v3 02/11] net: mac802154: Rename the main tx_work struct
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

This entry is dedicated to synchronous transmissions done by drivers
without async hook. Make this clearer that this is not a work that any
driver can use by at least prefixing it with "sync_". While at it, let's
enhance the comment explaining why we choose one or the other.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/ieee802154_i.h | 2 +-
 net/mac802154/main.c         | 2 +-
 net/mac802154/tx.c           | 9 ++++++---
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index d7632c6d225f..a8b7b9049f14 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -55,7 +55,7 @@ struct ieee802154_local {
 	struct sk_buff_head skb_queue;
 
 	struct sk_buff *tx_skb;
-	struct work_struct tx_work;
+	struct work_struct sync_tx_work;
 	/* A negative Linux error code or a null/positive MLME error status */
 	int tx_result;
 };
diff --git a/net/mac802154/main.c b/net/mac802154/main.c
index 392771bba9dd..40fab08df24b 100644
--- a/net/mac802154/main.c
+++ b/net/mac802154/main.c
@@ -95,7 +95,7 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops)
 
 	skb_queue_head_init(&local->skb_queue);
 
-	INIT_WORK(&local->tx_work, ieee802154_xmit_sync_worker);
+	INIT_WORK(&local->sync_tx_work, ieee802154_xmit_sync_worker);
 
 	/* init supported flags with 802.15.4 default ranges */
 	phy->supported.max_minbe = 8;
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 97df5985b830..a01689ddd547 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -25,7 +25,7 @@
 void ieee802154_xmit_sync_worker(struct work_struct *work)
 {
 	struct ieee802154_local *local =
-		container_of(work, struct ieee802154_local, tx_work);
+		container_of(work, struct ieee802154_local, sync_tx_work);
 	struct sk_buff *skb = local->tx_skb;
 	struct net_device *dev = skb->dev;
 	int res;
@@ -76,7 +76,10 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 	/* Stop the netif queue on each sub_if_data object. */
 	ieee802154_stop_queue(&local->hw);
 
-	/* async is priority, otherwise sync is fallback */
+	/* Drivers should preferably implement the async callback. In some rare
+	 * cases they only provide a sync callback which we will use as a
+	 * fallback.
+	 */
 	if (local->ops->xmit_async) {
 		unsigned int len = skb->len;
 
@@ -90,7 +93,7 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb)
 		dev->stats.tx_bytes += len;
 	} else {
 		local->tx_skb = skb;
-		queue_work(local->workqueue, &local->tx_work);
+		queue_work(local->workqueue, &local->sync_tx_work);
 	}
 
 	return NETDEV_TX_OK;
-- 
2.34.1


^ permalink raw reply related

* [PATCH wpan-next v3 00/11] ieee802154: Synchronous Tx support
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal

Hello,

This series brings support for that famous synchronous Tx API for MLME
commands.

MLME commands will be used during scan operations. In this situation,
we need to be sure that all transfers finished and that no transfer
will be queued for a short moment.

Cheers,
Miquèl

Changes in v3:
* Tested with lockdep enabled, a more aggressive preemption level and
  the sleeping while atomic warnings enabled.
* Changed the hold/release queue mutex into a spinlock.
* Split the mlme_tx function into three, one to hold the queue, then
  another part that does takes the rtnl and has the real content, and a
  last helper to release the queue.
* Fixed the warning condition in the slow path.
* Used an unsigned long and test/set_bit helpers to follow the queue
  state instead of an atomic_t.

Changes in v2:
* Updated the main tx function error path.
* Added a missing atomic_dec_at_test() call on the hold counter.
* Always called (upon a certain condition) the queue wakeup helper from
  the release queue helper (and similarly in the hold helper) and
  squashed two existing patches in it to simplify the series.
* Introduced a mutex to serialize accesses to the increment/decrement of
  the hold counter and the wake up call.
* Added a warning in case an MLME Tx gets triggered while the device was
  stopped.
* Used the rtnl to ensure the device cannot be stopped while an MLME
  transmission is ongoing.

Changes in v1 since this series got extracted from a bigger change:
* Introduced a new atomic variable to know when the queue is actually
  stopped. So far we only had an atomic to know when the queue was held
  (indicates a transitioning state towards a stopped queue only) and
  another atomic indicating if a transfer was still ongoing at this
  point (used by the wait logic as a condition to wake up).

Miquel Raynal (11):
  net: mac802154: Rename the synchronous xmit worker
  net: mac802154: Rename the main tx_work struct
  net: mac802154: Enhance the error path in the main tx helper
  net: mac802154: Follow the count of ongoing transmissions
  net: mac802154: Bring the ability to hold the transmit queue
  net: mac802154: Create a hot tx path
  net: mac802154: Introduce a helper to disable the queue
  net: mac802154: Introduce a tx queue flushing mechanism
  net: mac802154: Introduce a synchronous API for MLME commands
  net: mac802154: Add a warning in the hot path
  net: mac802154: Add a warning in the slow path

 include/net/cfg802154.h      |   9 ++-
 include/net/mac802154.h      |  27 -------
 net/ieee802154/core.c        |   3 +
 net/mac802154/cfg.c          |   4 +-
 net/mac802154/ieee802154_i.h |  37 ++++++++-
 net/mac802154/main.c         |   2 +-
 net/mac802154/tx.c           | 147 +++++++++++++++++++++++++++++++----
 net/mac802154/util.c         |  71 +++++++++++++++--
 8 files changed, 246 insertions(+), 54 deletions(-)

-- 
2.34.1

^ permalink raw reply

* [PATCH wpan-next v3 01/11] net: mac802154: Rename the synchronous xmit worker
From: Miquel Raynal @ 2022-05-17 16:34 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, linux-wpan
  Cc: David Girault, Romuald Despres, Frederic Blain, Nicolas Schodet,
	Thomas Petazzoni, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, Miquel Raynal
In-Reply-To: <20220517163450.240299-1-miquel.raynal@bootlin.com>

There are currently two driver hooks: one is synchronous, the other is
not. We cannot rely on driver implementations to provide a synchronous
API (which is related to the bus medium more than a wish to have a
synchronized implementation) so we are going to introduce a sync API
above any kind of driver transmit function. In order to clarify what
this worker is for (synchronous driver implementation), let's rename it
so that people don't get bothered by the fact that their driver does not
make use of the "xmit worker" which is a too generic name.

Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
 net/mac802154/ieee802154_i.h | 2 +-
 net/mac802154/main.c         | 2 +-
 net/mac802154/tx.c           | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/mac802154/ieee802154_i.h b/net/mac802154/ieee802154_i.h
index 1381e6a5e180..d7632c6d225f 100644
--- a/net/mac802154/ieee802154_i.h
+++ b/net/mac802154/ieee802154_i.h
@@ -123,7 +123,7 @@ ieee802154_sdata_running(struct ieee802154_sub_if_data *sdata)
 extern struct ieee802154_mlme_ops mac802154_mlme_wpan;
 
 void ieee802154_rx(struct ieee802154_local *local, struct sk_buff *skb);
-void ieee802154_xmit_worker(struct work_struct *work);
+void ieee802154_xmit_sync_worker(struct work_struct *work);
 netdev_tx_t
 ieee802154_monitor_start_xmit(struct sk_buff *skb, struct net_device *dev);
 netdev_tx_t
diff --git a/net/mac802154/main.c b/net/mac802154/main.c
index bd7bdb1219dd..392771bba9dd 100644
--- a/net/mac802154/main.c
+++ b/net/mac802154/main.c
@@ -95,7 +95,7 @@ ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops)
 
 	skb_queue_head_init(&local->skb_queue);
 
-	INIT_WORK(&local->tx_work, ieee802154_xmit_worker);
+	INIT_WORK(&local->tx_work, ieee802154_xmit_sync_worker);
 
 	/* init supported flags with 802.15.4 default ranges */
 	phy->supported.max_minbe = 8;
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index c829e4a75325..97df5985b830 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -22,7 +22,7 @@
 #include "ieee802154_i.h"
 #include "driver-ops.h"
 
-void ieee802154_xmit_worker(struct work_struct *work)
+void ieee802154_xmit_sync_worker(struct work_struct *work)
 {
 	struct ieee802154_local *local =
 		container_of(work, struct ieee802154_local, tx_work);
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v2 4/4] powerpc/52xx: Convert to use fwnode API
From: Andy Shevchenko @ 2022-05-17 16:30 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Wolfram Sang, Marc Kleine-Budde, Damien Le Moal, Mark Brown,
	chris.packham, Sergey Shtylyov, David S. Miller, Jakub Kicinski,
	Greg Kroah-Hartman, Jiri Slaby, linuxppc-dev, linux-kernel,
	linux-ide, linux-i2c, linux-can, netdev, linux-spi, linux-serial,
	Benjamin Herrenschmidt, Paul Mackerras, Anatolij Gustschin,
	Wolfgang Grandegger, Eric Dumazet, Paolo Abeni, Pantelis Antoniou
In-Reply-To: <874k1p6oa7.fsf@mpe.ellerman.id.au>

On Tue, May 17, 2022 at 09:38:56AM +1000, Michael Ellerman wrote:
> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> > On Mon, May 16, 2022 at 05:05:12PM +0300, Andy Shevchenko wrote:
> >> On Mon, May 16, 2022 at 11:48:05PM +1000, Michael Ellerman wrote:
> >> > Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> >> > > We may convert the GPT driver to use fwnode API for the sake
> >> > > of consistency of the used APIs inside the driver.
> >> > 
> >> > I'm not sure about this one.
> >> > 
> >> > It's more consistent to use fwnode in this driver, but it's very
> >> > inconsistent with the rest of the powerpc code. We have basically no
> >> > uses of the fwnode APIs at the moment.
> >> 
> >> Fair point!
> >> 
> >> > It seems like a pretty straight-forward conversion, but there could
> >> > easily be a bug in there, I don't have any way to test it. Do you?
> >> 
> >> Nope, only compile testing. The important part of this series is to
> >> clean up of_node from GPIO library, so since here it's a user of
> >> it I want to do that. This patch is just ad-hoc conversion that I
> >> noticed is possible. But there is no any requirement to do so.
> >> 
> >> Lemme drop this from v3.
> >
> > I just realize that there is no point to send a v3. You can just apply
> > first 3 patches. Or is your comment against entire series?
> 
> No, my comment is just about this patch.
> 
> I don't mind converting to new APIs when it's blocking some other
> cleanup. But given the age of this code I think it's probably better to
> just leave the rest of it as-is, unless someone volunteers to test it.
> 
> So yeah I'll just take patches 1-3 of this v2 series, no need to resend.

Thanks!

One note though, the fwnode_for_each_parent_node() is not yet available in
upstream, but will be after v5.19-rc1. It means the patch 3 can't be applied
without that. That's why LKP complained on patch 4 in this series.

That said, the easiest way is to postpone it till v5.19-rc1 is out.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH 1/2] arm64: dts: armada-3720-turris-mox: Correct reg property for mdio devices
From: Marek Behún @ 2022-05-17 16:21 UTC (permalink / raw)
  To: Chris Packham
  Cc: davem, edumazet, kuba, pabeni, robh+dt, krzysztof.kozlowski+dt,
	andrew, gregory.clement, sebastian.hesselbarth, netdev,
	devicetree, linux-kernel, linux-arm-kernel
In-Reply-To: <20220516224801.1656752-2-chris.packham@alliedtelesis.co.nz>

On Tue, 17 May 2022 10:48:00 +1200
Chris Packham <chris.packham@alliedtelesis.co.nz> wrote:

> MDIO devices have #address-cells = <1>, #size-cells = <0>. Now that we
> have a schema enforcing this for marvell,orion-mdio we can see that the
> turris-mox has a unnecessary 2nd cell for the switch nodes reg property
> of it's switch devices. Remove the unnecessary 2nd cell from the
> switches reg property.
> 
> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>

Reviewed-by: Marek Behún <kabel@kernel.org>

^ permalink raw reply

* Re: [PATCH net-next v3 08/10] ptp: ocp: fix PPS source selector reporting
From: Jakub Kicinski @ 2022-05-17 16:19 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: netdev, richardcochran, davem, pabeni, edumazet, kernel-team
In-Reply-To: <20220517153942.6ze5kj7hoj7z4caq@bsd-mbp.dhcp.thefacebook.com>

On Tue, 17 May 2022 08:39:42 -0700 Jonathan Lemon wrote:
> On Mon, May 16, 2022 at 06:54:28PM -0700, Jonathan Lemon wrote:
> > On Mon, May 16, 2022 at 05:43:17PM -0700, Jakub Kicinski wrote:  
> > > This one and patch 10 need Fixes tags  
> > 
> > This is for a debugfs entry.  I disagree that a Fixes: tag
> > is needed here.
> > 
> > I'll do it for patch 10 if you insist, but this only happens
> > if ptp_clock_register() fails, which not normally seen.  
> 
> Actually, patch 10 would be:
> 
> Fixes: c205d53c4923 ("ptp: ocp: Add firmware capability bits for feature gating")
> 
> Which is only in 5.18-rcX at this point.
> 
> Do we need a fixes tags for code which hasn't made it into a
> full release release yet?

Yup, having the Fixes tag makes it obvious to the maintainer that 
the tree selection is correct and helps backporters figure out if 
they need to worry that the patch didn't apply.

^ permalink raw reply

* Proposal
From: George Vemados @ 2022-05-17 15:51 UTC (permalink / raw)
  To: netdev

Can you represent our company in your region?

Regards.

^ permalink raw reply

* Re: [PATCH bpf-next v4 3/6] bpf: Move is_valid_bpf_tramp_flags() to the public trampoline code
From: Alexei Starovoitov @ 2022-05-17 15:53 UTC (permalink / raw)
  To: Xu Kuohai
  Cc: bpf, linux-arm-kernel, linux-kernel, netdev, linux-kselftest,
	Catalin Marinas, Will Deacon, Steven Rostedt, Ingo Molnar,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S . Miller, Hideaki YOSHIFUJI, David Ahern,
	Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, hpa,
	Shuah Khan, Jakub Kicinski, Jesper Dangaard Brouer, Mark Rutland,
	Pasha Tatashin, Ard Biesheuvel, Daniel Kiss, Steven Price,
	Sudeep Holla, Marc Zyngier, Peter Collingbourne, Mark Brown,
	Delyan Kratunov, Kumar Kartikeya Dwivedi
In-Reply-To: <20220517071838.3366093-4-xukuohai@huawei.com>

On Tue, May 17, 2022 at 03:18:35AM -0400, Xu Kuohai wrote:
>  
> +static bool is_valid_bpf_tramp_flags(unsigned int flags)
> +{
> +	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
> +	    (flags & BPF_TRAMP_F_SKIP_FRAME))
> +		return false;
> +
> +	/* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops,
> +	 * and it must be used alone.
> +	 */
> +	if ((flags & BPF_TRAMP_F_RET_FENTRY_RET) &&
> +	    (flags & ~BPF_TRAMP_F_RET_FENTRY_RET))
> +		return false;
> +
> +	return true;
> +}
> +
> +int bpf_prepare_trampoline(struct bpf_tramp_image *tr, void *image,
> +			   void *image_end, const struct btf_func_model *m,
> +			   u32 flags, struct bpf_tramp_links *tlinks,
> +			   void *orig_call)
> +{
> +	if (!is_valid_bpf_tramp_flags(flags))
> +		return -EINVAL;
> +
> +	return arch_prepare_bpf_trampoline(tr, image, image_end, m, flags,
> +					   tlinks, orig_call);
> +}

It's an overkill to introduce a new helper function just to validate
flags that almost compile time constants.
The flags are not user supplied.
Please move /* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops ... */
comment to bpf_struct_ops.c right before it calls arch_prepare_bpf_trampoline()
And add a comment to trampoline.c saying that BPF_TRAMP_F_RESTORE_REGS
and BPF_TRAMP_F_SKIP_FRAME should not be set together.
We could add a warn_on there or in arch code, but feels like overkill.

^ permalink raw reply

* Re: [PATCH v3 3/4] can: skb:: move can_dropped_invalid_skb and can_skb_headroom_valid to skb.c
From: Oliver Hartkopp @ 2022-05-17 15:50 UTC (permalink / raw)
  To: Max Staudt
  Cc: Marc Kleine-Budde, Vincent MAILHOL, linux-can, linux-kernel,
	netdev
In-Reply-To: <20220517173821.445c5e90.max@enpas.org>



On 17.05.22 17:38, Max Staudt wrote:
> On Tue, 17 May 2022 16:35:05 +0200
> Oliver Hartkopp <socketcan@hartkopp.net> wrote:
> 
>> I think it should be even more simple.
>>
>> When you enter the current Kconfig page of "CAN Device Drivers" every
>> selection of vcan/vxcan/slcan should directly select CAN_DEV_SW.
> 
> I'm a bit lost - what would CAN_DEV_SW do?

It should be just *one* enabler of building can-dev-ko

> If it enables can_dropped_invalid_skb(), then the HW drivers would also
> need to depend on CAN_DEV_SW directly or indirectly, or am I missing
> something?

And CAN_DEV is another enabler that would build the skb stuff from 
CAN_DEV_SW too (but also the netlink stuff).

That's what I meant with "some Makefile magic" which is then building 
the can-dev.ko with the required features depending on CAN_DEV_SW, 
CAN_DEV, CAN_DEV_RX_OFFLOAD, CAN_CALC_BITTIMING, etc

Best,
Oliver


^ permalink raw reply

* Re: [PATCH net-next v3 02/10] ptp: ocp: add Celestica timecard PCI ids
From: Jonathan Lemon @ 2022-05-17 15:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, richardcochran, davem, pabeni, edumazet, kernel-team
In-Reply-To: <20220517082558.59991355@kernel.org>

On Tue, May 17, 2022 at 08:25:58AM -0700, Jakub Kicinski wrote:
> On Mon, 16 May 2022 18:46:44 -0700 Jonathan Lemon wrote:
> > On Mon, May 16, 2022 at 05:43:03PM -0700, Jakub Kicinski wrote:
> > > On Fri, 13 May 2022 15:59:16 -0700 Jonathan Lemon wrote:  
> > > > +#ifndef PCI_VENDOR_ID_CELESTICA
> > > > +#define PCI_VENDOR_ID_CELESTICA 0x18d4
> > > > +#endif
> > > > +
> > > > +#ifndef PCI_DEVICE_ID_CELESTICA_TIMECARD
> > > > +#define PCI_DEVICE_ID_CELESTICA_TIMECARD 0x1008
> > > > +#endif  
> > > 
> > > The ifdefs are unnecessary, these kind of constructs are often used out
> > > of tree when one does not control the headers, but not sure what purpose
> > > they'd serve upstream?  
> > 
> > include/linux/pci_ids.h says:
> > 
> >  *      Do not add new entries to this file unless the definitions
> >  *      are shared between multiple drivers.
> > 
> > Neither FACEBOOK (0x1d9b) nor CELESTICA (0x18d4) are present
> > in this file.  This seems to a common idiom in several other
> > drivers.  Picking one at random:
> > 
> >    gve.h:#define PCI_VENDOR_ID_GOOGLE 0x1ae0
> > 
> > 
> > So these #defines are needed.
> 
> Indeed, but also I'm not complaining about defines but the ifdefs 
> in which they are wrapped :)

This is standard defensive coding practice.  I would vastly
prefer to leave them this way, and my hard-wired fingers 
would also not like to be retrained.

Next, you'll be telling me that all the kernel headers
should be using "#pragma once".
-- 
Jonathan

^ permalink raw reply

* Re: [PATCH v2 01/14] thermal/core: Change thermal_zone_ops to thermal_sensor_ops
From: Rafael J. Wysocki @ 2022-05-17 15:42 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Daniel Lezcano, Rafael J. Wysocki, Kevin Hilman, Alexandre Bailon,
	Linux PM, Linux Kernel Mailing List, Amit Kucheria, Zhang Rui,
	Jonathan Corbet, Len Brown, Raju Rangoju, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Ido Schimmel, Petr Machata,
	Luca Coelho, Kalle Valo, Peter Kaestle, Hans de Goede, Mark Gross,
	Sebastian Reichel, Miquel Raynal, Support Opensource, Shawn Guo,
	Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	NXP Linux Team, Niklas Söderlund, Miri Korenblit,
	Johannes Berg, Srinivas Pandruvada, Sumeet Pawnikar,
	Dan Carpenter, Chuansheng Liu, Jiasheng Jiang, Antoine Tenart,
	Andy Shevchenko, open list:DOCUMENTATION,
	open list:ACPI THERMAL DRIVER,
	open list:CXGB4 ETHERNET DRIVER (CXGB4),
	open list:INTEL WIRELESS WIFI LINK (iwlwifi),
	open list:ACER ASPIRE ONE TEMPERATURE AND FAN DRIVER,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	open list:RENESAS R-CAR THERMAL DRIVERS
In-Reply-To: <20220507125443.2766939-2-daniel.lezcano@linexp.org>

On Sat, May 7, 2022 at 2:55 PM Daniel Lezcano <daniel.lezcano@linexp.org> wrote:
>
> A thermal zone is software abstraction of a sensor associated with
> properties and cooling devices if any.
>
> The fact that we have thermal_zone and thermal_zone_ops mixed is
> confusing and does not clearly identify the different components
> entering in the thermal management process. A thermal zone appears to
> be a sensor while it is not.

Well, the majority of the operations in thermal_zone_ops don't apply
to thermal sensors.  For example, ->set_trips(), ->get_trip_type(),
->get_trip_temp().

> In order to set the scene for multiple thermal sensors aggregated into
> a single thermal zone. Rename the thermal_zone_ops to
> thermal_sensor_ops, that will appear clearyl the thermal zone is not a
> sensor but an abstraction of one [or multiple] sensor(s).

So I'm not convinced that the renaming mentioned above is particularly
clean either.

IMV the way to go would be to split the thermal sensor operations,
like ->get_temp(), out of thermal_zone_ops.

But then it is not clear what a thermal zone with multiple sensors in
it really means.  I guess it would require an aggregation function to
combine the thermal sensors in it that would produce an effective
temperature to check against the trip points.

Honestly, I don't think that setting a separate set of trips for each
sensor in a thermal zone would make a lot of sense.

^ permalink raw reply

* Re: [PATCH net-next v3 08/10] ptp: ocp: fix PPS source selector reporting
From: Jonathan Lemon @ 2022-05-17 15:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, richardcochran, davem, pabeni, edumazet, kernel-team
In-Reply-To: <20220517015428.l6ttuht3ufrl2deb@bsd-mbp.dhcp.thefacebook.com>

On Mon, May 16, 2022 at 06:54:28PM -0700, Jonathan Lemon wrote:
> On Mon, May 16, 2022 at 05:43:17PM -0700, Jakub Kicinski wrote:
> > On Fri, 13 May 2022 15:59:22 -0700 Jonathan Lemon wrote:
> > > The NTL timecard design has a PPS1 selector which selects the
> > > the PPS source automatically, according to Section 1.9 of the
> > > documentation.
> > > 
> > >   If there is a SMA PPS input detected:
> > >      - send signal to MAC and PPS slave selector.
> > > 
> > >   If there is a MAC PPS input detected:
> > >      - send GNSS1 to the MAC
> > >      - send MAC to the PPS slave
> > > 
> > >   If there is a GNSS1 input detected:
> > >      - send GNSS1 to the MAC
> > >      - send GNSS1 to the PPS slave.MAC
> > > 
> > > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > 
> > This one and patch 10 need Fixes tags
> 
> This is for a debugfs entry.  I disagree that a Fixes: tag
> is needed here.
> 
> I'll do it for patch 10 if you insist, but this only happens
> if ptp_clock_register() fails, which not normally seen.

Actually, patch 10 would be:

Fixes: c205d53c4923 ("ptp: ocp: Add firmware capability bits for feature gating")

Which is only in 5.18-rcX at this point.

Do we need a fixes tags for code which hasn't made it into a
full release release yet?
-- 
Jonathan


^ permalink raw reply

* Re: [PATCH v3 3/4] can: skb:: move can_dropped_invalid_skb and can_skb_headroom_valid to skb.c
From: Max Staudt @ 2022-05-17 15:38 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, Vincent MAILHOL, linux-can, linux-kernel,
	netdev
In-Reply-To: <22590a57-c7c6-39c6-06d5-11c6e4e1534b@hartkopp.net>

On Tue, 17 May 2022 16:35:05 +0200
Oliver Hartkopp <socketcan@hartkopp.net> wrote:

> I think it should be even more simple.
> 
> When you enter the current Kconfig page of "CAN Device Drivers" every 
> selection of vcan/vxcan/slcan should directly select CAN_DEV_SW.

I'm a bit lost - what would CAN_DEV_SW do?

If it enables can_dropped_invalid_skb(), then the HW drivers would also
need to depend on CAN_DEV_SW directly or indirectly, or am I missing
something?



Max

^ permalink raw reply

* Re: [PATCH 17/30] tracing: Improve panic/die notifiers
From: Guilherme G. Piccoli @ 2022-05-17 15:33 UTC (permalink / raw)
  To: Petr Mladek, rostedt
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, senozhatsky, stern, tglx,
	vgoyal, vkuznets, will
In-Reply-To: <20220511114541.GC26047@pathway.suse.cz>

On 11/05/2022 08:45, Petr Mladek wrote:
> [...]
> DIE_OOPS and PANIC_NOTIFIER are from different enum.
> It feels like comparing apples with oranges here.
> 
> IMHO, the proper way to unify the two notifiers is
> a check of the @self parameter. Something like:
> 
> static int trace_die_panic_handler(struct notifier_block *self,
> 				unsigned long ev, void *unused)
> {
> 	if (self == trace_die_notifier && val != DIE_OOPS)
> 		goto out;
> 
> 	ftrace_dump(ftrace_dump_on_oops);
> out:
> 	return NOTIFY_DONE;
> }
> 
> Best Regards,
> Petr

OK Petr, thanks - will implement your suggestion in V2 (CC Steven)

Cheers!

^ permalink raw reply

* Re: [PATCH 15/30] bus: brcmstb_gisb: Clean-up panic/die notifiers
From: Guilherme G. Piccoli @ 2022-05-17 15:32 UTC (permalink / raw)
  To: Petr Mladek, Florian Fainelli
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-arm-kernel, linux-edac,
	linux-hyperv, linux-leds, linux-mips, linux-parisc, linux-pm,
	linux-remoteproc, linux-s390, linux-tegra, linux-um, linux-xtensa,
	netdev, openipmi-developer, rcu, sparclinux, xen-devel, x86,
	kernel-dev, kernel, halves, fabiomirmar, alejandro.j.jimenez,
	andriy.shevchenko, arnd, bp, corbet, d.hatayama, dave.hansen,
	dyoung, feng.tang, gregkh, mikelley, hidehiro.kawai.ez, jgross,
	john.ogness, keescook, luto, mhiramat, mingo, paulmck, peterz,
	rostedt, senozhatsky, stern, tglx, vgoyal, vkuznets, will,
	Brian Norris
In-Reply-To: <YnqEqDnMfUgC4dM6@alley>

On 10/05/2022 12:28, Petr Mladek wrote:
> [...]
> IMHO, the check of the @self parameter was the proper solution.
> 
> "gisb_die_notifier" list uses @val from enum die_val.
> "gisb_panic_notifier" list uses @val from enum panic_notifier_val.
> 
> These are unrelated types. It might easily break when
> someone defines the same constant also in enum die_val.
> 
> Best Regards,
> Petr

OK Petr, I'll drop this idea for V2 - will just remove the useless
header / prototype then. (CC Florian)


Cheers,


Guilherme

^ permalink raw reply

* Re: [PATCH net-next v3 08/10] ptp: ocp: fix PPS source selector reporting
From: Jakub Kicinski @ 2022-05-17 15:32 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: netdev, richardcochran, davem, pabeni, edumazet, kernel-team
In-Reply-To: <20220517015428.l6ttuht3ufrl2deb@bsd-mbp.dhcp.thefacebook.com>

On Mon, 16 May 2022 18:54:28 -0700 Jonathan Lemon wrote:
> On Mon, May 16, 2022 at 05:43:17PM -0700, Jakub Kicinski wrote:
> > On Fri, 13 May 2022 15:59:22 -0700 Jonathan Lemon wrote:  
> > > The NTL timecard design has a PPS1 selector which selects the
> > > the PPS source automatically, according to Section 1.9 of the
> > > documentation.
> > > 
> > >   If there is a SMA PPS input detected:
> > >      - send signal to MAC and PPS slave selector.
> > > 
> > >   If there is a MAC PPS input detected:
> > >      - send GNSS1 to the MAC
> > >      - send MAC to the PPS slave
> > > 
> > >   If there is a GNSS1 input detected:
> > >      - send GNSS1 to the MAC
> > >      - send GNSS1 to the PPS slave.MAC
> > > 
> > > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>  
> > 
> > This one and patch 10 need Fixes tags  
> 
> This is for a debugfs entry.  I disagree that a Fixes: tag
> is needed here.
> 
> I'll do it for patch 10 if you insist, but this only happens
> if ptp_clock_register() fails, which not normally seen.

Fixes need a fixes tag and need to target the right tree.

^ permalink raw reply

* Re: [PATCH] wireless: Fix Makefile to be in alphabetical order
From: Kalle Valo @ 2022-05-17 15:30 UTC (permalink / raw)
  To: Srinivasan R
In-Reply-To: <MA1PR01MB26992E104B006B340C3C3A84C1CA9@MA1PR01MB2699.INDPRD01.PROD.OUTLOOK.COM>

Srinivasan R <srinir@outlook.com> wrote:

> Fix quantenna to be in the right order
> 
> Signed-off-by: Srinivasan R <srinir@outlook.com>

Patch applied to wireless-next.git, thanks.

8762246c7b23 wireless: Fix Makefile to be in alphabetical order

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/MA1PR01MB26992E104B006B340C3C3A84C1CA9@MA1PR01MB2699.INDPRD01.PROD.OUTLOOK.COM/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

* Re: [PATCH net v2] NFC: hci: fix sleep in atomic context bugs in nfc_hci_hcp_message_tx
From: Krzysztof Kozlowski @ 2022-05-17 15:28 UTC (permalink / raw)
  To: duoming
  Cc: linux-kernel, kuba, davem, edumazet, pabeni, gregkh,
	alexander.deucher, broonie, netdev
In-Reply-To: <71c24f38.1a1f4.180d29ff1fd.Coremail.duoming@zju.edu.cn>

On 17/05/2022 17:25, duoming@zju.edu.cn wrote:
> Hello,
> 
> On Tue, 17 May 2022 13:42:41 +0200 Krzysztof wrote:
> 
>> On 17/05/2022 12:55, Duoming Zhou wrote:
>>> There are sleep in atomic context bugs when the request to secure
>>> element of st21nfca is timeout. The root cause is that kzalloc and
>>> alloc_skb with GFP_KERNEL parameter and mutex_lock are called in
>>> st21nfca_se_wt_timeout which is a timer handler. The call tree shows
>>> the execution paths that could lead to bugs:
>>>
>>>    (Interrupt context)
>>> st21nfca_se_wt_timeout
>>>   nfc_hci_send_event
>>>     nfc_hci_hcp_message_tx
>>>       kzalloc(..., GFP_KERNEL) //may sleep
>>>       alloc_skb(..., GFP_KERNEL) //may sleep
>>>       mutex_lock() //may sleep
>>>
>>> This patch changes allocation mode of kzalloc and alloc_skb from
>>> GFP_KERNEL to GFP_ATOMIC and changes mutex_lock to spin_lock in
>>> order to prevent atomic context from sleeping.
>>>
>>> Fixes: 2130fb97fecf ("NFC: st21nfca: Adding support for secure element")
>>> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
>>> ---
>>> Changes in v2:
>>>   - Change mutex_lock to spin_lock.
>>>
>>>  include/net/nfc/hci.h |  3 ++-
>>>  net/nfc/hci/core.c    | 18 +++++++++---------
>>>  net/nfc/hci/hcp.c     | 10 +++++-----
>>>  3 files changed, 16 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/include/net/nfc/hci.h b/include/net/nfc/hci.h
>>> index 756c11084f6..8f66e6e6b91 100644
>>> --- a/include/net/nfc/hci.h
>>> +++ b/include/net/nfc/hci.h
>>> @@ -103,7 +103,8 @@ struct nfc_hci_dev {
>>>  
>>>  	bool shutting_down;
>>>  
>>> -	struct mutex msg_tx_mutex;
>>> +	/* The spinlock is used to protect resources related with hci message TX */
>>> +	spinlock_t msg_tx_spin;
>>>  
>>>  	struct list_head msg_tx_queue;
>>>  
>>> diff --git a/net/nfc/hci/core.c b/net/nfc/hci/core.c
>>> index ceb87db57cd..fa22f9fe5fc 100644
>>> --- a/net/nfc/hci/core.c
>>> +++ b/net/nfc/hci/core.c
>>> @@ -68,7 +68,7 @@ static void nfc_hci_msg_tx_work(struct work_struct *work)
>>>  	struct sk_buff *skb;
>>>  	int r = 0;
>>>  
>>> -	mutex_lock(&hdev->msg_tx_mutex);
>>> +	spin_lock(&hdev->msg_tx_spin);
>>>  	if (hdev->shutting_down)
>>>  		goto exit;
>>
>> How did you test your patch?
>>
>> Did you check, really check, that this can be an atomic (non-sleeping)
>> section?
>>
>> I have doubts because I found at least one path leading to device_lock
>> (which is a mutex) called within your new code.
> 
> The nfc_hci_hcp_message_tx() is called by both process context(hci_dev_up and so on)
> and interrupt context(st21nfca_se_wt_timeout()). The process context(hci_dev_up and so on)
> calls device_lock, but I think calling spin_lock() within device_lock() is ok. There is
> no device_lock() called within spin_lock(). 

There is.

nfc_hci_failure -> spin lock -> nfc_driver_failure -> nfc_targets_found
-> device_lock

I found it just by a very quick look, so I suspect there are several
other places, not really checked.

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH net v2] NFC: hci: fix sleep in atomic context bugs in nfc_hci_hcp_message_tx
From: duoming @ 2022-05-17 15:25 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: linux-kernel, kuba, davem, edumazet, pabeni, gregkh,
	alexander.deucher, broonie, netdev
In-Reply-To: <2ce7a871-3e55-ae50-955c-bf04a443aba3@linaro.org>

Hello,

On Tue, 17 May 2022 13:42:41 +0200 Krzysztof wrote:

> On 17/05/2022 12:55, Duoming Zhou wrote:
> > There are sleep in atomic context bugs when the request to secure
> > element of st21nfca is timeout. The root cause is that kzalloc and
> > alloc_skb with GFP_KERNEL parameter and mutex_lock are called in
> > st21nfca_se_wt_timeout which is a timer handler. The call tree shows
> > the execution paths that could lead to bugs:
> > 
> >    (Interrupt context)
> > st21nfca_se_wt_timeout
> >   nfc_hci_send_event
> >     nfc_hci_hcp_message_tx
> >       kzalloc(..., GFP_KERNEL) //may sleep
> >       alloc_skb(..., GFP_KERNEL) //may sleep
> >       mutex_lock() //may sleep
> > 
> > This patch changes allocation mode of kzalloc and alloc_skb from
> > GFP_KERNEL to GFP_ATOMIC and changes mutex_lock to spin_lock in
> > order to prevent atomic context from sleeping.
> > 
> > Fixes: 2130fb97fecf ("NFC: st21nfca: Adding support for secure element")
> > Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
> > ---
> > Changes in v2:
> >   - Change mutex_lock to spin_lock.
> > 
> >  include/net/nfc/hci.h |  3 ++-
> >  net/nfc/hci/core.c    | 18 +++++++++---------
> >  net/nfc/hci/hcp.c     | 10 +++++-----
> >  3 files changed, 16 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/net/nfc/hci.h b/include/net/nfc/hci.h
> > index 756c11084f6..8f66e6e6b91 100644
> > --- a/include/net/nfc/hci.h
> > +++ b/include/net/nfc/hci.h
> > @@ -103,7 +103,8 @@ struct nfc_hci_dev {
> >  
> >  	bool shutting_down;
> >  
> > -	struct mutex msg_tx_mutex;
> > +	/* The spinlock is used to protect resources related with hci message TX */
> > +	spinlock_t msg_tx_spin;
> >  
> >  	struct list_head msg_tx_queue;
> >  
> > diff --git a/net/nfc/hci/core.c b/net/nfc/hci/core.c
> > index ceb87db57cd..fa22f9fe5fc 100644
> > --- a/net/nfc/hci/core.c
> > +++ b/net/nfc/hci/core.c
> > @@ -68,7 +68,7 @@ static void nfc_hci_msg_tx_work(struct work_struct *work)
> >  	struct sk_buff *skb;
> >  	int r = 0;
> >  
> > -	mutex_lock(&hdev->msg_tx_mutex);
> > +	spin_lock(&hdev->msg_tx_spin);
> >  	if (hdev->shutting_down)
> >  		goto exit;
> 
> How did you test your patch?
> 
> Did you check, really check, that this can be an atomic (non-sleeping)
> section?
> 
> I have doubts because I found at least one path leading to device_lock
> (which is a mutex) called within your new code.

The nfc_hci_hcp_message_tx() is called by both process context(hci_dev_up and so on)
and interrupt context(st21nfca_se_wt_timeout()). The process context(hci_dev_up and so on)
calls device_lock, but I think calling spin_lock() within device_lock() is ok. There is
no device_lock() called within spin_lock(). 

The spinlock could also improve the performance of the program, because processing the
hci messages should be finished in a short time.

> Before sending a new version, please wait for discussion to reach some
> consensus. The quality of these fixes is really poor. :(

Ok, I will wait for discussion to reach consensus.

Best regards,
Duoming Zhou

^ permalink raw reply

* Re: [PATCH net-next v3 02/10] ptp: ocp: add Celestica timecard PCI ids
From: Jakub Kicinski @ 2022-05-17 15:25 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: netdev, richardcochran, davem, pabeni, edumazet, kernel-team
In-Reply-To: <20220517014644.4jxm4evud46ybsh3@bsd-mbp.dhcp.thefacebook.com>

On Mon, 16 May 2022 18:46:44 -0700 Jonathan Lemon wrote:
> On Mon, May 16, 2022 at 05:43:03PM -0700, Jakub Kicinski wrote:
> > On Fri, 13 May 2022 15:59:16 -0700 Jonathan Lemon wrote:  
> > > +#ifndef PCI_VENDOR_ID_CELESTICA
> > > +#define PCI_VENDOR_ID_CELESTICA 0x18d4
> > > +#endif
> > > +
> > > +#ifndef PCI_DEVICE_ID_CELESTICA_TIMECARD
> > > +#define PCI_DEVICE_ID_CELESTICA_TIMECARD 0x1008
> > > +#endif  
> > 
> > The ifdefs are unnecessary, these kind of constructs are often used out
> > of tree when one does not control the headers, but not sure what purpose
> > they'd serve upstream?  
> 
> include/linux/pci_ids.h says:
> 
>  *      Do not add new entries to this file unless the definitions
>  *      are shared between multiple drivers.
> 
> Neither FACEBOOK (0x1d9b) nor CELESTICA (0x18d4) are present
> in this file.  This seems to a common idiom in several other
> drivers.  Picking one at random:
> 
>    gve.h:#define PCI_VENDOR_ID_GOOGLE 0x1ae0
> 
> 
> So these #defines are needed.

Indeed, but also I'm not complaining about defines but the ifdefs 
in which they are wrapped :)

^ permalink raw reply

* Re: [PATCH 12/12] RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter
From: Jason Gunthorpe @ 2022-05-17 15:24 UTC (permalink / raw)
  To: longli
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Leon Romanovsky, linux-hyperv, netdev, linux-kernel, linux-rdma
In-Reply-To: <1652778276-2986-13-git-send-email-longli@linuxonhyperv.com>

On Tue, May 17, 2022 at 02:04:36AM -0700, longli@linuxonhyperv.com wrote:
> From: Long Li <longli@microsoft.com>
> 
> Add a RDMA VF driver for Microsoft Azure Network Adapter (MANA).

To set exepecation, we are currently rc7, so this will not make this
merge window. You will need to repost on the next rc1 in three weeks.


> new file mode 100644
> index 000000000000..0eac77c97658
> --- /dev/null
> +++ b/drivers/infiniband/hw/mana/cq.c
> @@ -0,0 +1,74 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2022, Microsoft Corporation. All rights reserved.
> + */
> +
> +#include "mana_ib.h"
> +
> +int mana_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> +		struct ib_udata *udata)
> +{
> +	struct mana_ib_create_cq ucmd = {};
> +	struct ib_device *ibdev = ibcq->device;
> +	struct mana_ib_dev *mdev =
> +		container_of(ibdev, struct mana_ib_dev, ib_dev);
> +	struct mana_ib_cq *cq = container_of(ibcq, struct mana_ib_cq, ibcq);
> +	int err;

We do try to follow the netdev formatting, christmas trees and all that.

> +
> +	err = ib_copy_from_udata(&ucmd, udata, min(sizeof(ucmd), udata->inlen));

Skeptical this min is correct, many other drivers get this wrong.

> +	if (err) {
> +		pr_err("Failed to copy from udata for create cq, %d\n", err);

Do not use pr_* - you should use one of the dev specific varients
inside a device driver. In this case ibdev_dbg

Also, do not ever print to the console on a user triggerable event
such as this. _dbg would be OK.

Scrub the driver for all occurances.

> +	pr_debug("ucmd buf_addr 0x%llx\n", ucmd.buf_addr);

I'm not keen on left over debugging please, in fact there are way too
many prints..

> +	cq->umem = ib_umem_get(ibdev, ucmd.buf_addr,
> +			       cq->cqe * COMP_ENTRY_SIZE,
> +			       IB_ACCESS_LOCAL_WRITE);

Please touch the file with clang-format and take all the things that
look good to you

> diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
> new file mode 100644
> index 000000000000..e288495e3ede
> --- /dev/null
> +++ b/drivers/infiniband/hw/mana/main.c
> @@ -0,0 +1,679 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/*
> + * Copyright (c) 2022, Microsoft Corporation. All rights reserved.
> + */
> +
> +#include "mana_ib.h"
> +
> +MODULE_DESCRIPTION("Microsoft Azure Network Adapter IB driver");
> +MODULE_LICENSE("Dual BSD/GPL");
> +
> +static const struct auxiliary_device_id mana_id_table[] = {
> +	{ .name = "mana.rdma", },
> +	{},
> +};

stylistically this stuff is usually at the bottom of the file, right
before its first use

> +static inline enum atb_page_size mana_ib_get_atb_page_size(u64 page_sz)
> +{
> +	int pos = 0;
> +
> +	page_sz = (page_sz >> 12); //start with 4k
> +
> +	while (page_sz) {
> +		pos++;
> +		page_sz = (page_sz >> 1);
> +	}
> +	return (enum atb_page_size)(pos - 1);
> +}

Isn't this ffs, log, or something that isn't a while loop?

> +static int _mana_ib_gd_create_dma_region(struct mana_ib_dev *dev,
> +					 const dma_addr_t *page_addr_array,
> +					 size_t num_pages_total,
> +					 u64 address, u64 length,
> +					 mana_handle_t *gdma_region,
> +					 u64 page_sz)
> +{
> +	struct gdma_dev *mdev = dev->gdma_dev;
> +	struct gdma_context *gc = mdev->gdma_context;
> +	struct hw_channel_context *hwc = gc->hwc.driver_data;
> +	size_t num_pages_cur, num_pages_to_handle;
> +	unsigned int create_req_msg_size;
> +	unsigned int i;
> +	struct gdma_dma_region_add_pages_req *add_req = NULL;
> +	int err;
> +
> +	struct gdma_create_dma_region_req *create_req;
> +	struct gdma_create_dma_region_resp create_resp = {};
> +
> +	size_t max_pgs_create_cmd = (hwc->max_req_msg_size -
> +				     sizeof(*create_req)) / sizeof(u64);

These extra blank lines are not kernel style, please check everything.

> +	num_pages_to_handle = min_t(size_t, num_pages_total,
> +				    max_pgs_create_cmd);
> +	create_req_msg_size = struct_size(create_req, page_addr_list,
> +					  num_pages_to_handle);
> +
> +	create_req = kzalloc(create_req_msg_size, GFP_KERNEL);
> +	if (!create_req)
> +		return -ENOMEM;
> +
> +	mana_gd_init_req_hdr(&create_req->hdr, GDMA_CREATE_DMA_REGION,
> +			     create_req_msg_size, sizeof(create_resp));
> +
> +	create_req->length = length;
> +	create_req->offset_in_page = address & (page_sz - 1);
> +	create_req->gdma_page_type = mana_ib_get_atb_page_size(page_sz);
> +	create_req->page_count = num_pages_total;
> +	create_req->page_addr_list_len = num_pages_to_handle;
> +
> +	pr_debug("size_dma_region %llu num_pages_total %lu, "
> +		 "page_sz 0x%llx offset_in_page %u\n",
> +		length, num_pages_total, page_sz, create_req->offset_in_page);
> +
> +	pr_debug("num_pages_to_handle %lu, gdma_page_type %u",
> +		 num_pages_to_handle, create_req->gdma_page_type);
> +
> +	for (i = 0; i < num_pages_to_handle; ++i) {
> +		dma_addr_t cur_addr = page_addr_array[i];
> +
> +		create_req->page_addr_list[i] = cur_addr;
> +
> +		pr_debug("page num %u cur_addr 0x%llx\n", i, cur_addr);
> +	}

Er, so we allocated memory and wrote the DMA addresses now you copy
them to another memory?

> +
> +	err = mana_gd_send_request(gc, create_req_msg_size, create_req,
> +				   sizeof(create_resp), &create_resp);
> +	kfree(create_req);
> +
> +	if (err || create_resp.hdr.status) {
> +		dev_err(gc->dev, "Failed to create DMA region: %d, 0x%x\n",
> +			err, create_resp.hdr.status);
> +		goto error;
> +	}
> +
> +	*gdma_region = create_resp.dma_region_handle;
> +	pr_debug("Created DMA region with handle 0x%llx\n", *gdma_region);
> +
> +	num_pages_cur = num_pages_to_handle;
> +
> +	if (num_pages_cur < num_pages_total) {
> +
> +		unsigned int add_req_msg_size;
> +		size_t max_pgs_add_cmd = (hwc->max_req_msg_size -
> +					  sizeof(*add_req)) / sizeof(u64);
> +
> +		num_pages_to_handle = min_t(size_t,
> +					    num_pages_total - num_pages_cur,
> +					    max_pgs_add_cmd);
> +
> +		// Calculate the max num of pages that will be handled
> +		add_req_msg_size = struct_size(add_req, page_addr_list,
> +					       num_pages_to_handle);
> +
> +		add_req = kmalloc(add_req_msg_size, GFP_KERNEL);
> +		if (!add_req) {
> +			err = -ENOMEM;
> +			goto error;
> +		}
> +
> +		while (num_pages_cur < num_pages_total) {
> +			struct gdma_general_resp add_resp = {};
> +			u32 expected_status;
> +			int expected_ret;
> +
> +			if (num_pages_cur + num_pages_to_handle <
> +					num_pages_total) {
> +				// This value means that more pages are needed
> +				expected_status = GDMA_STATUS_MORE_ENTRIES;
> +				expected_ret = 0x0;
> +			} else {
> +				expected_status = 0x0;
> +				expected_ret = 0x0;
> +			}
> +
> +			memset(add_req, 0, add_req_msg_size);
> +
> +			mana_gd_init_req_hdr(&add_req->hdr,
> +					     GDMA_DMA_REGION_ADD_PAGES,
> +					     add_req_msg_size,
> +					     sizeof(add_resp));
> +			add_req->dma_region_handle = *gdma_region;
> +			add_req->page_addr_list_len = num_pages_to_handle;
> +
> +			for (i = 0; i < num_pages_to_handle; ++i) {
> +				dma_addr_t cur_addr =
> +					page_addr_array[num_pages_cur + i];

And then again?

That isn't how this is supposed to work, you should iterate over the
umem in one pass and split up the output as you go. Allocating
potentially giant temporary arrays should be avoided.


> +				add_req->page_addr_list[i] = cur_addr;
> +
> +				pr_debug("page_addr_list %lu addr 0x%llx\n",
> +					 num_pages_cur + i, cur_addr);
> +			}
> +
> +			err = mana_gd_send_request(gc, add_req_msg_size,
> +						   add_req, sizeof(add_resp),
> +						   &add_resp);
> +			if (err != expected_ret ||
> +			    add_resp.hdr.status != expected_status) {
> +				dev_err(gc->dev,
> +					"Failed to put DMA pages %u: %d,0x%x\n",
> +					i, err, add_resp.hdr.status);
> +				err = -EPROTO;
> +				goto free_req;
> +			}
> +
> +			num_pages_cur += num_pages_to_handle;
> +			num_pages_to_handle = min_t(size_t,
> +						    num_pages_total -
> +							num_pages_cur,
> +						    max_pgs_add_cmd);
> +			add_req_msg_size = sizeof(*add_req) +
> +				num_pages_to_handle * sizeof(u64);
> +		}
> +free_req:
> +		kfree(add_req);
> +	}
> +
> +error:
> +	return err;
> +}
> +
> +
> +int mana_ib_gd_create_dma_region(struct mana_ib_dev *dev, struct ib_umem *umem,
> +				 mana_handle_t *dma_region_handle, u64 page_sz)

Since this takes in a umem it should also compute the page_sz for that
umem.

I see this driver seems to have various limitations, so the input
argument here should be the pgsz bitmask that is compatible with the
object being created.

> +{
> +	size_t num_pages = ib_umem_num_dma_blocks(umem, page_sz);
> +	struct ib_block_iter biter;
> +	dma_addr_t *page_addr_array;
> +	unsigned int i = 0;
> +	int err;
> +
> +	pr_debug("num pages %lu umem->address 0x%lx\n",
> +		 num_pages, umem->address);
> +
> +	page_addr_array = kmalloc_array(num_pages,
> +					sizeof(*page_addr_array), GFP_KERNEL);
> +	if (!page_addr_array)
> +		return -ENOMEM;

This will OOM easily, num_pages is user controlled.

> +
> +	rdma_umem_for_each_dma_block(umem, &biter, page_sz)
> +		page_addr_array[i++] = rdma_block_iter_dma_address(&biter);
> +
> +	err = _mana_ib_gd_create_dma_region(dev, page_addr_array, num_pages,
> +					    umem->address, umem->length,
> +					    dma_region_handle, page_sz);
> +
> +	kfree(page_addr_array);
> +
> +	return err;
> +}
> +int mana_ib_gd_create_pd(struct mana_ib_dev *dev, u64 *pd_handle, u32 *pd_id,
> +			 enum gdma_pd_flags flags)
> +{
> +	struct gdma_dev *mdev = dev->gdma_dev;
> +	struct gdma_context *gc = mdev->gdma_context;
> +	int err;
> +
> +	struct gdma_create_pd_req req = {};
> +	struct gdma_create_pd_resp resp = {};
> +
> +	mana_gd_init_req_hdr(&req.hdr, GDMA_CREATE_PD,
> +			     sizeof(req), sizeof(resp));
> +
> +	req.flags = flags;
> +	err = mana_gd_send_request(gc, sizeof(req), &req, sizeof(resp), &resp);
> +
> +	if (!err && !resp.hdr.status) {
> +		*pd_handle = resp.pd_handle;
> +		*pd_id = resp.pd_id;
> +		pr_debug("pd_handle 0x%llx pd_id %d\n", *pd_handle, *pd_id);

Kernel style is 'success oriented flow':

 if (err) {
    return err;
 }
 // success
 return 0;

Audit everything

> +static int mana_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vma)
> +{
> +	struct mana_ib_ucontext *mana_ucontext =
> +		container_of(ibcontext, struct mana_ib_ucontext, ibucontext);
> +	struct ib_device *ibdev = ibcontext->device;
> +	struct mana_ib_dev *mdev =
> +		container_of(ibdev, struct mana_ib_dev, ib_dev);
> +	struct gdma_context *gc = mdev->gdma_dev->gdma_context;
> +	pgprot_t prot;
> +	phys_addr_t pfn;
> +	int ret;

This needs to validate vma->pgoff

> +	// map to the page indexed by ucontext->doorbell

Not kernel style, be sure to run checkpatch and fix the egregious things.

> +static void mana_ib_disassociate_ucontext(struct ib_ucontext *ibcontext)
> +{
> +}

Does this driver actually support disassociate? Don't define this
function if it doesn't.

I didn't see any mmap zapping so I guess it doesn't.

> +static const struct ib_device_ops mana_ib_dev_ops = {
> +	.owner = THIS_MODULE,
> +	.driver_id = RDMA_DRIVER_MANA,
> +	.uverbs_abi_ver = MANA_IB_UVERBS_ABI_VERSION,
> +
> +	.alloc_pd = mana_ib_alloc_pd,
> +	.dealloc_pd = mana_ib_dealloc_pd,
> +
> +	.alloc_ucontext = mana_ib_alloc_ucontext,
> +	.dealloc_ucontext = mana_ib_dealloc_ucontext,
> +
> +	.create_cq = mana_ib_create_cq,
> +	.destroy_cq = mana_ib_destroy_cq,
> +
> +	.create_qp = mana_ib_create_qp,
> +	.modify_qp = mana_ib_modify_qp,
> +	.destroy_qp = mana_ib_destroy_qp,
> +
> +	.disassociate_ucontext = mana_ib_disassociate_ucontext,
> +
> +	.mmap = mana_ib_mmap,
> +
> +	.reg_user_mr = mana_ib_reg_user_mr,
> +	.dereg_mr = mana_ib_dereg_mr,
> +
> +	.create_wq = mana_ib_create_wq,
> +	.modify_wq = mana_ib_modify_wq,
> +	.destroy_wq = mana_ib_destroy_wq,
> +
> +	.create_rwq_ind_table = mana_ib_create_rwq_ind_table,
> +	.destroy_rwq_ind_table = mana_ib_destroy_rwq_ind_table,
> +
> +	.get_port_immutable = mana_ib_get_port_immutable,
> +	.query_device = mana_ib_query_device,
> +	.query_port = mana_ib_query_port,
> +	.query_gid = mana_ib_query_gid,

Usually drivers are just sorting this list

> +#define PAGE_SZ_BM (SZ_4K | SZ_8K | SZ_16K | SZ_32K | SZ_64K | SZ_128K \
> +		    | SZ_256K | SZ_512K | SZ_1M | SZ_2M)
> +
> +// Maximum size of a memory registration is 1G bytes
> +#define MANA_IB_MAX_MR_SIZE	(1024 * 1024 * 1024)

Even with large pages? Weird limit..

You also need to open a PR to the rdma-core github with the userspace for
this and pyverbs test suite support

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks
From: Guilherme G. Piccoli @ 2022-05-17 15:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: akpm, bhe, kexec, linux-kernel, bcm-kernel-feedback-list,
	linuxppc-dev, linux-alpha, linux-edac, linux-hyperv, linux-leds,
	linux-mips, linux-parisc, linux-pm, linux-remoteproc, linux-s390,
	linux-tegra, linux-um, linux-xtensa, netdev, openipmi-developer,
	rcu, sparclinux, xen-devel, x86, kernel-dev, kernel, halves,
	fabiomirmar, alejandro.j.jimenez, andriy.shevchenko, arnd, bp,
	corbet, d.hatayama, dave.hansen, dyoung, feng.tang, gregkh,
	mikelley, hidehiro.kawai.ez, jgross, john.ogness, keescook, luto,
	mhiramat, mingo, paulmck, peterz, rostedt, senozhatsky, stern,
	tglx, vgoyal, vkuznets, will
In-Reply-To: <YoOe7ifxfW8CEHdt@alley>

On 17/05/2022 10:11, Petr Mladek wrote:
> [...]
>> You mentioned 2 cases:
>>
>> (a) Same notifier_list used in different situations;
>>
>> (b) Same *notifier callback* used in different lists;
>>
>> Mine is case (b), right? Can you show me an example of case (a)?
> 
> There are many examples of case (a):
> 
> [... snip ...] 
> These all call the same list/chain in different situations.
> The situation is distinguished by @val.
> 
> 
>> You can see in the following patches (or grep the kernel) that people are using
>> this identification parameter to determine which kind of OOPS trigger
>> the callback to condition the execution of the function to specific
>> cases.
> 
> Could you please show me some existing code for case (b)?
> I am not able to find any except in your patches.
> 

Hi Petr, thanks for the examples - I agree with you. In the end, seems
I'm kind of abusing the API. This id is used to distinguish different
situations in which the callback is called, but in the same
"realm"/notifier list.

In my case I have different list calling the same callback and
(ab-)using the id to make distinction. I can rework the patches using
pointer comparison, it's fine =)

So, I'll drop this patch in V2.

> Anyway, the solution in 16th patch is bad, definitely.
> hv_die_panic_notify_crash() uses "val" to disinguish
> both:
> 
>      + "panic_notifier_list" vs "die_chain"
>      + die_val when callen via "die_chain"
> 
> The API around "die_chain" API is not aware of enum panic_notifier_val
> and the API using "panic_notifier_list" is not aware of enum die_val.
> As I said, it is mixing apples and oranges and it is error prone.
> 

OK, I'll re-work that patch - there's more there to be changed, that one
is complex heheh

Cheers!

^ permalink raw reply

* Re: [PATCH 03/12] net: mana: Handle vport sharing between devices
From: Stephen Hemminger @ 2022-05-17 15:19 UTC (permalink / raw)
  To: longli
  Cc: longli, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Dexuan Cui, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Jason Gunthorpe, Leon Romanovsky, linux-hyperv, netdev,
	linux-kernel, linux-rdma
In-Reply-To: <1652778276-2986-4-git-send-email-longli@linuxonhyperv.com>

On Tue, 17 May 2022 02:04:27 -0700
longli@linuxonhyperv.com wrote:

> diff --git a/drivers/net/ethernet/microsoft/mana/mana.h b/drivers/net/ethernet/microsoft/mana/mana.h
> index 51bff91b63ee..26f14fcb6a61 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana.h
> +++ b/drivers/net/ethernet/microsoft/mana/mana.h
> @@ -375,6 +375,7 @@ struct mana_port_context {
>  	unsigned int num_queues;
>  
>  	mana_handle_t port_handle;
> +	atomic_t port_use_count;

Could this be a refcount_t instead?
The refcount_t has protections against under/overflow.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox