Netdev List
 help / color / mirror / Atom feed
* Re: [RFC] dt-bindings: net: phy: Add subnode for LED configuration
From: Matthias Kaehlcke @ 2019-07-25 17:52 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S . Miller, Rob Herring, Mark Rutland, Florian Fainelli,
	Heiner Kallweit, netdev, devicetree, linux-kernel,
	Douglas Anderson
In-Reply-To: <20190724180430.GB28488@lunn.ch>

Hi Andrew,

On Wed, Jul 24, 2019 at 08:04:30PM +0200, Andrew Lunn wrote:
> On Mon, Jul 22, 2019 at 03:37:41PM -0700, Matthias Kaehlcke wrote:
> > The LED behavior of some Ethernet PHYs is configurable. Add an
> > optional 'leds' subnode with a child node for each LED to be
> > configured. The binding aims to be compatible with the common
> > LED binding (see devicetree/bindings/leds/common.txt).
> > 
> > A LED can be configured to be 'on' when a link with a certain speed
> > is active, or to blink on RX/TX activity. For the configuration to
> > be effective it needs to be supported by the hardware and the
> > corresponding PHY driver.
> > 
> > Suggested-by: Andrew Lunn <andrew@lunn.ch>
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> > This RFC is a follow up of the discussion on "[PATCH v2 6/7]
> > dt-bindings: net: realtek: Add property to configure LED mode"
> > (https://lore.kernel.org/patchwork/patch/1097185/).
> > 
> > For now posting as RFC to get a basic agreement on the bindings
> > before proceding with the implementation in phylib and a specific
> > driver.
> > ---
> >  Documentation/devicetree/bindings/net/phy.txt | 33 +++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt
> > index 9b9e5b1765dd..ad495d3abbbb 100644
> > --- a/Documentation/devicetree/bindings/net/phy.txt
> > +++ b/Documentation/devicetree/bindings/net/phy.txt
> > @@ -46,6 +46,25 @@ Optional Properties:
> >    Mark the corresponding energy efficient ethernet mode as broken and
> >    request the ethernet to stop advertising it.
> >  
> > +- leds: A sub-node which is a container of only LED nodes. Each child
> > +    node represents a PHY LED.
> > +
> > +  Required properties for LED child nodes:
> > +  - reg: The ID number of the LED, typically corresponds to a hardware ID.
> > +
> > +  Optional properties for child nodes:
> > +  - label: The label for this LED. If omitted, the label is taken from the node
> > +    name (excluding the unit address). It has to uniquely identify a device,
> > +    i.e. no other LED class device can be assigned the same label.
> 
> Hi Matthias
> 
> I've thought about label a bit more. 
> 
> > +			label = "ethphy0:left:green";
> 
> We need to be careful with names here. systemd etc renames
> interfaces. ethphy0 could in fact be connected to enp3s0, or eth0
> might get renamed to eth1, etc. So i think we should avoid things like
> ethphy0.

Agreed, this could be problematic.

> Also, i'm not sure we actually need a label, at least not to
> start with.Do we have any way to expose it to the user?

As of now I don't plan to expose the label to userspace by the PHY
driver/framework itself.

From my side we can omit the label for now and add it later if needed.

> If we do ever make it part of the generic LED framework, we can then
> use the label. At that point, i would probably combine the label with
> the interface name in a dynamic way to avoid issues like this.

Sounds good.

Thanks

Matthias

^ permalink raw reply

* [PATCH 2/3] ath10k: Use standard regulator bulk API in snoc
From: Bjorn Andersson @ 2019-07-25 17:47 UTC (permalink / raw)
  To: Kalle Valo, Govind Singh
  Cc: David S. Miller, ath10k, linux-wireless, netdev, linux-kernel,
	linux-arm-msm
In-Reply-To: <20190725174755.23432-1-bjorn.andersson@linaro.org>

The regulator_get_optional() exists for cases where the driver needs do
behave differently depending on some regulator supply being present or
not, as we don't use this we can use the standard regulator_get() and
rely on its handling of unspecified regulators.

While the driver currently doesn't specify any loads the regulator
framework was updated last year to only account for load of enabled
regulators, so should the need appear it's better to apply load numbers
during initialization that dynamically.

With this the regulator wrappers have been reduced the become identical
to the standard bulk API provided by the regulator framework, so use
these instead of rolling our own.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---
 drivers/net/wireless/ath/ath10k/snoc.c | 184 ++++---------------------
 drivers/net/wireless/ath/ath10k/snoc.h |  13 +-
 2 files changed, 27 insertions(+), 170 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 3d93ab09a298..1c9ff7e53e2f 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -36,11 +36,11 @@ static char *const ce_name[] = {
 	"WLAN_CE_11",
 };
 
-static struct ath10k_vreg_info vreg_cfg[] = {
-	{NULL, "vdd-0.8-cx-mx", 0, 0, false},
-	{NULL, "vdd-1.8-xo", 0, 0, false},
-	{NULL, "vdd-1.3-rfa", 0, 0, false},
-	{NULL, "vdd-3.3-ch0", 0, 0, false},
+static const char * const ath10k_regulators[] = {
+	"vdd-0.8-cx-mx",
+	"vdd-1.8-xo",
+	"vdd-1.3-rfa",
+	"vdd-3.3-ch0",
 };
 
 static struct ath10k_clk_info clk_cfg[] = {
@@ -1346,43 +1346,6 @@ static void ath10k_snoc_release_resource(struct ath10k *ar)
 		ath10k_ce_free_pipe(ar, i);
 }
 
-static int ath10k_get_vreg_info(struct ath10k *ar, struct device *dev,
-				struct ath10k_vreg_info *vreg_info)
-{
-	struct regulator *reg;
-	int ret = 0;
-
-	reg = devm_regulator_get_optional(dev, vreg_info->name);
-
-	if (IS_ERR(reg)) {
-		ret = PTR_ERR(reg);
-
-		if (ret  == -EPROBE_DEFER) {
-			ath10k_err(ar, "EPROBE_DEFER for regulator: %s\n",
-				   vreg_info->name);
-			return ret;
-		}
-		if (vreg_info->required) {
-			ath10k_err(ar, "Regulator %s doesn't exist: %d\n",
-				   vreg_info->name, ret);
-			return ret;
-		}
-		ath10k_dbg(ar, ATH10K_DBG_SNOC,
-			   "Optional regulator %s doesn't exist: %d\n",
-			   vreg_info->name, ret);
-		goto done;
-	}
-
-	vreg_info->reg = reg;
-
-done:
-	ath10k_dbg(ar, ATH10K_DBG_SNOC,
-		   "snog vreg %s load_ua %u settle_delay %lu\n",
-		   vreg_info->name, vreg_info->load_ua, vreg_info->settle_delay);
-
-	return 0;
-}
-
 static int ath10k_get_clk_info(struct ath10k *ar, struct device *dev,
 			       struct ath10k_clk_info *clk_info)
 {
@@ -1411,114 +1374,6 @@ static int ath10k_get_clk_info(struct ath10k *ar, struct device *dev,
 	return ret;
 }
 
-static int __ath10k_snoc_vreg_on(struct ath10k *ar,
-				 struct ath10k_vreg_info *vreg_info)
-{
-	int ret;
-
-	ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc regulator %s being enabled\n",
-		   vreg_info->name);
-
-	if (vreg_info->load_ua) {
-		ret = regulator_set_load(vreg_info->reg, vreg_info->load_ua);
-		if (ret < 0) {
-			ath10k_err(ar, "failed to set regulator %s load: %d\n",
-				   vreg_info->name, vreg_info->load_ua);
-			goto err_set_load;
-		}
-	}
-
-	ret = regulator_enable(vreg_info->reg);
-	if (ret) {
-		ath10k_err(ar, "failed to enable regulator %s\n",
-			   vreg_info->name);
-		goto err_enable;
-	}
-
-	if (vreg_info->settle_delay)
-		udelay(vreg_info->settle_delay);
-
-	return 0;
-
-err_enable:
-	regulator_set_load(vreg_info->reg, 0);
-err_set_load:
-
-	return ret;
-}
-
-static int __ath10k_snoc_vreg_off(struct ath10k *ar,
-				  struct ath10k_vreg_info *vreg_info)
-{
-	int ret;
-
-	ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc regulator %s being disabled\n",
-		   vreg_info->name);
-
-	ret = regulator_disable(vreg_info->reg);
-	if (ret)
-		ath10k_err(ar, "failed to disable regulator %s\n",
-			   vreg_info->name);
-
-	ret = regulator_set_load(vreg_info->reg, 0);
-	if (ret < 0)
-		ath10k_err(ar, "failed to set load %s\n", vreg_info->name);
-
-	return ret;
-}
-
-static int ath10k_snoc_vreg_on(struct ath10k *ar)
-{
-	struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
-	struct ath10k_vreg_info *vreg_info;
-	int ret = 0;
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(vreg_cfg); i++) {
-		vreg_info = &ar_snoc->vreg[i];
-
-		if (!vreg_info->reg)
-			continue;
-
-		ret = __ath10k_snoc_vreg_on(ar, vreg_info);
-		if (ret)
-			goto err_reg_config;
-	}
-
-	return 0;
-
-err_reg_config:
-	for (i = i - 1; i >= 0; i--) {
-		vreg_info = &ar_snoc->vreg[i];
-
-		if (!vreg_info->reg)
-			continue;
-
-		__ath10k_snoc_vreg_off(ar, vreg_info);
-	}
-
-	return ret;
-}
-
-static int ath10k_snoc_vreg_off(struct ath10k *ar)
-{
-	struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
-	struct ath10k_vreg_info *vreg_info;
-	int ret = 0;
-	int i;
-
-	for (i = ARRAY_SIZE(vreg_cfg) - 1; i >= 0; i--) {
-		vreg_info = &ar_snoc->vreg[i];
-
-		if (!vreg_info->reg)
-			continue;
-
-		ret = __ath10k_snoc_vreg_off(ar, vreg_info);
-	}
-
-	return ret;
-}
-
 static int ath10k_snoc_clk_init(struct ath10k *ar)
 {
 	struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
@@ -1591,11 +1446,12 @@ static int ath10k_snoc_clk_deinit(struct ath10k *ar)
 
 static int ath10k_hw_power_on(struct ath10k *ar)
 {
+	struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
 	int ret;
 
 	ath10k_dbg(ar, ATH10K_DBG_SNOC, "soc power on\n");
 
-	ret = ath10k_snoc_vreg_on(ar);
+	ret = regulator_bulk_enable(ar_snoc->num_vregs, ar_snoc->vregs);
 	if (ret)
 		return ret;
 
@@ -1606,21 +1462,19 @@ static int ath10k_hw_power_on(struct ath10k *ar)
 	return ret;
 
 vreg_off:
-	ath10k_snoc_vreg_off(ar);
+	regulator_bulk_disable(ar_snoc->num_vregs, ar_snoc->vregs);
 	return ret;
 }
 
 static int ath10k_hw_power_off(struct ath10k *ar)
 {
-	int ret;
+	struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
 
 	ath10k_dbg(ar, ATH10K_DBG_SNOC, "soc power off\n");
 
 	ath10k_snoc_clk_deinit(ar);
 
-	ret = ath10k_snoc_vreg_off(ar);
-
-	return ret;
+	return regulator_bulk_disable(ar_snoc->num_vregs, ar_snoc->vregs);
 }
 
 static const struct of_device_id ath10k_snoc_dt_match[] = {
@@ -1691,12 +1545,20 @@ static int ath10k_snoc_probe(struct platform_device *pdev)
 		goto err_release_resource;
 	}
 
-	ar_snoc->vreg = vreg_cfg;
-	for (i = 0; i < ARRAY_SIZE(vreg_cfg); i++) {
-		ret = ath10k_get_vreg_info(ar, dev, &ar_snoc->vreg[i]);
-		if (ret)
-			goto err_free_irq;
+	ar_snoc->num_vregs = ARRAY_SIZE(ath10k_regulators);
+	ar_snoc->vregs = devm_kcalloc(&pdev->dev, ar_snoc->num_vregs,
+				      sizeof(*ar_snoc->vregs), GFP_KERNEL);
+	if (!ar_snoc->vregs) {
+		ret = -ENOMEM;
+		goto err_free_irq;
 	}
+	for (i = 0; i < ar_snoc->num_vregs; i++)
+		ar_snoc->vregs[i].supply = ath10k_regulators[i];
+
+	ret = devm_regulator_bulk_get(&pdev->dev, ar_snoc->num_vregs,
+				      ar_snoc->vregs);
+	if (ret < 0)
+		goto err_free_irq;
 
 	ar_snoc->clk = clk_cfg;
 	for (i = 0; i < ARRAY_SIZE(clk_cfg); i++) {
diff --git a/drivers/net/wireless/ath/ath10k/snoc.h b/drivers/net/wireless/ath/ath10k/snoc.h
index 1bf7bd4ef2a3..3965ddf66d74 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.h
+++ b/drivers/net/wireless/ath/ath10k/snoc.h
@@ -42,14 +42,6 @@ struct ath10k_snoc_ce_irq {
 	u32 irq_line;
 };
 
-struct ath10k_vreg_info {
-	struct regulator *reg;
-	const char *name;
-	u32 load_ua;
-	unsigned long settle_delay;
-	bool required;
-};
-
 struct ath10k_clk_info {
 	struct clk *handle;
 	const char *name;
@@ -64,6 +56,8 @@ enum ath10k_snoc_flags {
 	ATH10K_SNOC_FLAG_8BIT_HOST_CAP_QUIRK,
 };
 
+struct regulator_bulk_data;
+
 struct ath10k_snoc {
 	struct platform_device *dev;
 	struct ath10k *ar;
@@ -75,7 +69,8 @@ struct ath10k_snoc {
 	struct ath10k_snoc_ce_irq ce_irqs[CE_COUNT_MAX];
 	struct ath10k_ce ce;
 	struct timer_list rx_post_retry;
-	struct ath10k_vreg_info *vreg;
+	struct regulator_bulk_data *vregs;
+	size_t num_vregs;
 	struct ath10k_clk_info *clk;
 	struct ath10k_qmi *qmi;
 	unsigned long flags;
-- 
2.18.0


^ permalink raw reply related

* [PATCH 1/3] ath10k: snoc: skip regulator operations
From: Bjorn Andersson @ 2019-07-25 17:47 UTC (permalink / raw)
  To: Kalle Valo, Govind Singh
  Cc: David S. Miller, ath10k, linux-wireless, netdev, linux-kernel,
	linux-arm-msm
In-Reply-To: <20190725174755.23432-1-bjorn.andersson@linaro.org>

The regulator operations is trying to set a voltage to a fixed value, by
giving some wiggle room. But some board designs specifies regulator
voltages outside this limited range. One such example is the Lenovo Yoga
C630, with vdd-3.3-ch0 in particular specified at 3.1V.

But consumers with fixed voltage requirements should just rely on the
board configuration to provide the power at the required level, so this
code should be removed.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---

This patch is required for Lenovo Yoga C630 to succeed in power on ath10k, it
can be merged independently of the two following cleanup patches.

 drivers/net/wireless/ath/ath10k/snoc.c | 27 ++++++--------------------
 drivers/net/wireless/ath/ath10k/snoc.h |  2 --
 2 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index fc15a0037f0e..3d93ab09a298 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -37,10 +37,10 @@ static char *const ce_name[] = {
 };
 
 static struct ath10k_vreg_info vreg_cfg[] = {
-	{NULL, "vdd-0.8-cx-mx", 800000, 850000, 0, 0, false},
-	{NULL, "vdd-1.8-xo", 1800000, 1850000, 0, 0, false},
-	{NULL, "vdd-1.3-rfa", 1300000, 1350000, 0, 0, false},
-	{NULL, "vdd-3.3-ch0", 3300000, 3350000, 0, 0, false},
+	{NULL, "vdd-0.8-cx-mx", 0, 0, false},
+	{NULL, "vdd-1.8-xo", 0, 0, false},
+	{NULL, "vdd-1.3-rfa", 0, 0, false},
+	{NULL, "vdd-3.3-ch0", 0, 0, false},
 };
 
 static struct ath10k_clk_info clk_cfg[] = {
@@ -1377,9 +1377,8 @@ static int ath10k_get_vreg_info(struct ath10k *ar, struct device *dev,
 
 done:
 	ath10k_dbg(ar, ATH10K_DBG_SNOC,
-		   "snog vreg %s min_v %u max_v %u load_ua %u settle_delay %lu\n",
-		   vreg_info->name, vreg_info->min_v, vreg_info->max_v,
-		   vreg_info->load_ua, vreg_info->settle_delay);
+		   "snog vreg %s load_ua %u settle_delay %lu\n",
+		   vreg_info->name, vreg_info->load_ua, vreg_info->settle_delay);
 
 	return 0;
 }
@@ -1420,15 +1419,6 @@ static int __ath10k_snoc_vreg_on(struct ath10k *ar,
 	ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc regulator %s being enabled\n",
 		   vreg_info->name);
 
-	ret = regulator_set_voltage(vreg_info->reg, vreg_info->min_v,
-				    vreg_info->max_v);
-	if (ret) {
-		ath10k_err(ar,
-			   "failed to set regulator %s voltage-min: %d voltage-max: %d\n",
-			   vreg_info->name, vreg_info->min_v, vreg_info->max_v);
-		return ret;
-	}
-
 	if (vreg_info->load_ua) {
 		ret = regulator_set_load(vreg_info->reg, vreg_info->load_ua);
 		if (ret < 0) {
@@ -1453,7 +1443,6 @@ static int __ath10k_snoc_vreg_on(struct ath10k *ar,
 err_enable:
 	regulator_set_load(vreg_info->reg, 0);
 err_set_load:
-	regulator_set_voltage(vreg_info->reg, 0, vreg_info->max_v);
 
 	return ret;
 }
@@ -1475,10 +1464,6 @@ static int __ath10k_snoc_vreg_off(struct ath10k *ar,
 	if (ret < 0)
 		ath10k_err(ar, "failed to set load %s\n", vreg_info->name);
 
-	ret = regulator_set_voltage(vreg_info->reg, 0, vreg_info->max_v);
-	if (ret)
-		ath10k_err(ar, "failed to set voltage %s\n", vreg_info->name);
-
 	return ret;
 }
 
diff --git a/drivers/net/wireless/ath/ath10k/snoc.h b/drivers/net/wireless/ath/ath10k/snoc.h
index 9db823e46314..1bf7bd4ef2a3 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.h
+++ b/drivers/net/wireless/ath/ath10k/snoc.h
@@ -45,8 +45,6 @@ struct ath10k_snoc_ce_irq {
 struct ath10k_vreg_info {
 	struct regulator *reg;
 	const char *name;
-	u32 min_v;
-	u32 max_v;
 	u32 load_ua;
 	unsigned long settle_delay;
 	bool required;
-- 
2.18.0


^ permalink raw reply related

* [PATCH 3/3] ath10k: Use standard bulk clock API in snoc
From: Bjorn Andersson @ 2019-07-25 17:47 UTC (permalink / raw)
  To: Kalle Valo, Govind Singh
  Cc: David S. Miller, ath10k, linux-wireless, netdev, linux-kernel,
	linux-arm-msm
In-Reply-To: <20190725174755.23432-1-bjorn.andersson@linaro.org>

No frequency is currently specified for the single clock defined in the
snoc driver, so the clock wrappers reimplements the standard bulk API
provided by the clock framework. Change to this.

The single clock defined is marked as optional so this version of the
get API is used, but might need to be reconsidered in the future.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
---
 drivers/net/wireless/ath/ath10k/snoc.c | 125 ++++---------------------
 drivers/net/wireless/ath/ath10k/snoc.h |  11 +--
 2 files changed, 21 insertions(+), 115 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 1c9ff7e53e2f..80ce68c0f75e 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -43,8 +43,8 @@ static const char * const ath10k_regulators[] = {
 	"vdd-3.3-ch0",
 };
 
-static struct ath10k_clk_info clk_cfg[] = {
-	{NULL, "cxo_ref_clk_pin", 0, false},
+static const char * const ath10k_clocks[] = {
+	"cxo_ref_clk_pin",
 };
 
 static void ath10k_snoc_htc_tx_cb(struct ath10k_ce_pipe *ce_state);
@@ -1346,104 +1346,6 @@ static void ath10k_snoc_release_resource(struct ath10k *ar)
 		ath10k_ce_free_pipe(ar, i);
 }
 
-static int ath10k_get_clk_info(struct ath10k *ar, struct device *dev,
-			       struct ath10k_clk_info *clk_info)
-{
-	struct clk *handle;
-	int ret = 0;
-
-	handle = devm_clk_get(dev, clk_info->name);
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-		if (clk_info->required) {
-			ath10k_err(ar, "snoc clock %s isn't available: %d\n",
-				   clk_info->name, ret);
-			return ret;
-		}
-		ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc ignoring clock %s: %d\n",
-			   clk_info->name,
-			   ret);
-		return 0;
-	}
-
-	ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc clock %s freq %u\n",
-		   clk_info->name, clk_info->freq);
-
-	clk_info->handle = handle;
-
-	return ret;
-}
-
-static int ath10k_snoc_clk_init(struct ath10k *ar)
-{
-	struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
-	struct ath10k_clk_info *clk_info;
-	int ret = 0;
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(clk_cfg); i++) {
-		clk_info = &ar_snoc->clk[i];
-
-		if (!clk_info->handle)
-			continue;
-
-		ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc clock %s being enabled\n",
-			   clk_info->name);
-
-		if (clk_info->freq) {
-			ret = clk_set_rate(clk_info->handle, clk_info->freq);
-
-			if (ret) {
-				ath10k_err(ar, "failed to set clock %s freq %u\n",
-					   clk_info->name, clk_info->freq);
-				goto err_clock_config;
-			}
-		}
-
-		ret = clk_prepare_enable(clk_info->handle);
-		if (ret) {
-			ath10k_err(ar, "failed to enable clock %s\n",
-				   clk_info->name);
-			goto err_clock_config;
-		}
-	}
-
-	return 0;
-
-err_clock_config:
-	for (i = i - 1; i >= 0; i--) {
-		clk_info = &ar_snoc->clk[i];
-
-		if (!clk_info->handle)
-			continue;
-
-		clk_disable_unprepare(clk_info->handle);
-	}
-
-	return ret;
-}
-
-static int ath10k_snoc_clk_deinit(struct ath10k *ar)
-{
-	struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
-	struct ath10k_clk_info *clk_info;
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(clk_cfg); i++) {
-		clk_info = &ar_snoc->clk[i];
-
-		if (!clk_info->handle)
-			continue;
-
-		ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc clock %s being disabled\n",
-			   clk_info->name);
-
-		clk_disable_unprepare(clk_info->handle);
-	}
-
-	return 0;
-}
-
 static int ath10k_hw_power_on(struct ath10k *ar)
 {
 	struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
@@ -1455,7 +1357,7 @@ static int ath10k_hw_power_on(struct ath10k *ar)
 	if (ret)
 		return ret;
 
-	ret = ath10k_snoc_clk_init(ar);
+	ret = clk_bulk_prepare_enable(ar_snoc->num_clks, ar_snoc->clks);
 	if (ret)
 		goto vreg_off;
 
@@ -1472,7 +1374,7 @@ static int ath10k_hw_power_off(struct ath10k *ar)
 
 	ath10k_dbg(ar, ATH10K_DBG_SNOC, "soc power off\n");
 
-	ath10k_snoc_clk_deinit(ar);
+	clk_bulk_disable_unprepare(ar_snoc->num_clks, ar_snoc->clks);
 
 	return regulator_bulk_disable(ar_snoc->num_vregs, ar_snoc->vregs);
 }
@@ -1560,13 +1462,22 @@ static int ath10k_snoc_probe(struct platform_device *pdev)
 	if (ret < 0)
 		goto err_free_irq;
 
-	ar_snoc->clk = clk_cfg;
-	for (i = 0; i < ARRAY_SIZE(clk_cfg); i++) {
-		ret = ath10k_get_clk_info(ar, dev, &ar_snoc->clk[i]);
-		if (ret)
-			goto err_free_irq;
+	ar_snoc->num_clks = ARRAY_SIZE(ath10k_clocks);
+	ar_snoc->clks = devm_kcalloc(&pdev->dev, ar_snoc->num_clks,
+				     sizeof(*ar_snoc->clks), GFP_KERNEL);
+	if (!ar_snoc->clks) {
+		ret = -ENOMEM;
+		goto err_free_irq;
 	}
 
+	for (i = 0; i < ar_snoc->num_clks; i++)
+		ar_snoc->clks[i].id = ath10k_clocks[i];
+
+	ret = devm_clk_bulk_get_optional(&pdev->dev, ar_snoc->num_clks,
+					 ar_snoc->clks);
+	if (ret)
+		goto err_free_irq;
+
 	ret = ath10k_hw_power_on(ar);
 	if (ret) {
 		ath10k_err(ar, "failed to power on device: %d\n", ret);
diff --git a/drivers/net/wireless/ath/ath10k/snoc.h b/drivers/net/wireless/ath/ath10k/snoc.h
index 3965ddf66d74..d2449a3b4a8f 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.h
+++ b/drivers/net/wireless/ath/ath10k/snoc.h
@@ -42,13 +42,6 @@ struct ath10k_snoc_ce_irq {
 	u32 irq_line;
 };
 
-struct ath10k_clk_info {
-	struct clk *handle;
-	const char *name;
-	u32 freq;
-	bool required;
-};
-
 enum ath10k_snoc_flags {
 	ATH10K_SNOC_FLAG_REGISTERED,
 	ATH10K_SNOC_FLAG_UNREGISTERING,
@@ -56,6 +49,7 @@ enum ath10k_snoc_flags {
 	ATH10K_SNOC_FLAG_8BIT_HOST_CAP_QUIRK,
 };
 
+struct clk_bulk_data;
 struct regulator_bulk_data;
 
 struct ath10k_snoc {
@@ -71,7 +65,8 @@ struct ath10k_snoc {
 	struct timer_list rx_post_retry;
 	struct regulator_bulk_data *vregs;
 	size_t num_vregs;
-	struct ath10k_clk_info *clk;
+	struct clk_bulk_data *clks;
+	size_t num_clks;
 	struct ath10k_qmi *qmi;
 	unsigned long flags;
 };
-- 
2.18.0


^ permalink raw reply related

* [PATCH 0/3] ath10k: Clean up regulator and clock handling
From: Bjorn Andersson @ 2019-07-25 17:47 UTC (permalink / raw)
  To: Kalle Valo, Govind Singh
  Cc: David S. Miller, ath10k, linux-wireless, netdev, linux-kernel,
	linux-arm-msm

The first patch in this series removes the regulator_set_voltage() of a fixed
voltate, as fixed regulator constraints should be specified on a board level
and on certain boards - such as the Lenovo Yoga C630 - the voltage specified
for the 3.3V regulator is outside the given range.

The following two patches cleans up regulator and clock usage by using the bulk
API provided by the two frameworks.

Bjorn Andersson (3):
  ath10k: snoc: skip regulator operations
  ath10k: Use standard regulator bulk API in snoc
  ath10k: Use standard bulk clock API in snoc

 drivers/net/wireless/ath/ath10k/snoc.c | 324 ++++---------------------
 drivers/net/wireless/ath/ath10k/snoc.h |  26 +-
 2 files changed, 48 insertions(+), 302 deletions(-)

-- 
2.18.0


^ permalink raw reply

* Re: [PATCH bpf-next v2 5/7] selftests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG
From: Song Liu @ 2019-07-25 17:15 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S. Miller, ast@kernel.org,
	daniel@iogearbox.net, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190725153342.3571-6-sdf@google.com>



> On Jul 25, 2019, at 8:33 AM, Stanislav Fomichev <sdf@google.com> wrote:
> 
> bpf_flow.c: exit early unless FLOW_DISSECTOR_F_PARSE_1ST_FRAG is passed
> in flags. Also, set ip_proto earlier, this makes sure we have correct
> value with fragmented packets.
> 
> Add selftest cases to test ipv4/ipv6 fragments and skip eth_get_headlen
> tests that don't have FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag.
> 
> eth_get_headlen calls flow dissector with
> FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag so we can't run tests that
> have different set of input flags against it.
> 
> v2:
> * sefltests -> selftests (Willem de Bruijn)
> * Reword a comment about eth_get_headlen flags (Song Liu)
> 
> Acked-by: Willem de Bruijn <willemb@google.com>
> Cc: Song Liu <songliubraving@fb.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Acked-by: Song Liu <songliubraving@fb.com>


> ---
> .../selftests/bpf/prog_tests/flow_dissector.c | 132 +++++++++++++++++-
> tools/testing/selftests/bpf/progs/bpf_flow.c  |  28 +++-
> 2 files changed, 153 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> index c938283ac232..f93a115db650 100644
> --- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> +++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> @@ -5,6 +5,10 @@
> #include <linux/if_tun.h>
> #include <sys/uio.h>
> 
> +#ifndef IP_MF
> +#define IP_MF 0x2000
> +#endif
> +
> #define CHECK_FLOW_KEYS(desc, got, expected)				\
> 	CHECK_ATTR(memcmp(&got, &expected, sizeof(got)) != 0,		\
> 	      desc,							\
> @@ -49,6 +53,18 @@ struct ipv6_pkt {
> 	struct tcphdr tcp;
> } __packed;
> 
> +struct ipv6_frag_pkt {
> +	struct ethhdr eth;
> +	struct ipv6hdr iph;
> +	struct frag_hdr {
> +		__u8 nexthdr;
> +		__u8 reserved;
> +		__be16 frag_off;
> +		__be32 identification;
> +	} ipf;
> +	struct tcphdr tcp;
> +} __packed;
> +
> struct dvlan_ipv6_pkt {
> 	struct ethhdr eth;
> 	__u16 vlan_tci;
> @@ -65,9 +81,11 @@ struct test {
> 		struct ipv4_pkt ipv4;
> 		struct svlan_ipv4_pkt svlan_ipv4;
> 		struct ipv6_pkt ipv6;
> +		struct ipv6_frag_pkt ipv6_frag;
> 		struct dvlan_ipv6_pkt dvlan_ipv6;
> 	} pkt;
> 	struct bpf_flow_keys keys;
> +	__u32 flags;
> };
> 
> #define VLAN_HLEN	4
> @@ -143,6 +161,102 @@ struct test tests[] = {
> 			.n_proto = __bpf_constant_htons(ETH_P_IPV6),
> 		},
> 	},
> +	{
> +		.name = "ipv4-frag",
> +		.pkt.ipv4 = {
> +			.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> +			.iph.ihl = 5,
> +			.iph.protocol = IPPROTO_TCP,
> +			.iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> +			.iph.frag_off = __bpf_constant_htons(IP_MF),
> +			.tcp.doff = 5,
> +			.tcp.source = 80,
> +			.tcp.dest = 8080,
> +		},
> +		.keys = {
> +			.flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> +			.nhoff = ETH_HLEN,
> +			.thoff = ETH_HLEN + sizeof(struct iphdr),
> +			.addr_proto = ETH_P_IP,
> +			.ip_proto = IPPROTO_TCP,
> +			.n_proto = __bpf_constant_htons(ETH_P_IP),
> +			.is_frag = true,
> +			.is_first_frag = true,
> +			.sport = 80,
> +			.dport = 8080,
> +		},
> +		.flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> +	},
> +	{
> +		.name = "ipv4-no-frag",
> +		.pkt.ipv4 = {
> +			.eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> +			.iph.ihl = 5,
> +			.iph.protocol = IPPROTO_TCP,
> +			.iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> +			.iph.frag_off = __bpf_constant_htons(IP_MF),
> +			.tcp.doff = 5,
> +			.tcp.source = 80,
> +			.tcp.dest = 8080,
> +		},
> +		.keys = {
> +			.nhoff = ETH_HLEN,
> +			.thoff = ETH_HLEN + sizeof(struct iphdr),
> +			.addr_proto = ETH_P_IP,
> +			.ip_proto = IPPROTO_TCP,
> +			.n_proto = __bpf_constant_htons(ETH_P_IP),
> +			.is_frag = true,
> +			.is_first_frag = true,
> +		},
> +	},
> +	{
> +		.name = "ipv6-frag",
> +		.pkt.ipv6_frag = {
> +			.eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> +			.iph.nexthdr = IPPROTO_FRAGMENT,
> +			.iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> +			.ipf.nexthdr = IPPROTO_TCP,
> +			.tcp.doff = 5,
> +			.tcp.source = 80,
> +			.tcp.dest = 8080,
> +		},
> +		.keys = {
> +			.flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> +			.nhoff = ETH_HLEN,
> +			.thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> +				sizeof(struct frag_hdr),
> +			.addr_proto = ETH_P_IPV6,
> +			.ip_proto = IPPROTO_TCP,
> +			.n_proto = __bpf_constant_htons(ETH_P_IPV6),
> +			.is_frag = true,
> +			.is_first_frag = true,
> +			.sport = 80,
> +			.dport = 8080,
> +		},
> +		.flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> +	},
> +	{
> +		.name = "ipv6-no-frag",
> +		.pkt.ipv6_frag = {
> +			.eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> +			.iph.nexthdr = IPPROTO_FRAGMENT,
> +			.iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> +			.ipf.nexthdr = IPPROTO_TCP,
> +			.tcp.doff = 5,
> +			.tcp.source = 80,
> +			.tcp.dest = 8080,
> +		},
> +		.keys = {
> +			.nhoff = ETH_HLEN,
> +			.thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> +				sizeof(struct frag_hdr),
> +			.addr_proto = ETH_P_IPV6,
> +			.ip_proto = IPPROTO_TCP,
> +			.n_proto = __bpf_constant_htons(ETH_P_IPV6),
> +			.is_frag = true,
> +			.is_first_frag = true,
> +		},
> +	},
> };
> 
> static int create_tap(const char *ifname)
> @@ -225,6 +339,13 @@ void test_flow_dissector(void)
> 			.data_size_in = sizeof(tests[i].pkt),
> 			.data_out = &flow_keys,
> 		};
> +		static struct bpf_flow_keys ctx = {};
> +
> +		if (tests[i].flags) {
> +			tattr.ctx_in = &ctx;
> +			tattr.ctx_size_in = sizeof(ctx);
> +			ctx.flags = tests[i].flags;
> +		}
> 
> 		err = bpf_prog_test_run_xattr(&tattr);
> 		CHECK_ATTR(tattr.data_size_out != sizeof(flow_keys) ||
> @@ -251,10 +372,19 @@ void test_flow_dissector(void)
> 	CHECK(err, "ifup", "err %d errno %d\n", err, errno);
> 
> 	for (i = 0; i < ARRAY_SIZE(tests); i++) {
> -		struct bpf_flow_keys flow_keys = {};
> +		/* Keep in sync with 'flags' from eth_get_headlen. */
> +		__u32 eth_get_headlen_flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
> 		struct bpf_prog_test_run_attr tattr = {};
> +		struct bpf_flow_keys flow_keys = {};
> 		__u32 key = 0;
> 
> +		/* For skb-less case we can't pass input flags; run
> +		 * only the tests that have a matching set of flags.
> +		 */
> +
> +		if (tests[i].flags != eth_get_headlen_flags)
> +			continue;
> +
> 		err = tx_tap(tap_fd, &tests[i].pkt, sizeof(tests[i].pkt));
> 		CHECK(err < 0, "tx_tap", "err %d errno %d\n", err, errno);
> 
> diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
> index 5ae485a6af3f..0eabe5e57944 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_flow.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
> @@ -153,7 +153,6 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
> 	struct tcphdr *tcp, _tcp;
> 	struct udphdr *udp, _udp;
> 
> -	keys->ip_proto = proto;
> 	switch (proto) {
> 	case IPPROTO_ICMP:
> 		icmp = bpf_flow_dissect_get_header(skb, sizeof(*icmp), &_icmp);
> @@ -231,7 +230,6 @@ static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr)
> {
> 	struct bpf_flow_keys *keys = skb->flow_keys;
> 
> -	keys->ip_proto = nexthdr;
> 	switch (nexthdr) {
> 	case IPPROTO_HOPOPTS:
> 	case IPPROTO_DSTOPTS:
> @@ -266,6 +264,7 @@ PROG(IP)(struct __sk_buff *skb)
> 	keys->addr_proto = ETH_P_IP;
> 	keys->ipv4_src = iph->saddr;
> 	keys->ipv4_dst = iph->daddr;
> +	keys->ip_proto = iph->protocol;
> 
> 	keys->thoff += iph->ihl << 2;
> 	if (data + keys->thoff > data_end)
> @@ -273,13 +272,19 @@ PROG(IP)(struct __sk_buff *skb)
> 
> 	if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET)) {
> 		keys->is_frag = true;
> -		if (iph->frag_off & bpf_htons(IP_OFFSET))
> +		if (iph->frag_off & bpf_htons(IP_OFFSET)) {
> 			/* From second fragment on, packets do not have headers
> 			 * we can parse.
> 			 */
> 			done = true;
> -		else
> +		} else {
> 			keys->is_first_frag = true;
> +			/* No need to parse fragmented packet unless
> +			 * explicitly asked for.
> +			 */
> +			if (!(keys->flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG))
> +				done = true;
> +		}
> 	}
> 
> 	if (done)
> @@ -301,6 +306,7 @@ PROG(IPV6)(struct __sk_buff *skb)
> 	memcpy(&keys->ipv6_src, &ip6h->saddr, 2*sizeof(ip6h->saddr));
> 
> 	keys->thoff += sizeof(struct ipv6hdr);
> +	keys->ip_proto = ip6h->nexthdr;
> 
> 	return parse_ipv6_proto(skb, ip6h->nexthdr);
> }
> @@ -317,7 +323,8 @@ PROG(IPV6OP)(struct __sk_buff *skb)
> 	/* hlen is in 8-octets and does not include the first 8 bytes
> 	 * of the header
> 	 */
> -	skb->flow_keys->thoff += (1 + ip6h->hdrlen) << 3;
> +	keys->thoff += (1 + ip6h->hdrlen) << 3;
> +	keys->ip_proto = ip6h->nexthdr;
> 
> 	return parse_ipv6_proto(skb, ip6h->nexthdr);
> }
> @@ -333,9 +340,18 @@ PROG(IPV6FR)(struct __sk_buff *skb)
> 
> 	keys->thoff += sizeof(*fragh);
> 	keys->is_frag = true;
> -	if (!(fragh->frag_off & bpf_htons(IP6_OFFSET)))
> +	keys->ip_proto = fragh->nexthdr;
> +
> +	if (!(fragh->frag_off & bpf_htons(IP6_OFFSET))) {
> 		keys->is_first_frag = true;
> 
> +		/* No need to parse fragmented packet unless
> +		 * explicitly asked for.
> +		 */
> +		if (!(keys->flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG))
> +			return export_flow_keys(keys, BPF_OK);
> +	}
> +
> 	return parse_ipv6_proto(skb, fragh->nexthdr);
> }
> 
> -- 
> 2.22.0.657.g960e92d24f-goog
> 


^ permalink raw reply

* pull-request: bpf 2019-07-25
From: Alexei Starovoitov @ 2019-07-25 17:35 UTC (permalink / raw)
  To: davem; +Cc: daniel, netdev, bpf, kernel-team

Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) fix segfault in libbpf, from Andrii.

2) fix gso_segs access, from Eric.

3) tls/sockmap fixes, from Jakub and John.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit 8d650cdedaabb33e85e9b7c517c0c71fcecc1de9:

  tcp: fix tcp_set_congestion_control() use from bpf hook (2019-07-18 20:33:48 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to cb8ffde5694ae5fffb456eae932aac442aa3a207:

  libbpf: silence GCC8 warning about string truncation (2019-07-25 10:13:31 -0700)

----------------------------------------------------------------
Alexei Starovoitov (1):
      Merge branch 'fix-gso_segs'

Andrii Nakryiko (3):
      libbpf: fix SIGSEGV when BTF loading fails, but .BTF.ext exists
      libbpf: sanitize VAR to conservative 1-byte INT
      libbpf: silence GCC8 warning about string truncation

Arnaldo Carvalho de Melo (2):
      libbpf: Fix endianness macro usage for some compilers
      libbpf: Avoid designated initializers for unnamed union members

Daniel Borkmann (1):
      Merge branch 'bpf-sockmap-tls-fixes'

Eric Dumazet (2):
      bpf: fix access to skb_shared_info->gso_segs
      selftests/bpf: add another gso_segs access

Ilya Leoshkevich (2):
      selftests/bpf: fix sendmsg6_prog on s390
      bpf: fix narrower loads on s390

Ilya Maximets (1):
      libbpf: fix using uninitialized ioctl results

Jakub Kicinski (7):
      net/tls: don't arm strparser immediately in tls_set_sw_offload()
      net/tls: don't call tls_sk_proto_close for hw record offload
      selftests/tls: add a test for ULP but no keys
      selftests/tls: test error codes around TLS ULP installation
      selftests/tls: add a bidirectional test
      selftests/tls: close the socket with open record
      selftests/tls: add shutdown tests

John Fastabend (7):
      net/tls: remove close callback sock unlock/lock around TX work flush
      net/tls: remove sock unlock/lock around strp_done()
      net/tls: fix transition through disconnect with close
      bpf: sockmap, sock_map_delete needs to use xchg
      bpf: sockmap, synchronize_rcu before free'ing map
      bpf: sockmap, only create entry if ulp is not already enabled
      bpf: sockmap/tls, close can race with map free

 Documentation/networking/tls-offload.rst          |   6 +
 include/linux/filter.h                            |  13 ++
 include/linux/skmsg.h                             |   8 +-
 include/net/tcp.h                                 |   3 +
 include/net/tls.h                                 |  15 +-
 kernel/bpf/verifier.c                             |   4 +-
 net/core/filter.c                                 |   6 +-
 net/core/skmsg.c                                  |   4 +-
 net/core/sock_map.c                               |  19 ++-
 net/ipv4/tcp_ulp.c                                |  13 ++
 net/tls/tls_main.c                                | 142 ++++++++++++----
 net/tls/tls_sw.c                                  |  83 ++++++---
 tools/lib/bpf/btf.c                               |   5 +-
 tools/lib/bpf/libbpf.c                            |  34 ++--
 tools/lib/bpf/xsk.c                               |  11 +-
 tools/testing/selftests/bpf/progs/sendmsg6_prog.c |   3 +-
 tools/testing/selftests/bpf/verifier/ctx_skb.c    |  11 ++
 tools/testing/selftests/net/tls.c                 | 194 ++++++++++++++++++++++
 18 files changed, 480 insertions(+), 94 deletions(-)

^ permalink raw reply

* Re: TSN - tc usage for a tbs setup
From: Vinicius Costa Gomes @ 2019-07-25 17:30 UTC (permalink / raw)
  To: Stéphane Ancelot, netdev
In-Reply-To: <43c8c7bd-f281-a4dc-badc-9672aaccbd6a@numalliance.com>

Hi,

Stéphane Ancelot <sancelot@numalliance.com> writes:

> Hi,
>
> I am trying to setup my network queue for offline time based configuration.
>
> initial setup is :
>
> tc qdisc show dev eth1:
>
> qdisc mq 0: root
>
> qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 
> 1 1 1
>
>
> I won't need pfifo , I have to send one frame at a precise xmit time 
> (high prio), and then maybe some other frames (with low priority)
>
>
> I want to setup offload time based  xmit.
>
> /sbin/tc qdisc add dev eth1 root handle 100:1 etf delta 100000 clockid 
> CLOCK_REALTIME offload
>

Because the common (expected?) use case for ETF is using it on a system
that is running ptp4l (for example), and so, has the NIC PHC clock using
the TAI clock reference, we only accept the clockid to be CLOCK_TAI.
(Perhaps you are using an old version of iproute2, because a clearer
message should have been printed together with the error as well, anyway
there should be something in dmesg too)

That said, when I need to run some experiments with ETF, and do not care
about having the PHC clock is sync with anything else, I use phc2sys to
force the TAI offset to be zero. Something like this:

$ phc2sys -c $IFACE -s CLOCK_REALTIME -O 0 -m

And install ETF as "usual", something like this:

$ tc qdisc add dev $IFACE root handle 100:1 etf delta 100000 clockid CLOCK_TAI offload

Hope this helps.


Cheers,
--
Vinicius

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
From: Jonathan Lemon @ 2019-07-25 17:30 UTC (permalink / raw)
  To: Richardson, Bruce
  Cc: Laatz, Kevin, netdev, ast, daniel, Topel, Bjorn, Karlsson, Magnus,
	jakub.kicinski, saeedm, maximmi, stephen, Loftus, Ciara, bpf,
	intel-wired-lan
In-Reply-To: <59AF69C657FD0841A61C55336867B5B07EDB5C3F@IRSMSX103.ger.corp.intel.com>



On 25 Jul 2019, at 8:56, Richardson, Bruce wrote:

>> -----Original Message-----
>> From: Jonathan Lemon [mailto:jonathan.lemon@gmail.com]
>> Sent: Thursday, July 25, 2019 4:39 PM
>> To: Laatz, Kevin <kevin.laatz@intel.com>
>> Cc: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
>> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
>> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
>> saeedm@mellanox.com; maximmi@mellanox.com; stephen@networkplumber.org;
>> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
>> <ciara.loftus@intel.com>; bpf@vger.kernel.org; intel-wired-
>> lan@lists.osuosl.org
>> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
>> support
>>
>>
>>
>> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
>>
>>> This patch set adds the ability to use unaligned chunks in the XDP umem.
>>>
>>> Currently, all chunk addresses passed to the umem are masked to be
>>> chunk size aligned (max is PAGE_SIZE). This limits where we can place
>>> chunks within the umem as well as limiting the packet sizes that are
>> supported.
>>>
>>> The changes in this patch set removes these restrictions, allowing XDP
>>> to be more flexible in where it can place a chunk within a umem. By
>>> relaxing where the chunks can be placed, it allows us to use an
>>> arbitrary buffer size and place that wherever we have a free address
>>> in the umem. These changes add the ability to support arbitrary frame
>>> sizes up to 4k
>>> (PAGE_SIZE) and make it easy to integrate with other existing
>>> frameworks that have their own memory management systems, such as DPDK.
>>> In DPDK, for example, there is already support for AF_XDP with zero-
>> copy.
>>> However, with this patch set the integration will be much more seamless.
>>> You can find the DPDK AF_XDP driver at:
>>> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
>>>
>>> Since we are now dealing with arbitrary frame sizes, we need also need
>>> to update how we pass around addresses. Currently, the addresses can
>>> simply be masked to 2k to get back to the original address. This
>>> becomes less trivial when using frame sizes that are not a 'power of
>>> 2' size. This patch set modifies the Rx/Tx descriptor format to use
>>> the upper 16-bits of the addr field for an offset value, leaving the
>>> lower 48-bits for the address (this leaves us with 256 Terabytes,
>>> which should be enough!). We only need to use the upper 16-bits to store
>> the offset when running in unaligned mode.
>>> Rather than adding the offset (headroom etc) to the address, we will
>>> store it in the upper 16-bits of the address field. This way, we can
>>> easily add the offset to the address where we need it, using some bit
>>> manipulation and addition, and we can also easily get the original
>>> address wherever we need it (for example in i40e_zca_fr-- ee) by
>>> simply masking to get the lower 48-bits of the address field.
>>
>> I wonder if it would be better to break backwards compatibility here and
>> say that a handle is going to change from [addr] to [base | offset], or
>> even [index | offset], where address = (index * chunk size) + offset, and
>> then use accessor macros to manipulate the queue entries.
>>
>> This way, the XDP hotpath can adjust the handle with simple arithmetic,
>> bypassing the "if (unaligned)", check, as it changes the offset directly.
>>
>> Using a chunk index instead of a base address is safer, otherwise it is
>> too easy to corrupt things.
>> --
>
> The trouble with using a chunk index is that it assumes that all chunks are
> contiguous, which is not always going to be the case. For example, for
> userspace apps the easiest way to get memory that is IOVA/physically
> contiguous is to use hugepages, but even then we still need to skip space
> when crossing a 2MB barrier.
>
> Specifically in this example case, with a 3k buffer size and 2MB hugepages,
> we'd get 666 buffers on a single page, but then waste a few KB before
> starting the 667th buffer at byte 0 on the second 2MB page.
> This is the setup used in DPDK, for instance, when we allocate memory for
> use in buffer pools.
>
> Therefore, I think it's better to just have the kernel sanity checking the
> request for safety and leave userspace the freedom to decide where in its
> memory area it wants to place the buffers.

That makes sense, thanks.  My concern is that this makes it difficult for
the kernel to validate the buffer start address, but perhaps that isn't a
concern.

I still believe it would simplify things if the format was [base | offset],
any opinion on that?
-- 
Jonathan

^ permalink raw reply

* Re: [PATCH v2 bpf] libbpf: silence GCC8 warning about string truncation
From: Alexei Starovoitov @ 2019-07-25 17:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Network Development, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Magnus Karlsson
In-Reply-To: <20190724214753.1816451-1-andriin@fb.com>

On Wed, Jul 24, 2019 at 10:46 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> Despite a proper NULL-termination after strncpy(..., ..., IFNAMSIZ - 1),
> GCC8 still complains about *expected* string truncation:
>
>   xsk.c:330:2: error: 'strncpy' output may be truncated copying 15 bytes
>   from a string of length 15 [-Werror=stringop-truncation]
>     strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
>
> This patch gets rid of the issue altogether by using memcpy instead.
> There is no performance regression, as strncpy will still copy and fill
> all of the bytes anyway.
>
> v1->v2:
> - rebase against bpf tree.
>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Acked-by: Song Liu <songliubraving@fb.com>

Applied. Thanks

^ permalink raw reply

* Re: [PATCH bpf-next v2 0/7] bpf/flow_dissector: support input flags
From: Petar Penkov @ 2019-07-25 17:05 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, Song Liu, Willem de Bruijn
In-Reply-To: <20190725153342.3571-1-sdf@google.com>

Thanks! For the series:

Acked-by: Petar Penkov <ppenkov@google.com>

On Thu, Jul 25, 2019 at 8:33 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> C flow dissector supports input flags that tell it to customize parsing
> by either stopping early or trying to parse as deep as possible.
> BPF flow dissector always parses as deep as possible which is sub-optimal.
> Pass input flags to the BPF flow dissector as well so it can make the same
> decisions.
>
> Series outline:
> * remove unused FLOW_DISSECTOR_F_STOP_AT_L3 flag
> * export FLOW_DISSECTOR_F_XXX flags as uapi and pass them to BPF
>   flow dissector
> * add documentation for the export flags
> * support input flags in BPF_PROG_TEST_RUN via ctx_{in,out}
> * sync uapi to tools
> * support FLOW_DISSECTOR_F_PARSE_1ST_FRAG in selftest
> * support FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL in kernel and selftest
> * support FLOW_DISSECTOR_F_STOP_AT_ENCAP in selftest
>
> Pros:
> * makes BPF flow dissector faster by avoiding burning extra cycles
> * existing BPF progs continue to work by ignoring the flags and always
>   parsing as deep as possible
>
> Cons:
> * new UAPI which we need to support (OTOH, if we need to deprecate some
>   flags, we can just stop setting them upon calling BPF programs)
>
> Some numbers (with .repeat = 4000000 in test_flow_dissector):
>         test_flow_dissector:PASS:ipv4-frag 35 nsec
>         test_flow_dissector:PASS:ipv4-frag 35 nsec
>         test_flow_dissector:PASS:ipv4-no-frag 32 nsec
>         test_flow_dissector:PASS:ipv4-no-frag 32 nsec
>
>         test_flow_dissector:PASS:ipv6-frag 39 nsec
>         test_flow_dissector:PASS:ipv6-frag 39 nsec
>         test_flow_dissector:PASS:ipv6-no-frag 36 nsec
>         test_flow_dissector:PASS:ipv6-no-frag 36 nsec
>
>         test_flow_dissector:PASS:ipv6-flow-label 36 nsec
>         test_flow_dissector:PASS:ipv6-flow-label 36 nsec
>         test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
>         test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
>
>         test_flow_dissector:PASS:ipip-encap 38 nsec
>         test_flow_dissector:PASS:ipip-encap 38 nsec
>         test_flow_dissector:PASS:ipip-no-encap 32 nsec
>         test_flow_dissector:PASS:ipip-no-encap 32 nsec
>
> The improvement is around 10%, but it's in a tight cache-hot
> BPF_PROG_TEST_RUN loop.
>
> Cc: Song Liu <songliubraving@fb.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
>
> Stanislav Fomichev (7):
>   bpf/flow_dissector: pass input flags to BPF flow dissector program
>   bpf/flow_dissector: document flags
>   bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN
>   tools/bpf: sync bpf_flow_keys flags
>   selftests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG
>   bpf/flow_dissector: support ipv6 flow_label and
>     FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL
>   selftests/bpf: support FLOW_DISSECTOR_F_STOP_AT_ENCAP
>
>  Documentation/bpf/prog_flow_dissector.rst     |  18 ++
>  include/linux/skbuff.h                        |   2 +-
>  include/net/flow_dissector.h                  |   4 -
>  include/uapi/linux/bpf.h                      |   6 +
>  net/bpf/test_run.c                            |  39 ++-
>  net/core/flow_dissector.c                     |  14 +-
>  tools/include/uapi/linux/bpf.h                |   6 +
>  .../selftests/bpf/prog_tests/flow_dissector.c | 242 +++++++++++++++++-
>  tools/testing/selftests/bpf/progs/bpf_flow.c  |  46 +++-
>  9 files changed, 359 insertions(+), 18 deletions(-)
>
> --
> 2.22.0.657.g960e92d24f-goog

^ permalink raw reply

* Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
From: Laatz, Kevin @ 2019-07-25 17:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	Topel, Bjorn, Karlsson, Magnus, jonathan.lemon@gmail.com,
	saeedm@mellanox.com, maximmi@mellanox.com,
	stephen@networkplumber.org, Richardson, Bruce, Loftus, Ciara,
	bpf@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <20190724192253.00ac07bd@cakuba.netronome.com>

On 25/07/2019 03:22, Jakub Kicinski wrote:
> On Wed, 24 Jul 2019 05:10:35 +0000, Kevin Laatz wrote:
>> Currently, addresses are chunk size aligned. This means, we are very
>> restricted in terms of where we can place chunk within the umem. For
>> example, if we have a chunk size of 2k, then our chunks can only be placed
>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>
>> This patch introduces the ability to use unaligned chunks. With these
>> changes, we are no longer bound to having to place chunks at a 2k (or
>> whatever your chunk size is) interval. Since we are no longer dealing with
>> aligned chunks, they can now cross page boundaries. Checks for page
>> contiguity have been added in order to keep track of which pages are
>> followed by a physically contiguous page.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>
>> ---
>> v2:
>>    - Add checks for the flags coming from userspace
>>    - Fix how we get chunk_size in xsk_diag.c
>>    - Add defines for masking the new descriptor format
>>    - Modified the rx functions to use new descriptor format
>>    - Modified the tx functions to use new descriptor format
>>
>> v3:
>>    - Add helper function to do address/offset masking/addition
>> ---
>>   include/net/xdp_sock.h      | 17 ++++++++
>>   include/uapi/linux/if_xdp.h |  9 ++++
>>   net/xdp/xdp_umem.c          | 18 +++++---
>>   net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>>   net/xdp/xsk_diag.c          |  2 +-
>>   net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>>   6 files changed, 170 insertions(+), 30 deletions(-)
>>
>> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
>> index 69796d264f06..738996c0f995 100644
>> --- a/include/net/xdp_sock.h
>> +++ b/include/net/xdp_sock.h
>> @@ -19,6 +19,7 @@ struct xsk_queue;
>>   struct xdp_umem_page {
>>   	void *addr;
>>   	dma_addr_t dma;
>> +	bool next_pg_contig;
> IIRC accesses to xdp_umem_page case a lot of cache misses, so having
> this structure grow from 16 to 24B is a little unfortunate :(
> Can we try to steal lower bits of addr or dma? Or perhaps not pre
> compute this info at all?

Pre-computing is saving us from re-computing the same information 
multiple times for the same chunk in this case. Keeping that in mind, I 
would be more in favor of stealing a bit.

Will look into a nicer solution for the v4 :)

>>   };
>>   
>>   struct xdp_umem_fq_reuse {
>> @@ -48,6 +49,7 @@ struct xdp_umem {
>>   	bool zc;
>>   	spinlock_t xsk_list_lock;
>>   	struct list_head xsk_list;
>> +	u32 flags;
>>   };
>>   
>>   struct xdp_sock {
>> @@ -144,6 +146,15 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>>   
>>   	rq->handles[rq->length++] = addr;
>>   }
>> +
>> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
>> +					 u64 offset)
>> +{
>> +	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>> +		return handle |= (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>> +	else
>> +		return handle += offset;
>> +}
>>   #else
>>   static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>>   {
>> @@ -241,6 +252,12 @@ static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
>>   {
>>   }
>>   
>> +static inline u64 xsk_umem_handle_offset(struct xdp_umem *umem, u64 handle,
>> +					 u64 offset)
>> +{
>> +	return NULL;
> 	return 0?
Will change for the v4, thanks.
>
>> +}
>> +
>>   #endif /* CONFIG_XDP_SOCKETS */
>>   
>>   #endif /* _LINUX_XDP_SOCK_H */
>> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
>> index faaa5ca2a117..f8dc68fcdf78 100644
>> --- a/include/uapi/linux/if_xdp.h
>> +++ b/include/uapi/linux/if_xdp.h
>> @@ -17,6 +17,9 @@
>>   #define XDP_COPY	(1 << 1) /* Force copy-mode */
>>   #define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
>>   
>> +/* Flags for xsk_umem_config flags */
>> +#define XDP_UMEM_UNALIGNED_CHUNKS (1 << 0)
>> +
>>   struct sockaddr_xdp {
>>   	__u16 sxdp_family;
>>   	__u16 sxdp_flags;
>> @@ -53,6 +56,7 @@ struct xdp_umem_reg {
>>   	__u64 len; /* Length of packet data area */
>>   	__u32 chunk_size;
>>   	__u32 headroom;
>> +	__u32 flags;
>>   };
>>   
>>   struct xdp_statistics {
>> @@ -74,6 +78,11 @@ struct xdp_options {
>>   #define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
>>   #define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
>>   
>> +/* Masks for unaligned chunks mode */
>> +#define XSK_UNALIGNED_BUF_OFFSET_SHIFT 48
>> +#define XSK_UNALIGNED_BUF_ADDR_MASK \
>> +	((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1)
>> +
>>   /* Rx/Tx descriptor */
>>   struct xdp_desc {
>>   	__u64 addr;
>> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
>> index 83de74ca729a..952ca22103e9 100644
>> --- a/net/xdp/xdp_umem.c
>> +++ b/net/xdp/xdp_umem.c
>> @@ -299,6 +299,7 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
>>   
>>   static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   {
>> +	bool unaligned_chunks = mr->flags & XDP_UMEM_UNALIGNED_CHUNKS;
>>   	u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
>>   	unsigned int chunks, chunks_per_page;
>>   	u64 addr = mr->addr, size = mr->len;
>> @@ -314,7 +315,10 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   		return -EINVAL;
>>   	}
>>   
>> -	if (!is_power_of_2(chunk_size))
>> +	if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNKS))
> parens unnecessary, consider adding a define for known flags.
Will fix in the v4.
>
>> +		return -EINVAL;
>> +
>> +	if (!unaligned_chunks && !is_power_of_2(chunk_size))
>>   		return -EINVAL;
>>   
>>   	if (!PAGE_ALIGNED(addr)) {
>> @@ -331,9 +335,11 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   	if (chunks == 0)
>>   		return -EINVAL;
>>   
>> -	chunks_per_page = PAGE_SIZE / chunk_size;
>> -	if (chunks < chunks_per_page || chunks % chunks_per_page)
>> -		return -EINVAL;
>> +	if (!unaligned_chunks) {
>> +		chunks_per_page = PAGE_SIZE / chunk_size;
>> +		if (chunks < chunks_per_page || chunks % chunks_per_page)
>> +			return -EINVAL;
>> +	}
>>   
>>   	headroom = ALIGN(headroom, 64);
>>   
>> @@ -342,13 +348,15 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>>   		return -EINVAL;
>>   
>>   	umem->address = (unsigned long)addr;
>> -	umem->chunk_mask = ~((u64)chunk_size - 1);
>> +	umem->chunk_mask = unaligned_chunks ? XSK_UNALIGNED_BUF_ADDR_MASK
>> +					    : ~((u64)chunk_size - 1);
>>   	umem->size = size;
>>   	umem->headroom = headroom;
>>   	umem->chunk_size_nohr = chunk_size - headroom;
>>   	umem->npgs = size / PAGE_SIZE;
>>   	umem->pgs = NULL;
>>   	umem->user = NULL;
>> +	umem->flags = mr->flags;
>>   	INIT_LIST_HEAD(&umem->xsk_list);
>>   	spin_lock_init(&umem->xsk_list_lock);
>>   
>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>> index 59b57d708697..b3ab653091c4 100644
>> --- a/net/xdp/xsk.c
>> +++ b/net/xdp/xsk.c
>> @@ -45,7 +45,7 @@ EXPORT_SYMBOL(xsk_umem_has_addrs);
>>   
>>   u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
>>   {
>> -	return xskq_peek_addr(umem->fq, addr);
>> +	return xskq_peek_addr(umem->fq, addr, umem);
>>   }
>>   EXPORT_SYMBOL(xsk_umem_peek_addr);
>>   
>> @@ -55,21 +55,42 @@ void xsk_umem_discard_addr(struct xdp_umem *umem)
>>   }
>>   EXPORT_SYMBOL(xsk_umem_discard_addr);
>>   
>> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
>> + * each page. This is only required in copy mode.
>> + */
>> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
>> +			     u32 len, u32 metalen)
>> +{
>> +	void *to_buf = xdp_umem_get_data(umem, addr);
>> +
>> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
>> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
>> +		u64 page_start = addr & (PAGE_SIZE - 1);
>> +		u64 first_len = PAGE_SIZE - (addr - page_start);
>> +
>> +		memcpy(to_buf, from_buf, first_len + metalen);
>> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
>> +
>> +		return;
>> +	}
>> +
>> +	memcpy(to_buf, from_buf, len + metalen);
>> +}
> Why handle this case gracefully? Real XSK use is the zero copy mode,
> having extra code to make copy mode more permissive seems a little
> counter productive IMHO.
Agree that zero copy mode is the main use and that this is somewhat 
unnecessary. However, since we are now allowing for unaligned chunks 
which can cross page boundaries, there's no harm in adding this extra 
check and handling it gracefully.
>
>>   static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>>   {
>> -	void *to_buf, *from_buf;
>> +	u64 offset = xs->umem->headroom;
>> +	void *from_buf;
>>   	u32 metalen;
>>   	u64 addr;
>>   	int err;
>>   
>> -	if (!xskq_peek_addr(xs->umem->fq, &addr) ||
>> +	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>>   	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>>   		xs->rx_dropped++;
>>   		return -ENOSPC;
>>   	}
>>   
>> -	addr += xs->umem->headroom;
>> -
>>   	if (unlikely(xdp_data_meta_unsupported(xdp))) {
>>   		from_buf = xdp->data;
>>   		metalen = 0;
>> @@ -78,9 +99,13 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>>   		metalen = xdp->data - xdp->data_meta;
>>   	}
>>   
>> -	to_buf = xdp_umem_get_data(xs->umem, addr);
>> -	memcpy(to_buf, from_buf, len + metalen);
>> -	addr += metalen;
>> +	__xsk_rcv_memcpy(xs->umem, addr + offset, from_buf, len, metalen);
>> +
>> +	offset += metalen;
>> +	if (xs->umem->flags & XDP_UMEM_UNALIGNED_CHUNKS)
>> +		addr |= offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
>> +	else
>> +		addr += offset;
>>   	err = xskq_produce_batch_desc(xs->rx, addr, len);
>>   	if (!err) {
>>   		xskq_discard_addr(xs->umem->fq);
>> @@ -127,6 +152,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>>   	u32 len = xdp->data_end - xdp->data;
>>   	void *buffer;
>>   	u64 addr;
>> +	u64 offset = xs->umem->headroom;
> reverse xmas tree, please

Will fix in the v4.

Thanks for reviewing!


^ permalink raw reply

* Re: [PATCH bpf-next v3 06/11] mlx5e: modify driver for handling offsets
From: Laatz, Kevin @ 2019-07-25 17:00 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	Topel, Bjorn, Karlsson, Magnus, jakub.kicinski@netronome.com,
	jonathan.lemon@gmail.com, Saeed Mahameed,
	stephen@networkplumber.org, Richardson, Bruce, Loftus, Ciara,
	bpf@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <c5704b74-8efe-af2a-68e6-716fa89a5665@mellanox.com>

On 25/07/2019 11:15, Maxim Mikityanskiy wrote:
> On 2019-07-24 08:10, Kevin Laatz wrote:
>> With the addition of the unaligned chunks option, we need to make sure we
>> handle the offsets accordingly based on the mode we are currently running
>> in. This patch modifies the driver to appropriately mask the address for
>> each case.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>>
>> ---
>> v3:
>>     - Use new helper function to handle offset
>> ---
>>    drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 8 ++++++--
>>    drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 9 +++++++--
>>    2 files changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> index b0b982cf69bb..d5245893d2c8 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
>> @@ -122,6 +122,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>>    		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
>>    {
>>    	struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
>> +	struct xdp_umem *umem = rq->umem;
>>    	struct xdp_buff xdp;
>>    	u32 act;
>>    	int err;
>> @@ -138,8 +139,11 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>>    	xdp.rxq = &rq->xdp_rxq;
>>    
>>    	act = bpf_prog_run_xdp(prog, &xdp);
>> -	if (xsk)
>> -		xdp.handle += xdp.data - xdp.data_hard_start;
>> +	if (xsk) {
>> +		u64 off = xdp.data - xdp.data_hard_start;
>> +
>> +		xdp.handle = xsk_umem_handle_offset(umem, xdp.handle, off);
>> +	}
> What's missed is that umem_headroom is added to handle directly in
> mlx5e_xsk_page_alloc_umem. In my understanding umem_headroom should go
> to the offset part (high 16 bits) of the handle, to
> xsk_umem_handle_offset has to support increasing the offset.

Will look into it and make the changes for the v4

>>    	switch (act) {
>>    	case XDP_PASS:
>>    		*rx_headroom = xdp.data - xdp.data_hard_start;
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
>> index 35e188cf4ea4..f596e63cba00 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c
>> @@ -61,6 +61,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>>    	struct mlx5e_xdp_xmit_data xdptxd;
>>    	bool work_done = true;
>>    	bool flush = false;
>> +	u64 addr, offset;
>>    
>>    	xdpi.mode = MLX5E_XDP_XMIT_MODE_XSK;
>>    
>> @@ -82,8 +83,12 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget)
>>    			break;
>>    		}
>>    
>> -		xdptxd.dma_addr = xdp_umem_get_dma(umem, desc.addr);
>> -		xdptxd.data = xdp_umem_get_data(umem, desc.addr);
>> +		/* for unaligned chunks need to take offset from upper bits */
>> +		offset = (desc.addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT);
>> +		addr = (desc.addr & XSK_UNALIGNED_BUF_ADDR_MASK);
>> +
>> +		xdptxd.dma_addr = xdp_umem_get_dma(umem, addr + offset);
>> +		xdptxd.data = xdp_umem_get_data(umem, addr + offset);
> Why can't these calculations be encapsulated into
> xdp_umem_get_{dma,data}? I think they are common for all drivers, aren't
> they?
>
> Even if there is some reason not to put this bitshifting stuff into
> xdp_umem_get_* functions, I suggest to encapsulate it into a function
> anyway, because it's a good idea to keep those calculations in a single
> place.
Nice suggestion! I will move it to the xdp_umem_get_* functions in the 
v4. Thanks


^ permalink raw reply

* Re: [PATCH bpf-next v3 08/11] samples/bpf: add unaligned chunks mode support to xdpsock
From: Laatz, Kevin @ 2019-07-25 17:00 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	Topel, Bjorn, Karlsson, Magnus, jakub.kicinski@netronome.com,
	jonathan.lemon@gmail.com, Saeed Mahameed,
	stephen@networkplumber.org, Richardson, Bruce, Loftus, Ciara,
	bpf@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <fd7b6b71-5bfd-986a-b288-b9e3478acebb@mellanox.com>


On 25/07/2019 10:43, Maxim Mikityanskiy wrote:
> On 2019-07-24 08:10, Kevin Laatz wrote:
>> This patch adds support for the unaligned chunks mode. The addition of the
>> unaligned chunks option will allow users to run the application with more
>> relaxed chunk placement in the XDP umem.
>>
>> Unaligned chunks mode can be used with the '-u' or '--unaligned' command
>> line options.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> ---
>>    samples/bpf/xdpsock_user.c | 17 +++++++++++++++--
>>    1 file changed, 15 insertions(+), 2 deletions(-)
> <...>
>
>> @@ -372,6 +378,7 @@ static void usage(const char *prog)
>>    		"  -z, --zero-copy      Force zero-copy mode.\n"
>>    		"  -c, --copy           Force copy mode.\n"
>>    		"  -f, --frame-size=n   Set the frame size (must be a power of two, default is %d).\n"
> Help text for -f has to be updated, it doesn't have to be a power of two
> if -u is specified.

Will fix in the v4, thanks!




^ permalink raw reply

* Re: [PATCH bpf-next v3 03/11] xsk: add support to allow unaligned chunk placement
From: Laatz, Kevin @ 2019-07-25 17:00 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	Topel, Bjorn, Karlsson, Magnus, jakub.kicinski@netronome.com,
	jonathan.lemon@gmail.com, Saeed Mahameed,
	stephen@networkplumber.org, Richardson, Bruce, Loftus, Ciara,
	bpf@vger.kernel.org, intel-wired-lan@lists.osuosl.org
In-Reply-To: <3af74e26-8899-cf1e-6fd4-5ea0bd349fc3@mellanox.com>


On 25/07/2019 10:27, Maxim Mikityanskiy wrote:
> On 2019-07-24 08:10, Kevin Laatz wrote:
>> Currently, addresses are chunk size aligned. This means, we are very
>> restricted in terms of where we can place chunk within the umem. For
>> example, if we have a chunk size of 2k, then our chunks can only be placed
>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>
>> This patch introduces the ability to use unaligned chunks. With these
>> changes, we are no longer bound to having to place chunks at a 2k (or
>> whatever your chunk size is) interval. Since we are no longer dealing with
>> aligned chunks, they can now cross page boundaries. Checks for page
>> contiguity have been added in order to keep track of which pages are
>> followed by a physically contiguous page.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>
>> ---
>> v2:
>>     - Add checks for the flags coming from userspace
>>     - Fix how we get chunk_size in xsk_diag.c
>>     - Add defines for masking the new descriptor format
>>     - Modified the rx functions to use new descriptor format
>>     - Modified the tx functions to use new descriptor format
>>
>> v3:
>>     - Add helper function to do address/offset masking/addition
>> ---
>>    include/net/xdp_sock.h      | 17 ++++++++
>>    include/uapi/linux/if_xdp.h |  9 ++++
>>    net/xdp/xdp_umem.c          | 18 +++++---
>>    net/xdp/xsk.c               | 86 ++++++++++++++++++++++++++++++-------
>>    net/xdp/xsk_diag.c          |  2 +-
>>    net/xdp/xsk_queue.h         | 68 +++++++++++++++++++++++++----
>>    6 files changed, 170 insertions(+), 30 deletions(-)
>>
> <...>
>
>> +/* If a buffer crosses a page boundary, we need to do 2 memcpy's, one for
>> + * each page. This is only required in copy mode.
>> + */
>> +static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
>> +			     u32 len, u32 metalen)
>> +{
>> +	void *to_buf = xdp_umem_get_data(umem, addr);
>> +
>> +	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
>> +		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
>> +		u64 page_start = addr & (PAGE_SIZE - 1);
>> +		u64 first_len = PAGE_SIZE - (addr - page_start);
> Let addr = 0x12345, PAGE_SIZE = 0x1000, len = 0x1000. Your calculations
> lead to page_start = 0x345, first_len = 0x1000 - 0x12000, which is
> negative. I think page_start is calculated incorrectly (is ~ missing?).

Correct, the ~ is missing in the page_start calculation. Nice spot, thanks!

>
>> +
>> +		memcpy(to_buf, from_buf, first_len + metalen);
>> +		memcpy(next_pg_addr, from_buf + first_len, len - first_len);
>> +
>> +		return;
>> +	}
>> +
>> +	memcpy(to_buf, from_buf, len + metalen);
>> +}
>> +
> <...>
>
>> +static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
>> +						u64 length,
>> +						struct xdp_umem *umem)
>> +{
>> +	addr += addr >> XSK_UNALIGNED_BUF_OFFSET_SHIFT;
>> +	addr &= XSK_UNALIGNED_BUF_ADDR_MASK;
>> +	if (addr >= q->size ||
> Addresses like 0x00aaffffffffffff will pass the validation (0xaa +
> 0xffffffffffff will overflow mod 2^48 and become a small number),
> whereas such addresses don't look valid for me.

If you are referring to the addr >= q->size check... that check was 
already in xskq_is_valid_addr (which I based this function on). If this 
doesn't make sense, then it should be removed/fixed for both.

>
>> +	    xskq_crosses_non_contig_pg(umem, addr, length)) {
> If the region is not contiguous, we cant RX into it - that's clear.
> However, how can the userspace determine whether these two pages of UMEM
> are mapped contiguously in the DMA space? Are we going to silently drop
> descriptors to non-contiguous frames and leak them? Please explain how
> to use it correctly from the application side.

Correct, if it is not contiguous then we cannot Rx into it.

Userspace apps should be aware of the page contiguity when using 
zero-copy and if not, then should not be using the unaligned mode. 
Existing frameworks that have their own memory management subsystem, 
such as DPDK, are aware of page contiguity and manage this themselves 
while simpler apps that don't have this kind of visibility can use 
hugepages, as is shown in the xdpsock sample changes in this set.




^ permalink raw reply

* Re: [PATCH] sis900: add support for ethtool --eeprom-dump
From: Sergej Benilov @ 2019-07-25 16:41 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: venza, netdev
In-Reply-To: <20190725162543.GJ21952@lunn.ch>

On Thu, 25 Jul 2019 at 18:25, Andrew Lunn <andrew@lunn.ch> wrote:
>
> > +static int sis900_read_eeprom(struct net_device *net_dev, u8 *buf)
> > +{
> > +     struct sis900_private *sis_priv = netdev_priv(net_dev);
> > +     void __iomem *ioaddr = sis_priv->ioaddr;
> > +     int wait, ret = -EAGAIN;
> > +     u16 signature;
> > +     u16 *ebuf = (u16 *)buf;
> > +     int i;
> > +
> > +     if (sis_priv->chipset_rev == SIS96x_900_REV) {
> > +             sw32(mear, EEREQ);
> > +             for (wait = 0; wait < 2000; wait++) {
> > +                     if (sr32(mear) & EEGNT) {
> > +                             /* read 16 bits, and index by 16 bits */
> > +                             for (i = 0; i < sis_priv->eeprom_size / 2; i++)
> > +                                     ebuf[i] = (u16)read_eeprom(ioaddr, i);
> > +                     ret = 0;
> > +                     break;
> > +                     }
> > +             udelay(1);
> > +             }
> > +     sw32(mear, EEDONE);
>
> The indentation looks all messed up here.

This has passed ./scripts/checkpatch.pl, as you had suggested for the
previous patch.

>
> > +     } else {
> > +             signature = (u16)read_eeprom(ioaddr, EEPROMSignature);
> > +             if (signature != 0xffff && signature != 0x0000) {
> > +                     /* read 16 bits, and index by 16 bits */
> > +                     for (i = 0; i < sis_priv->eeprom_size / 2; i++)
> > +                             ebuf[i] = (u16)read_eeprom(ioaddr, i);
> > +                     ret = 0;
> > +             }
> > +     }
> > +     return ret;
> > +}
> > +
> > +#define SIS900_EEPROM_MAGIC  0xBABE
> > +static int sis900_get_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, u8 *data)
> > +{
> > +     struct sis900_private *sis_priv = netdev_priv(dev);
> > +     u8 *eebuf;
> > +     int res;
> > +
> > +     eebuf = kmalloc(sis_priv->eeprom_size, GFP_KERNEL);
> > +     if (!eebuf)
> > +             return -ENOMEM;
> > +
> > +     eeprom->magic = SIS900_EEPROM_MAGIC;
> > +     spin_lock_irq(&sis_priv->lock);
> > +     res = sis900_read_eeprom(dev, eebuf);
> > +     spin_unlock_irq(&sis_priv->lock);
> > +     if (!res)
> > +             memcpy(data, eebuf + eeprom->offset, eeprom->len);
> > +     kfree(eebuf);
>
> Why do you not put the data directly into data and avoid this memory
> allocation, and memcpy?

Because EEPROM data from 'eeprom->offset' offset and of 'eeprom->len'
length only is expected to be returned in 'data'.

>
>             Andrew

^ permalink raw reply

* [PATCH] ip6_tunnel: fix possible use-after-free on xmit
From: Haishuang Yan @ 2019-07-25 16:40 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov; +Cc: netdev, linux-kernel, Haishuang Yan

ip4ip6/ip6ip6 tunnels run iptunnel_handle_offloads on xmit which
can cause a possible use-after-free accessing iph/ipv6h pointer
since the packet will be 'uncloned' running pskb_expand_head if
it is a cloned gso skb.

Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv6/ip6_tunnel.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 3134fbb..754a484 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1278,12 +1278,11 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 	}
 
 	fl6.flowi6_uid = sock_net_uid(dev_net(dev), NULL);
+	dsfield = INET_ECN_encapsulate(dsfield, ipv4_get_dsfield(iph));
 
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
 		return -1;
 
-	dsfield = INET_ECN_encapsulate(dsfield, ipv4_get_dsfield(iph));
-
 	skb_set_inner_ipproto(skb, IPPROTO_IPIP);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
@@ -1367,12 +1366,11 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 	}
 
 	fl6.flowi6_uid = sock_net_uid(dev_net(dev), NULL);
+	dsfield = INET_ECN_encapsulate(dsfield, ipv6_get_dsfield(ipv6h));
 
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
 		return -1;
 
-	dsfield = INET_ECN_encapsulate(dsfield, ipv6_get_dsfield(ipv6h));
-
 	skb_set_inner_ipproto(skb, IPPROTO_IPV6);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
-- 
1.8.3.1




^ permalink raw reply related

* Re: [PATCH v3 0/7] Convert skb_frag_t to bio_vec
From: Jonathan Lemon @ 2019-07-25 16:26 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: davem, hch, netdev
In-Reply-To: <20190723030831.11879-1-willy@infradead.org>



On 22 Jul 2019, at 20:08, Matthew Wilcox wrote:

> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> The skb_frag_t and bio_vec are fundamentally the same (page, offset,
> length) tuple.  This patch series unifies the two, leaving the
> skb_frag_t typedef in place.  This has the immediate advantage that
> we already have iov_iter support for bvecs and don't need to add
> support for iterating skbuffs.  It enables a long-term plan to use
> bvecs more broadly within the kernel and should make network-storage
> drivers able to do less work converting between skbuffs and biovecs.
>
> It will consume more memory on 32-bit kernels.  If that proves
> problematic, we can look at ways of addressing it.
>
> v3: Rebase on latest Linus with net-next merged.
>   - Reorder the uncontroversial 'Use skb accessors' patches first so you
>     can apply just those two if you want to hold off on the full
>     conversion.
>   - Convert all the users of 'struct skb_frag_struct' to skb_frag_t.

Thanks for doing this!  One step closer to simplifying code.

For the series:
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>



>
> Matthew Wilcox (Oracle) (7):
>   net: Use skb accessors in network drivers
>   net: Use skb accessors in network core
>   net: Increase the size of skb_frag_t
>   net: Reorder the contents of skb_frag_t
>   net: Rename skb_frag page to bv_page
>   net: Rename skb_frag_t size to bv_len
>   net: Convert skb_frag_t to bio_vec
>
>  drivers/crypto/chelsio/chtls/chtls_io.c       |  6 ++--
>  drivers/hsi/clients/ssi_protocol.c            |  3 +-
>  drivers/infiniband/hw/hfi1/vnic_sdma.c        |  2 +-
>  drivers/net/ethernet/3com/3c59x.c             |  2 +-
>  drivers/net/ethernet/agere/et131x.c           |  6 ++--
>  drivers/net/ethernet/amd/xgbe/xgbe-desc.c     |  2 +-
>  drivers/net/ethernet/amd/xgbe/xgbe-drv.c      |  2 +-
>  .../net/ethernet/apm/xgene/xgene_enet_main.c  |  3 +-
>  drivers/net/ethernet/atheros/alx/main.c       |  4 +--
>  .../net/ethernet/atheros/atl1c/atl1c_main.c   |  4 +--
>  .../net/ethernet/atheros/atl1e/atl1e_main.c   |  3 +-
>  drivers/net/ethernet/atheros/atlx/atl1.c      |  3 +-
>  drivers/net/ethernet/broadcom/bgmac.c         |  2 +-
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
>  drivers/net/ethernet/brocade/bna/bnad.c       |  2 +-
>  drivers/net/ethernet/calxeda/xgmac.c          |  2 +-
>  .../net/ethernet/cavium/liquidio/lio_main.c   | 23 ++++++-------
>  .../ethernet/cavium/liquidio/lio_vf_main.c    | 23 ++++++-------
>  .../ethernet/cavium/thunder/nicvf_queues.c    |  4 +--
>  drivers/net/ethernet/chelsio/cxgb3/sge.c      |  2 +-
>  drivers/net/ethernet/cortina/gemini.c         |  5 ++-
>  drivers/net/ethernet/emulex/benet/be_main.c   |  2 +-
>  drivers/net/ethernet/freescale/enetc/enetc.c  |  2 +-
>  drivers/net/ethernet/freescale/fec_main.c     |  4 +--
>  drivers/net/ethernet/hisilicon/hix5hd2_gmac.c |  2 +-
>  drivers/net/ethernet/hisilicon/hns/hns_enet.c |  4 +--
>  .../net/ethernet/hisilicon/hns3/hns3_enet.c   |  8 ++---
>  drivers/net/ethernet/huawei/hinic/hinic_tx.c  |  2 +-
>  drivers/net/ethernet/ibm/emac/core.c          |  2 +-
>  drivers/net/ethernet/intel/e1000/e1000_main.c |  3 +-
>  drivers/net/ethernet/intel/e1000e/netdev.c    |  3 +-
>  drivers/net/ethernet/intel/fm10k/fm10k_main.c |  5 +--
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  4 +--
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  2 +-
>  drivers/net/ethernet/intel/iavf/iavf_txrx.c   |  4 +--
>  drivers/net/ethernet/intel/iavf/iavf_txrx.h   |  2 +-
>  drivers/net/ethernet/intel/ice/ice_txrx.c     |  6 ++--
>  drivers/net/ethernet/intel/igb/igb_main.c     |  5 +--
>  drivers/net/ethernet/intel/igbvf/netdev.c     |  2 +-
>  drivers/net/ethernet/intel/igc/igc_main.c     |  5 +--
>  drivers/net/ethernet/intel/ixgb/ixgb_main.c   |  4 +--
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  9 ++---
>  .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  2 +-
>  drivers/net/ethernet/jme.c                    |  5 ++-
>  drivers/net/ethernet/marvell/mvneta.c         |  4 +--
>  .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  7 ++--
>  drivers/net/ethernet/mediatek/mtk_eth_soc.c   |  4 +--
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c    |  4 +--
>  .../net/ethernet/mellanox/mlx5/core/en_tx.c   |  2 +-
>  drivers/net/ethernet/microchip/lan743x_main.c |  5 ++-
>  .../net/ethernet/myricom/myri10ge/myri10ge.c  | 10 +++---
>  .../ethernet/netronome/nfp/nfp_net_common.c   |  6 ++--
>  .../ethernet/qlogic/netxen/netxen_nic_main.c  |  4 +--
>  .../net/ethernet/qlogic/qlcnic/qlcnic_io.c    |  2 +-
>  drivers/net/ethernet/qualcomm/emac/emac-mac.c | 12 +++----
>  .../net/ethernet/synopsys/dwc-xlgmac-desc.c   |  2 +-
>  .../net/ethernet/synopsys/dwc-xlgmac-net.c    |  2 +-
>  drivers/net/ethernet/tehuti/tehuti.c          |  2 +-
>  drivers/net/usb/usbnet.c                      |  4 +--
>  drivers/net/vmxnet3/vmxnet3_drv.c             |  7 ++--
>  drivers/net/wireless/ath/wil6210/debugfs.c    |  3 +-
>  drivers/net/wireless/ath/wil6210/txrx.c       |  9 +++--
>  drivers/net/wireless/ath/wil6210/txrx_edma.c  |  2 +-
>  drivers/net/xen-netback/netback.c             |  4 +--
>  drivers/s390/net/qeth_core_main.c             |  2 +-
>  drivers/scsi/fcoe/fcoe_transport.c            |  2 +-
>  drivers/staging/octeon/ethernet-tx.c          |  5 ++-
>  .../staging/unisys/visornic/visornic_main.c   |  4 +--
>  drivers/target/iscsi/cxgbit/cxgbit_target.c   | 13 +++----
>  include/linux/bvec.h                          |  5 ++-
>  include/linux/skbuff.h                        | 34 ++++++-------------
>  net/core/skbuff.c                             | 26 ++++++++------
>  net/core/tso.c                                |  8 ++---
>  net/ipv4/tcp.c                                | 14 ++++----
>  net/kcm/kcmsock.c                             |  8 ++---
>  net/tls/tls_device.c                          | 14 ++++----
>  76 files changed, 202 insertions(+), 220 deletions(-)
>
> -- 
> 2.20.1

^ permalink raw reply

* Re: [PATCH] sis900: add support for ethtool --eeprom-dump
From: Andrew Lunn @ 2019-07-25 16:25 UTC (permalink / raw)
  To: Sergej Benilov; +Cc: venza, netdev
In-Reply-To: <20190725161809.14650-1-sergej.benilov@googlemail.com>

> +static int sis900_read_eeprom(struct net_device *net_dev, u8 *buf)
> +{
> +	struct sis900_private *sis_priv = netdev_priv(net_dev);
> +	void __iomem *ioaddr = sis_priv->ioaddr;
> +	int wait, ret = -EAGAIN;
> +	u16 signature;
> +	u16 *ebuf = (u16 *)buf;
> +	int i;
> +
> +	if (sis_priv->chipset_rev == SIS96x_900_REV) {
> +		sw32(mear, EEREQ);
> +		for (wait = 0; wait < 2000; wait++) {
> +			if (sr32(mear) & EEGNT) {
> +				/* read 16 bits, and index by 16 bits */
> +				for (i = 0; i < sis_priv->eeprom_size / 2; i++)
> +					ebuf[i] = (u16)read_eeprom(ioaddr, i);
> +			ret = 0;
> +			break;
> +			}
> +		udelay(1);
> +		}
> +	sw32(mear, EEDONE);

The indentation looks all messed up here.

> +	} else {
> +		signature = (u16)read_eeprom(ioaddr, EEPROMSignature);
> +		if (signature != 0xffff && signature != 0x0000) {
> +			/* read 16 bits, and index by 16 bits */
> +			for (i = 0; i < sis_priv->eeprom_size / 2; i++)
> +				ebuf[i] = (u16)read_eeprom(ioaddr, i);
> +			ret = 0;
> +		}
> +	}
> +	return ret;
> +}
> +
> +#define SIS900_EEPROM_MAGIC	0xBABE
> +static int sis900_get_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, u8 *data)
> +{
> +	struct sis900_private *sis_priv = netdev_priv(dev);
> +	u8 *eebuf;
> +	int res;
> +
> +	eebuf = kmalloc(sis_priv->eeprom_size, GFP_KERNEL);
> +	if (!eebuf)
> +		return -ENOMEM;
> +
> +	eeprom->magic = SIS900_EEPROM_MAGIC;
> +	spin_lock_irq(&sis_priv->lock);
> +	res = sis900_read_eeprom(dev, eebuf);
> +	spin_unlock_irq(&sis_priv->lock);
> +	if (!res)
> +		memcpy(data, eebuf + eeprom->offset, eeprom->len);
> +	kfree(eebuf);

Why do you not put the data directly into data and avoid this memory
allocation, and memcpy?

	    Andrew

^ permalink raw reply

* Re: [PATCH V2 3/3] net: dsa: ksz: Add Microchip KSZ8795 DSA driver
From: Andrew Lunn @ 2019-07-25 16:18 UTC (permalink / raw)
  To: Marek Vasut
  Cc: netdev, Tristram Ha, David S . Miller, Florian Fainelli,
	Vivien Didelot, Woojung Huh
In-Reply-To: <20190725150552.6901-4-marex@denx.de>

On Thu, Jul 25, 2019 at 05:05:52PM +0200, Marek Vasut wrote:

Hi Marek

> +static void ksz8795_phy_setup(struct ksz_device *dev, int port,
> +			      struct phy_device *phy)
> +{
> +	if (port < dev->phy_port_cnt) {
> +		/*
> +		 * SUPPORTED_Asym_Pause and SUPPORTED_Pause can be removed to
> +		 * disable flow control when rate limiting is used.
> +		 */
> +		linkmode_copy(phy->advertising, phy->supported);
> +	}
> +}

Please could you drop this, since your testing seems to indicate it is
not needed.

    Thanks
	Andrew

^ permalink raw reply

* [PATCH] sis900: add support for ethtool --eeprom-dump
From: Sergej Benilov @ 2019-07-25 16:18 UTC (permalink / raw)
  To: venza, netdev, andrew; +Cc: Sergej Benilov

Implement ethtool's EEPROM dump command (ethtool -e|--eeprom-dump).

Signed-off-by: Sergej Benilov <sergej.benilov@googlemail.com>
---
 drivers/net/ethernet/sis/sis900.c | 68 +++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/drivers/net/ethernet/sis/sis900.c b/drivers/net/ethernet/sis/sis900.c
index 6e07f5ebacfc..cf9d75ffefc9 100644
--- a/drivers/net/ethernet/sis/sis900.c
+++ b/drivers/net/ethernet/sis/sis900.c
@@ -191,6 +191,8 @@ struct sis900_private {
 	unsigned int tx_full; /* The Tx queue is full. */
 	u8 host_bridge_rev;
 	u8 chipset_rev;
+	/* EEPROM data */
+	int eeprom_size;
 };
 
 MODULE_AUTHOR("Jim Huang <cmhuang@sis.com.tw>, Ollie Lho <ollie@sis.com.tw>");
@@ -475,6 +477,8 @@ static int sis900_probe(struct pci_dev *pci_dev,
 	sis_priv->pci_dev = pci_dev;
 	spin_lock_init(&sis_priv->lock);
 
+	sis_priv->eeprom_size = 24;
+
 	pci_set_drvdata(pci_dev, net_dev);
 
 	ring_space = pci_alloc_consistent(pci_dev, TX_TOTAL_SIZE, &ring_dma);
@@ -2122,6 +2126,68 @@ static void sis900_get_wol(struct net_device *net_dev, struct ethtool_wolinfo *w
 	wol->supported = (WAKE_PHY | WAKE_MAGIC);
 }
 
+static int sis900_get_eeprom_len(struct net_device *dev)
+{
+	struct sis900_private *sis_priv = netdev_priv(dev);
+
+	return sis_priv->eeprom_size;
+}
+
+static int sis900_read_eeprom(struct net_device *net_dev, u8 *buf)
+{
+	struct sis900_private *sis_priv = netdev_priv(net_dev);
+	void __iomem *ioaddr = sis_priv->ioaddr;
+	int wait, ret = -EAGAIN;
+	u16 signature;
+	u16 *ebuf = (u16 *)buf;
+	int i;
+
+	if (sis_priv->chipset_rev == SIS96x_900_REV) {
+		sw32(mear, EEREQ);
+		for (wait = 0; wait < 2000; wait++) {
+			if (sr32(mear) & EEGNT) {
+				/* read 16 bits, and index by 16 bits */
+				for (i = 0; i < sis_priv->eeprom_size / 2; i++)
+					ebuf[i] = (u16)read_eeprom(ioaddr, i);
+			ret = 0;
+			break;
+			}
+		udelay(1);
+		}
+	sw32(mear, EEDONE);
+	} else {
+		signature = (u16)read_eeprom(ioaddr, EEPROMSignature);
+		if (signature != 0xffff && signature != 0x0000) {
+			/* read 16 bits, and index by 16 bits */
+			for (i = 0; i < sis_priv->eeprom_size / 2; i++)
+				ebuf[i] = (u16)read_eeprom(ioaddr, i);
+			ret = 0;
+		}
+	}
+	return ret;
+}
+
+#define SIS900_EEPROM_MAGIC	0xBABE
+static int sis900_get_eeprom(struct net_device *dev, struct ethtool_eeprom *eeprom, u8 *data)
+{
+	struct sis900_private *sis_priv = netdev_priv(dev);
+	u8 *eebuf;
+	int res;
+
+	eebuf = kmalloc(sis_priv->eeprom_size, GFP_KERNEL);
+	if (!eebuf)
+		return -ENOMEM;
+
+	eeprom->magic = SIS900_EEPROM_MAGIC;
+	spin_lock_irq(&sis_priv->lock);
+	res = sis900_read_eeprom(dev, eebuf);
+	spin_unlock_irq(&sis_priv->lock);
+	if (!res)
+		memcpy(data, eebuf + eeprom->offset, eeprom->len);
+	kfree(eebuf);
+	return res;
+}
+
 static const struct ethtool_ops sis900_ethtool_ops = {
 	.get_drvinfo 	= sis900_get_drvinfo,
 	.get_msglevel	= sis900_get_msglevel,
@@ -2132,6 +2198,8 @@ static const struct ethtool_ops sis900_ethtool_ops = {
 	.set_wol	= sis900_set_wol,
 	.get_link_ksettings = sis900_get_link_ksettings,
 	.set_link_ksettings = sis900_set_link_ksettings,
+	.get_eeprom_len = sis900_get_eeprom_len,
+	.get_eeprom = sis900_get_eeprom,
 };
 
 /**
-- 
2.17.1


^ permalink raw reply related

* RE: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
From: Richardson, Bruce @ 2019-07-25 15:56 UTC (permalink / raw)
  To: Jonathan Lemon, Laatz, Kevin
  Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	Topel, Bjorn, Karlsson, Magnus, jakub.kicinski@netronome.com,
	saeedm@mellanox.com, maximmi@mellanox.com,
	stephen@networkplumber.org, Loftus, Ciara, bpf@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org
In-Reply-To: <94EAD717-F632-499F-8BBD-FFF5A5333CBF@gmail.com>



> -----Original Message-----
> From: Jonathan Lemon [mailto:jonathan.lemon@gmail.com]
> Sent: Thursday, July 25, 2019 4:39 PM
> To: Laatz, Kevin <kevin.laatz@intel.com>
> Cc: netdev@vger.kernel.org; ast@kernel.org; daniel@iogearbox.net; Topel,
> Bjorn <bjorn.topel@intel.com>; Karlsson, Magnus
> <magnus.karlsson@intel.com>; jakub.kicinski@netronome.com;
> saeedm@mellanox.com; maximmi@mellanox.com; stephen@networkplumber.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; bpf@vger.kernel.org; intel-wired-
> lan@lists.osuosl.org
> Subject: Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement
> support
> 
> 
> 
> On 23 Jul 2019, at 22:10, Kevin Laatz wrote:
> 
> > This patch set adds the ability to use unaligned chunks in the XDP umem.
> >
> > Currently, all chunk addresses passed to the umem are masked to be
> > chunk size aligned (max is PAGE_SIZE). This limits where we can place
> > chunks within the umem as well as limiting the packet sizes that are
> supported.
> >
> > The changes in this patch set removes these restrictions, allowing XDP
> > to be more flexible in where it can place a chunk within a umem. By
> > relaxing where the chunks can be placed, it allows us to use an
> > arbitrary buffer size and place that wherever we have a free address
> > in the umem. These changes add the ability to support arbitrary frame
> > sizes up to 4k
> > (PAGE_SIZE) and make it easy to integrate with other existing
> > frameworks that have their own memory management systems, such as DPDK.
> > In DPDK, for example, there is already support for AF_XDP with zero-
> copy.
> > However, with this patch set the integration will be much more seamless.
> > You can find the DPDK AF_XDP driver at:
> > https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
> >
> > Since we are now dealing with arbitrary frame sizes, we need also need
> > to update how we pass around addresses. Currently, the addresses can
> > simply be masked to 2k to get back to the original address. This
> > becomes less trivial when using frame sizes that are not a 'power of
> > 2' size. This patch set modifies the Rx/Tx descriptor format to use
> > the upper 16-bits of the addr field for an offset value, leaving the
> > lower 48-bits for the address (this leaves us with 256 Terabytes,
> > which should be enough!). We only need to use the upper 16-bits to store
> the offset when running in unaligned mode.
> > Rather than adding the offset (headroom etc) to the address, we will
> > store it in the upper 16-bits of the address field. This way, we can
> > easily add the offset to the address where we need it, using some bit
> > manipulation and addition, and we can also easily get the original
> > address wherever we need it (for example in i40e_zca_fr-- ee) by
> > simply masking to get the lower 48-bits of the address field.
> 
> I wonder if it would be better to break backwards compatibility here and
> say that a handle is going to change from [addr] to [base | offset], or
> even [index | offset], where address = (index * chunk size) + offset, and
> then use accessor macros to manipulate the queue entries.
> 
> This way, the XDP hotpath can adjust the handle with simple arithmetic,
> bypassing the "if (unaligned)", check, as it changes the offset directly.
> 
> Using a chunk index instead of a base address is safer, otherwise it is
> too easy to corrupt things.
> --

The trouble with using a chunk index is that it assumes that all chunks are
contiguous, which is not always going to be the case. For example, for
userspace apps the easiest way to get memory that is IOVA/physically 
contiguous is to use hugepages, but even then we still need to skip space
when crossing a 2MB barrier.

Specifically in this example case, with a 3k buffer size and 2MB hugepages,
we'd get 666 buffers on a single page, but then waste a few KB before
starting the 667th buffer at byte 0 on the second 2MB page.
This is the setup used in DPDK, for instance, when we allocate memory for
use in buffer pools. 

Therefore, I think it's better to just have the kernel sanity checking the
request for safety and leave userspace the freedom to decide where in its
memory area it wants to place the buffers.

Regards,
/Bruce

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH v2 00/10] XDP unaligned chunk placement support
From: Jonathan Lemon @ 2019-07-25 15:43 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Alexei Starovoitov, Kevin Laatz, Jakub Kicinski, Daniel Borkmann,
	Network Development, ciara.loftus, Alexei Starovoitov,
	intel-wired-lan, bruce.richardson, bpf, Björn Töpel,
	Karlsson, Magnus
In-Reply-To: <CAJ8uoz26Byzhc4My70h3nmfd3pQB1ahFf=ZzN2heN4asrJ-NQg@mail.gmail.com>



On 24 Jul 2019, at 6:25, Magnus Karlsson wrote:

> On Tue, Jul 23, 2019 at 11:08 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> Johnathan, Bjorn, Jakub,
>> Please review!
>> The patch set has been pending for a week.
>
> There is a v3 coming out shortly so I suggest we wait for that. It
> will have Mellanox support for this feature too and some clean ups. I
> refrained from posting a review on the mailing list due to the merge
> window being closed last week, but maybe that was not correct. Should
> I still post reviews for new features submitted during the closed
> merge window period? I am happy to do it since I do not have any other
> tasks during that time. It is a quite period for me. Just let me know.

Same here - last time I posted a patch when the merge window was closed,
it was signaled to me (Hi Jakub!) to wait until it reopened.
-- 
Jonathan



>
> /Magnus
>
>> On Tue, Jul 16, 2019 at 4:21 AM Kevin Laatz <kevin.laatz@intel.com> 
>> wrote:
>>>
>>> This patch set adds the ability to use unaligned chunks in the XDP 
>>> umem.
>>>
>>> Currently, all chunk addresses passed to the umem are masked to be 
>>> chunk
>>> size aligned (default is 2k, max is PAGE_SIZE). This limits where we 
>>> can
>>> place chunks within the umem as well as limiting the packet sizes 
>>> that are
>>> supported.
>>>
>>> The changes in this patch set removes these restrictions, allowing 
>>> XDP to
>>> be more flexible in where it can place a chunk within a umem. By 
>>> relaxing
>>> where the chunks can be placed, it allows us to use an arbitrary 
>>> buffer
>>> size and place that wherever we have a free address in the umem. 
>>> These
>>> changes add the ability to support arbitrary frame sizes up to 4k
>>> (PAGE_SIZE) and make it easy to integrate with other existing 
>>> frameworks
>>> that have their own memory management systems, such as DPDK.
>>>
>>> Since we are now dealing with arbitrary frame sizes, we need also 
>>> need to
>>> update how we pass around addresses. Currently, the addresses can 
>>> simply be
>>> masked to 2k to get back to the original address. This becomes less 
>>> trivial
>>> when using frame sizes that are not a 'power of 2' size. This patch 
>>> set
>>> modifies the Rx/Tx descriptor format to use the upper 16-bits of the 
>>> addr
>>> field for an offset value, leaving the lower 48-bits for the address 
>>> (this
>>> leaves us with 256 Terabytes, which should be enough!). We only need 
>>> to use
>>> the upper 16-bits to store the offset when running in unaligned 
>>> mode.
>>> Rather than adding the offset (headroom etc) to the address, we will 
>>> store
>>> it in the upper 16-bits of the address field. This way, we can 
>>> easily add
>>> the offset to the address where we need it, using some bit 
>>> manipulation and
>>> addition, and we can also easily get the original address wherever 
>>> we need
>>> it (for example in i40e_zca_free) by simply masking to get the lower
>>> 48-bits of the address field.
>>>
>>> The numbers below were recorded with the following set up:
>>>   - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
>>>   - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 
>>> (rev 02)
>>>   - Driver: i40e
>>>   - Application: xdpsock with l2fwd (single interface)
>>>
>>> These are solely for comparing performance with and without the 
>>> patches.
>>> The largest drop was ~1% (in zero-copy mode).
>>>
>>> +-------------------------+------------+-----------------+-------------+
>>> | Buffer size: 2048       | SKB mode   | Zero-copy       | Copy      
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>> | Aligned (baseline)      | 1.7 Mpps   | 15.3 Mpps       | 2.08 Mpps 
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>> | Aligned (with patches)  | 1.7 Mpps   | 15.1 Mpps       | 2.08 Mpps 
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>> | Unaligned               | 1.7 Mpps   | 14.5 Mpps       | 2.08 Mpps 
>>>   |
>>> +-------------------------+------------+-----------------+-------------+
>>>
>>> NOTE: We are currently working on the changes required in the 
>>> Mellanox
>>> driver. We will include these in the v3.
>>>
>>> Structure of the patchset:
>>> Patch 1:
>>>   - Remove unnecessary masking and headroom addition during 
>>> zero-copy Rx
>>>     buffer recycling in i40e. This change is required in order for 
>>> the
>>>     buffer recycling to work in the unaligned chunk mode.
>>>
>>> Patch 2:
>>>   - Remove unnecessary masking and headroom addition during
>>>     zero-copy Rx buffer recycling in ixgbe. This change is required 
>>> in
>>>     order for the  buffer recycling to work in the unaligned chunk 
>>> mode.
>>>
>>> Patch 3:
>>>   - Add infrastructure for unaligned chunks. Since we are dealing 
>>> with
>>>     unaligned chunks that could potentially cross a physical page 
>>> boundary,
>>>     we add checks to keep track of that information. We can later 
>>> use this
>>>     information to correctly handle buffers that are placed at an 
>>> address
>>>     where they cross a page boundary.  This patch also modifies the
>>>     existing Rx and Tx functions to use the new descriptor format. 
>>> To
>>>     handle addresses correctly, we need to mask appropriately based 
>>> on
>>>     whether we are in aligned or unaligned mode.
>>>
>>> Patch 4:
>>>   - This patch updates the i40e driver to make use of the new 
>>> descriptor
>>>     format. The new format is particularly useful here since we can 
>>> now
>>>     retrieve the original address in places like i40e_zca_free with 
>>> ease.
>>>     This saves us doing various calculations to get the original 
>>> address
>>>     back.
>>>
>>> Patch 5:
>>>   - This patch updates the ixgbe driver to make use of the new 
>>> descriptor
>>>     format. The new format is particularly useful here since we can 
>>> now
>>>     retrieve the original address in places like ixgbe_zca_free with 
>>> ease.
>>>     This saves us doing various calculations to get the original 
>>> address
>>>     back.
>>>
>>> Patch 6:
>>>   - Add flags for umem configuration to libbpf
>>>
>>> Patch 7:
>>>   - Modify xdpsock application to add a command line option for
>>>     unaligned chunks
>>>
>>> Patch 8:
>>>   - Since we can now run the application in unaligned chunk mode, we 
>>> need
>>>     to make sure we recycle the buffers appropriately.
>>>
>>> Patch 9:
>>>   - Adds hugepage support to the xdpsock application
>>>
>>> Patch 10:
>>>   - Documentation update to include the unaligned chunk scenario. We 
>>> need
>>>     to explicitly state that the incoming addresses are only masked 
>>> in the
>>>     aligned chunk mode and not the unaligned chunk mode.
>>>
>>> ---
>>> v2:
>>>   - fixed checkpatch issues
>>>   - fixed Rx buffer recycling for unaligned chunks in xdpsock
>>>   - removed unused defines
>>>   - fixed how chunk_size is calculated in xsk_diag.c
>>>   - added some performance numbers to cover letter
>>>   - modified descriptor format to make it easier to retrieve 
>>> original
>>>     address
>>>   - removed patch adding off_t off to the zero copy allocator. This 
>>> is no
>>>     longer needed with the new descriptor format.
>>>
>>> Kevin Laatz (10):
>>>   i40e: simplify Rx buffer recycle
>>>   ixgbe: simplify Rx buffer recycle
>>>   xsk: add support to allow unaligned chunk placement
>>>   i40e: modify driver for handling offsets
>>>   ixgbe: modify driver for handling offsets
>>>   libbpf: add flags to umem config
>>>   samples/bpf: add unaligned chunks mode support to xdpsock
>>>   samples/bpf: add buffer recycling for unaligned chunks to xdpsock
>>>   samples/bpf: use hugepages in xdpsock app
>>>   doc/af_xdp: include unaligned chunk case
>>>
>>>  Documentation/networking/af_xdp.rst          | 10 ++-
>>>  drivers/net/ethernet/intel/i40e/i40e_xsk.c   | 39 +++++----
>>>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 39 +++++----
>>>  include/net/xdp_sock.h                       |  2 +
>>>  include/uapi/linux/if_xdp.h                  |  9 ++
>>>  net/xdp/xdp_umem.c                           | 17 ++--
>>>  net/xdp/xsk.c                                | 89 
>>> ++++++++++++++++----
>>>  net/xdp/xsk_diag.c                           |  2 +-
>>>  net/xdp/xsk_queue.h                          | 70 +++++++++++++--
>>>  samples/bpf/xdpsock_user.c                   | 61 ++++++++++----
>>>  tools/include/uapi/linux/if_xdp.h            |  4 +
>>>  tools/lib/bpf/xsk.c                          |  3 +
>>>  tools/lib/bpf/xsk.h                          |  2 +
>>>  13 files changed, 266 insertions(+), 81 deletions(-)
>>>
>>> --
>>> 2.17.1
>>>
>> _______________________________________________
>> Intel-wired-lan mailing list
>> Intel-wired-lan@osuosl.org
>> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply

* Re: [PATCH 3/3] net: dsa: ksz: Add Microchip KSZ8795 DSA driver
From: Marek Vasut @ 2019-07-25 15:36 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Tristram Ha, David S . Miller, Florian Fainelli,
	Vivien Didelot, Woojung Huh
In-Reply-To: <20190725145924.GB25700@lunn.ch>

On 7/25/19 4:59 PM, Andrew Lunn wrote:
> On Thu, Jul 25, 2019 at 04:56:37PM +0200, Marek Vasut wrote:
>> On 7/25/19 4:03 PM, Andrew Lunn wrote:
>>> On Wed, Jul 24, 2019 at 03:40:48PM +0200, Marek Vasut wrote:
>>>> From: Tristram Ha <Tristram.Ha@microchip.com>
>>>> +static void ksz8795_phy_setup(struct ksz_device *dev, int port,
>>>> +			      struct phy_device *phy)
>>>> +{
>>>> +	if (port < dev->phy_port_cnt) {
>>>> +		/*
>>>> +		 * SUPPORTED_Asym_Pause and SUPPORTED_Pause can be removed to
>>>> +		 * disable flow control when rate limiting is used.
>>>> +		 */
>>>> +		linkmode_copy(phy->advertising, phy->supported);
>>>> +	}
>>>> +}
>>>
>>> Hi Marek
>>>
>>> Do you know why this is needed?
>>
>> Unfortunately, no.
>>
>> It seems it copies supported features of the PHY to advertised features
>> of the PHY for ports which are downstream (i.e. not the CPU port).
> 
> Hi Marek
> 
> Could you test it without this copy? Do you get sensible values from
> ethtool? Does the pause configuration look sensible?

They do look OK even without the code.

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/11] XDP unaligned chunk placement support
From: Jonathan Lemon @ 2019-07-25 15:39 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
	saeedm, maximmi, stephen, bruce.richardson, ciara.loftus, bpf,
	intel-wired-lan
In-Reply-To: <20190724051043.14348-1-kevin.laatz@intel.com>



On 23 Jul 2019, at 22:10, Kevin Laatz wrote:

> This patch set adds the ability to use unaligned chunks in the XDP umem.
>
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (max is PAGE_SIZE). This limits where we can place chunks
> within the umem as well as limiting the packet sizes that are supported.
>
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
> In DPDK, for example, there is already support for AF_XDP with zero-copy.
> However, with this patch set the integration will be much more seamless.
> You can find the DPDK AF_XDP driver at:
> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
>
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_fr-- ee) by simply masking to get the lower
> 48-bits of the address field.

I wonder if it would be better to break backwards compatibility here and
say that a handle is going to change from [addr] to [base | offset], or
even [index | offset], where address = (index * chunk size) + offset, and
then use accessor macros to manipulate the queue entries.

This way, the XDP hotpath can adjust the handle with simple arithmetic,
bypassing the "if (unaligned)", check, as it changes the offset directly.

Using a chunk index instead of a base address is safer, otherwise it is
too easy to corrupt things.
-- 
Jonathan

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox