Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2] dpaa_eth: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02 16:47 UTC (permalink / raw)
  Cc: Madalin Bucur, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Add #include in dpaa_eth.h.

 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 6 +++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index f38c3fa7d705..2df6e745cb3f 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -485,7 +485,7 @@ static struct dpaa_bp *dpaa_bpid2pool(int bpid)
 static bool dpaa_bpid2pool_use(int bpid)
 {
 	if (dpaa_bpid2pool(bpid)) {
-		atomic_inc(&dpaa_bp_array[bpid]->refs);
+		refcount_inc(&dpaa_bp_array[bpid]->refs);
 		return true;
 	}
 
@@ -496,7 +496,7 @@ static bool dpaa_bpid2pool_use(int bpid)
 static void dpaa_bpid2pool_map(int bpid, struct dpaa_bp *dpaa_bp)
 {
 	dpaa_bp_array[bpid] = dpaa_bp;
-	atomic_set(&dpaa_bp->refs, 1);
+	refcount_set(&dpaa_bp->refs, 1);
 }
 
 static int dpaa_bp_alloc_pool(struct dpaa_bp *dpaa_bp)
@@ -584,7 +584,7 @@ static void dpaa_bp_free(struct dpaa_bp *dpaa_bp)
 	if (!bp)
 		return;
 
-	if (!atomic_dec_and_test(&bp->refs))
+	if (!refcount_dec_and_test(&bp->refs))
 		return;
 
 	if (bp->free_buf_cb)
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index af320f83c742..f7e59e8db075 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -32,6 +32,7 @@
 #define __DPAA_H
 
 #include <linux/netdevice.h>
+#include <linux/refcount.h>
 #include <soc/fsl/qman.h>
 #include <soc/fsl/bman.h>
 
@@ -99,7 +100,7 @@ struct dpaa_bp {
 	int (*seed_cb)(struct dpaa_bp *);
 	/* bpool can be emptied before freeing by this cb */
 	void (*free_buf_cb)(const struct dpaa_bp *, struct bm_buffer *);
-	atomic_t refs;
+	refcount_t refs;
 };
 
 struct dpaa_rx_errors {
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v4 2/4] net: phy: Add function to retrieve LED configuration from the DT
From: Andrew Lunn @ 2019-08-02 16:38 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: David S . Miller, Rob Herring, Mark Rutland, Florian Fainelli,
	Heiner Kallweit, netdev, devicetree, linux-kernel,
	Douglas Anderson
In-Reply-To: <20190801190759.28201-3-mka@chromium.org>

On Thu, Aug 01, 2019 at 12:07:57PM -0700, Matthias Kaehlcke wrote:
> Add a phylib function for retrieving PHY LED configuration that
> is specified in the device tree using the generic binding. LEDs
> can be configured to be 'on' for a certain link speed or to blink
> when there is TX/RX activity.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> Changes in v4:
> - patch added to the series
> ---
>  drivers/net/phy/phy_device.c | 50 ++++++++++++++++++++++++++++++++++++
>  include/linux/phy.h          | 15 +++++++++++
>  2 files changed, 65 insertions(+)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 6b5cb87f3866..b4b48de45712 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2188,6 +2188,56 @@ static bool phy_drv_supports_irq(struct phy_driver *phydrv)
>  	return phydrv->config_intr && phydrv->ack_interrupt;
>  }
>  
> +int of_get_phy_led_cfg(struct phy_device *phydev, int led,
> +		       struct phy_led_config *cfg)
> +{
> +	struct device_node *np, *child;
> +	const char *trigger;
> +	int ret;
> +
> +	if (!IS_ENABLED(CONFIG_OF_MDIO))
> +		return -ENOENT;
> +
> +	np = of_find_node_by_name(phydev->mdio.dev.of_node, "leds");
> +	if (!np)
> +		return -ENOENT;
> +
> +	for_each_child_of_node(np, child) {
> +		u32 val;
> +
> +		if (!of_property_read_u32(child, "reg", &val)) {
> +			if (val == (u32)led)
> +				break;
> +		}
> +	}

Hi Matthias

This is leaking references to np and child. In the past we have not
cared about this too much, but we are now getting patches adding the
missing releases. So it would be good to fix this.

	Andrew

^ permalink raw reply

* [RFC] net: dsa: mv88e6xxx: ptp: improve phc2sys precision for mv88e6xxx  switch in combination with imx6-fec
From: Hubert Feurstein @ 2019-08-02 16:32 UTC (permalink / raw)
  To: Richard Cochran, Andrew Lunn, netdev, linux-kernel
  Cc: Hubert Feurstein, Vivien Didelot, Florian Fainelli,
	David S. Miller, Vladimir Oltean

With this patch the phc2sys synchronisation precision improved to +/-500ns.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---

This patch should only be the base for a discussion about improving precision of
phc2sys (from the linuxptp package) in combination with a mv88e6xxx switch and
imx6-fec.

When I started my work on PTP at the beginning of this week, I was positively 
supprised about the sync precision of ptp4l. After adding support for the 
MV88E6220 I was able to synchronize two of our boards within +/- 10 nanoseconds.
Remebering that the PTP system in the MV88E6220 is clocked with 100MHz, this is
I think the best what can be expected. Big thanks to Richard and the other 
developers who made this possible.

But then I tried to synchornize the PTP clock with the system clock by using 
phc2sys (phc2sys -rr -amq -l6) and I quickly was very disapointed about the
precision.

  Min:		-17829 ns
  Max: 		21694 ns
  StdDev:	8520 ns
  Count:	127

So I tried to find the reason for this. And as you probably already know, the
reason for that is the MDIO latency, which is terrible (up to 800 usecs).

As a next step, I tried to to implement the gettimex64 callback (see: "[PATCH] 
net: dsa: mv88e6xxx: extend PTP gettime function to read system clock"). With
this in place (and a patched linuxptp-master version which really uses the
PTP_SYS_OFFSET_EXTENDED-ioctl), I got the following results:

  Min:		-12144 ns
  Max:		10891 ns
  StdDev:	4046,71 ns
  Count:	112

So, things improved, but this is still unacceptable. It was still not possible 
to compensate the MDIO latency issue.

According to my understanding, the timestamps (by using 
ptp_read_system_{pre|post}ts) have to be captured at a place where we have an
constant offset related to the PHC in the switch. The only point where these
timestamps can be captured is the mdio_write callback in the imx_fec. Because,
reading the PHC timestamp will result in the follwing MDIO accesses:

  (several) reads of the AVB_CMD register (to poll for the busy-flag)
  write AVB_CMD (AVBOp=110b Read with post-incerement of PHC timestamp)
  read AVB_DATA (PTP Global Time [15:0])
  read AVB_DATA (PTP Global Time [31:16])
  
With this sequence in mind, the Marvell switch has to snapshot the PHC 
timestamp at the write-AVB_CMD in order to be able to get sane values later by 
reading AVB_DATA. So the best place to capture the system timestamps is this
one and only write operation directly in the imx_fec. By using the patch below 
(without the changes to the system clock resolution) I got the following 
results:

  Min:		-464 ns
  Max:		525 ns
  StdDev:	210,31 ns
  Count:	401
  
I would say that is a huge improvement.

I realized, that the system clock (at least on the imx6) has a resolution of
333ns. So I tried to speed up this clock by using the PER-clock instead of
OSC_PER. This gave me 15ns resolution. The results were:

  Min:		-476 ns
  Max:		439 ns
  StdDev:	176,52 ns
  Count:	630
  
So, things got improved again a little bit (at least the StdDev).

According to my understanding, this is almost the best which is possible, 
because there is one more clock which influences the results. This is the MDIO 
bus clock, which is just 2.5MHz (or 400ns). So, I would say that +/- 400ns 
jitter is caused only by the MDIO bus clock, since the changes in imx_fec should 
not introduce any unpredictable latencies.

My question to the experienced kernel developers is, how can this be implemented
in a more generic way? Because this hack only works under these circumstances,
and of course can never be part of the mainline kernel.

I am not 100% sure that all my statements or assumptions are correct, so feel 
free to correct me.

Hubert

 drivers/clocksource/timer-imx-gpt.c       |  9 ++++++++-
 drivers/net/dsa/mv88e6xxx/chip.h          |  2 ++
 drivers/net/dsa/mv88e6xxx/ptp.c           | 11 +++++++----
 drivers/net/dsa/mv88e6xxx/smi.c           |  2 +-
 drivers/net/ethernet/freescale/fec_main.c | 24 ++++++++++++++++++-----
 drivers/net/phy/mdio_bus.c                | 16 +++++++++++++++
 include/linux/mdio.h                      |  2 ++
 include/linux/phy.h                       |  1 +
 8 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/drivers/clocksource/timer-imx-gpt.c b/drivers/clocksource/timer-imx-gpt.c
index 706c0d0ff56c..84695a2d8ff7 100644
--- a/drivers/clocksource/timer-imx-gpt.c
+++ b/drivers/clocksource/timer-imx-gpt.c
@@ -471,8 +471,15 @@ static int __init mxc_timer_init_dt(struct device_node *np,  enum imx_gpt_type t
 
 	/* Try osc_per first, and fall back to per otherwise */
 	imxtm->clk_per = of_clk_get_by_name(np, "osc_per");
-	if (IS_ERR(imxtm->clk_per))
+
+	/* Force PER clock to be used, so we get 15ns instead of 333ns */
+	//if (IS_ERR(imxtm->clk_per)) {
+	if (1) {
 		imxtm->clk_per = of_clk_get_by_name(np, "per");
+		pr_info("==> Using PER clock\n");
+	} else {
+		pr_info("==> Using OSC_PER clock\n");
+	}
 
 	imxtm->type = type;
 
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 01963ee94c50..9e14dc406415 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -277,6 +277,8 @@ struct mv88e6xxx_chip {
 	struct ptp_clock_info	ptp_clock_info;
 	struct delayed_work	tai_event_work;
 	struct ptp_pin_desc	pin_config[MV88E6XXX_MAX_GPIO];
+	struct ptp_system_timestamp *ptp_sts;
+
 	u16 trig_config;
 	u16 evcap_config;
 	u16 enable_count;
diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
index 073cbd0bb91b..cf6e52ee9e0a 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.c
+++ b/drivers/net/dsa/mv88e6xxx/ptp.c
@@ -235,14 +235,17 @@ static int mv88e6xxx_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
 	return 0;
 }
 
-static int mv88e6xxx_ptp_gettime(struct ptp_clock_info *ptp,
-				 struct timespec64 *ts)
+static int mv88e6xxx_ptp_gettimex(struct ptp_clock_info *ptp,
+				  struct timespec64 *ts,
+				  struct ptp_system_timestamp *sts)
 {
 	struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
 	u64 ns;
 
 	mv88e6xxx_reg_lock(chip);
+	chip->ptp_sts = sts;
 	ns = timecounter_read(&chip->tstamp_tc);
+	chip->ptp_sts = NULL;
 	mv88e6xxx_reg_unlock(chip);
 
 	*ts = ns_to_timespec64(ns);
@@ -426,7 +429,7 @@ static void mv88e6xxx_ptp_overflow_check(struct work_struct *work)
 	struct mv88e6xxx_chip *chip = dw_overflow_to_chip(dw);
 	struct timespec64 ts;
 
-	mv88e6xxx_ptp_gettime(&chip->ptp_clock_info, &ts);
+	mv88e6xxx_ptp_gettimex(&chip->ptp_clock_info, &ts, NULL);
 
 	schedule_delayed_work(&chip->overflow_work,
 			      MV88E6XXX_TAI_OVERFLOW_PERIOD);
@@ -472,7 +475,7 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
 	chip->ptp_clock_info.max_adj    = MV88E6XXX_MAX_ADJ_PPB;
 	chip->ptp_clock_info.adjfine	= mv88e6xxx_ptp_adjfine;
 	chip->ptp_clock_info.adjtime	= mv88e6xxx_ptp_adjtime;
-	chip->ptp_clock_info.gettime64	= mv88e6xxx_ptp_gettime;
+	chip->ptp_clock_info.gettimex64	= mv88e6xxx_ptp_gettimex;
 	chip->ptp_clock_info.settime64	= mv88e6xxx_ptp_settime;
 	chip->ptp_clock_info.enable	= ptp_ops->ptp_enable;
 	chip->ptp_clock_info.verify	= ptp_ops->ptp_verify;
diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index 5fc78a063843..801fd4abba5a 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -45,7 +45,7 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
 {
 	int ret;
 
-	ret = mdiobus_write_nested(chip->bus, dev, reg, data);
+	ret = mdiobus_write_nested_ptp(chip->bus, dev, reg, data, chip->ptp_sts);
 	if (ret < 0)
 		return ret;
 
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 2f6057e7335d..20f589dc5b8b 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1814,11 +1814,25 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 
 	reinit_completion(&fep->mdio_done);
 
-	/* start a write op */
-	writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
-		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
-		FEC_MMFR_TA | FEC_MMFR_DATA(value),
-		fep->hwp + FEC_MII_DATA);
+	if (bus->ptp_sts) {
+		unsigned long flags = 0;
+		local_irq_save(flags);
+		__iowmb();
+		/* >Take the timestamp *after* the memory barrier */
+		ptp_read_system_prets(bus->ptp_sts);
+		writel_relaxed(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
+			FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
+			FEC_MMFR_TA | FEC_MMFR_DATA(value),
+			fep->hwp + FEC_MII_DATA);
+		ptp_read_system_postts(bus->ptp_sts);
+		local_irq_restore(flags);
+	} else {
+		/* start a write op */
+		writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
+			FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
+			FEC_MMFR_TA | FEC_MMFR_DATA(value),
+			fep->hwp + FEC_MII_DATA);
+	}
 
 	/* wait for end of transfer */
 	time_left = wait_for_completion_timeout(&fep->mdio_done,
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index bd04fe762056..f076487db11f 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -672,6 +672,22 @@ int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val)
 }
 EXPORT_SYMBOL(mdiobus_write_nested);
 
+int mdiobus_write_nested_ptp(struct mii_bus *bus, int addr, u32 regnum, u16 val, struct ptp_system_timestamp *ptp_sts)
+{
+	int err;
+
+	BUG_ON(in_interrupt());
+
+	mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
+	bus->ptp_sts = ptp_sts;
+	err = __mdiobus_write(bus, addr, regnum, val);
+	bus->ptp_sts = NULL;
+	mutex_unlock(&bus->mdio_lock);
+
+	return err;
+}
+EXPORT_SYMBOL(mdiobus_write_nested_ptp);
+
 /**
  * mdiobus_write - Convenience function for writing a given MII mgmt register
  * @bus: the mii_bus struct
diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index e8242ad88c81..7f9767babdc3 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -9,6 +9,7 @@
 #include <uapi/linux/mdio.h>
 #include <linux/mod_devicetable.h>
 
+struct ptp_system_timestamp;
 struct gpio_desc;
 struct mii_bus;
 
@@ -310,6 +311,7 @@ int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
 int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int mdiobus_write_nested_ptp(struct mii_bus *bus, int addr, u32 regnum, u16 val, struct ptp_system_timestamp *ptp_sts);
 
 int mdiobus_register_device(struct mdio_device *mdiodev);
 int mdiobus_unregister_device(struct mdio_device *mdiodev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 462b90b73f93..fd4e33654863 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -252,6 +252,7 @@ struct mii_bus {
 	int reset_delay_us;
 	/* RESET GPIO descriptor pointer */
 	struct gpio_desc *reset_gpiod;
+	struct ptp_system_timestamp *ptp_sts;
 };
 #define to_mii_bus(d) container_of(d, struct mii_bus, dev)
 
-- 
2.22.0


^ permalink raw reply related

* Re: linux-next: Tree for Aug 2 (staging/octeon/)
From: Randy Dunlap @ 2019-08-02 16:22 UTC (permalink / raw)
  To: Stephen Rothwell, Linux Next Mailing List
  Cc: Linux Kernel Mailing List, devel@driverdev.osuosl.org,
	Matthew Wilcox, netdev@vger.kernel.org, Nathan Chancellor
In-Reply-To: <20190802155223.41b0be6e@canb.auug.org.au>

On 8/1/19 10:52 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20190801:
> 

on x86_64:
when CONFIG_OF is not set/enabled.


WARNING: unmet direct dependencies detected for MDIO_OCTEON
  Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=y] && MDIO_BUS [=y] && 64BIT [=y] && HAS_IOMEM [=y] && OF_MDIO [=n]
  Selected by [y]:
  - OCTEON_ETHERNET [=y] && STAGING [=y] && (CAVIUM_OCTEON_SOC || COMPILE_TEST [=y]) && NETDEVICES [=y]

ld: drivers/net/phy/mdio-octeon.o: in function `octeon_mdiobus_probe':
mdio-octeon.c:(.text+0x31c): undefined reference to `cavium_mdiobus_read'
ld: mdio-octeon.c:(.text+0x35a): undefined reference to `cavium_mdiobus_write'


OCTEON_ETHERNET should not select MDIO_OCTEON when OF_MDIO is not set/enabled.


-- 
~Randy

^ permalink raw reply

* Re: [patch 1/1] drivers/net/ethernet/marvell/mvmdio.c: Fix non OF case
From: Arnaud Patard @ 2019-08-02 16:21 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20190802144353.GG2099@lunn.ch>

Andrew Lunn <andrew@lunn.ch> writes:

Hi,

> On Fri, Aug 02, 2019 at 10:32:40AM +0200, Arnaud Patard wrote:
>> Orion5.x systems are still using machine files and not device-tree.
>> Commit 96cb4342382290c9 ("net: mvmdio: allow up to three clocks to be
>> specified for orion-mdio") has replaced devm_clk_get() with of_clk_get(),
>> leading to a oops at boot and not working network, as reported in 
>> https://lists.debian.org/debian-arm/2019/07/msg00088.html and possibly in
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908712.
>> 
>> Link: https://lists.debian.org/debian-arm/2019/07/msg00088.html
>> Fixes: 96cb4342382290c9 ("net: mvmdio: allow up to three clocks to be specified for orion-mdio")
>> Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
>> Index: linux-next/drivers/net/ethernet/marvell/mvmdio.c
>> ===================================================================
>> --- linux-next.orig/drivers/net/ethernet/marvell/mvmdio.c
>> +++ linux-next/drivers/net/ethernet/marvell/mvmdio.c
>> @@ -319,20 +319,33 @@ static int orion_mdio_probe(struct platf
>>  
>>  	init_waitqueue_head(&dev->smi_busy_wait);
>>  
>> -	for (i = 0; i < ARRAY_SIZE(dev->clk); i++) {
>> -		dev->clk[i] = of_clk_get(pdev->dev.of_node, i);
>> -		if (PTR_ERR(dev->clk[i]) == -EPROBE_DEFER) {
>> +	if (pdev->dev.of_node) {
>> +		for (i = 0; i < ARRAY_SIZE(dev->clk); i++) {
>> +			dev->clk[i] = of_clk_get(pdev->dev.of_node, i);
>> +			if (PTR_ERR(dev->clk[i]) == -EPROBE_DEFER) {
>> +				ret = -EPROBE_DEFER;
>> +				goto out_clk;
>> +			}
>> +			if (IS_ERR(dev->clk[i]))
>> +				break;
>> +			clk_prepare_enable(dev->clk[i]);
>> +		}
>> +
>> +		if (!IS_ERR(of_clk_get(pdev->dev.of_node,
>> +				       ARRAY_SIZE(dev->clk))))
>> +			dev_warn(&pdev->dev,
>> +				 "unsupported number of clocks, limiting to the first "
>> +				 __stringify(ARRAY_SIZE(dev->clk)) "\n");
>> +	} else {
>> +		dev->clk[0] = clk_get(&pdev->dev, NULL);
>> +		if (PTR_ERR(dev->clk[0]) == -EPROBE_DEFER) {
>>  			ret = -EPROBE_DEFER;
>>  			goto out_clk;
>>  		}
>
> Hi Arnaud
>
> It is a long time since i looked at Orion5x. Is this else clause even
> needed? If my memory is right, i don't think it needs to enable tclk?
> It was kirkwood which first added gateable clocks. And all kirkwood
> boards are not DT.

I was not sure if the else clause was needed or not so I only restored
similar behaviour to what was there before the commit 96cb4342382290c9.

By looking at the commit logs, the commit adding the clock support
3d604da1e9547c09 (net: mvmdio: get and enable optional clock) doesn't
mention the SoC names. So, I've no idea if it's needed or not.

Arnaud

^ permalink raw reply

* Re: [PATCH] niu: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02 16:16 UTC (permalink / raw)
  Cc: David S . Miller, Netdev, LKML
In-Reply-To: <20190802125720.22363-1-hslester96@gmail.com>

Chuhong Yuan <hslester96@gmail.com> 于2019年8月2日周五 下午8:57写道：
>
> refcount_t is better for reference counters since its
> implementation can prevent overflows.
> So convert atomic_t ref counters to refcount_t.
>
> Also convert refcount from 0-based to 1-based.

It seems that directly converting refcount from 0-based to
1-based is infeasible.
I am sorry for this mistake.

Regards,
Chuhong

>
> Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
> ---
>  drivers/net/ethernet/sun/niu.c | 6 +++---
>  drivers/net/ethernet/sun/niu.h | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c
> index 0bc5863bffeb..5bf096e51db7 100644
> --- a/drivers/net/ethernet/sun/niu.c
> +++ b/drivers/net/ethernet/sun/niu.c
> @@ -9464,7 +9464,7 @@ static struct niu_parent *niu_new_parent(struct niu *np,
>         memcpy(&p->id, id, sizeof(*id));
>         p->plat_type = ptype;
>         INIT_LIST_HEAD(&p->list);
> -       atomic_set(&p->refcnt, 0);
> +       refcount_set(&p->refcnt, 1);
>         list_add(&p->list, &niu_parent_list);
>         spin_lock_init(&p->lock);
>
> @@ -9524,7 +9524,7 @@ static struct niu_parent *niu_get_parent(struct niu *np,
>                                         port_name);
>                 if (!err) {
>                         p->ports[port] = np;
> -                       atomic_inc(&p->refcnt);
> +                       refcount_inc(&p->refcnt);
>                 }
>         }
>         mutex_unlock(&niu_parent_lock);
> @@ -9552,7 +9552,7 @@ static void niu_put_parent(struct niu *np)
>         p->ports[port] = NULL;
>         np->parent = NULL;
>
> -       if (atomic_dec_and_test(&p->refcnt)) {
> +       if (refcount_dec_and_test(&p->refcnt)) {
>                 list_del(&p->list);
>                 platform_device_unregister(p->plat_dev);
>         }
> diff --git a/drivers/net/ethernet/sun/niu.h b/drivers/net/ethernet/sun/niu.h
> index 04c215f91fc0..755e6dd4c903 100644
> --- a/drivers/net/ethernet/sun/niu.h
> +++ b/drivers/net/ethernet/sun/niu.h
> @@ -3071,7 +3071,7 @@ struct niu_parent {
>
>         struct niu              *ports[NIU_MAX_PORTS];
>
> -       atomic_t                refcnt;
> +       refcount_t              refcnt;
>         struct list_head        list;
>
>         spinlock_t              lock;
> --
> 2.20.1
>

^ permalink raw reply

* Re: [PATCH] net/mlx4_core: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02 16:10 UTC (permalink / raw)
  Cc: Tariq Toukan, David S . Miller, Netdev, linux-rdma, LKML
In-Reply-To: <20190802121020.1181-1-hslester96@gmail.com>

Chuhong Yuan <hslester96@gmail.com> 于2019年8月2日周五 下午8:10写道：
>
> refcount_t is better for reference counters since its
> implementation can prevent overflows.
> So convert atomic_t ref counters to refcount_t.
>
> Also convert refcount from 0-based to 1-based.
>

It seems that directly converting refcount from 0-based
to 1-based is infeasible.
I am sorry for this mistake.

Regards,
Chuhong

> Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
> ---
>  .../ethernet/mellanox/mlx4/resource_tracker.c | 90 +++++++++----------
>  1 file changed, 45 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
> index 4356f3a58002..d7e26935fd76 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
> @@ -114,7 +114,7 @@ struct res_qp {
>         struct list_head        mcg_list;
>         spinlock_t              mcg_spl;
>         int                     local_qpn;
> -       atomic_t                ref_count;
> +       refcount_t              ref_count;
>         u32                     qpc_flags;
>         /* saved qp params before VST enforcement in order to restore on VGT */
>         u8                      sched_queue;
> @@ -143,7 +143,7 @@ static inline const char *mtt_states_str(enum res_mtt_states state)
>  struct res_mtt {
>         struct res_common       com;
>         int                     order;
> -       atomic_t                ref_count;
> +       refcount_t              ref_count;
>  };
>
>  enum res_mpt_states {
> @@ -179,7 +179,7 @@ enum res_cq_states {
>  struct res_cq {
>         struct res_common       com;
>         struct res_mtt         *mtt;
> -       atomic_t                ref_count;
> +       refcount_t              ref_count;
>  };
>
>  enum res_srq_states {
> @@ -192,7 +192,7 @@ struct res_srq {
>         struct res_common       com;
>         struct res_mtt         *mtt;
>         struct res_cq          *cq;
> -       atomic_t                ref_count;
> +       refcount_t              ref_count;
>  };
>
>  enum res_counter_states {
> @@ -1050,7 +1050,7 @@ static struct res_common *alloc_qp_tr(int id)
>         ret->local_qpn = id;
>         INIT_LIST_HEAD(&ret->mcg_list);
>         spin_lock_init(&ret->mcg_spl);
> -       atomic_set(&ret->ref_count, 0);
> +       refcount_set(&ret->ref_count, 1);
>
>         return &ret->com;
>  }
> @@ -1066,7 +1066,7 @@ static struct res_common *alloc_mtt_tr(int id, int order)
>         ret->com.res_id = id;
>         ret->order = order;
>         ret->com.state = RES_MTT_ALLOCATED;
> -       atomic_set(&ret->ref_count, 0);
> +       refcount_set(&ret->ref_count, 1);
>
>         return &ret->com;
>  }
> @@ -1110,7 +1110,7 @@ static struct res_common *alloc_cq_tr(int id)
>
>         ret->com.res_id = id;
>         ret->com.state = RES_CQ_ALLOCATED;
> -       atomic_set(&ret->ref_count, 0);
> +       refcount_set(&ret->ref_count, 1);
>
>         return &ret->com;
>  }
> @@ -1125,7 +1125,7 @@ static struct res_common *alloc_srq_tr(int id)
>
>         ret->com.res_id = id;
>         ret->com.state = RES_SRQ_ALLOCATED;
> -       atomic_set(&ret->ref_count, 0);
> +       refcount_set(&ret->ref_count, 1);
>
>         return &ret->com;
>  }
> @@ -1325,10 +1325,10 @@ static int add_res_range(struct mlx4_dev *dev, int slave, u64 base, int count,
>
>  static int remove_qp_ok(struct res_qp *res)
>  {
> -       if (res->com.state == RES_QP_BUSY || atomic_read(&res->ref_count) ||
> +       if (res->com.state == RES_QP_BUSY || refcount_read(&res->ref_count) != 1 ||
>             !list_empty(&res->mcg_list)) {
>                 pr_err("resource tracker: fail to remove qp, state %d, ref_count %d\n",
> -                      res->com.state, atomic_read(&res->ref_count));
> +                      res->com.state, refcount_read(&res->ref_count));
>                 return -EBUSY;
>         } else if (res->com.state != RES_QP_RESERVED) {
>                 return -EPERM;
> @@ -1340,11 +1340,11 @@ static int remove_qp_ok(struct res_qp *res)
>  static int remove_mtt_ok(struct res_mtt *res, int order)
>  {
>         if (res->com.state == RES_MTT_BUSY ||
> -           atomic_read(&res->ref_count)) {
> +           refcount_read(&res->ref_count) != 1) {
>                 pr_devel("%s-%d: state %s, ref_count %d\n",
>                          __func__, __LINE__,
>                          mtt_states_str(res->com.state),
> -                        atomic_read(&res->ref_count));
> +                        refcount_read(&res->ref_count));
>                 return -EBUSY;
>         } else if (res->com.state != RES_MTT_ALLOCATED)
>                 return -EPERM;
> @@ -1675,7 +1675,7 @@ static int cq_res_start_move_to(struct mlx4_dev *dev, int slave, int cqn,
>         } else if (state == RES_CQ_ALLOCATED) {
>                 if (r->com.state != RES_CQ_HW)
>                         err = -EINVAL;
> -               else if (atomic_read(&r->ref_count))
> +               else if (refcount_read(&r->ref_count) != 1)
>                         err = -EBUSY;
>                 else
>                         err = 0;
> @@ -1715,7 +1715,7 @@ static int srq_res_start_move_to(struct mlx4_dev *dev, int slave, int index,
>         } else if (state == RES_SRQ_ALLOCATED) {
>                 if (r->com.state != RES_SRQ_HW)
>                         err = -EINVAL;
> -               else if (atomic_read(&r->ref_count))
> +               else if (refcount_read(&r->ref_count) != 1)
>                         err = -EBUSY;
>         } else if (state != RES_SRQ_HW || r->com.state != RES_SRQ_ALLOCATED) {
>                 err = -EINVAL;
> @@ -2808,7 +2808,7 @@ int mlx4_SW2HW_MPT_wrapper(struct mlx4_dev *dev, int slave,
>                 goto ex_put;
>
>         if (!phys) {
> -               atomic_inc(&mtt->ref_count);
> +               refcount_inc(&mtt->ref_count);
>                 put_res(dev, slave, mtt->com.res_id, RES_MTT);
>         }
>
> @@ -2845,7 +2845,7 @@ int mlx4_HW2SW_MPT_wrapper(struct mlx4_dev *dev, int slave,
>                 goto ex_abort;
>
>         if (mpt->mtt)
> -               atomic_dec(&mpt->mtt->ref_count);
> +               refcount_dec(&mpt->mtt->ref_count);
>
>         res_end_move(dev, slave, RES_MPT, id);
>         return 0;
> @@ -3007,18 +3007,18 @@ int mlx4_RST2INIT_QP_wrapper(struct mlx4_dev *dev, int slave,
>         err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
>         if (err)
>                 goto ex_put_srq;
> -       atomic_inc(&mtt->ref_count);
> +       refcount_inc(&mtt->ref_count);
>         qp->mtt = mtt;
> -       atomic_inc(&rcq->ref_count);
> +       refcount_inc(&rcq->ref_count);
>         qp->rcq = rcq;
> -       atomic_inc(&scq->ref_count);
> +       refcount_inc(&scq->ref_count);
>         qp->scq = scq;
>
>         if (scqn != rcqn)
>                 put_res(dev, slave, scqn, RES_CQ);
>
>         if (use_srq) {
> -               atomic_inc(&srq->ref_count);
> +               refcount_inc(&srq->ref_count);
>                 put_res(dev, slave, srqn, RES_SRQ);
>                 qp->srq = srq;
>         }
> @@ -3113,7 +3113,7 @@ int mlx4_SW2HW_EQ_wrapper(struct mlx4_dev *dev, int slave,
>         if (err)
>                 goto out_put;
>
> -       atomic_inc(&mtt->ref_count);
> +       refcount_inc(&mtt->ref_count);
>         eq->mtt = mtt;
>         put_res(dev, slave, mtt->com.res_id, RES_MTT);
>         res_end_move(dev, slave, RES_EQ, res_id);
> @@ -3310,7 +3310,7 @@ int mlx4_HW2SW_EQ_wrapper(struct mlx4_dev *dev, int slave,
>         if (err)
>                 goto ex_put;
>
> -       atomic_dec(&eq->mtt->ref_count);
> +       refcount_dec(&eq->mtt->ref_count);
>         put_res(dev, slave, eq->mtt->com.res_id, RES_MTT);
>         res_end_move(dev, slave, RES_EQ, res_id);
>         rem_res_range(dev, slave, res_id, 1, RES_EQ, 0);
> @@ -3445,7 +3445,7 @@ int mlx4_SW2HW_CQ_wrapper(struct mlx4_dev *dev, int slave,
>         err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
>         if (err)
>                 goto out_put;
> -       atomic_inc(&mtt->ref_count);
> +       refcount_inc(&mtt->ref_count);
>         cq->mtt = mtt;
>         put_res(dev, slave, mtt->com.res_id, RES_MTT);
>         res_end_move(dev, slave, RES_CQ, cqn);
> @@ -3474,7 +3474,7 @@ int mlx4_HW2SW_CQ_wrapper(struct mlx4_dev *dev, int slave,
>         err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
>         if (err)
>                 goto out_move;
> -       atomic_dec(&cq->mtt->ref_count);
> +       refcount_dec(&cq->mtt->ref_count);
>         res_end_move(dev, slave, RES_CQ, cqn);
>         return 0;
>
> @@ -3539,9 +3539,9 @@ static int handle_resize(struct mlx4_dev *dev, int slave,
>         err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
>         if (err)
>                 goto ex_put1;
> -       atomic_dec(&orig_mtt->ref_count);
> +       refcount_dec(&orig_mtt->ref_count);
>         put_res(dev, slave, orig_mtt->com.res_id, RES_MTT);
> -       atomic_inc(&mtt->ref_count);
> +       refcount_inc(&mtt->ref_count);
>         cq->mtt = mtt;
>         put_res(dev, slave, mtt->com.res_id, RES_MTT);
>         return 0;
> @@ -3627,7 +3627,7 @@ int mlx4_SW2HW_SRQ_wrapper(struct mlx4_dev *dev, int slave,
>         if (err)
>                 goto ex_put_mtt;
>
> -       atomic_inc(&mtt->ref_count);
> +       refcount_inc(&mtt->ref_count);
>         srq->mtt = mtt;
>         put_res(dev, slave, mtt->com.res_id, RES_MTT);
>         res_end_move(dev, slave, RES_SRQ, srqn);
> @@ -3657,9 +3657,9 @@ int mlx4_HW2SW_SRQ_wrapper(struct mlx4_dev *dev, int slave,
>         err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
>         if (err)
>                 goto ex_abort;
> -       atomic_dec(&srq->mtt->ref_count);
> +       refcount_dec(&srq->mtt->ref_count);
>         if (srq->cq)
> -               atomic_dec(&srq->cq->ref_count);
> +               refcount_dec(&srq->cq->ref_count);
>         res_end_move(dev, slave, RES_SRQ, srqn);
>
>         return 0;
> @@ -3988,11 +3988,11 @@ int mlx4_2RST_QP_wrapper(struct mlx4_dev *dev, int slave,
>         if (err)
>                 goto ex_abort;
>
> -       atomic_dec(&qp->mtt->ref_count);
> -       atomic_dec(&qp->rcq->ref_count);
> -       atomic_dec(&qp->scq->ref_count);
> +       refcount_dec(&qp->mtt->ref_count);
> +       refcount_dec(&qp->rcq->ref_count);
> +       refcount_dec(&qp->scq->ref_count);
>         if (qp->srq)
> -               atomic_dec(&qp->srq->ref_count);
> +               refcount_dec(&qp->srq->ref_count);
>         res_end_move(dev, slave, RES_QP, qpn);
>         return 0;
>
> @@ -4456,7 +4456,7 @@ int mlx4_QP_FLOW_STEERING_ATTACH_wrapper(struct mlx4_dev *dev, int slave,
>         if (mlx4_is_bonded(dev))
>                 mlx4_do_mirror_rule(dev, rrule);
>
> -       atomic_inc(&rqp->ref_count);
> +       refcount_inc(&rqp->ref_count);
>
>  err_put_rule:
>         put_res(dev, slave, vhcr->out_param, RES_FS_RULE);
> @@ -4540,7 +4540,7 @@ int mlx4_QP_FLOW_STEERING_DETACH_wrapper(struct mlx4_dev *dev, int slave,
>                        MLX4_QP_FLOW_STEERING_DETACH, MLX4_CMD_TIME_CLASS_A,
>                        MLX4_CMD_NATIVE);
>         if (!err)
> -               atomic_dec(&rqp->ref_count);
> +               refcount_dec(&rqp->ref_count);
>  out:
>         put_res(dev, slave, qpn, RES_QP);
>         return err;
> @@ -4702,11 +4702,11 @@ static void rem_slave_qps(struct mlx4_dev *dev, int slave)
>                                         if (err)
>                                                 mlx4_dbg(dev, "rem_slave_qps: failed to move slave %d qpn %d to reset\n",
>                                                          slave, qp->local_qpn);
> -                                       atomic_dec(&qp->rcq->ref_count);
> -                                       atomic_dec(&qp->scq->ref_count);
> -                                       atomic_dec(&qp->mtt->ref_count);
> +                                       refcount_dec(&qp->rcq->ref_count);
> +                                       refcount_dec(&qp->scq->ref_count);
> +                                       refcount_dec(&qp->mtt->ref_count);
>                                         if (qp->srq)
> -                                               atomic_dec(&qp->srq->ref_count);
> +                                               refcount_dec(&qp->srq->ref_count);
>                                         state = RES_QP_MAPPED;
>                                         break;
>                                 default:
> @@ -4768,9 +4768,9 @@ static void rem_slave_srqs(struct mlx4_dev *dev, int slave)
>                                                 mlx4_dbg(dev, "rem_slave_srqs: failed to move slave %d srq %d to SW ownership\n",
>                                                          slave, srqn);
>
> -                                       atomic_dec(&srq->mtt->ref_count);
> +                                       refcount_dec(&srq->mtt->ref_count);
>                                         if (srq->cq)
> -                                               atomic_dec(&srq->cq->ref_count);
> +                                               refcount_dec(&srq->cq->ref_count);
>                                         state = RES_SRQ_ALLOCATED;
>                                         break;
>
> @@ -4805,7 +4805,7 @@ static void rem_slave_cqs(struct mlx4_dev *dev, int slave)
>         spin_lock_irq(mlx4_tlock(dev));
>         list_for_each_entry_safe(cq, tmp, cq_list, com.list) {
>                 spin_unlock_irq(mlx4_tlock(dev));
> -               if (cq->com.owner == slave && !atomic_read(&cq->ref_count)) {
> +               if (cq->com.owner == slave && refcount_read(&cq->ref_count) == 1) {
>                         cqn = cq->com.res_id;
>                         state = cq->com.from_state;
>                         while (state != 0) {
> @@ -4832,7 +4832,7 @@ static void rem_slave_cqs(struct mlx4_dev *dev, int slave)
>                                         if (err)
>                                                 mlx4_dbg(dev, "rem_slave_cqs: failed to move slave %d cq %d to SW ownership\n",
>                                                          slave, cqn);
> -                                       atomic_dec(&cq->mtt->ref_count);
> +                                       refcount_dec(&cq->mtt->ref_count);
>                                         state = RES_CQ_ALLOCATED;
>                                         break;
>
> @@ -4900,7 +4900,7 @@ static void rem_slave_mrs(struct mlx4_dev *dev, int slave)
>                                                 mlx4_dbg(dev, "rem_slave_mrs: failed to move slave %d mpt %d to SW ownership\n",
>                                                          slave, mptn);
>                                         if (mpt->mtt)
> -                                               atomic_dec(&mpt->mtt->ref_count);
> +                                               refcount_dec(&mpt->mtt->ref_count);
>                                         state = RES_MPT_MAPPED;
>                                         break;
>                                 default:
> @@ -5144,7 +5144,7 @@ static void rem_slave_eqs(struct mlx4_dev *dev, int slave)
>                                         if (err)
>                                                 mlx4_dbg(dev, "rem_slave_eqs: failed to move slave %d eqs %d to SW ownership\n",
>                                                          slave, eqn & 0x3ff);
> -                                       atomic_dec(&eq->mtt->ref_count);
> +                                       refcount_dec(&eq->mtt->ref_count);
>                                         state = RES_EQ_RESERVED;
>                                         break;
>
> --
> 2.20.1
>

^ permalink raw reply

* RE: [PATCH 20/34] xen: convert put_page() to put_user_page*()
From: Weiny, Ira @ 2019-08-02 16:09 UTC (permalink / raw)
  To: Juergen Gross, John Hubbard, john.hubbard@gmail.com,
	Andrew Morton
  Cc: devel@driverdev.osuosl.org, Dave Chinner, Christoph Hellwig,
	Williams, Dan J, x86@kernel.org, linux-mm@kvack.org, Dave Hansen,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org,
	linux-arm-kernel@lists.infradead.org,
	linux-rpi-kernel@lists.infradead.org, devel@lists.orangefs.org,
	xen-devel@lists.xenproject.org, Boris Ostrovsky,
	rds-devel@oss.oracle.com, Jérôme Glisse, Jan Kara,
	ceph-devel@vger.kernel.org, kvm@vger.kernel.org,
	linux-block@vger.kernel.org, linux-crypto@vger.kernel.org,
	linux-fbdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, LKML,
	linux-media@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-xfs@vger.kernel.org,
	netdev@vger.kernel.org, sparclinux@vger.kernel.org,
	Jason Gunthorpe
In-Reply-To: <d4931311-db01-e8c3-0f8c-d64685dc2143@suse.com>

> 
> On 02.08.19 07:48, John Hubbard wrote:
> > On 8/1/19 9:36 PM, Juergen Gross wrote:
> >> On 02.08.19 04:19, john.hubbard@gmail.com wrote:
> >>> From: John Hubbard <jhubbard@nvidia.com>
> > ...
> >>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index
> >>> 2f5ce7230a43..29e461dbee2d 100644
> >>> --- a/drivers/xen/privcmd.c
> >>> +++ b/drivers/xen/privcmd.c
> >>> @@ -611,15 +611,10 @@ static int lock_pages(
> >>>   static void unlock_pages(struct page *pages[], unsigned int
> >>> nr_pages)
> >>>   {
> >>> -    unsigned int i;
> >>> -
> >>>       if (!pages)
> >>>           return;
> >>> -    for (i = 0; i < nr_pages; i++) {
> >>> -        if (pages[i])
> >>> -            put_page(pages[i]);
> >>> -    }
> >>> +    put_user_pages(pages, nr_pages);
> >>
> >> You are not handling the case where pages[i] is NULL here. Or am I
> >> missing a pending patch to put_user_pages() here?
> >>
> >
> > Hi Juergen,
> >
> > You are correct--this no longer handles the cases where pages[i] is
> > NULL. It's intentional, though possibly wrong. :)
> >
> > I see that I should have added my standard blurb to this commit
> > description. I missed this one, but some of the other patches have it.
> > It makes the following, possibly incorrect claim:
> >
> > "This changes the release code slightly, because each page slot in the
> > page_list[] array is no longer checked for NULL. However, that check
> > was wrong anyway, because the get_user_pages() pattern of usage here
> > never allowed for NULL entries within a range of pinned pages."
> >
> > The way I've seen these page arrays used with get_user_pages(), things
> > are either done single page, or with a contiguous range. So unless I'm
> > missing a case where someone is either
> >
> > a) releasing individual pages within a range (and thus likely messing
> > up their count of pages they have), or
> >
> > b) allocating two gup ranges within the same pages[] array, with a gap
> > between the allocations,
> >
> > ...then it should be correct. If so, then I'll add the above blurb to
> > this patch's commit description.
> >
> > If that's not the case (both here, and in 3 or 4 other patches in this
> > series, then as you said, I should add NULL checks to put_user_pages()
> > and put_user_pages_dirty_lock().
> 
> In this case it is not correct, but can easily be handled. The NULL case can
> occur only in an error case with the pages array filled partially or not at all.
> 
> I'd prefer something like the attached patch here.

I'm not an expert in this code and have not looked at it carefully but that patch does seem to be the better fix than forcing NULL checks on everyone.

Ira


^ permalink raw reply

* Re: [PATCH net-next 00/15] net: Add functional tests for L3 and L4
From: David Ahern @ 2019-08-02 16:00 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Ahern, David S. Miller, Network Development
In-Reply-To: <CAADnVQJ7GUX=EWP0RWxpe71cGx2cTMyKHsA+6RRX0P05FPMg3w@mail.gmail.com>

On 8/2/19 9:15 AM, Alexei Starovoitov wrote:
> On Thu, Aug 1, 2019 at 9:11 PM David Ahern <dsahern@gmail.com> wrote:
>>
>> On 8/1/19 6:19 PM, Alexei Starovoitov wrote:
>>> Do you really need 'sleep 1' everywhere?
>>> It makes them so slow to run...
>>> What happens if you just remove it ? Tests will fail? Why?
>>
>> yes, the sleep 1 is needed. A server process is getting launched into
>> the background. It's status is not relevant; the client's is the key.
>> The server needs time to initialize. And, yes I tried forking and it
>> gets hung up with bash and capturing the output for verbose mode.
> 
> That means that the tests are not suitable for CI servers
> where cpus are pegged to 100% and multiple tests run in parallel.
> The test will be flaky :(
> 

once the tests exist, they can be improved - by me or anyone else that
has the time and interest.

^ permalink raw reply

* Re: [PATCH net-next 00/15] net: Add functional tests for L3 and L4
From: David Ahern @ 2019-08-02 15:59 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Ahern, David S. Miller, Network Development
In-Reply-To: <CAADnVQLLKziv+3oOEijh=woyBZ7KsxoJ8=BB9ax+XJT9wxTuYQ@mail.gmail.com>

On 8/2/19 9:14 AM, Alexei Starovoitov wrote:
> On Thu, Aug 1, 2019 at 9:04 PM David Ahern <dsahern@gmail.com> wrote:
>> ...
>>
>>>
>>> with -v I see:
>>> COMMAND: ip netns exec ns-A ping -c1 -w1 -I 172.16.2.1 172.16.1.2
>>> ping: unknown iface 172.16.2.1
>>> TEST: ping out, address bind - ns-B IP                                        [FAIL]
>>
>> With ping from iputils-ping -I can be an address or a device.
> 
> the ping, I have installed, supports -I.
> The issue is somewhere else. Ideas?
> 

make sure ping supports the overloading of -I (both dev and address).

check your kernel version. No results guaranteed on kernel prior to 5.3

This is Fedora 29 with 5.1 kernel

TEST: ping out - ns-B IP                                          [ OK ]
TEST: ping out, device bind - ns-B IP                             [ OK ]
TEST: ping out, address bind - ns-B IP                            [ OK ]
TEST: ping out - ns-B loopback IP                                 [ OK ]
TEST: ping out, device bind - ns-B loopback IP                    [ OK ]
TEST: ping out, address bind - ns-B loopback IP                   [ OK ]
TEST: ping in - ns-A IP                                           [ OK ]
TEST: ping in - ns-A loopback IP                                  [ OK ]
TEST: ping local - ns-A IP                                        [ OK ]
TEST: ping local - ns-A loopback IP                               [ OK ]
TEST: ping local - loopback                                       [ OK ]
TEST: ping local, device bind - ns-A IP                           [ OK ]
TEST: ping local, device bind - ns-A loopback IP                  [ OK ]
TEST: ping local, device bind - loopback                          [ OK ]
TEST: ping out, blocked by rule - ns-B loopback IP                [ OK ]
TEST: ping in, blocked by rule - ns-A loopback IP                 [ OK ]
TEST: ping out, blocked by route - ns-B loopback IP               [ OK ]
TEST: ping in, blocked by route - ns-A loopback IP                [ OK ]
TEST: ping out, unreachable default route - ns-B loopback IP      [ OK ]

Tests are known to work on Debian stretch and buster. Appears to work on
Fedora 29.

^ permalink raw reply

* [PATCH][net-next] ice: fix potential infinite loop
From: Colin King @ 2019-08-02 15:52 UTC (permalink / raw)
  To: Jeff Kirsher, David S . Miller, intel-wired-lan, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The loop counter of a for-loop is a u8 however this is being compared
to an int upper bound and this can lead to an infinite loop if the
upper bound is greater than 255 since the loop counter will wrap back
to zero. Fix this potential issue by making the loop counter an int.

Addresses-Coverity: ("Infinite loop")
Fixes: c7aeb4d1b9bf ("ice: Disable VFs until reset is completed")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index c26e6a102dac..088543d50095 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -488,7 +488,7 @@ static void
 ice_prepare_for_reset(struct ice_pf *pf)
 {
 	struct ice_hw *hw = &pf->hw;
-	u8 i;
+	int i;

 	/* already prepared for reset */
 	if (test_bit(__ICE_PREPARED_FOR_RESET, pf->state))
-- 
2.20.1

^ permalink raw reply related

* Re: [PATCH iproute2-next] ip tunnel: add json output
From: Stephen Hemminger @ 2019-08-02 15:49 UTC (permalink / raw)
  To: Andrea Claudi; +Cc: linux-netdev, David Ahern
In-Reply-To: <CAPpH65wSxeXGQc+r+7W_LRZR=vjnL2bXfub1U10vt-Gni67+kQ@mail.gmail.com>

On Fri, 2 Aug 2019 13:14:15 +0200
Andrea Claudi <aclaudi@redhat.com> wrote:

> On Thu, Aug 1, 2019 at 5:16 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Thu,  1 Aug 2019 12:12:58 +0200
> > Andrea Claudi <aclaudi@redhat.com> wrote:
> >  
> > > Add json support on iptunnel and ip6tunnel.
> > > The plain text output format should remain the same.
> > >
> > > Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
> > > ---
> > >  ip/ip6tunnel.c | 82 +++++++++++++++++++++++++++++++--------------
> > >  ip/iptunnel.c  | 90 +++++++++++++++++++++++++++++++++-----------------
> > >  ip/tunnel.c    | 42 +++++++++++++++++------
> > >  3 files changed, 148 insertions(+), 66 deletions(-)
> > >
> > > diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
> > > index d7684a673fdc4..f2b9710c1320f 100644
> > > --- a/ip/ip6tunnel.c
> > > +++ b/ip/ip6tunnel.c
> > > @@ -71,57 +71,90 @@ static void usage(void)
> > >  static void print_tunnel(const void *t)
> > >  {
> > >       const struct ip6_tnl_parm2 *p = t;
> > > -     char s1[1024];
> > > -     char s2[1024];
> > > +     SPRINT_BUF(b1);
> > >
> > >       /* Do not use format_host() for local addr,
> > >        * symbolic name will not be useful.
> > >        */
> > > -     printf("%s: %s/ipv6 remote %s local %s",
> > > -            p->name,
> > > -            tnl_strproto(p->proto),
> > > -            format_host_r(AF_INET6, 16, &p->raddr, s1, sizeof(s1)),
> > > -            rt_addr_n2a_r(AF_INET6, 16, &p->laddr, s2, sizeof(s2)));
> > > +     open_json_object(NULL);
> > > +     print_string(PRINT_ANY, "ifname", "%s: ", p->name);  
> >
> > Print this using color for interface name?  
> 
> Thanks for the suggestion, I can do the same also for remote and local
> addresses?
> 
> >
> >  
> > > +     snprintf(b1, sizeof(b1), "%s/ipv6", tnl_strproto(p->proto));
> > > +     print_string(PRINT_ANY, "mode", "%s ", b1);
> > > +     print_string(PRINT_ANY,
> > > +                  "remote",
> > > +                  "remote %s ",
> > > +                  format_host_r(AF_INET6, 16, &p->raddr, b1, sizeof(b1)));
> > > +     print_string(PRINT_ANY,
> > > +                  "local",
> > > +                  "local %s",
> > > +                  rt_addr_n2a_r(AF_INET6, 16, &p->laddr, b1, sizeof(b1)));
> > > +
> > >       if (p->link) {
> > >               const char *n = ll_index_to_name(p->link);
> > >
> > >               if (n)
> > > -                     printf(" dev %s", n);
> > > +                     print_string(PRINT_ANY, "link", " dev %s", n);
> > >       }
> > >
> > >       if (p->flags & IP6_TNL_F_IGN_ENCAP_LIMIT)
> > > -             printf(" encaplimit none");
> > > +             print_bool(PRINT_ANY,
> > > +                        "ip6_tnl_f_ign_encap_limit",
> > > +                        " encaplimit none",
> > > +                        true);  
> >
> > For flags like this, print_null is more typical JSON than a boolean
> > value. Null is better for presence flag. Bool is better if both true and
> > false are printed.  
> 
> Using print_null json JSON output becomes:
> 
>   {
>     "ifname": "gre2",
>     "mode": "gre/ipv6",
>     "remote": "fd::1",
>     "local": "::",
>     "ip6_tnl_f_ign_encap_limit": null,
>     "hoplimit": 64,
>     "tclass": "0x00",
>     "flowlabel": "0x00000",
>     "flowinfo": "0x00000000",
>     "flags": []
>   }
> 
> Which seems a bit confusing to me (what does the "null" value means?).
> Using print_null we also introduce an inconsistency with the JSON
> output of ip/link_gre6.c and ip/link_ip6tnl.c.
> I would prefer to use print_bool and print out
> ip6_tnl_f_ign_encap_limit both when true and false, patching both
> files above to do the same. WDYT?

JSON has several ways to represent the same type of flag value.
Since JSON always has key/value. Null is used to indicate something is present and
in that case the value is unnecessary, which is what the null field was meant for.



^ permalink raw reply

* Re: [patch net-next v2 1/3] net: devlink: allow to change namespaces
From: David Ahern @ 2019-08-02 15:45 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Jakub Kicinski, netdev, davem, sthemmin, mlxsw
In-Reply-To: <20190802074838.GC2203@nanopsycho>

On 8/2/19 1:48 AM, Jiri Pirko wrote:
> Wed, Jul 31, 2019 at 09:58:10PM CEST, dsahern@gmail.com wrote:
>> On 7/31/19 1:46 PM, David Ahern wrote:
>>> On 7/31/19 1:45 PM, Jiri Pirko wrote:
>>>>> check. e.g., what happens if a resource controller has been configured
>>>>> for the devlink instance and it is moved to a namespace whose existing
>>>>> config exceeds those limits?
>>>>
>>>> It's moved with all the values. The whole instance is moved.
>>>>
>>>
>>> The values are moved, but the FIB in a namespace could already contain
>>> more routes than the devlink instance allows.
>>>
>>
>>From a quick test your recent refactoring to netdevsim broke the
>> resource controller. It was, and is intended to be, per network namespace.
> 
> unifying devlink instances with network namespace in netdevsim was
> really odd. Netdevsim is also a device, like any other. With other
> devices, you do not do this so I don't see why to do this with netdevsim.
> 
> Now you create netdevsim instance in sysfs, there is proper bus probe
> mechanism done, there is a devlink instance created for this device,
> there are netdevices and devlink ports created. Same as for the real
> hardware.
> 
> Honestly, creating a devlink instance per-network namespace
> automagically, no relation to netdevsim devices, that is simply wrong.
> There should be always 1:1 relationshin between a device and devlink
> instance.
> 

Jiri: prior to your recent change netdevsim had a fib resource
controller per network namespace. Please return that behavior or revert
the change.

^ permalink raw reply

* Re: [PATCH net v4] net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
From: Nikolay Aleksandrov @ 2019-08-02 15:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, roopa, davem, bridge, michael-dev
In-Reply-To: <20190802083519.71cb4fa2@hermes.lan>

On 02/08/2019 18:35, Stephen Hemminger wrote:
> On Fri,  2 Aug 2019 13:57:36 +0300
> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> 
>> +int br_vlan_bridge_event(struct net_device *dev, unsigned long event, void *ptr)
>>  {
>>  	struct netdev_notifier_changeupper_info *info;
>> -	struct net_bridge *br;
>> +	struct net_bridge *br = netdev_priv(dev);
>> +	bool changed;
>> +	int ret = 0;
>>  
>>  	switch (event) {
>> +	case NETDEV_REGISTER:
>> +		ret = br_vlan_add(br, br->default_pvid,
>> +				  BRIDGE_VLAN_INFO_PVID |
>> +				  BRIDGE_VLAN_INFO_UNTAGGED |
>> +				  BRIDGE_VLAN_INFO_BRENTRY, &changed, NULL);
>> +		break;
> 
> Looks good.
> 
> As minor optimization br_vlan_add could ignore changed pointer if NULL.
> This would save places where you don't care.
> 
> 
> Something like:
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 021cc9f66804..bacd3543b215 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -626,10 +626,11 @@ static int br_vlan_add_existing(struct net_bridge *br,
>  		refcount_inc(&vlan->refcnt);
>  		vlan->flags |= BRIDGE_VLAN_INFO_BRENTRY;
>  		vg->num_vlans++;
> -		*changed = true;
> +		if (changed)
> +			*changed = true;
>  	}
>  
> -	if (__vlan_add_flags(vlan, flags))
> +	if (__vlan_add_flags(vlan, flags) && changed)
>  		*changed = true;
>  
>  	return 0;
> @@ -653,7 +654,8 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags, bool *changed,
>  
>  	ASSERT_RTNL();
>  
> -	*changed = false;
> +	if (changed)
> +		*changed = false;
>  	vg = br_vlan_group(br);
>  	vlan = br_vlan_find(vg, vid);
>  	if (vlan)
> @@ -679,7 +681,7 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags, bool *changed,
>  	if (ret) {
>  		free_percpu(vlan->stats);
>  		kfree(vlan);
> -	} else {
> +	} else if (changed) {
>  		*changed = true;
>  	}
>  
> 

sure, this is ok for net-next


^ permalink raw reply

* Re: [PATCH net v4] net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
From: Stephen Hemminger @ 2019-08-02 15:35 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, davem, bridge, michael-dev
In-Reply-To: <20190802105736.26767-1-nikolay@cumulusnetworks.com>

On Fri,  2 Aug 2019 13:57:36 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> +int br_vlan_bridge_event(struct net_device *dev, unsigned long event, void *ptr)
>  {
>  	struct netdev_notifier_changeupper_info *info;
> -	struct net_bridge *br;
> +	struct net_bridge *br = netdev_priv(dev);
> +	bool changed;
> +	int ret = 0;
>  
>  	switch (event) {
> +	case NETDEV_REGISTER:
> +		ret = br_vlan_add(br, br->default_pvid,
> +				  BRIDGE_VLAN_INFO_PVID |
> +				  BRIDGE_VLAN_INFO_UNTAGGED |
> +				  BRIDGE_VLAN_INFO_BRENTRY, &changed, NULL);
> +		break;

Looks good.

As minor optimization br_vlan_add could ignore changed pointer if NULL.
This would save places where you don't care.


Something like:
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 021cc9f66804..bacd3543b215 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -626,10 +626,11 @@ static int br_vlan_add_existing(struct net_bridge *br,
 		refcount_inc(&vlan->refcnt);
 		vlan->flags |= BRIDGE_VLAN_INFO_BRENTRY;
 		vg->num_vlans++;
-		*changed = true;
+		if (changed)
+			*changed = true;
 	}
 
-	if (__vlan_add_flags(vlan, flags))
+	if (__vlan_add_flags(vlan, flags) && changed)
 		*changed = true;
 
 	return 0;
@@ -653,7 +654,8 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags, bool *changed,
 
 	ASSERT_RTNL();
 
-	*changed = false;
+	if (changed)
+		*changed = false;
 	vg = br_vlan_group(br);
 	vlan = br_vlan_find(vg, vid);
 	if (vlan)
@@ -679,7 +681,7 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags, bool *changed,
 	if (ret) {
 		free_percpu(vlan->stats);
 		kfree(vlan);
-	} else {
+	} else if (changed) {
 		*changed = true;
 	}
 


^ permalink raw reply related

* Re: [PATCH 2/2] net: mvpp2: support multiple comphy lanes
From: Maxime Chevallier @ 2019-08-02 15:22 UTC (permalink / raw)
  To: Matt Pelland; +Cc: netdev, davem, antoine.tenart
In-Reply-To: <20190801204523.26454-3-mpelland@starry.com>

Hello Matt,

On Thu,  1 Aug 2019 16:45:23 -0400
Matt Pelland <mpelland@starry.com> wrote:

>mvpp 2.2 supports RXAUI which requires a pair of serdes lanes instead of
>the usual single lane required by other interface modes. This patch
>expands the number of lanes that can be associated to a port so that
>both lanes are correctly configured at the appropriate times when RXAUI
>is in use.

Thanks for the patch, it's nice to see RXAUI support moving forward.

I'll give your patches a test on my setup, although I never really got
something stable, it might simply be due to the particular hardware I
have access to.

I do have some comments below :

>Signed-off-by: Matt Pelland <mpelland@starry.com>
>---
> drivers/net/ethernet/marvell/mvpp2/mvpp2.h    |  7 +-
> .../net/ethernet/marvell/mvpp2/mvpp2_main.c   | 66 +++++++++++++------
> 2 files changed, 53 insertions(+), 20 deletions(-)
>
>diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
>index 256e7c796631..9ae2b3d9d0c7 100644
>--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
>+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
>@@ -655,6 +655,11 @@
> #define MVPP2_F_LOOPBACK		BIT(0)
> #define MVPP2_F_DT_COMPAT		BIT(1)
> 
>+/* MVPP22 supports RXAUI which requires two comphy lanes, all other modes
>+ * require one.
>+ */
>+#define MVPP22_MAX_COMPHYS		2

This driver is "supposed" to support XAUI too, at least it appears in
the list of modes we handle. XAUI uses 4 lanes, maybe we could bump the
max number of lanes to 4.

> /* Marvell tag types */
> enum mvpp2_tag_type {
> 	MVPP2_TAG_TYPE_NONE = 0,
>@@ -935,7 +940,7 @@ struct mvpp2_port {
> 	phy_interface_t phy_interface;
> 	struct phylink *phylink;
> 	struct phylink_config phylink_config;
>-	struct phy *comphy;
>+	struct phy *comphys[MVPP22_MAX_COMPHYS];
> 
> 	struct mvpp2_bm_pool *pool_long;
> 	struct mvpp2_bm_pool *pool_short;
>diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
>index 8b633af4a684..97dfe2e71b03 100644
>--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
>+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
>@@ -1186,17 +1186,40 @@ static void mvpp22_gop_setup_irq(struct mvpp2_port *port)
>  */
> static int mvpp22_comphy_init(struct mvpp2_port *port)
> {
>-	int ret;
>+	int i, ret;
> 
>-	if (!port->comphy)
>-		return 0;
>+	for (i = 0; i < ARRAY_SIZE(port->comphys); i++) {
>+		if (!port->comphys[i])
>+			return 0;
> 
>-	ret = phy_set_mode_ext(port->comphy, PHY_MODE_ETHERNET,
>-			       port->phy_interface);
>-	if (ret)
>-		return ret;
>+		ret = phy_set_mode_ext(port->comphys[i],
>+				       PHY_MODE_ETHERNET,
>+				       port->phy_interface);
>+		if (ret)
>+			return ret;
> 
>-	return phy_power_on(port->comphy);
>+		ret = phy_power_on(port->comphys[i]);
>+		if (ret)
>+			return ret;
>+	}
>+
>+	return 0;
>+}

It could be good to add some sanity check, to make sure we do have 2
comphy lanes available when configuring RXAUI mode (and 4 for XAUI).

>+static int mvpp22_comphy_deinit(struct mvpp2_port *port)
>+{
>+	int i, ret;
>+
>+	for (i = 0; i < ARRAY_SIZE(port->comphys); i++) {
>+		if (!port->comphys[i])
>+			return 0;
>+
>+		ret = phy_power_off(port->comphys[i]);
>+		if (ret)
>+			return ret;
>+	}
>+
>+	return 0;
> }
> 
> static void mvpp2_port_enable(struct mvpp2_port *port)
>@@ -3375,7 +3398,9 @@ static void mvpp2_stop_dev(struct mvpp2_port *port)
> 
> 	if (port->phylink)
> 		phylink_stop(port->phylink);
>-	phy_power_off(port->comphy);
>+
>+	if (port->priv->hw_version == MVPP22)
>+		mvpp22_comphy_deinit(port);
> }
> 
> static int mvpp2_check_ringparam_valid(struct net_device *dev,
>@@ -4947,7 +4972,7 @@ static void mvpp2_mac_config(struct phylink_config *config, unsigned int mode,
> 		port->phy_interface = state->interface;
> 
> 		/* Reconfigure the serdes lanes */
>-		phy_power_off(port->comphy);
>+		mvpp22_comphy_deinit(port);
> 		mvpp22_mode_reconfigure(port);
> 	}
> 
>@@ -5038,7 +5063,6 @@ static int mvpp2_port_probe(struct platform_device *pdev,
> 			    struct fwnode_handle *port_fwnode,
> 			    struct mvpp2 *priv)
> {
>-	struct phy *comphy = NULL;
> 	struct mvpp2_port *port;
> 	struct mvpp2_port_pcpu *port_pcpu;
> 	struct device_node *port_node = to_of_node(port_fwnode);
>@@ -5085,14 +5109,20 @@ static int mvpp2_port_probe(struct platform_device *pdev,
> 		goto err_free_netdev;
> 	}
> 
>+	port = netdev_priv(dev);
>+
> 	if (port_node) {
>-		comphy = devm_of_phy_get(&pdev->dev, port_node, NULL);
>-		if (IS_ERR(comphy)) {
>-			if (PTR_ERR(comphy) == -EPROBE_DEFER) {
>-				err = -EPROBE_DEFER;
>-				goto err_free_netdev;
>+		for (i = 0; i < ARRAY_SIZE(port->comphys); i++) {
>+			port->comphys[i] = devm_of_phy_get_by_index(&pdev->dev,
>+								    port_node,
>+								    i);
>+			if (IS_ERR(port->comphys[i])) {
>+				err = PTR_ERR(port->comphys[i]);
>+				port->comphys[i] = NULL;
>+				if (err == -EPROBE_DEFER)
>+					goto err_free_netdev;
>+				err = 0;
> 			}
>-			comphy = NULL;
> 		}
> 	}
> 
>@@ -5107,7 +5137,6 @@ static int mvpp2_port_probe(struct platform_device *pdev,
> 	dev->netdev_ops = &mvpp2_netdev_ops;
> 	dev->ethtool_ops = &mvpp2_eth_tool_ops;
> 
>-	port = netdev_priv(dev);
> 	port->dev = dev;
> 	port->fwnode = port_fwnode;
> 	port->has_phy = !!of_find_property(port_node, "phy", NULL);
>@@ -5144,7 +5173,6 @@ static int mvpp2_port_probe(struct platform_device *pdev,
> 
> 	port->of_node = port_node;
> 	port->phy_interface = phy_mode;
>-	port->comphy = comphy;
> 
> 	if (priv->hw_version == MVPP21) {
> 		port->base = devm_platform_ioremap_resource(pdev, 2 + id);



-- 
Maxime Chevallier, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH net-next 00/15] net: Add functional tests for L3 and L4
From: Alexei Starovoitov @ 2019-08-02 15:15 UTC (permalink / raw)
  To: David Ahern; +Cc: David Ahern, David S. Miller, Network Development
In-Reply-To: <7b95042e-e586-2ca2-2a26-f5aea5a8184b@gmail.com>

On Thu, Aug 1, 2019 at 9:11 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 8/1/19 6:19 PM, Alexei Starovoitov wrote:
> > Do you really need 'sleep 1' everywhere?
> > It makes them so slow to run...
> > What happens if you just remove it ? Tests will fail? Why?
>
> yes, the sleep 1 is needed. A server process is getting launched into
> the background. It's status is not relevant; the client's is the key.
> The server needs time to initialize. And, yes I tried forking and it
> gets hung up with bash and capturing the output for verbose mode.

That means that the tests are not suitable for CI servers
where cpus are pegged to 100% and multiple tests run in parallel.
The test will be flaky :(

^ permalink raw reply

* [PATCH net] inet: frags: re-introduce skb coalescing for local delivery
From: Guillaume Nault @ 2019-08-02 15:15 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, Eric Dumazet, Peter Oskolkov, Alexander Aring

Before commit d4289fcc9b16 ("net: IP6 defrag: use rbtrees for IPv6
defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
generating many fragments) and running over an IPsec tunnel, reported
more than 6Gbps throughput. After that patch, the same test gets only
9Mbps when receiving on a be2net nic (driver can make a big difference
here, for example, ixgbe doesn't seem to be affected).

By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
(IPv4 fragment coalescing was dropped by commit 14fe22e33462 ("Revert
"ipv4: use skb coalescing in defragmentation"")).

Without fragment coalescing, be2net runs out of Rx ring entries and
starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
the netperf traffic is only composed of UDP fragments, any lost packet
prevents reassembly of the full datagram. Therefore, fragments which
have no possibility to ever get reassembled pile up in the reassembly
queue, until the memory accounting exeeds the threshold. At that point
no fragment is accepted anymore, which effectively discards all
netperf traffic.

When reassembly timeout expires, some stale fragments are removed from
the reassembly queue, so a few packets can be received, reassembled
and delivered to the netperf receiver. But the nic still drops frames
and soon the reassembly queue gets filled again with stale fragments.
These long time frames where no datagram can be received explain why
the performance drop is so significant.

Re-introducing fragment coalescing is enough to get the initial
performances again (6.6Gbps with be2net): driver doesn't drop frames
anymore (no more rx_drops_no_frags errors) and the reassembly engine
works at full speed.

This patch is quite conservative and only coalesces skbs for local
IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
forwarding). Coalescing could be extended in the future if need be, as
more scenarios would probably benefit from it.

[0]: Test configuration
Sender:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
netserver -D -L fc00:2::1

Receiver:
ip xfrm policy flush
ip xfrm state flush
ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6

Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
 include/net/inet_frag.h                 |  2 +-
 net/ieee802154/6lowpan/reassembly.c     |  2 +-
 net/ipv4/inet_fragment.c                | 39 ++++++++++++++++++-------
 net/ipv4/ip_fragment.c                  |  8 ++++-
 net/ipv6/netfilter/nf_conntrack_reasm.c |  2 +-
 net/ipv6/reassembly.c                   |  2 +-
 6 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 010f26b31c89..bac79e817776 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -171,7 +171,7 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
 void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
 			      struct sk_buff *parent);
 void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
-			    void *reasm_data);
+			    void *reasm_data, bool try_coalesce);
 struct sk_buff *inet_frag_pull_head(struct inet_frag_queue *q);
 
 #endif
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index e4aba5d485be..bbe9b3b2d395 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -170,7 +170,7 @@ static int lowpan_frag_reasm(struct lowpan_frag_queue *fq, struct sk_buff *skb,
 	reasm_data = inet_frag_reasm_prepare(&fq->q, skb, prev_tail);
 	if (!reasm_data)
 		goto out_oom;
-	inet_frag_reasm_finish(&fq->q, skb, reasm_data);
+	inet_frag_reasm_finish(&fq->q, skb, reasm_data, false);
 
 	skb->dev = ldev;
 	skb->tstamp = fq->q.stamp;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index a999451345f9..10d31733297d 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -475,11 +475,12 @@ void *inet_frag_reasm_prepare(struct inet_frag_queue *q, struct sk_buff *skb,
 EXPORT_SYMBOL(inet_frag_reasm_prepare);
 
 void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
-			    void *reasm_data)
+			    void *reasm_data, bool try_coalesce)
 {
 	struct sk_buff **nextp = (struct sk_buff **)reasm_data;
 	struct rb_node *rbn;
 	struct sk_buff *fp;
+	int sum_truesize;
 
 	skb_push(head, head->data - skb_network_header(head));
 
@@ -487,25 +488,41 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
 	fp = FRAG_CB(head)->next_frag;
 	rbn = rb_next(&head->rbnode);
 	rb_erase(&head->rbnode, &q->rb_fragments);
+
+	sum_truesize = head->truesize;
 	while (rbn || fp) {
 		/* fp points to the next sk_buff in the current run;
 		 * rbn points to the next run.
 		 */
 		/* Go through the current run. */
 		while (fp) {
-			*nextp = fp;
-			nextp = &fp->next;
-			fp->prev = NULL;
-			memset(&fp->rbnode, 0, sizeof(fp->rbnode));
-			fp->sk = NULL;
-			head->data_len += fp->len;
-			head->len += fp->len;
+			struct sk_buff *next_frag = FRAG_CB(fp)->next_frag;
+			bool stolen;
+			int delta;
+
+			sum_truesize += fp->truesize;
 			if (head->ip_summed != fp->ip_summed)
 				head->ip_summed = CHECKSUM_NONE;
 			else if (head->ip_summed == CHECKSUM_COMPLETE)
 				head->csum = csum_add(head->csum, fp->csum);
-			head->truesize += fp->truesize;
-			fp = FRAG_CB(fp)->next_frag;
+
+			if (try_coalesce && skb_try_coalesce(head, fp, &stolen,
+							     &delta)) {
+				kfree_skb_partial(fp, stolen);
+			} else {
+				fp->prev = NULL;
+				memset(&fp->rbnode, 0, sizeof(fp->rbnode));
+				fp->sk = NULL;
+
+				head->data_len += fp->len;
+				head->len += fp->len;
+				head->truesize += fp->truesize;
+
+				*nextp = fp;
+				nextp = &fp->next;
+			}
+
+			fp = next_frag;
 		}
 		/* Move to the next run. */
 		if (rbn) {
@@ -516,7 +533,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
 			rbn = rbnext;
 		}
 	}
-	sub_frag_mem_limit(q->fqdir, head->truesize);
+	sub_frag_mem_limit(q->fqdir, sum_truesize);
 
 	*nextp = NULL;
 	skb_mark_not_on_list(head);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 4385eb9e781f..cfeb8890f94e 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -393,6 +393,11 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 	return err;
 }
 
+static bool ip_frag_coalesce_ok(const struct ipq *qp)
+{
+	return qp->q.key.v4.user == IP_DEFRAG_LOCAL_DELIVER;
+}
+
 /* Build a new IP datagram from all its fragments. */
 static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
 			 struct sk_buff *prev_tail, struct net_device *dev)
@@ -421,7 +426,8 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb,
 	if (len > 65535)
 		goto out_oversize;
 
-	inet_frag_reasm_finish(&qp->q, skb, reasm_data);
+	inet_frag_reasm_finish(&qp->q, skb, reasm_data,
+			       ip_frag_coalesce_ok(qp));
 
 	skb->dev = dev;
 	IPCB(skb)->frag_max_size = max(qp->max_df_size, qp->q.max_size);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 0f82c150543b..fed9666a2f7d 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -348,7 +348,7 @@ static int nf_ct_frag6_reasm(struct frag_queue *fq, struct sk_buff *skb,
 
 	skb_reset_transport_header(skb);
 
-	inet_frag_reasm_finish(&fq->q, skb, reasm_data);
+	inet_frag_reasm_finish(&fq->q, skb, reasm_data, false);
 
 	skb->ignore_df = 1;
 	skb->dev = dev;
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index ca05b16f1bb9..1f5d4d196dcc 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -282,7 +282,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb,
 
 	skb_reset_transport_header(skb);
 
-	inet_frag_reasm_finish(&fq->q, skb, reasm_data);
+	inet_frag_reasm_finish(&fq->q, skb, reasm_data, true);
 
 	skb->dev = dev;
 	ipv6_hdr(skb)->payload_len = htons(payload_len);
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 1/2] cxgb4: sched: Use refcount_t for refcount
From: Willem de Bruijn @ 2019-08-02 15:14 UTC (permalink / raw)
  To: Chuhong Yuan
  Cc: Vishal Kulkarni, David S . Miller, Network Development,
	linux-kernel
In-Reply-To: <CANhBUQ2C3OfkC6qDL9=hhXq=C-OMHUwaL7EaMbagVRTt=rc00A@mail.gmail.com>

On Fri, Aug 2, 2019 at 11:10 AM Chuhong Yuan <hslester96@gmail.com> wrote:
>
> Willem de Bruijn <willemdebruijn.kernel@gmail.com> 于2019年8月2日周五 下午10:53写道：
> >
> > On Fri, Aug 2, 2019 at 10:27 AM Chuhong Yuan <hslester96@gmail.com> wrote:
> > >
> > > Willem de Bruijn <willemdebruijn.kernel@gmail.com> 于2019年8月2日周五 下午9:40写道：
> > > >
> > > > On Fri, Aug 2, 2019 at 4:36 AM Chuhong Yuan <hslester96@gmail.com> wrote:
> > > > >
> > > > > refcount_t is better for reference counters since its
> > > > > implementation can prevent overflows.
> > > > > So convert atomic_t ref counters to refcount_t.
> > > > >
> > > > > Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
> > > > > ---
> > > > > Changes in v2:
> > > > >   - Convert refcount from 0-base to 1-base.
> > > >
> > > > This changes the initial value from 0 to 1, but does not change the
> > > > release condition. So this introduces an accounting bug?
> > >
> > > I have noticed this problem and have checked other files which use refcount_t.
> > > I find although the refcounts are 1-based, they still use
> > > refcount_dec_and_test()
> > > to check whether the resource should be released.
> > > One example is drivers/char/mspec.c.
> > > Therefore I think this is okay and do not change the release condition.
> >
> > Indeed it is fine to use refcount_t with a model where the initial
> > allocation already accounts for the first reference and thus
> > initializes with refcount_set(.., 1).
> >
> > But it is not correct to just change a previously zero initialization
> > to one. As now an extra refcount_dec will be needed to release state.
> > But the rest of the code has not changed, so this extra decrement will
> > not happen.
> >
> > For a correct conversion, see for instance commits
> >
> >   commit db5bce32fbe19f0c7482fb5a40a33178bbe7b11b
> >   net: prepare (struct ubuf_info)->refcnt conversion
> >
> > and
> >
> >   commit c1d1b437816f0afa99202be3cb650c9d174667bc
> >   net: convert (struct ubuf_info)->refcnt to refcount_t
> >
> > The second makes a search-and-replace style API change like your
> > patches (though also notice the additional required #include).
> >
>
> Thanks for your examples!
> I will fix the #include in those no base changed patches.
>
> > But the other patch is needed first to change both the initial
> > atomic_set *and* at least one atomic_inc, to maintain the same
> > reference count over the object's lifetime.
> >
> > That change requires understanding of the object's lifecycle, so I
> > suggest only making those changes when aware of that whole data path.
>
> I think I had better focus on the 1-based cases first.

Yes, sounds good. And please try a single driver first and get that
accepted, before moving on to multiple concurrent submissions.

^ permalink raw reply

* Re: [PATCH net-next 00/15] net: Add functional tests for L3 and L4
From: Alexei Starovoitov @ 2019-08-02 15:14 UTC (permalink / raw)
  To: David Ahern; +Cc: David Ahern, David S. Miller, Network Development
In-Reply-To: <4c89b1cd-4dba-9cd8-0f4e-ae0a5d8bc61c@gmail.com>

On Thu, Aug 1, 2019 at 9:04 PM David Ahern <dsahern@gmail.com> wrote:
> ...
>
> >
> > with -v I see:
> > COMMAND: ip netns exec ns-A ping -c1 -w1 -I 172.16.2.1 172.16.1.2
> > ping: unknown iface 172.16.2.1
> > TEST: ping out, address bind - ns-B IP                                        [FAIL]
>
> With ping from iputils-ping -I can be an address or a device.

the ping, I have installed, supports -I.
The issue is somewhere else. Ideas?

^ permalink raw reply

* [PATCH][net-next][V2] net/mlx5: remove self-assignment on esw->dev
From: Colin King @ 2019-08-02 15:13 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, David S . Miller, netdev,
	linux-rdma
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

There is a self assignment of esw->dev to itself, clean this up by
removing it. Also make dev a const pointer.

Addresses-Coverity: ("Self assignment")
Fixes: 6cedde451399 ("net/mlx5: E-Switch, Verify support QoS element type")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---

V2: make dev const

---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index f4ace5f8e884..de0894b695e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1413,7 +1413,7 @@ static int esw_vport_egress_config(struct mlx5_eswitch *esw,
 
 static bool element_type_supported(struct mlx5_eswitch *esw, int type)
 {
-	struct mlx5_core_dev *dev = esw->dev = esw->dev;
+	const struct mlx5_core_dev *dev = esw->dev;
 
 	switch (type) {
 	case SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR:
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 1/2] cxgb4: sched: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02 15:10 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Vishal Kulkarni, David S . Miller, Network Development,
	linux-kernel
In-Reply-To: <CAF=yD-+3tzufyOnK4suJnovrhX_=4sPqWOsjOcETGG3cA9+MdA@mail.gmail.com>

Willem de Bruijn <willemdebruijn.kernel@gmail.com> 于2019年8月2日周五 下午10:53写道：
>
> On Fri, Aug 2, 2019 at 10:27 AM Chuhong Yuan <hslester96@gmail.com> wrote:
> >
> > Willem de Bruijn <willemdebruijn.kernel@gmail.com> 于2019年8月2日周五 下午9:40写道：
> > >
> > > On Fri, Aug 2, 2019 at 4:36 AM Chuhong Yuan <hslester96@gmail.com> wrote:
> > > >
> > > > refcount_t is better for reference counters since its
> > > > implementation can prevent overflows.
> > > > So convert atomic_t ref counters to refcount_t.
> > > >
> > > > Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
> > > > ---
> > > > Changes in v2:
> > > >   - Convert refcount from 0-base to 1-base.
> > >
> > > This changes the initial value from 0 to 1, but does not change the
> > > release condition. So this introduces an accounting bug?
> >
> > I have noticed this problem and have checked other files which use refcount_t.
> > I find although the refcounts are 1-based, they still use
> > refcount_dec_and_test()
> > to check whether the resource should be released.
> > One example is drivers/char/mspec.c.
> > Therefore I think this is okay and do not change the release condition.
>
> Indeed it is fine to use refcount_t with a model where the initial
> allocation already accounts for the first reference and thus
> initializes with refcount_set(.., 1).
>
> But it is not correct to just change a previously zero initialization
> to one. As now an extra refcount_dec will be needed to release state.
> But the rest of the code has not changed, so this extra decrement will
> not happen.
>
> For a correct conversion, see for instance commits
>
>   commit db5bce32fbe19f0c7482fb5a40a33178bbe7b11b
>   net: prepare (struct ubuf_info)->refcnt conversion
>
> and
>
>   commit c1d1b437816f0afa99202be3cb650c9d174667bc
>   net: convert (struct ubuf_info)->refcnt to refcount_t
>
> The second makes a search-and-replace style API change like your
> patches (though also notice the additional required #include).
>

Thanks for your examples!
I will fix the #include in those no base changed patches.

> But the other patch is needed first to change both the initial
> atomic_set *and* at least one atomic_inc, to maintain the same
> reference count over the object's lifetime.
>
> That change requires understanding of the object's lifecycle, so I
> suggest only making those changes when aware of that whole data path.

I think I had better focus on the 1-based cases first.

^ permalink raw reply

* Re: [PATCH v2 1/2] cxgb4: sched: Use refcount_t for refcount
From: Willem de Bruijn @ 2019-08-02 14:52 UTC (permalink / raw)
  To: Chuhong Yuan
  Cc: Vishal Kulkarni, David S . Miller, Network Development,
	linux-kernel
In-Reply-To: <CANhBUQ2TRr4RuSmjaRYPXHZpVw_-2awXvWNjjdvV_z1yoGdkXA@mail.gmail.com>

On Fri, Aug 2, 2019 at 10:27 AM Chuhong Yuan <hslester96@gmail.com> wrote:
>
> Willem de Bruijn <willemdebruijn.kernel@gmail.com> 于2019年8月2日周五 下午9:40写道：
> >
> > On Fri, Aug 2, 2019 at 4:36 AM Chuhong Yuan <hslester96@gmail.com> wrote:
> > >
> > > refcount_t is better for reference counters since its
> > > implementation can prevent overflows.
> > > So convert atomic_t ref counters to refcount_t.
> > >
> > > Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
> > > ---
> > > Changes in v2:
> > >   - Convert refcount from 0-base to 1-base.
> >
> > This changes the initial value from 0 to 1, but does not change the
> > release condition. So this introduces an accounting bug?
>
> I have noticed this problem and have checked other files which use refcount_t.
> I find although the refcounts are 1-based, they still use
> refcount_dec_and_test()
> to check whether the resource should be released.
> One example is drivers/char/mspec.c.
> Therefore I think this is okay and do not change the release condition.

Indeed it is fine to use refcount_t with a model where the initial
allocation already accounts for the first reference and thus
initializes with refcount_set(.., 1).

But it is not correct to just change a previously zero initialization
to one. As now an extra refcount_dec will be needed to release state.
But the rest of the code has not changed, so this extra decrement will
not happen.

For a correct conversion, see for instance commits

  commit db5bce32fbe19f0c7482fb5a40a33178bbe7b11b
  net: prepare (struct ubuf_info)->refcnt conversion

and

  commit c1d1b437816f0afa99202be3cb650c9d174667bc
  net: convert (struct ubuf_info)->refcnt to refcount_t

The second makes a search-and-replace style API change like your
patches (though also notice the additional required #include).

But the other patch is needed first to change both the initial
atomic_set *and* at least one atomic_inc, to maintain the same
reference count over the object's lifetime.

That change requires understanding of the object's lifecycle, so I
suggest only making those changes when aware of that whole data path.

^ permalink raw reply

* Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: Jan Kara @ 2019-08-02 14:52 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jan Kara, Michal Hocko, john.hubbard, Andrew Morton,
	Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jason Gunthorpe, Jérôme Glisse, LKML,
	amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx, kvm,
	linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel, John Hubbard
In-Reply-To: <20190802142443.GB5597@bombadil.infradead.org>

On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
> On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
> > On Fri 02-08-19 11:12:44, Michal Hocko wrote:
> > > On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
> > > [...]
> > > > 2) Convert all of the call sites for get_user_pages*(), to
> > > > invoke put_user_page*(), instead of put_page(). This involves dozens of
> > > > call sites, and will take some time.
> > > 
> > > How do we make sure this is the case and it will remain the case in the
> > > future? There must be some automagic to enforce/check that. It is simply
> > > not manageable to do it every now and then because then 3) will simply
> > > be never safe.
> > > 
> > > Have you considered coccinele or some other scripted way to do the
> > > transition? I have no idea how to deal with future changes that would
> > > break the balance though.
> > 
> > Yeah, that's why I've been suggesting at LSF/MM that we may need to create
> > a gup wrapper - say vaddr_pin_pages() - and track which sites dropping
> > references got converted by using this wrapper instead of gup. The
> > counterpart would then be more logically named as unpin_page() or whatever
> > instead of put_user_page().  Sure this is not completely foolproof (you can
> > create new callsite using vaddr_pin_pages() and then just drop refs using
> > put_page()) but I suppose it would be a high enough barrier for missed
> > conversions... Thoughts?
> 
> I think the API we really need is get_user_bvec() / put_user_bvec(),
> and I know Christoph has been putting some work into that.  That avoids
> doing refcount operations on hundreds of pages if the page in question is
> a huge page.  Once people are switched over to that, they won't be tempted
> to manually call put_page() on the individual constituent pages of a bvec.

Well, get_user_bvec() is certainly a good API for one class of users but
just looking at the above series, you'll see there are *many* places that
just don't work with bvecs at all and you need something for those.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH net-next v2] net: phy: broadcom: add 1000Base-X support for BCM54616S
From: Andrew Lunn @ 2019-08-02 14:50 UTC (permalink / raw)
  To: Tao Ren
  Cc: Florian Fainelli, Heiner Kallweit, David S . Miller,
	Arun Parameswaran, Justin Chen, Vladimir Oltean, netdev,
	linux-kernel, openbmc
In-Reply-To: <20190801235839.290689-1-taoren@fb.com>

> +static int bcm54616s_read_status(struct phy_device *phydev)
> +{
> +	int err;
> +
> +	err = genphy_read_status(phydev);
> +
> +	/* 1000Base-X register set doesn't provide speed fields: the
> +	 * link speed is always 1000 Mb/s as long as link is up.
> +	 */
> +	if (phydev->dev_flags & PHY_BCM_FLAGS_MODE_1000BX &&
> +	    phydev->link)
> +		phydev->speed = SPEED_1000;
> +
> +	return err;
> +}

This function is equivalent to bcm5482_read_status(). You should use
it, rather than add a new function.

    Andrew

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox