Netdev List
 help / color / mirror / Atom feed
* [PATCH 3/3] net: davinci_emac: match the mdio device against its compatible if possible
From: Bartosz Golaszewski @ 2018-06-19 16:09 UTC (permalink / raw)
  To: Grygorii Strashko, David S . Miller, Florian Fainelli,
	Dan Carpenter, Ivan Khoronzhuk, Rob Herring, Lukas Wunner,
	Kevin Hilman, David Lechner, Sekhar Nori, Andrew Lunn
  Cc: linux-omap, netdev, linux-kernel, Bartosz Golaszewski
In-Reply-To: <20180619160950.6283-1-brgl@bgdev.pl>

From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

Device tree based systems without of_dev_auxdata will have the mdio
device named differently than "davinci_mdio(.0)". In this case use the
device's compatible string for matching.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
 drivers/net/ethernet/ti/davinci_emac.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index a1a6445b5a7e..c28a35bb852f 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1387,6 +1387,10 @@ static int emac_devioctl(struct net_device *ndev, struct ifreq *ifrq, int cmd)
 
 static int match_first_device(struct device *dev, void *data)
 {
+	if (dev->of_node)
+		return of_device_is_compatible(dev->of_node,
+					       "ti,davinci_mdio");
+
 	return !strncmp(dev_name(dev), "davinci_mdio", 12);
 }
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH 2/3] net: phy: set the of_node in the mdiodev's struct device
From: Bartosz Golaszewski @ 2018-06-19 16:09 UTC (permalink / raw)
  To: Grygorii Strashko, David S . Miller, Florian Fainelli,
	Dan Carpenter, Ivan Khoronzhuk, Rob Herring, Lukas Wunner,
	Kevin Hilman, David Lechner, Sekhar Nori, Andrew Lunn
  Cc: linux-omap, netdev, linux-kernel, Bartosz Golaszewski
In-Reply-To: <20180619160950.6283-1-brgl@bgdev.pl>

From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

Copy the of_node over from mii_bus's struct device. This is needed
for device-tree systems to be able to check the mdio device's
compatible string.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
 drivers/net/phy/phy_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index bd0f339f69fd..a92d5ee61813 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -411,6 +411,7 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
 	mdiodev->dev.parent = &bus->dev;
 	mdiodev->dev.bus = &mdio_bus_type;
 	mdiodev->dev.type = &mdio_bus_phy_type;
+	mdiodev->dev.of_node = bus->dev.of_node;
 	mdiodev->bus = bus;
 	mdiodev->bus_match = phy_bus_match;
 	mdiodev->addr = addr;
-- 
2.17.1

^ permalink raw reply related

* [PATCH 1/3] net: ethernet: fix suspend/resume in davinci_emac
From: Bartosz Golaszewski @ 2018-06-19 16:09 UTC (permalink / raw)
  To: Grygorii Strashko, David S . Miller, Florian Fainelli,
	Dan Carpenter, Ivan Khoronzhuk, Rob Herring, Lukas Wunner,
	Kevin Hilman, David Lechner, Sekhar Nori, Andrew Lunn
  Cc: linux-omap, netdev, linux-kernel, Bartosz Golaszewski, stable
In-Reply-To: <20180619160950.6283-1-brgl@bgdev.pl>

From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

This patch reverts commit 3243ff2a05ec ("net: ethernet: davinci_emac:
Deduplicate bus_find_device() by name matching") and adds a comment
which should stop anyone from reintroducing the same "fix" in the future.

We can't use bus_find_device_by_name() here because the device name is
not guaranteed to be 'davinci_mdio'. On some systems it can be
'davinci_mdio.0' so we need to use strncmp() against the first part of
the string to correctly match it.

Fixes: 3243ff2a05ec ("net: ethernet: davinci_emac: Deduplicate bus_find_device() by name matching")
Cc: stable@vger.kernel.org
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
 drivers/net/ethernet/ti/davinci_emac.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index 06d7c9e4dcda..a1a6445b5a7e 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -1385,6 +1385,11 @@ static int emac_devioctl(struct net_device *ndev, struct ifreq *ifrq, int cmd)
 		return -EOPNOTSUPP;
 }
 
+static int match_first_device(struct device *dev, void *data)
+{
+	return !strncmp(dev_name(dev), "davinci_mdio", 12);
+}
+
 /**
  * emac_dev_open - EMAC device open
  * @ndev: The DaVinci EMAC network adapter
@@ -1484,8 +1489,14 @@ static int emac_dev_open(struct net_device *ndev)
 
 	/* use the first phy on the bus if pdata did not give us a phy id */
 	if (!phydev && !priv->phy_id) {
-		phy = bus_find_device_by_name(&mdio_bus_type, NULL,
-					      "davinci_mdio");
+		/* NOTE: we can't use bus_find_device_by_name() here because
+		 * the device name is not guaranteed to be 'davinci_mdio'. On
+		 * some systems it can be 'davinci_mdio.0' so we need to use
+		 * strncmp() against the first part of the string to correctly
+		 * match it.
+		 */
+		phy = bus_find_device(&mdio_bus_type, NULL, NULL,
+				      match_first_device);
 		if (phy) {
 			priv->phy_id = dev_name(phy);
 			if (!priv->phy_id || !*priv->phy_id)
-- 
2.17.1

^ permalink raw reply related

* [PATCH 0/3] net: davinci_emac: fix suspend/resume (both a regression and a common clk problem)
From: Bartosz Golaszewski @ 2018-06-19 16:09 UTC (permalink / raw)
  To: Grygorii Strashko, David S . Miller, Florian Fainelli,
	Dan Carpenter, Ivan Khoronzhuk, Rob Herring, Lukas Wunner,
	Kevin Hilman, David Lechner, Sekhar Nori, Andrew Lunn
  Cc: linux-omap, netdev, linux-kernel, Bartosz Golaszewski

From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

Earlier today I sent the first patch as a solution to a regression
introduced during the v4.16 merge window, but after testing David's
common clock series on top of 4.18-rc1 + this patch it turned out that
the problem persisted.

This is a follow-up containing the regression fix and two additional
patches that make suspend/resume work with David's changes.

Bartosz Golaszewski (3):
  net: ethernet: fix suspend/resume in davinci_emac
  net: phy: set the of_node in the mdiodev's struct device
  net: davinci_emac: match the mdio device against its compatible if
    possible

 drivers/net/ethernet/ti/davinci_emac.c | 19 +++++++++++++++++--
 drivers/net/phy/phy_device.c           |  1 +
 2 files changed, 18 insertions(+), 2 deletions(-)

-- 
2.17.1

^ permalink raw reply

* Re: iproute2 won't compile without AF_VSOCK
From: Stephen Hemminger @ 2018-06-19 15:47 UTC (permalink / raw)
  To: Steve Wise; +Cc: David Ahern, netdev
In-Reply-To: <c06b38c7-1680-806d-5b93-a7e04313183d@opengridcomputing.com>

On Tue, 19 Jun 2018 10:17:45 -0500
Steve Wise <swise@opengridcomputing.com> wrote:

> Hey David,
> 
> I'm trying to compile the latest iproute2 on an RHEL-7.3 distro, and it
> fails to compile because AF_VSOCK is not defined.  Should this
> functionality be a configure option to disable it on older distros?
> 
> 
> Thanks,
> 
> Steve.
> 
> ----
> 
> misc
>     CC       ss.o
> ss.c:301:27: error: ‘AF_VSOCK’ undeclared here (not in a function)
>    .families = FAMILY_MASK(AF_VSOCK),
>                            ^
> ss.c:252:46: note: in definition of macro ‘FAMILY_MASK’
>  #define FAMILY_MASK(family) ((uint64_t)1 << (family))
>                                               ^
> ss.c:334:2: error: array index in initializer not of integer type
>   [AF_VSOCK] = {
>   ^
> ss.c:334:2: error: (near initialization for ‘default_afs’)
> make[1]: *** [ss.o] Error 1
> make: *** [all] Error 2
> 

Probably should just add an #ifdef to takeout that if not present

^ permalink raw reply

* [PATCH] net: stmmac: socfpga: add additional ocp reset line for Stratix10
From: Dinh Nguyen @ 2018-06-19 15:35 UTC (permalink / raw)
  To: netdev
  Cc: dinguyen, davem, joabreu, alexandre.torgue, peppe.cavallaro,
	linux-kernel

The Stratix10 platform has an additional reset line, OCP(Open Core Protocol),
that also needs to get deasserted for the stmmac ethernet controller to work.
Thus we need to update the Kconfig to include ARCH_STRATIX10 in order to build
dwmac-socfpga.

Also, remove the redundant check for the reset controller pointer. The
reset driver already checks for the pointer and returns 0 if the pointer
is NULL.

Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
---
 drivers/net/ethernet/stmicro/stmmac/Kconfig         |  2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 18 ++++++++++++++----
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index cb5b0f5..edf2036 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -111,7 +111,7 @@ config DWMAC_ROCKCHIP
 config DWMAC_SOCFPGA
 	tristate "SOCFPGA dwmac support"
 	default ARCH_SOCFPGA
-	depends on OF && (ARCH_SOCFPGA || COMPILE_TEST)
+	depends on OF && (ARCH_SOCFPGA || ARCH_STRATIX10 || COMPILE_TEST)
 	select MFD_SYSCON
 	help
 	  Support for ethernet controller on Altera SOCFPGA
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
index 6e35957..5b3b06a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
@@ -55,6 +55,7 @@ struct socfpga_dwmac {
 	struct	device *dev;
 	struct regmap *sys_mgr_base_addr;
 	struct reset_control *stmmac_rst;
+	struct reset_control *stmmac_ocp_rst;
 	void __iomem *splitter_base;
 	bool f2h_ptp_ref_clk;
 	struct tse_pcs pcs;
@@ -262,8 +263,8 @@ static int socfpga_dwmac_set_phy_mode(struct socfpga_dwmac *dwmac)
 		val = SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_GMII_MII;
 
 	/* Assert reset to the enet controller before changing the phy mode */
-	if (dwmac->stmmac_rst)
-		reset_control_assert(dwmac->stmmac_rst);
+	reset_control_assert(dwmac->stmmac_ocp_rst);
+	reset_control_assert(dwmac->stmmac_rst);
 
 	regmap_read(sys_mgr_base_addr, reg_offset, &ctrl);
 	ctrl &= ~(SYSMGR_EMACGRP_CTRL_PHYSEL_MASK << reg_shift);
@@ -288,8 +289,8 @@ static int socfpga_dwmac_set_phy_mode(struct socfpga_dwmac *dwmac)
 	/* Deassert reset for the phy configuration to be sampled by
 	 * the enet controller, and operation to start in requested mode
 	 */
-	if (dwmac->stmmac_rst)
-		reset_control_deassert(dwmac->stmmac_rst);
+	reset_control_deassert(dwmac->stmmac_ocp_rst);
+	reset_control_deassert(dwmac->stmmac_rst);
 	if (phymode == PHY_INTERFACE_MODE_SGMII) {
 		if (tse_pcs_init(dwmac->pcs.tse_pcs_base, &dwmac->pcs) != 0) {
 			dev_err(dwmac->dev, "Unable to initialize TSE PCS");
@@ -324,6 +325,15 @@ static int socfpga_dwmac_probe(struct platform_device *pdev)
 		goto err_remove_config_dt;
 	}
 
+	dwmac->stmmac_ocp_rst = devm_reset_control_get_optional(dev, "stmmaceth-ocp");
+	if (IS_ERR(dwmac->stmmac_ocp_rst)) {
+		ret = PTR_ERR(dwmac->stmmac_ocp_rst);
+		dev_err(dev, "error getting reset control of ocp %d\n", ret);
+		goto err_remove_config_dt;
+	}
+
+	reset_control_deassert(dwmac->stmmac_ocp_rst);
+
 	ret = socfpga_dwmac_parse_data(dwmac, dev);
 	if (ret) {
 		dev_err(dev, "Unable to parse OF data\n");
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: David Ahern @ 2018-06-19 15:34 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: dsahern, netdev, borkmann, ast, davem
In-Reply-To: <20180619152529.rkzeyyqgmiwsvjp6@kafai-mbp.dhcp.thefacebook.com>

On 6/19/18 9:25 AM, Martin KaFai Lau wrote:
> On Mon, Jun 18, 2018 at 03:35:25PM -0600, David Ahern wrote:
>> On 6/18/18 2:55 PM, Martin KaFai Lau wrote:
>>>> 	/* rc > 0 case */
>>>> 	switch(rc) {
>>>> 	case BPF_FIB_LKUP_RET_BLACKHOLE:
>>>> 	case BPF_FIB_LKUP_RET_UNREACHABLE:
>>>> 	case BPF_FIB_LKUP_RET_PROHIBIT:
>>>> 		return XDP_DROP;
>>>> 	}
>>>>
>>>> For the others it becomes a question of do we share why the stack needs
>>>> to be involved? Maybe the program wants to collect stats to show traffic
>>>> patterns that can be improved (BPF_FIB_LKUP_RET_FRAG_NEEDED) or support
>>>> in the kernel needs to be improved (BPF_FIB_LKUP_RET_UNSUPP_LWT) or an
>>>> interface is misconfigured (BPF_FIB_LKUP_RET_FWD_DISABLED).
>>> Thanks for the explanation.
>>>
>>> Agree on the bpf able to collect stats will be useful.
>>>
>>> I am wondering, if a new BPF_FIB_LKUP_RET_XYZ is added later,
>>> how may the old xdp_prog work/not-work?  As of now, the return value
>>> is straight forward, FWD, PASS (to stack) or DROP (error).
>>> With this change, the xdp_prog needs to match/switch() the
>>> BPF_FIB_LKUP_RET_* to at least PASS and DROP.
>>
>> IMO, programs should only call XDP_DROP for known reasons - like the 3
>> above. Anything else punt to the stack.
>>
>> If a new RET_XYZ comes along:
>> 1. the new XYZ is a new ACL response where the packet is to be dropped.
>> If the program does not understand XYZ and punts to the stack
>> (recommendation), then a second lookup is done during normal packet
>> processing and the stack drops it.
>>
>> 2. the new XYZ is a new path in the kernel that is unsupported with
>> respect to XDP forwarding, nothing new for the program to do.
>>
>> Either way I would expect stats on BPF_FIB_LKUP_RET_* to give a hint to
>> the program writer.
>>
>> Worst case of punting packets to the stack for any rc != 0 means the
>> stack is doing 2 lookups - 1 in XDP based on its lookup parameters and 1
>> in normal stack processing - to handle the packet.
> Instead of having the xdp_prog to follow the meaning of what RET_SYZ is,
> should the bpf_*_fib_lookup() return value be kept as is such that
> the xdp_prog is clear what to do.  The reason can be returned in
> the 'struct bpf_fib_lookup'.  The number of reasons can be extended.
> If the xdp_prog does not understand a reason, it still will not
> affect its decision because the return value is clear.
> I think the situation here is similar to regular syscall which usually
> uses -1 to clearly states error and errno to spells out the reason.
> 

I did consider returning the status in struct bpf_fib_lookup. However,
it is 64 bytes and can not be extended without a big performance
penalty, so the only option there is to make an existing entry a union
the most logical of which is the ifindex. It seemed odd to me to have
the result by hidden in the struct as a union on ifindex and returning
the egress index from the function:

@@ -2625,7 +2636,11 @@ struct bpf_fib_lookup {

        /* total length of packet from network header - used for MTU
check */
        __u16   tot_len;
-       __u32   ifindex;  /* L3 device index for lookup */
+
+       union {
+               __u32   ifindex;  /* input: L3 device index for lookup */
+               __u32   result;   /* output: one of BPF_FIB_LKUP_RET_* */
+       };


It seemed more natural to have ifindex stay ifindex and only change
value on return:

@@ -2625,7 +2639,11 @@ struct bpf_fib_lookup {

 	/* total length of packet from network header - used for MTU check */
 	__u16	tot_len;
-	__u32	ifindex;  /* L3 device index for lookup */
+
+	/* input: L3 device index for lookup
+	 * output: nexthop device index from FIB lookup
+	 */
+	__u32	ifindex;

 	union {
 		/* inputs to lookup */


>From a program's perspective:

rc < 0  -- program is passing incorrect data
rc == 0 -- packet can be forwarded
rc > 0  -- packet can not be forwarded.

BPF programs are not required to track the LKUP_RET values any more than
a function returning multiple negative values - the caller just checks
rc < 0 means failure. If the program cares it can look at specific
values of rc to see the specific value.

The same applies with the LKUP_RET values - they are there to provide
insight into why the packet is not forwarded directly if the program
cares to know why.

^ permalink raw reply

* Re: [PATCH] net/phy: Micrel KSZ8061 PHY link failure after cable connect
From: Andrew Lunn @ 2018-06-19 15:28 UTC (permalink / raw)
  To: Onnasch, Alexander (EXT)
  Cc: Florian Fainelli, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <AM6PR01MB4262F8BB6004D246DFF5C3FEF0700@AM6PR01MB4262.eurprd01.prod.exchangelabs.com>

On Tue, Jun 19, 2018 at 02:23:41PM +0000, Onnasch, Alexander (EXT) wrote:
> Hi Andrew
> thanks for the hint. But actually I cannot confirm - or I don't see it yet. 
> 
> Without having tested, just from the code, the struct phy_driver instance for PHY_ID_KSZ8061 in micrel.c does not have a .write_mmd function assigned, thus phy_write_mmd should evaluate to its else-clause (see below) and not to mdiobus_write (as in phy_write).
> 
> Also the ksz8061_extended_write() function which I have added uses the same principle as already existing HW-specific functions in micrel.c for simular reasons (kszphy_extended_write and ksz9031_extended_write).
> They use phy_write all over the place in that file and never phy_write_mmd - for whatever reason they had.
> Thus I thought it would be a good idea ...

Hi Alexander

Please don't top post. And wrap your lines at around 75 characters 

> 		struct mii_bus *bus = phydev->mdio.bus;
> 		int phy_addr = phydev->mdio.addr;
> 
> 		mutex_lock(&bus->mdio_lock);
> 		mmd_phy_indirect(bus, phy_addr, devad, regnum);
> 
> 		/* Write the data into MMD's selected register */
> 		bus->write(bus, phy_addr, MII_MMD_DATA, val);
> 		mutex_unlock(&bus->mdio_lock);


> > +static int ksz8061_extended_write(struct phy_device *phydev,
> > +				  u8 mode, u32 dev_addr, u32 regnum, u16 val) {
> > +	phy_write(phydev, MII_KSZ8061RN_MMD_CTRL_REG, dev_addr);
> > +	phy_write(phydev, MII_KSZ8061RN_MMD_REGDATA_REG, regnum);
> > +	phy_write(phydev, MII_KSZ8061RN_MMD_CTRL_REG, (mode << 14) | dev_addr);
> > +	return phy_write(phydev, MII_KSZ8061RN_MMD_REGDATA_REG, val); }
> 
> Hi Alexander
> 
> This looks a lot like phy_write_mmd().

Look closely at the two implementations. Look at what
mmd_phy_indirect() does. I _think_ these are identical. So don't add
your own helper, please use the core code.

     Andrew

^ permalink raw reply

* Re: [PATCH bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: Martin KaFai Lau @ 2018-06-19 15:25 UTC (permalink / raw)
  To: David Ahern; +Cc: dsahern, netdev, borkmann, ast, davem
In-Reply-To: <1339f6f2-9dd3-886c-2178-7088b0ae4746@gmail.com>

On Mon, Jun 18, 2018 at 03:35:25PM -0600, David Ahern wrote:
> On 6/18/18 2:55 PM, Martin KaFai Lau wrote:
> >> 	/* rc > 0 case */
> >> 	switch(rc) {
> >> 	case BPF_FIB_LKUP_RET_BLACKHOLE:
> >> 	case BPF_FIB_LKUP_RET_UNREACHABLE:
> >> 	case BPF_FIB_LKUP_RET_PROHIBIT:
> >> 		return XDP_DROP;
> >> 	}
> >>
> >> For the others it becomes a question of do we share why the stack needs
> >> to be involved? Maybe the program wants to collect stats to show traffic
> >> patterns that can be improved (BPF_FIB_LKUP_RET_FRAG_NEEDED) or support
> >> in the kernel needs to be improved (BPF_FIB_LKUP_RET_UNSUPP_LWT) or an
> >> interface is misconfigured (BPF_FIB_LKUP_RET_FWD_DISABLED).
> > Thanks for the explanation.
> > 
> > Agree on the bpf able to collect stats will be useful.
> > 
> > I am wondering, if a new BPF_FIB_LKUP_RET_XYZ is added later,
> > how may the old xdp_prog work/not-work?  As of now, the return value
> > is straight forward, FWD, PASS (to stack) or DROP (error).
> > With this change, the xdp_prog needs to match/switch() the
> > BPF_FIB_LKUP_RET_* to at least PASS and DROP.
> 
> IMO, programs should only call XDP_DROP for known reasons - like the 3
> above. Anything else punt to the stack.
> 
> If a new RET_XYZ comes along:
> 1. the new XYZ is a new ACL response where the packet is to be dropped.
> If the program does not understand XYZ and punts to the stack
> (recommendation), then a second lookup is done during normal packet
> processing and the stack drops it.
> 
> 2. the new XYZ is a new path in the kernel that is unsupported with
> respect to XDP forwarding, nothing new for the program to do.
> 
> Either way I would expect stats on BPF_FIB_LKUP_RET_* to give a hint to
> the program writer.
> 
> Worst case of punting packets to the stack for any rc != 0 means the
> stack is doing 2 lookups - 1 in XDP based on its lookup parameters and 1
> in normal stack processing - to handle the packet.
Instead of having the xdp_prog to follow the meaning of what RET_SYZ is,
should the bpf_*_fib_lookup() return value be kept as is such that
the xdp_prog is clear what to do.  The reason can be returned in
the 'struct bpf_fib_lookup'.  The number of reasons can be extended.
If the xdp_prog does not understand a reason, it still will not
affect its decision because the return value is clear.
I think the situation here is similar to regular syscall which usually
uses -1 to clearly states error and errno to spells out the reason.

> 
> > 
> >>
> >> Arguably BPF_FIB_LKUP_RET_NO_NHDEV is not needed. See below.
> >>
> >>>> @@ -2612,6 +2613,19 @@ struct bpf_raw_tracepoint_args {
> >>>>  #define BPF_FIB_LOOKUP_DIRECT  BIT(0)
> >>>>  #define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
> >>>>  
> >>>> +enum {
> >>>> +	BPF_FIB_LKUP_RET_SUCCESS,      /* lookup successful */
> >>>> +	BPF_FIB_LKUP_RET_BLACKHOLE,    /* dest is blackholed */
> >>>> +	BPF_FIB_LKUP_RET_UNREACHABLE,  /* dest is unreachable */
> >>>> +	BPF_FIB_LKUP_RET_PROHIBIT,     /* dest not allowed */
> >>>> +	BPF_FIB_LKUP_RET_NOT_FWDED,    /* pkt is not forwardded */
> >>> BPF_FIB_LKUP_RET_NOT_FWDED is a catch all?
> >>>
> >>
> >> Destination is local. More precisely, the FIB lookup is not unicast so
> >> not forwarded. It could be RTN_LOCAL, RTN_BROADCAST, RTN_ANYCAST, or
> >> RTN_MULTICAST. The next ones -- blackhole, reachable, prohibit -- are
> >> called out.
> > I think it also includes the tbid not found case.
> 
> Another one of those "should never happen scenarios". The user does not
> specify the table; it is retrieved based on device association. Table
> defaults to the main table - which always exists - and any VRF
> enslavement of a device happens after the VRF device creates the table.
> 
> > 
> >>
> >>>> @@ -4252,16 +4277,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> >>>>  	if (check_mtu) {
> >>>>  		mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
> >>>>  		if (params->tot_len > mtu)
> >>>> -			return 0;
> >>>> +			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
> >>>>  	}
> >>>>  
> >>>>  	if (f6i->fib6_nh.nh_lwtstate)
> >>>> -		return 0;
> >>>> +		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
> >>>>  
> >>>>  	if (f6i->fib6_flags & RTF_GATEWAY)
> >>>>  		*dst = f6i->fib6_nh.nh_gw;
> >>>>  
> >>>>  	dev = f6i->fib6_nh.nh_dev;
> >>>> +	if (unlikely(!dev))
> >>>> +		return BPF_FIB_LKUP_RET_NO_NHDEV;
> >>> Is this a bug fix?
> >>>
> >>
> >> Difference between IPv4 and IPv6. Making them consistent.
> >>
> >> It is a major BUG in the kernel to reach this point in either protocol
> >> to have a unicast route not tied to a device. IPv4 has checks; v6 does
> >> not. I figured this being new code, why not make bpf_ipv{4,6}_fib_lookup
> >> as close to the same as possible.
> > Make sense.  A comment in the commit log will be useful if there is a
> > re-spin.
> > 
> 
> ok.

^ permalink raw reply

* Re: Link modes representation in phylib
From: Andrew Lunn @ 2018-06-19 15:21 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, Russell King - ARM Linux, Florian Fainelli, netdev,
	Antoine Tenart, thomas.petazzoni@bootlin.com, Gregory CLEMENT,
	Miquel Raynal
In-Reply-To: <20180619113053.11df78a2@bootlin.com>

> What I propose is that we add 3 link_mode fields in phy_device, and keep
> the legacy fields for now. It would be up to the driver to fill the new
> "supported" field in config_init, kind of like what's done in the
> marvell10g driver.

Hi Maxime

You can do this conversion in the core. If features == 0, and some
bits are set in the features link_mode, do the conversion at probe
time. The same can be done for lp_advertising, when the call into the
drivers read_status() has completed.

> Would that be acceptable ?

It sounds reasonable. Lets see what the code looks like.

   Andrew

^ permalink raw reply

* [PATCH] bpfilter: ignore binary files
From: Matteo Croce @ 2018-06-19 15:21 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov

net/bpfilter/bpfilter_umh is a binary file generated when bpfilter is
enabled, add it to .gitignore to avoid committing it.

Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
 net/bpfilter/.gitignore | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 net/bpfilter/.gitignore

diff --git a/net/bpfilter/.gitignore b/net/bpfilter/.gitignore
new file mode 100644
index 000000000000..e97084e3eea2
--- /dev/null
+++ b/net/bpfilter/.gitignore
@@ -0,0 +1 @@
+bpfilter_umh
-- 
2.17.1

^ permalink raw reply related

* iproute2 won't compile without AF_VSOCK
From: Steve Wise @ 2018-06-19 15:17 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev

Hey David,

I'm trying to compile the latest iproute2 on an RHEL-7.3 distro, and it
fails to compile because AF_VSOCK is not defined.  Should this
functionality be a configure option to disable it on older distros?


Thanks,

Steve.

----

misc
    CC       ss.o
ss.c:301:27: error: ‘AF_VSOCK’ undeclared here (not in a function)
   .families = FAMILY_MASK(AF_VSOCK),
                           ^
ss.c:252:46: note: in definition of macro ‘FAMILY_MASK’
 #define FAMILY_MASK(family) ((uint64_t)1 << (family))
                                              ^
ss.c:334:2: error: array index in initializer not of integer type
  [AF_VSOCK] = {
  ^
ss.c:334:2: error: (near initialization for ‘default_afs’)
make[1]: *** [ss.o] Error 1
make: *** [all] Error 2


^ permalink raw reply

* [PATCH] bpfilter: fix build error
From: Matteo Croce @ 2018-06-19 15:16 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov

bpfilter Makefile assumes that the system locale is en_US, and the
parsing of objdump output fails.
Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns
only 2 processes instead of 7.

Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
 net/bpfilter/Makefile | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index e0bbe7583e58..dd86b022eff0 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -21,8 +21,10 @@ endif
 # which bpfilter_kern.c passes further into umh blob loader at run-time
 quiet_cmd_copy_umh = GEN $@
       cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
-      $(OBJCOPY) -I binary -O `$(OBJDUMP) -f $<|grep format|cut -d' ' -f8` \
-      -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
+      $(OBJCOPY) -I binary \
+          `LC_ALL=C objdump -f net/bpfilter/bpfilter_umh \
+          |awk -F' |,' '/file format/{print "-O",$$NF} \
+          /^architecture:/{print "-B",$$2}'` \
       --rename-section .data=.init.rodata $< $@
 
 $(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] optoe: driver to read/write SFP/QSFP EEPROMs
From: Andrew Lunn @ 2018-06-19 15:15 UTC (permalink / raw)
  To: Don Bollinger
  Cc: Tom Lendacky, Arnd Bergmann, Greg Kroah-Hartman, linux-kernel,
	brandon_chuang, wally_wang, roy_lee, rick_burchett, quentin.chang,
	steven.noble, jeffrey.townsend, scotte, roopa, David Ahern,
	luke.williams, Guohan Lu, Russell King, netdev@vger.kernel.org
In-Reply-To: <20180618194127.uaeqlo3dy35qs3ip@thebollingers.org>

> > If you are using Linux as a boot loader, i doubt you will find any
> > network kernel developers who are willing to consider this driver. The
> 
> It isn't a boot loader.  It is the kernel that is running on the switch
> when it is doing its switch thing.  The kernel hosts the drivers and the
> switch SDK and all the apps that configure and manage the networking.

That is exactly using it as a boot loader. Linux network stack itself
has nothing to do with the switches. It just loads an application
which controls the switch from user space.

> This is real code, that fits a real need, and would like the
> benefits of being maintained as part of the mainline kernel.

Have you ever attended one of the netdev conferences?

https://www.netdevconf.org/2.1/submit-proposal.html

Netdev 2.1 is a community-driven conference geared towards Linux
netheads. Linux kernel networking and user space utilization of the
interfaces to the Linux kernel networking subsystem are the focus. If
you are using Linux as a boot system for proprietary networking, then
this conference _may not be for you_.

That gives you an idea of what the kernel community feels about this
sort of thing. Why should we help maintain code which brings no
benefit to the netdev community?

If you make use of the existing SFP code, extend it so it both
benefits the netdev kernel community and your own, then we might
consider merging your code. But optoe as it is brings no benefit to
the netdev community, so is unlikely to get merged.

    Andrew

^ permalink raw reply

* Re: [PATCH bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: David Ahern @ 2018-06-19 15:11 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: dsahern, netdev, borkmann, ast, davem
In-Reply-To: <20180618205537.2j645mfujdsqxf2b@kafai-mbp.dhcp.thefacebook.com>

On 6/18/18 2:55 PM, Martin KaFai Lau wrote:
>>
>> Arguably BPF_FIB_LKUP_RET_NO_NHDEV is not needed. See below.
>>

...

>>>> @@ -4252,16 +4277,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>>>>  	if (check_mtu) {
>>>>  		mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
>>>>  		if (params->tot_len > mtu)
>>>> -			return 0;
>>>> +			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
>>>>  	}
>>>>  
>>>>  	if (f6i->fib6_nh.nh_lwtstate)
>>>> -		return 0;
>>>> +		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
>>>>  
>>>>  	if (f6i->fib6_flags & RTF_GATEWAY)
>>>>  		*dst = f6i->fib6_nh.nh_gw;
>>>>  
>>>>  	dev = f6i->fib6_nh.nh_dev;
>>>> +	if (unlikely(!dev))
>>>> +		return BPF_FIB_LKUP_RET_NO_NHDEV;
>>> Is this a bug fix?
>>>
>>
>> Difference between IPv4 and IPv6. Making them consistent.
>>
>> It is a major BUG in the kernel to reach this point in either protocol
>> to have a unicast route not tied to a device. IPv4 has checks; v6 does
>> not. I figured this being new code, why not make bpf_ipv{4,6}_fib_lookup
>> as close to the same as possible.
> Make sense.  A comment in the commit log will be useful if there is a
> re-spin.
> 

Upon further review, I will remove BPF_FIB_LKUP_RET_NO_NHDEV. The dev
check is not needed in either ipv4 or ipv6. For IPv4 after fib_lookup
calls both __mkroute_input and ip_route_output_key_hash_rcu expect dev
to be set for unicast as it should be.

^ permalink raw reply

* Re: [RFC PATCH ghak86 V1] audit: eliminate audit_enabled magic number comparison
From: Paul Moore @ 2018-06-19 15:10 UTC (permalink / raw)
  To: rgb
  Cc: linux-audit, linux-kernel, netdev, netfilter-devel,
	linux-security-module, Eric Paris, sgrubb
In-Reply-To: <490a00a7902582823fe8c532f5dd995a1da61fb1.1528214962.git.rgb@redhat.com>

On Tue, Jun 5, 2018 at 7:22 PM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Remove comparison of audit_enabled to magic numbers outside of audit.
>
> Related: https://github.com/linux-audit/audit-kernel/issues/86
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  drivers/tty/tty_audit.c      | 2 +-
>  include/linux/audit.h        | 5 ++++-
>  include/net/xfrm.h           | 2 +-
>  kernel/audit.c               | 3 ---
>  net/netfilter/xt_AUDIT.c     | 2 +-
>  net/netlabel/netlabel_user.c | 2 +-
>  6 files changed, 8 insertions(+), 8 deletions(-)

Merged, thanks.

> diff --git a/drivers/tty/tty_audit.c b/drivers/tty/tty_audit.c
> index e30aa6b..50f567b 100644
> --- a/drivers/tty/tty_audit.c
> +++ b/drivers/tty/tty_audit.c
> @@ -92,7 +92,7 @@ static void tty_audit_buf_push(struct tty_audit_buf *buf)
>  {
>         if (buf->valid == 0)
>                 return;
> -       if (audit_enabled == 0) {
> +       if (audit_enabled == AUDIT_OFF) {
>                 buf->valid = 0;
>                 return;
>         }
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 69c7847..9334fbe 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -117,6 +117,9 @@ struct audit_field {
>
>  extern void audit_log_session_info(struct audit_buffer *ab);
>
> +#define AUDIT_OFF      0
> +#define AUDIT_ON       1
> +#define AUDIT_LOCKED   2
>  #ifdef CONFIG_AUDIT
>  /* These are defined in audit.c */
>                                 /* Public API */
> @@ -202,7 +205,7 @@ static inline int audit_log_task_context(struct audit_buffer *ab)
>  static inline void audit_log_task_info(struct audit_buffer *ab,
>                                        struct task_struct *tsk)
>  { }
> -#define audit_enabled 0
> +#define audit_enabled AUDIT_OFF
>  #endif /* CONFIG_AUDIT */
>
>  #ifdef CONFIG_AUDIT_COMPAT_GENERIC
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index 7f2e31a..ce995a1 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -734,7 +734,7 @@ static inline struct audit_buffer *xfrm_audit_start(const char *op)
>  {
>         struct audit_buffer *audit_buf = NULL;
>
> -       if (audit_enabled == 0)
> +       if (audit_enabled == AUDIT_OFF)
>                 return NULL;
>         audit_buf = audit_log_start(audit_context(), GFP_ATOMIC,
>                                     AUDIT_MAC_IPSEC_EVENT);
> diff --git a/kernel/audit.c b/kernel/audit.c
> index e7478cb..8442c65 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -83,9 +83,6 @@
>  #define AUDIT_INITIALIZED      1
>  static int     audit_initialized;
>
> -#define AUDIT_OFF      0
> -#define AUDIT_ON       1
> -#define AUDIT_LOCKED   2
>  u32            audit_enabled = AUDIT_OFF;
>  bool           audit_ever_enabled = !!AUDIT_OFF;
>
> diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
> index f368ee6..af883f1 100644
> --- a/net/netfilter/xt_AUDIT.c
> +++ b/net/netfilter/xt_AUDIT.c
> @@ -72,7 +72,7 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
>         struct audit_buffer *ab;
>         int fam = -1;
>
> -       if (audit_enabled == 0)
> +       if (audit_enabled == AUDIT_OFF)
>                 goto errout;
>         ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
>         if (ab == NULL)
> diff --git a/net/netlabel/netlabel_user.c b/net/netlabel/netlabel_user.c
> index 2f328af..4676f5b 100644
> --- a/net/netlabel/netlabel_user.c
> +++ b/net/netlabel/netlabel_user.c
> @@ -101,7 +101,7 @@ struct audit_buffer *netlbl_audit_start_common(int type,
>         char *secctx;
>         u32 secctx_len;
>
> -       if (audit_enabled == 0)
> +       if (audit_enabled == AUDIT_OFF)
>                 return NULL;
>
>         audit_buf = audit_log_start(audit_context(), GFP_ATOMIC, type);
> --
> 1.8.3.1
>


-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: David Ahern @ 2018-06-19 15:02 UTC (permalink / raw)
  To: Quentin Monnet, dsahern, netdev, borkmann, ast; +Cc: davem
In-Reply-To: <a856c7da-da18-9460-3eff-4d25ed6852ea@netronome.com>

On 6/19/18 3:36 AM, Quentin Monnet wrote:
> Since you are about to respin (I think?), could you please also fix the
> formatting in your change to the doc? The "BPF_FIB_LKUP_RET_" is not
> emphasized (and will even cause an error message when producing the man
> page, because of the trailing underscore that gets interpreted in RST),
> and the three cases for the return value are not formatted properly for
> the conversion.
> 
> Something like the following would work:
> 
> ---
>  *     Return
>  *		* < 0 if any input argument is invalid.
>  *		*   0 on success (packet is forwarded and nexthop neighbor exists).
>  *		* > 0: one of **BPF_FIB_LKUP_RET_** codes on FIB lookup response.
> ---
> 

Will do. thanks for the review.

^ permalink raw reply

* Re: [PATCH rdma-next 0/3] Dump and fill MKEY
From: Leon Romanovsky @ 2018-06-19 14:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Guy Levi, Yishai Hadas,
	Yonatan Cohen, Saeed Mahameed, linux-netdev
In-Reply-To: <20180619142106.GA27457@mellanox.com>

On Tue, Jun 19, 2018 at 08:21:06AM -0600, Jason Gunthorpe wrote:
> On Tue, Jun 19, 2018 at 08:47:21AM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > MLX5 IB HCA offers the memory key, dump_fill_mkey to increase
> > performance, when used in a send or receive operations.
> >
> > It is used to force local HCA operations to skip the PCI bus access,
> > while keeping track of the processed length in the ibv_sge handling.
> >
> > In this three patch series, we expose various bits in our HW
> > spec file (mlx5_ifc.h), move unneeded for mlx5_core FW command and
> > export such memory key to user space thought our mlx5-abi header file.
>
> Where is the user space for this?

It will be published in the near future.

Thanks

>
> Jason

^ permalink raw reply

* RE: [PATCH] net/phy: Micrel KSZ8061 PHY link failure after cable connect
From: Onnasch, Alexander (EXT) @ 2018-06-19 14:23 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20180606123947.GB14898@lunn.ch>

Hi Andrew
thanks for the hint. But actually I cannot confirm - or I don't see it yet. 

Without having tested, just from the code, the struct phy_driver instance for PHY_ID_KSZ8061 in micrel.c does not have a .write_mmd function assigned, thus phy_write_mmd should evaluate to its else-clause (see below) and not to mdiobus_write (as in phy_write).

Also the ksz8061_extended_write() function which I have added uses the same principle as already existing HW-specific functions in micrel.c for simular reasons (kszphy_extended_write and ksz9031_extended_write).
They use phy_write all over the place in that file and never phy_write_mmd - for whatever reason they had.
Thus I thought it would be a good idea ...

best regards, Alex


static inline int phy_write(struct phy_device *phydev, u32 regnum, u16 val)
{
	return mdiobus_write(phydev->mdio.bus, phydev->mdio.addr, regnum, val);
}



int phy_write_mmd(struct phy_device *phydev, int devad, u32 regnum, u16 val)
{
	int ret;

	if (regnum > (u16)~0 || devad > 32)
		return -EINVAL;

	if (phydev->drv->write_mmd) {
		ret = phydev->drv->write_mmd(phydev, devad, regnum, val);
	} else if (phydev->is_c45) {
		u32 addr = MII_ADDR_C45 | (devad << 16) | (regnum & 0xffff);

		ret = mdiobus_write(phydev->mdio.bus, phydev->mdio.addr,
				    addr, val);
	} else {
		struct mii_bus *bus = phydev->mdio.bus;
		int phy_addr = phydev->mdio.addr;

		mutex_lock(&bus->mdio_lock);
		mmd_phy_indirect(bus, phy_addr, devad, regnum);

		/* Write the data into MMD's selected register */
		bus->write(bus, phy_addr, MII_MMD_DATA, val);
		mutex_unlock(&bus->mdio_lock);

		ret = 0;
	}
	return ret;
}

-----Original Message-----
From: Andrew Lunn <andrew@lunn.ch> 
Sent: Mittwoch, 6. Juni 2018 14:40
To: Onnasch, Alexander (EXT) <Alexander.Onnasch@landisgyr.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] net/phy: Micrel KSZ8061 PHY link failure after cable connect

On Wed, Jun 06, 2018 at 10:03:18AM +0200, Alexander Onnasch wrote:
> With Micrel KSZ8061 PHY, the link may occasionally not come up after 
> Ethernet cable connect. The vendor's (Microchip, former Micrel) errata 
> sheet 80000688A.pdf describes the problem and possible workarounds in 
> detail, see below.
> The patch implements workaround 1, which permanently fixes the issue.
> 
> DESCRIPTION
> Link-up may not occur properly when the Ethernet cable is initially 
> connected. This issue occurs more commonly when the cable is connected 
> slowly, but it may occur any time a cable is connected. This issue 
> occurs in the auto-negotiation circuit, and will not occur if 
> auto-negotiation is disabled (which requires that the two link 
> partners be set to the same speed and duplex).
> 
> END USER IMPLICATIONS
> When this issue occurs, link is not established. Subsequent cable 
> plug/unplug cycles will not correct the issue.
> 
> WORK AROUND
> There are four approaches to work around this issue:
> 1.  This issue can be prevented by setting bit 15 in MMD device address 1,
>     register 2, prior to connecting the  cable or prior to setting the
>     Restart Auto-Negotiation bit in register 0h.The MMD registers are
>     accessed via the indirect access registers Dh and Eh, or via the Micrel
>     EthUtil utility as shown here:
>     •  If using the EthUtil utility (usually with a Micrel KSZ8061
>        Evaluation Board), type the following commands:
>        > address 1
>        > mmd 1
>        > iw 2 b61a
>     •  Alternatively, write the following registers to write to the
>        indirect MMD register:
>        Write register Dh, data 0001h
>        Write register Eh, data 0002h
>        Write register Dh, data 4001h
>        Write register Eh, data B61Ah
> 2.  The issue can be avoided by disabling auto-negotiation in the KSZ8061,
>     either by the strapping option, or by clearing bit 12 in register 0h.
>     Care must be taken to ensure that the KSZ8061 and the link partner
>     will link with the same speed and duplex. Note that the KSZ8061
>     defaults to full-duplex when auto-negotiation is off, but other
>     devices may default to half-duplex in the event of failed
>     auto-negotiation.
> 3.  The issue can be avoided by connecting the cable prior to powering-up
>     or resetting the KSZ8061, and leaving it plugged in thereafter.
> 4.  If the above measures are not taken and the problem occurs, link can
>     be recovered by setting the Restart Auto-Negotiation bit in
>     register 0h, or by resetting or power cycling the device. Reset may
>     be either hardware reset or software reset (register 0h, bit 15).
> 
> PLAN
> This errata will not be corrected in a future revision.
> 
> Signed-off-by: Alexander Onnasch <alexander.onnasch@landisgyr.com>
> ---
>  drivers/net/phy/micrel.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 
> 6c45ff6..7d80a00 100644
> --- a/drivers/net/phy/micrel.c
> +++ b/drivers/net/phy/micrel.c
> @@ -339,6 +339,28 @@ static int ksz8041_config_aneg(struct phy_device *phydev)
>  	return genphy_config_aneg(phydev);
>  }
>  
> +#define MII_KSZ8061RN_MMD_CTRL_REG	    0x0d
> +#define MII_KSZ8061RN_MMD_REGDATA_REG	0x0e
> +
> +static int ksz8061_extended_write(struct phy_device *phydev,
> +				  u8 mode, u32 dev_addr, u32 regnum, u16 val) {
> +	phy_write(phydev, MII_KSZ8061RN_MMD_CTRL_REG, dev_addr);
> +	phy_write(phydev, MII_KSZ8061RN_MMD_REGDATA_REG, regnum);
> +	phy_write(phydev, MII_KSZ8061RN_MMD_CTRL_REG, (mode << 14) | dev_addr);
> +	return phy_write(phydev, MII_KSZ8061RN_MMD_REGDATA_REG, val); }

Hi Alexander

This looks a lot like phy_write_mmd().

     Andrew

^ permalink raw reply

* Re: [PATCH rdma-next 0/3] Dump and fill MKEY
From: Jason Gunthorpe @ 2018-06-19 14:21 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Guy Levi,
	Yishai Hadas, Yonatan Cohen, Saeed Mahameed, linux-netdev
In-Reply-To: <20180619054724.32677-1-leon@kernel.org>

On Tue, Jun 19, 2018 at 08:47:21AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> MLX5 IB HCA offers the memory key, dump_fill_mkey to increase
> performance, when used in a send or receive operations.
> 
> It is used to force local HCA operations to skip the PCI bus access,
> while keeping track of the processed length in the ibv_sge handling.
> 
> In this three patch series, we expose various bits in our HW
> spec file (mlx5_ifc.h), move unneeded for mlx5_core FW command and
> export such memory key to user space thought our mlx5-abi header file.

Where is the user space for this?

Jason

^ permalink raw reply

* NETLINK_ROUTE policy routing questions
From: Cunningham, Joel @ 2018-06-19 14:17 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Daniel Wagner, connman@lists.01.org

Hi,

I've been working through an issue with Connman and NETLINK_ROUTE messages as it relates to policy routing.  Background can be seen here: https://lists.01.org/pipermail/connman/2018-June/022846.html

I'm using kernel 4.9.27 from ASOP releases and had a couple of questions of how NETLINK_ROUTE is intended to work and whether or not we are seeing a kernel bug.

Connman has long-running NETLINK_ROUTE socket which binds with:

memset(&addr, 0, sizeof(addr));
addr.nl_family = AF_NETLINK;
addr.nl_groups = RTMGRP_LINK | RTMGRP_IPV4_IFADDR | RTMGRP_IPV4_ROUTE |
		RTMGRP_IPV6_IFADDR | RTMGRP_IPV6_ROUTE |
		(1<<(RTNLGRP_ND_USEROPT-1));

Connman also creates other short-lived NETLINK_ROUTE sockets to perform setters, in particular, we have RTM_NEWROUTE and RTM_DELROUTE with a custom routing table.  Connman uses policy routing to create a session based routing table.  When a new interface comes online and has a gateway, Connman adds a default route to a new custom routing table.  When this happens, we get a RTM_NEWROUTE message for the main table (254), but we never receive any RTM_NEWROUTE/RTM_DELROUTE messages for our custom table.  Should NETLINK_ROUTE sockets bound to RTMGRP_IPV4_ROUTE be receiving updates for custom tables or only table ID < 256?

The other behavior which I ran into was originally my kernel didn't have CONFIG_IP_MULTIPLE_TABLES enabled and when Connman sent RTM_NEWROUTE/DELROUTE with the custom table, we got NETLINK_ROUTE messages for these actions being applied to the main table (254).  This was corrected by enabling CONFIG_IP_MULTIPLE_TABLES in the kernel, but I was still just curious if using table 254 was a fallback mechanism rather than the setter returning an error.

Thanks,

Joel

^ permalink raw reply

* [PATCH] net/usb/drivers: Remove useless hrtimer_active check
From: Daniel Lezcano @ 2018-06-19 14:14 UTC (permalink / raw)
  To: oliver
  Cc: David S. Miller, open list:USB CDC ETHERNET DRIVER,
	open list:NETWORKING DRIVERS, open list

The code does:

 if (hrtimer_active(&t))
    hrtimer_cancel(&t);

However, hrtimer_cancel() checks if the timer is active, so the
test above is pointless.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/net/usb/cdc_ncm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 90d07ed..732ceff 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -967,8 +967,7 @@ void cdc_ncm_unbind(struct usbnet *dev, struct usb_interface *intf)
 
 	atomic_set(&ctx->stop, 1);
 
-	if (hrtimer_active(&ctx->tx_timer))
-		hrtimer_cancel(&ctx->tx_timer);
+	hrtimer_cancel(&ctx->tx_timer);
 
 	tasklet_kill(&ctx->bh);
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] net: ethernet: fix suspend/resume in davinci_emac
From: Lukas Wunner @ 2018-06-19 13:52 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Grygorii Strashko, David S . Miller, Florian Fainelli,
	Dan Carpenter, Ivan Khoronzhuk, Rob Herring, Kevin Hilman,
	David Lechner, Sekhar Nori, linux-omap, netdev, linux-kernel,
	Bartosz Golaszewski, stable
In-Reply-To: <20180619124400.11878-1-brgl@bgdev.pl>

On Tue, Jun 19, 2018 at 02:44:00PM +0200, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> 
> This patch reverts commit 3243ff2a05ec ("net: ethernet: davinci_emac:
> Deduplicate bus_find_device() by name matching") and adds a comment
> which should stop anyone from reintroducing the same "fix" in the future.
> 
> We can't use bus_find_device_by_name() here because the device name is
> not guaranteed to be 'davinci_mdio'. On some systems it can be
> 'davinci_mdio.0' so we need to use strncmp() against the first part of
> the string to correctly match it.
> 
> Fixes: 3243ff2a05ec ("net: ethernet: davinci_emac: Deduplicate bus_find_device() by name matching")
> Cc: stable@vger.kernel.org
> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>

Acked-by: Lukas Wunner <lukas@wunner.de>

My apologies Bartosz, it wasn't clear to me that the driver deliberately
only matched against the prefix of the name.  Sorry for the breakage.


> ---
>  drivers/net/ethernet/ti/davinci_emac.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
> index 06d7c9e4dcda..a1a6445b5a7e 100644
> --- a/drivers/net/ethernet/ti/davinci_emac.c
> +++ b/drivers/net/ethernet/ti/davinci_emac.c
> @@ -1385,6 +1385,11 @@ static int emac_devioctl(struct net_device *ndev, struct ifreq *ifrq, int cmd)
>  		return -EOPNOTSUPP;
>  }
>  
> +static int match_first_device(struct device *dev, void *data)
> +{
> +	return !strncmp(dev_name(dev), "davinci_mdio", 12);
> +}
> +
>  /**
>   * emac_dev_open - EMAC device open
>   * @ndev: The DaVinci EMAC network adapter
> @@ -1484,8 +1489,14 @@ static int emac_dev_open(struct net_device *ndev)
>  
>  	/* use the first phy on the bus if pdata did not give us a phy id */
>  	if (!phydev && !priv->phy_id) {
> -		phy = bus_find_device_by_name(&mdio_bus_type, NULL,
> -					      "davinci_mdio");
> +		/* NOTE: we can't use bus_find_device_by_name() here because
> +		 * the device name is not guaranteed to be 'davinci_mdio'. On
> +		 * some systems it can be 'davinci_mdio.0' so we need to use
> +		 * strncmp() against the first part of the string to correctly
> +		 * match it.
> +		 */
> +		phy = bus_find_device(&mdio_bus_type, NULL, NULL,
> +				      match_first_device);
>  		if (phy) {
>  			priv->phy_id = dev_name(phy);
>  			if (!priv->phy_id || !*priv->phy_id)
> -- 
> 2.17.1
> 

^ permalink raw reply

* [PATCH net] net/sched: act_ife: preserve the action control in case of error
From: Davide Caratti @ 2018-06-19 13:45 UTC (permalink / raw)
  To: Jamal Hadi Salim, Cong Wang; +Cc: David S. Miller, netdev

in the following script

 # tc actions add action ife encode allow prio pass index 42
 # tc actions replace action ife encode allow tcindex drop index 42

the action control should remain equal to 'pass', if the kernel failed
to replace the TC action. Pospone the assignment of the action control,
to ensure it is not overwritten in the error path of tcf_ife_init().

Fixes: ef6980b6becb ("introduce IFE action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/sched/act_ife.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 078d52212172..20d7d36b2fc9 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -517,8 +517,6 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 			saddr = nla_data(tb[TCA_IFE_SMAC]);
 	}
 
-	ife->tcf_action = parm->action;
-
 	if (parm->flags & IFE_ENCODE) {
 		if (daddr)
 			ether_addr_copy(p->eth_dst, daddr);
@@ -575,6 +573,7 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 		}
 	}
 
+	ife->tcf_action = parm->action;
 	if (exists)
 		spin_unlock_bh(&ife->tcf_lock);
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH net] net/sched: act_ife: fix recursive lock and idr leak
From: Davide Caratti @ 2018-06-19 13:39 UTC (permalink / raw)
  To: Jamal Hadi Salim, Cong Wang; +Cc: David S. Miller, netdev

a recursive lock warning [1] can be observed with the following script,

 # $TC actions add action ife encode allow prio pass index 42
 IFE type 0xED3E
 # $TC actions replace action ife encode allow tcindex pass index 42

in case the kernel was unable to run the last command (e.g. because of
the impossibility to load 'act_meta_skbtcindex'). For a similar reason,
the kernel can leak idr in the error path of tcf_ife_init(), because
tcf_idr_release() is not called after successful idr reservation:

 # $TC actions add action ife encode allow tcindex index 47
 IFE type 0xED3E
 RTNETLINK answers: No such file or directory
 We have an error talking to the kernel
 # $TC actions add action ife encode allow tcindex index 47
 IFE type 0xED3E
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47
 IFE type 0xFEFE
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

Since tcfa_lock is already taken when the action is being edited, a call
to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock
again. On the other hand, tcf_idr_release() needs to be called in the
error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation.
Fix both problems in tcf_ife_init().
Since the cleanup() routine can now be called when ife->params is NULL,
also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu).

 [1]
 ============================================
 WARNING: possible recursive locking detected
 4.17.0-rc4.kasan+ #417 Tainted: G            E
 --------------------------------------------
 tc/3932 is trying to acquire lock:
 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife]

 but task is already holding lock:
 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&p->tcfa_lock)->rlock);
   lock(&(&p->tcfa_lock)->rlock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 2 locks held by tc/3932:
  #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife]
  #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]

 stack backtrace:
 CPU: 3 PID: 3932 Comm: tc Tainted: G            E     4.17.0-rc4.kasan+ #417
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Call Trace:
  dump_stack+0x9a/0xeb
  __lock_acquire+0xf43/0x34a0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? __mutex_lock+0x62f/0x1240
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  ? lock_acquire+0x10b/0x330
  lock_acquire+0x10b/0x330
  ? tcf_ife_cleanup+0x19/0x80 [act_ife]
  _raw_spin_lock_bh+0x38/0x70
  ? tcf_ife_cleanup+0x19/0x80 [act_ife]
  tcf_ife_cleanup+0x19/0x80 [act_ife]
  __tcf_idr_release+0xff/0x350
  tcf_ife_init+0xdde/0x13c0 [act_ife]
  ? ife_exit_net+0x290/0x290 [act_ife]
  ? __lock_is_held+0xb4/0x140
  tcf_action_init_1+0x67b/0xad0
  ? tcf_action_dump_old+0xa0/0xa0
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? memset+0x1f/0x40
  tcf_action_init+0x30f/0x590
  ? tcf_action_init_1+0xad0/0xad0
  ? memset+0x1f/0x40
  tc_ctl_action+0x48e/0x5e0
  ? mutex_lock_io_nested+0x1160/0x1160
  ? tca_action_gd+0x990/0x990
  ? sched_clock+0x5/0x10
  ? find_held_lock+0x39/0x1d0
  rtnetlink_rcv_msg+0x4da/0x990
  ? validate_linkmsg+0x680/0x680
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  netlink_rcv_skb+0x127/0x350
  ? validate_linkmsg+0x680/0x680
  ? netlink_ack+0x970/0x970
  ? __kmalloc_node_track_caller+0x304/0x3a0
  netlink_unicast+0x40f/0x5d0
  ? netlink_attachskb+0x580/0x580
  ? _copy_from_iter_full+0x187/0x760
  ? import_iovec+0x90/0x390
  netlink_sendmsg+0x67f/0xb50
  ? netlink_unicast+0x5d0/0x5d0
  ? copy_msghdr_from_user+0x206/0x340
  ? netlink_unicast+0x5d0/0x5d0
  sock_sendmsg+0xb3/0xf0
  ___sys_sendmsg+0x60a/0x8b0
  ? copy_msghdr_from_user+0x340/0x340
  ? lock_downgrade+0x5e0/0x5e0
  ? tty_write_lock+0x18/0x50
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  ? lock_downgrade+0x5e0/0x5e0
  ? lock_acquire+0x10b/0x330
  ? __audit_syscall_entry+0x316/0x690
  ? current_kernel_time64+0x6b/0xd0
  ? __fget_light+0x55/0x1f0
  ? __sys_sendmsg+0xd2/0x170
  __sys_sendmsg+0xd2/0x170
  ? __ia32_sys_shutdown+0x70/0x70
  ? syscall_trace_enter+0x57a/0xd60
  ? rcu_read_lock_sched_held+0xdc/0x110
  ? __bpf_trace_sys_enter+0x10/0x10
  ? do_syscall_64+0x22/0x480
  do_syscall_64+0xa5/0x480
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7fd646988ba0
 RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0
 RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003
 RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000
 R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100

Fixes: 4e8c86155010 ("net sched: net sched: ife action fix late binding")
Fixes: ef6980b6becb ("introduce IFE action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/sched/act_ife.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 8527cfdc446d..078d52212172 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -415,7 +415,8 @@ static void tcf_ife_cleanup(struct tc_action *a)
 	spin_unlock_bh(&ife->tcf_lock);
 
 	p = rcu_dereference_protected(ife->params, 1);
-	kfree_rcu(p, rcu);
+	if (p)
+		kfree_rcu(p, rcu);
 }
 
 /* under ife->tcf_lock for existing action */
@@ -543,10 +544,8 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 				       NULL, NULL);
 		if (err) {
 metadata_parse_err:
-			if (exists)
-				tcf_idr_release(*a, bind);
 			if (ret == ACT_P_CREATED)
-				_tcf_ife_cleanup(*a);
+				tcf_idr_release(*a, bind);
 
 			if (exists)
 				spin_unlock_bh(&ife->tcf_lock);
@@ -567,7 +566,7 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 		err = use_all_metadata(ife);
 		if (err) {
 			if (ret == ACT_P_CREATED)
-				_tcf_ife_cleanup(*a);
+				tcf_idr_release(*a, bind);
 
 			if (exists)
 				spin_unlock_bh(&ife->tcf_lock);
-- 
2.17.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox