Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v4 02/17] net: sched: protect chain->explicitly_created with block->lock
From: Jiri Pirko @ 2019-02-11 14:15 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, davem, ast, daniel
In-Reply-To: <20190211085548.7190-3-vladbu@mellanox.com>

Mon, Feb 11, 2019 at 09:55:33AM CET, vladbu@mellanox.com wrote:
>In order to remove dependency on rtnl lock, protect
>tcf_chain->explicitly_created flag with block->lock. Consolidate code that
>checks and resets 'explicitly_created' flag into __tcf_chain_put() to
>execute it atomically with rest of code that puts chain reference.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next v4 01/17] net: sched: protect block state with mutex
From: Jiri Pirko @ 2019-02-11 14:15 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, davem, ast, daniel
In-Reply-To: <20190211085548.7190-2-vladbu@mellanox.com>

Mon, Feb 11, 2019 at 09:55:32AM CET, vladbu@mellanox.com wrote:
>Currently, tcf_block doesn't use any synchronization mechanisms to protect
>critical sections that manage lifetime of its chains. block->chain_list and
>multiple variables in tcf_chain that control its lifetime assume external
>synchronization provided by global rtnl lock. Converting chain reference
>counting to atomic reference counters is not possible because cls API uses
>multiple counters and flags to control chain lifetime, so all of them must
>be synchronized in chain get/put code.
>
>Use single per-block lock to protect block data and manage lifetime of all
>chains on the block. Always take block->lock when accessing chain_list.
>Chain get and put modify chain lifetime-management data and parent block's
>chain_list, so take the lock in these functions. Verify block->lock state
>with assertions in functions that expect to be called with the lock taken
>and are called from multiple places. Take block->lock when accessing
>filter_chain_list.
>
>In order to allow parallel update of rules on single block, move all calls
>to classifiers outside of critical sections protected by new block->lock.
>Rearrange chain get and put functions code to only access protected chain
>data while holding block lock:
>- Rearrange code to only access chain reference counter and chain action
>  reference counter while holding block lock.
>- Extract code that requires block->lock from tcf_chain_destroy() into
>  standalone tcf_chain_destroy() function that is called by
>  __tcf_chain_put() in same critical section that changes chain reference
>  counters.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* [PATCH net-next 1/4] net: phy: Mask-out non-compatible modes when setting the max-speed
From: Maxime Chevallier @ 2019-02-11 14:25 UTC (permalink / raw)
  To: davem
  Cc: Maxime Chevallier, netdev, linux-kernel, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King, linux-arm-kernel,
	Antoine Tenart, thomas.petazzoni, gregory.clement, miquel.raynal,
	nadavh, stefanc, mw
In-Reply-To: <20190211142529.22885-1-maxime.chevallier@bootlin.com>

When setting a PHY's max speed using either the max-speed DT property
or ethtool, we should mask-out all non-compatible modes according to the
settings table, instead of just the 10/100BASET modes.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/phy/phy-core.c   | 45 ++++++++++++++++++++++++++++++
 drivers/net/phy/phy_device.c | 53 ------------------------------------
 include/linux/phy.h          |  1 +
 3 files changed, 46 insertions(+), 53 deletions(-)

diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index cdea028d1328..855abf487279 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -4,6 +4,7 @@
  */
 #include <linux/export.h>
 #include <linux/phy.h>
+#include <linux/of.h>
 
 const char *phy_speed_to_str(int speed)
 {
@@ -338,6 +339,50 @@ size_t phy_speeds(unsigned int *speeds, size_t size,
 	return count;
 }
 
+static int __set_phy_supported(struct phy_device *phydev, u32 max_speed)
+{
+	const struct phy_setting *p;
+	int i;
+
+	for (i = 0, p = settings; i < ARRAY_SIZE(settings); i++, p++) {
+		if (p->speed > max_speed)
+			linkmode_clear_bit(p->bit, phydev->supported);
+		else
+			break;
+	}
+
+	return 0;
+}
+
+int phy_set_max_speed(struct phy_device *phydev, u32 max_speed)
+{
+	int err;
+
+	err = __set_phy_supported(phydev, max_speed);
+	if (err)
+		return err;
+
+	linkmode_copy(phydev->advertising, phydev->supported);
+
+	return 0;
+}
+EXPORT_SYMBOL(phy_set_max_speed);
+
+void of_set_phy_supported(struct phy_device *phydev)
+{
+	struct device_node *node = phydev->mdio.dev.of_node;
+	u32 max_speed;
+
+	if (!IS_ENABLED(CONFIG_OF_MDIO))
+		return;
+
+	if (!node)
+		return;
+
+	if (!of_property_read_u32(node, "max-speed", &max_speed))
+		__set_phy_supported(phydev, max_speed);
+}
+
 /**
  * phy_resolve_aneg_linkmode - resolve the advertisements into phy settings
  * @phydev: The phy_device struct
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 3d14e48aebc5..65e18293d8e5 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1964,44 +1964,6 @@ int genphy_loopback(struct phy_device *phydev, bool enable)
 }
 EXPORT_SYMBOL(genphy_loopback);
 
-static int __set_phy_supported(struct phy_device *phydev, u32 max_speed)
-{
-	switch (max_speed) {
-	case SPEED_10:
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT,
-				   phydev->supported);
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT,
-				   phydev->supported);
-		/* fall through */
-	case SPEED_100:
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
-				   phydev->supported);
-		linkmode_clear_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
-				   phydev->supported);
-		break;
-	case SPEED_1000:
-		break;
-	default:
-		return -ENOTSUPP;
-	}
-
-	return 0;
-}
-
-int phy_set_max_speed(struct phy_device *phydev, u32 max_speed)
-{
-	int err;
-
-	err = __set_phy_supported(phydev, max_speed);
-	if (err)
-		return err;
-
-	linkmode_copy(phydev->advertising, phydev->supported);
-
-	return 0;
-}
-EXPORT_SYMBOL(phy_set_max_speed);
-
 /**
  * phy_remove_link_mode - Remove a supported link mode
  * @phydev: phy_device structure to remove link mode from
@@ -2132,21 +2094,6 @@ bool phy_validate_pause(struct phy_device *phydev,
 }
 EXPORT_SYMBOL(phy_validate_pause);
 
-static void of_set_phy_supported(struct phy_device *phydev)
-{
-	struct device_node *node = phydev->mdio.dev.of_node;
-	u32 max_speed;
-
-	if (!IS_ENABLED(CONFIG_OF_MDIO))
-		return;
-
-	if (!node)
-		return;
-
-	if (!of_property_read_u32(node, "max-speed", &max_speed))
-		__set_phy_supported(phydev, max_speed);
-}
-
 static void of_set_phy_eee_broken(struct phy_device *phydev)
 {
 	struct device_node *node = phydev->mdio.dev.of_node;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 378da9a6165e..20344c7744d8 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -673,6 +673,7 @@ phy_lookup_setting(int speed, int duplex, const unsigned long *mask,
 		   bool exact);
 size_t phy_speeds(unsigned int *speeds, size_t size,
 		  unsigned long *mask);
+void of_set_phy_supported(struct phy_device *phydev);
 
 static inline bool __phy_is_started(struct phy_device *phydev)
 {
-- 
2.19.2


^ permalink raw reply related

* [PATCH net-next 2/4] net: phy: Move of_set_phy_eee_broken to phy-core.c
From: Maxime Chevallier @ 2019-02-11 14:25 UTC (permalink / raw)
  To: davem
  Cc: Maxime Chevallier, netdev, linux-kernel, Andrew Lunn,
	Florian Fainelli, Heiner Kallweit, Russell King, linux-arm-kernel,
	Antoine Tenart, thomas.petazzoni, gregory.clement, miquel.raynal,
	nadavh, stefanc, mw
In-Reply-To: <20190211142529.22885-1-maxime.chevallier@bootlin.com>

Since of_set_phy_supported was moved to phy-core.c, we can also move
of_set_phy_eee_broken to the same location, so that we have all OF
functions in the same place.

This patch doesn't intend to introduce any change in behaviour.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/phy/phy-core.c   | 27 +++++++++++++++++++++++++++
 drivers/net/phy/phy_device.c | 28 ----------------------------
 include/linux/phy.h          |  1 +
 3 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 855abf487279..de58a59815d5 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -383,6 +383,33 @@ void of_set_phy_supported(struct phy_device *phydev)
 		__set_phy_supported(phydev, max_speed);
 }
 
+void of_set_phy_eee_broken(struct phy_device *phydev)
+{
+	struct device_node *node = phydev->mdio.dev.of_node;
+	u32 broken = 0;
+
+	if (!IS_ENABLED(CONFIG_OF_MDIO))
+		return;
+
+	if (!node)
+		return;
+
+	if (of_property_read_bool(node, "eee-broken-100tx"))
+		broken |= MDIO_EEE_100TX;
+	if (of_property_read_bool(node, "eee-broken-1000t"))
+		broken |= MDIO_EEE_1000T;
+	if (of_property_read_bool(node, "eee-broken-10gt"))
+		broken |= MDIO_EEE_10GT;
+	if (of_property_read_bool(node, "eee-broken-1000kx"))
+		broken |= MDIO_EEE_1000KX;
+	if (of_property_read_bool(node, "eee-broken-10gkx4"))
+		broken |= MDIO_EEE_10GKX4;
+	if (of_property_read_bool(node, "eee-broken-10gkr"))
+		broken |= MDIO_EEE_10GKR;
+
+	phydev->eee_broken_modes = broken;
+}
+
 /**
  * phy_resolve_aneg_linkmode - resolve the advertisements into phy settings
  * @phydev: The phy_device struct
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 65e18293d8e5..626f0527f36a 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -30,7 +30,6 @@
 #include <linux/mdio.h>
 #include <linux/io.h>
 #include <linux/uaccess.h>
-#include <linux/of.h>
 
 MODULE_DESCRIPTION("PHY library");
 MODULE_AUTHOR("Andy Fleming");
@@ -2094,33 +2093,6 @@ bool phy_validate_pause(struct phy_device *phydev,
 }
 EXPORT_SYMBOL(phy_validate_pause);
 
-static void of_set_phy_eee_broken(struct phy_device *phydev)
-{
-	struct device_node *node = phydev->mdio.dev.of_node;
-	u32 broken = 0;
-
-	if (!IS_ENABLED(CONFIG_OF_MDIO))
-		return;
-
-	if (!node)
-		return;
-
-	if (of_property_read_bool(node, "eee-broken-100tx"))
-		broken |= MDIO_EEE_100TX;
-	if (of_property_read_bool(node, "eee-broken-1000t"))
-		broken |= MDIO_EEE_1000T;
-	if (of_property_read_bool(node, "eee-broken-10gt"))
-		broken |= MDIO_EEE_10GT;
-	if (of_property_read_bool(node, "eee-broken-1000kx"))
-		broken |= MDIO_EEE_1000KX;
-	if (of_property_read_bool(node, "eee-broken-10gkx4"))
-		broken |= MDIO_EEE_10GKX4;
-	if (of_property_read_bool(node, "eee-broken-10gkr"))
-		broken |= MDIO_EEE_10GKR;
-
-	phydev->eee_broken_modes = broken;
-}
-
 static bool phy_drv_supports_irq(struct phy_driver *phydrv)
 {
 	return phydrv->config_intr && phydrv->ack_interrupt;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 20344c7744d8..1a1d93a2a906 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -674,6 +674,7 @@ phy_lookup_setting(int speed, int duplex, const unsigned long *mask,
 size_t phy_speeds(unsigned int *speeds, size_t size,
 		  unsigned long *mask);
 void of_set_phy_supported(struct phy_device *phydev);
+void of_set_phy_eee_broken(struct phy_device *phydev);
 
 static inline bool __phy_is_started(struct phy_device *phydev)
 {
-- 
2.19.2


^ permalink raw reply related

* [PATCH v2] xsk: share the mmap_sem for page pinning
From: Davidlohr Bueso @ 2019-02-11 16:15 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, David S . Miller, Bjorn Topel,
	Magnus Karlsson, netdev, Davidlohr Bueso
In-Reply-To: <20190207053740.26915-2-dave@stgolabs.net>

Holding mmap_sem exclusively for a gup() is an overkill.
Lets share the lock and replace the gup call for gup_longterm(),
as it is better suited for the lifetime of the pinning.

Cc: David S. Miller <davem@davemloft.net>
Cc: Bjorn Topel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
CC: netdev@vger.kernel.org
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 net/xdp/xdp_umem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 5ab236c5c9a5..e7fa8d0d7090 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -265,10 +265,10 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
 	if (!umem->pgs)
 		return -ENOMEM;
 
-	down_write(&current->mm->mmap_sem);
-	npgs = get_user_pages(umem->address, umem->npgs,
+	down_read(&current->mm->mmap_sem);
+	npgs = get_user_pages_longterm(umem->address, umem->npgs,
 			      gup_flags, &umem->pgs[0], NULL);
-	up_write(&current->mm->mmap_sem);
+	up_read(&current->mm->mmap_sem);
 
 	if (npgs != umem->npgs) {
 		if (npgs >= 0) {
-- 
2.16.4


^ permalink raw reply related

* Re: [PATCH v2] xsk: share the mmap_sem for page pinning
From: Björn Töpel @ 2019-02-11 16:21 UTC (permalink / raw)
  To: akpm, linux-mm, LKML, David S . Miller, Bjorn Topel,
	Magnus Karlsson, Netdev, Davidlohr Bueso
In-Reply-To: <20190211161529.uskq5ca7y3j5522i@linux-r8p5>

Den mån 11 feb. 2019 kl 17:15 skrev Davidlohr Bueso <dave@stgolabs.net>:
>
> Holding mmap_sem exclusively for a gup() is an overkill.
> Lets share the lock and replace the gup call for gup_longterm(),
> as it is better suited for the lifetime of the pinning.
>

Thanks for the cleanup!

Acked-by: Björn Töpel <bjorn.topel@intel.com>

> Cc: David S. Miller <davem@davemloft.net>
> Cc: Bjorn Topel <bjorn.topel@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> CC: netdev@vger.kernel.org
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  net/xdp/xdp_umem.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 5ab236c5c9a5..e7fa8d0d7090 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -265,10 +265,10 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
>         if (!umem->pgs)
>                 return -ENOMEM;
>
> -       down_write(&current->mm->mmap_sem);
> -       npgs = get_user_pages(umem->address, umem->npgs,
> +       down_read(&current->mm->mmap_sem);
> +       npgs = get_user_pages_longterm(umem->address, umem->npgs,
>                               gup_flags, &umem->pgs[0], NULL);
> -       up_write(&current->mm->mmap_sem);
> +       up_read(&current->mm->mmap_sem);
>
>         if (npgs != umem->npgs) {
>                 if (npgs >= 0) {
> --
> 2.16.4
>

^ permalink raw reply

* Re: [PATCH v2 bpf] bpf: fix lockdep false positive in stackmap
From: Eric Dumazet @ 2019-02-11 16:35 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, David Miller, Peter Zijlstra, longman,
	Jann Horn, netdev, kernel-team
In-Reply-To: <d6d3bf01-4c8f-8b4d-073c-a8222f5ba677@iogearbox.net>

On Mon, Feb 11, 2019 at 7:39 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 02/10/2019 09:52 PM, Alexei Starovoitov wrote:
> > Lockdep warns about false positive:
> > [   11.211460] ------------[ cut here ]------------
> > [   11.211936] DEBUG_LOCKS_WARN_ON(depth <= 0)
> > [   11.211985] WARNING: CPU: 0 PID: 141 at ../kernel/locking/lockdep.c:3592 lock_release+0x1ad/0x280
> > [   11.213134] Modules linked in:
> > [   11.214954] RIP: 0010:lock_release+0x1ad/0x280
> > [   11.223508] Call Trace:
> > [   11.223705]  <IRQ>
> > [   11.223874]  ? __local_bh_enable+0x7a/0x80
> > [   11.224199]  up_read+0x1c/0xa0
> > [   11.224446]  do_up_read+0x12/0x20
> > [   11.224713]  irq_work_run_list+0x43/0x70
> > [   11.225030]  irq_work_run+0x26/0x50
> > [   11.225310]  smp_irq_work_interrupt+0x57/0x1f0
> > [   11.225662]  irq_work_interrupt+0xf/0x20
> >
> > since rw_semaphore is released in a different task vs task that locked the sema.
> > It is expected behavior.
> > Fix the warning with up_read_non_owner() and rwsem_release() annotation.
> >
> > Fixes: bae77c5eb5b2 ("bpf: enable stackmap with build_id in nmi context")
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>
> Applied, thanks!

Thanks everyone !

^ permalink raw reply

* Re: [PATCH 1/2] xsk: do not use mmap_sem
From: Dan Williams @ 2019-02-11 16:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Björn Töpel, Davidlohr Bueso, Andrew Morton, Linux MM,
	LKML, David S . Miller, Bjorn Topel, Magnus Karlsson, Netdev,
	Davidlohr Bueso, Weiny, Ira
In-Reply-To: <d92b7b49-81e6-1ac5-4ae4-4909f87bbea8@iogearbox.net>

[ add Ira ]

On Mon, Feb 11, 2019 at 7:33 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> [ +Dan ]
>
> On 02/07/2019 08:43 AM, Björn Töpel wrote:
> > Den tors 7 feb. 2019 kl 06:38 skrev Davidlohr Bueso <dave@stgolabs.net>:
> >>
> >> Holding mmap_sem exclusively for a gup() is an overkill.
> >> Lets replace the call for gup_fast() and let the mm take
> >> it if necessary.
> >>
> >> Cc: David S. Miller <davem@davemloft.net>
> >> Cc: Bjorn Topel <bjorn.topel@intel.com>
> >> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> >> CC: netdev@vger.kernel.org
> >> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> >> ---
> >>  net/xdp/xdp_umem.c | 6 ++----
> >>  1 file changed, 2 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> >> index 5ab236c5c9a5..25e1e76654a8 100644
> >> --- a/net/xdp/xdp_umem.c
> >> +++ b/net/xdp/xdp_umem.c
> >> @@ -265,10 +265,8 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
> >>         if (!umem->pgs)
> >>                 return -ENOMEM;
> >>
> >> -       down_write(&current->mm->mmap_sem);
> >> -       npgs = get_user_pages(umem->address, umem->npgs,
> >> -                             gup_flags, &umem->pgs[0], NULL);
> >> -       up_write(&current->mm->mmap_sem);
> >> +       npgs = get_user_pages_fast(umem->address, umem->npgs,
> >> +                                  gup_flags, &umem->pgs[0]);
> >>
> >
> > Thanks for the patch!
> >
> > The lifetime of the pinning is similar to RDMA umem mapping, so isn't
> > gup_longterm preferred?
>
> Seems reasonable from reading what gup_longterm seems to do. Davidlohr
> or Dan, any thoughts on the above?

Yes, any gup where the lifetime of the corresponding put_page() is
under direct control of userspace should be using the _longterm flavor
to coordinate be careful in the presence of dax. In the dax case there
is no page cache indirection to coordinate truncate vs page pins. Ira
is looking to add a "fast + longterm" flavor for cases like this.

^ permalink raw reply

* Re: bpf: test_tunnel.sh: BUG: unable to handle kernel NULL pointer dereference
From: Alan Maguire @ 2019-02-11 16:39 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Netdev, open list, open list:KERNEL SELFTEST FRAMEWORK,
	Linux-Next Mailing List, David S. Miller, Valdis Kletnieks,
	Song Liu, Rafael Tinoco, ast, Daniel Borkmann
In-Reply-To: <CA+G9fYu-RXRUPyTeAfSQjXXbtGQeTkbhns9-L5ZVhm12G3xhmQ@mail.gmail.com>

On Fri, 1 Feb 2019, Naresh Kamboju wrote:

> Kernel panic while running bpf: test_tunnel.sh on linux -next
> 5.0.0-rc4-next-20190129.
> 
> selftests: bpf: test_tunnel.sh
> [  273.997647] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> [  274.054068] ip (11458) used greatest stack depth: 11448 bytes left
> [  274.120445] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000000
> [  274.128285] #PF error: [INSTR]
> [  274.131351] PGD 8000000414a0e067 P4D 8000000414a0e067 PUD 3b6334067 PMD 0
> [  274.138241] Oops: 0010 [#1] SMP PTI
> [  274.141734] CPU: 1 PID: 11464 Comm: ping Not tainted
> 5.0.0-rc4-next-20190129 #1
> [  274.149046] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.0b 07/27/2017
> [  274.156526] RIP: 0010:          (null)
> [  274.160280] Code: Bad RIP value.
> [  274.163509] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
> [  274.168726] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
> [  274.175851] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
> [  274.182974] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
> [  274.190098] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
> [  274.197222] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
> [  274.204346] FS:  00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
> knlGS:0000000000000000
> [  274.212424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  274.218162] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
> [  274.225292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  274.232416] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  274.239541] Call Trace:
> [  274.241988]  ? tnl_update_pmtu+0x296/0x3b0
> [  274.246085]  ip_md_tunnel_xmit+0x1bc/0x520
> [  274.250176]  gre_fb_xmit+0x330/0x390
> [  274.253754]  gre_tap_xmit+0x128/0x180
> [  274.257414]  dev_hard_start_xmit+0xb7/0x300
> [  274.261598]  sch_direct_xmit+0xf6/0x290
> [  274.265430]  __qdisc_run+0x15d/0x5e0
> [  274.269007]  __dev_queue_xmit+0x2c5/0xc00
> [  274.273011]  ? dev_queue_xmit+0x10/0x20
> [  274.276842]  ? eth_header+0x2b/0xc0
> [  274.280326]  dev_queue_xmit+0x10/0x20
> [  274.283984]  ? dev_queue_xmit+0x10/0x20
> [  274.287813]  arp_xmit+0x1a/0xf0
> [  274.290952]  arp_send_dst.part.19+0x46/0x60
> [  274.295138]  arp_solicit+0x177/0x6b0
> [  274.298708]  ? mod_timer+0x18e/0x440
> [  274.302281]  neigh_probe+0x57/0x70
> [  274.305684]  __neigh_event_send+0x197/0x2d0
> [  274.309862]  neigh_resolve_output+0x18c/0x210
> [  274.314212]  ip_finish_output2+0x257/0x690
> [  274.318304]  ip_finish_output+0x219/0x340
> [  274.322314]  ? ip_finish_output+0x219/0x340
> [  274.326493]  ip_output+0x76/0x240
> [  274.329805]  ? ip_fragment.constprop.53+0x80/0x80
> [  274.334510]  ip_local_out+0x3f/0x70
> [  274.337992]  ip_send_skb+0x19/0x40
> [  274.341391]  ip_push_pending_frames+0x33/0x40
> [  274.345740]  raw_sendmsg+0xc15/0x11d0
> [  274.349403]  ? __might_fault+0x85/0x90
> [  274.353151]  ? _copy_from_user+0x6b/0xa0
> [  274.357070]  ? rw_copy_check_uvector+0x54/0x130
> [  274.361604]  inet_sendmsg+0x42/0x1c0
> [  274.365179]  ? inet_sendmsg+0x42/0x1c0
> [  274.368937]  sock_sendmsg+0x3e/0x50
> [  274.372460]  ___sys_sendmsg+0x26f/0x2d0
> [  274.376293]  ? lock_acquire+0x95/0x190
> [  274.380043]  ? __handle_mm_fault+0x7ce/0xb70
> [  274.384307]  ? lock_acquire+0x95/0x190
> [  274.388053]  ? __audit_syscall_entry+0xdd/0x130
> [  274.392586]  ? ktime_get_coarse_real_ts64+0x64/0xc0
> [  274.397461]  ? __audit_syscall_entry+0xdd/0x130
> [  274.401989]  ? trace_hardirqs_on+0x4c/0x100
> [  274.406173]  __sys_sendmsg+0x63/0xa0
> [  274.409744]  ? __sys_sendmsg+0x63/0xa0
> [  274.413488]  __x64_sys_sendmsg+0x1f/0x30
> [  274.417405]  do_syscall_64+0x55/0x190
> [  274.421064]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  274.426113] RIP: 0033:0x7ff4ae0e6e87
> [  274.429686] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00
> 00 00 00 8b 05 ca d9 2b 00 48 63 d2 48 63 ff 85 c0 75 10 b8 2e 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 53 48 89 f3 48 83 ec 10 48 89 7c
> 24 08
> [  274.448422] RSP: 002b:00007ffcd9b76db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002e
> [  274.455978] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007ff4ae0e6e87
> [  274.463104] RDX: 0000000000000000 RSI: 00000000006092e0 RDI: 0000000000000003
> [  274.470228] RBP: 0000000000000000 R08: 00007ffcd9bc40a0 R09: 00007ffcd9bc4080
> [  274.477349] R10: 000000000000060a R11: 0000000000000246 R12: 0000000000000003
> [  274.484475] R13: 0000000000000016 R14: 00007ffcd9b77fa0 R15: 00007ffcd9b78da4
> [  274.491602] Modules linked in: cls_bpf sch_ingress iptable_filter
> ip_tables algif_hash af_alg x86_pkg_temp_thermal fuse [last unloaded:
> test_bpf]
> [  274.504634] CR2: 0000000000000000
> [  274.507976] ---[ end trace 196d18386545eae1 ]---
> [  274.512588] RIP: 0010:          (null)
> [  274.516334] Code: Bad RIP value.
> [  274.519557] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286
> [  274.524775] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000
> [  274.531921] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0
> [  274.539082] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c
> [  274.546205] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000
> [  274.553329] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400
> [  274.560456] FS:  00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000)
> knlGS:0000000000000000
> [  274.568541] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  274.574277] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0
> [  274.581403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  274.588535] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  274.595658] Kernel panic - not syncing: Fatal exception in interrupt
> [  274.602046] Kernel Offset: 0x14400000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [  274.612827] ---[ end Kernel panic - not syncing: Fatal exception in
> interrupt ]---
> [  274.620387] ------------[ cut here ]------------
> 
> The above log is from x86_64 device.
> 

 
I'm also seeing the same panic during test_tunnel.sh execution
on x86_64.

From poking around it looks like the skb's dst entry is being used
to calculate the mtu in:

mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;

...but because that dst_entry  has an "ops" value set to md_dst_ops,
the various ops (including mtu) are not set:

crash> struct sk_buff._skb_refdst ffff928f87447700 -x
      _skb_refdst = 0xffffcd6fbf5ea590
crash> struct dst_entry.ops 0xffffcd6fbf5ea590
  ops = 0xffffffffa0193800
crash> struct dst_ops.mtu 0xffffffffa0193800
  mtu = 0x0
crash>

I confirmed that the dst entry also has dst->input set to
dst_md_discard, so it looks like it's an entry that's been
initialized via __metadata_dst_init alright.

I think the fix here is to use skb_valid_dst(skb) - it checks
for  DST_METADATA also, and with that fix in place, the
problem - which was previously 100% reproducible - disappears.

However what we should do in terms of path MTU setting for
such cases (use ip*_update_pmtu() perhaps?), I'm not sure.

The below patch resolves the panic and all tunnel tests pass
without incident.

Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 net/ipv4/ip_tunnel.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 893f013..5dcf50c 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -515,9 +515,10 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
 		mtu = dst_mtu(&rt->dst) - dev->hard_header_len
 					- sizeof(struct iphdr) - tunnel_hlen;
 	else
-		mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
+		mtu = skb_valid_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
 
-	skb_dst_update_pmtu(skb, mtu);
+	if (skb_valid_dst(skb))
+		skb_dst_update_pmtu(skb, mtu);
 
 	if (skb->protocol == htons(ETH_P_IP)) {
 		if (!skb_is_gso(skb) &&
@@ -530,9 +531,11 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
 	}
 #if IS_ENABLED(CONFIG_IPV6)
 	else if (skb->protocol == htons(ETH_P_IPV6)) {
-		struct rt6_info *rt6 = (struct rt6_info *)skb_dst(skb);
+		struct rt6_info *rt6;
 		__be32 daddr;
 
+		rt6 = skb_valid_dst(skb) ? (struct rt6_info *)skb_dst(skb) :
+					   NULL;
 		daddr = md ? dst : tunnel->parms.iph.daddr;
 
 		if (rt6 && mtu < dst_mtu(skb_dst(skb)) &&
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 0/2] Netfilter fixes for net
From: Pablo Neira Ayuso @ 2019-02-11 16:53 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains Netfilter fixes for net:

1) Out-of-bound access to packet data from the snmp nat helper,
   from Jann Horn.

2) ICMP(v6) error packets are set as related traffic by conntrack,
   update protocol number before calling nf_nat_ipv4_manip_pkt()
   to use ICMP(v6) rather than the original protocol number,
   from Florian Westphal.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit 31b58ad0c3279817cd246eab27eaf53b626dfcde:

  Merge branch 'r8169-revert-two-commits-due-to-a-regression' (2019-02-10 12:54:49 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 8303b7e8f018724a2cd7752eb29c2801fa8c4067:

  netfilter: nat: fix spurious connection timeouts (2019-02-11 17:43:17 +0100)

----------------------------------------------------------------
Florian Westphal (1):
      netfilter: nat: fix spurious connection timeouts

Jann Horn (1):
      netfilter: nf_nat_snmp_basic: add missing length checks in ASN.1 cbs

 net/ipv4/netfilter/nf_nat_l3proto_ipv4.c    | 1 +
 net/ipv4/netfilter/nf_nat_snmp_basic_main.c | 7 ++++++-
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c    | 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)

^ permalink raw reply

* [PATCH 1/2] netfilter: nf_nat_snmp_basic: add missing length checks in ASN.1 cbs
From: Pablo Neira Ayuso @ 2019-02-11 16:53 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190211165319.17965-1-pablo@netfilter.org>

From: Jann Horn <jannh@google.com>

The generic ASN.1 decoder infrastructure doesn't guarantee that callbacks
will get as much data as they expect; callbacks have to check the `datalen`
parameter before looking at `data`. Make sure that snmp_version() and
snmp_helper() don't read/write beyond the end of the packet data.

(Also move the assignment to `pdata` down below the check to make it clear
that it isn't necessarily a pointer we can use before the `datalen` check.)

Fixes: cc2d58634e0f ("netfilter: nf_nat_snmp_basic: use asn1 decoder library")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/nf_nat_snmp_basic_main.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic_main.c b/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
index a0aa13bcabda..0a8a60c1bf9a 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
@@ -105,6 +105,8 @@ static void fast_csum(struct snmp_ctx *ctx, unsigned char offset)
 int snmp_version(void *context, size_t hdrlen, unsigned char tag,
 		 const void *data, size_t datalen)
 {
+	if (datalen != 1)
+		return -EINVAL;
 	if (*(unsigned char *)data > 1)
 		return -ENOTSUPP;
 	return 1;
@@ -114,8 +116,11 @@ int snmp_helper(void *context, size_t hdrlen, unsigned char tag,
 		const void *data, size_t datalen)
 {
 	struct snmp_ctx *ctx = (struct snmp_ctx *)context;
-	__be32 *pdata = (__be32 *)data;
+	__be32 *pdata;
 
+	if (datalen != 4)
+		return -EINVAL;
+	pdata = (__be32 *)data;
 	if (*pdata == ctx->from) {
 		pr_debug("%s: %pI4 to %pI4\n", __func__,
 			 (void *)&ctx->from, (void *)&ctx->to);
-- 
2.11.0


^ permalink raw reply related

* [PATCH 2/2] netfilter: nat: fix spurious connection timeouts
From: Pablo Neira Ayuso @ 2019-02-11 16:53 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190211165319.17965-1-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

Sander Eikelenboom bisected a NAT related regression down
to the l4proto->manip_pkt indirection removal.

I forgot that ICMP(v6) errors (e.g. PKTTOOBIG) can be set as related
to the existing conntrack entry.

Therefore, when passing the skb to nf_nat_ipv4/6_manip_pkt(), that
ended up calling the wrong l4 manip function, as tuple->dst.protonum
is the original flows l4 protocol (TCP, UDP, etc).

Set the dst protocol field to ICMP(v6), we already have a private copy
of the tuple due to the inversion of src/dst.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Fixes: faec18dbb0405 ("netfilter: nat: remove l4proto->manip_pkt")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/nf_nat_l3proto_ipv4.c | 1 +
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
index 2687db015b6f..fa2ba7c500e4 100644
--- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -215,6 +215,7 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb,
 
 	/* Change outer to look like the reply to an incoming packet */
 	nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+	target.dst.protonum = IPPROTO_ICMP;
 	if (!nf_nat_ipv4_manip_pkt(skb, 0, &target, manip))
 		return 0;
 
diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
index 23022447eb49..7a41ee3c11b4 100644
--- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -226,6 +226,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
 	}
 
 	nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+	target.dst.protonum = IPPROTO_ICMPV6;
 	if (!nf_nat_ipv6_manip_pkt(skb, 0, &target, manip))
 		return 0;
 
-- 
2.11.0


^ permalink raw reply related

* Re: [RFC 1/3] devlink: add flash update command
From: Jiri Pirko @ 2019-02-11 16:45 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, oss-drivers, mkubecek, andrew
In-Reply-To: <20190211065923.22670-2-jakub.kicinski@netronome.com>

Mon, Feb 11, 2019 at 07:59:20AM CET, jakub.kicinski@netronome.com wrote:
>Add devlink flash update command. Advanced NICs have firmware
>stored in flash and often cryptographically secured. Updating
>that flash is handled by management firmware. Ethtool has a
>flash update command which served us well, however, it has two
>shortcomings:
> - it takes rtnl_lock unnecessarily - really flash update has
>   nothing to do with networking, so using a networking device
>   as a handle is suboptimal, which leads us to the second one:
> - it requires a functioning netdev - in case device enters an
>   error state and can't spawn a netdev (e.g. communication
>   with the device fails) there is no netdev to use as a handle
>   for flashing.
>
>Devlink already has the ability to report the firmware versions,
>now with the ability to update the firmware/flash we will be
>able to recover devices in bad state.
>
>To enable easy interoperability with ethtool add the target
>partition ID. We may or may not add a different method of
>identification, but there is no such immediate need.
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> include/net/devlink.h        |  2 ++
> include/uapi/linux/devlink.h |  6 ++++++
> net/core/devlink.c           | 30 ++++++++++++++++++++++++++++++
> 3 files changed, 38 insertions(+)
>
>diff --git a/include/net/devlink.h b/include/net/devlink.h
>index 07660fe4c0e3..55b3478b1291 100644
>--- a/include/net/devlink.h
>+++ b/include/net/devlink.h
>@@ -529,6 +529,8 @@ struct devlink_ops {
> 				      struct netlink_ext_ack *extack);
> 	int (*info_get)(struct devlink *devlink, struct devlink_info_req *req,
> 			struct netlink_ext_ack *extack);
>+	int (*flash_update)(struct devlink *devlink, const char *path,
>+			    u32 target, struct netlink_ext_ack *extack);
> };
> 
> static inline void *devlink_priv(struct devlink *devlink)
>diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
>index 72d9f7c89190..f4417283fd1b 100644
>--- a/include/uapi/linux/devlink.h
>+++ b/include/uapi/linux/devlink.h
>@@ -103,6 +103,8 @@ enum devlink_command {
> 	DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
> 	DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
> 
>+	DEVLINK_CMD_FLASH_UPDATE,
>+
> 	/* add new commands above here */
> 	__DEVLINK_CMD_MAX,
> 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
>@@ -326,6 +328,10 @@ enum devlink_attr {
> 	DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS,		/* u64 */
> 	DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,	/* u64 */
> 	DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,	/* u8 */
>+
>+	DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME,	/* string */
>+	DEVLINK_ATTR_FLASH_UPDATE_TARGET_ID,	/* u32 */

Do we need to carry this on? I mean, the ef->region is only checked in 4
drivers against ETHTOOL_FLASH_ALL_REGIONS, which is default.
There is this bnxt driver which is the only one working with this value.
I think that for compat, there should be some id-region mapping
provided by driver which the compat layer would use to translate.

I also think that this should be in sync with what is returned in
DEVLINK_ATTR_INFO_VERSION_NAME.

For example:
$ devlink dev info pci/0000:82:00.0
pci/0000:82:00.0:
  driver nfp
  serial_number 16240145
  versions:
      fixed:
        board.id AMDA0081-0001
        board.rev 15
        board.vendor SMA
        board.model hydrogen
      running:
        fw.mgmt 010181.010181.0101d4
        fw.cpld 0x1030000
        fw.app abm-d372b6
        fw.undi 0.0.2
        chip.init AMDA-0081-0001  20160318164536
      stored:
        fw.mgmt 010181.010181.0101d4
        fw.app abm-d372b6
        fw.undi 0.0.2
        chip.init AMDA-0081-0001  20160318164536

Now user should be able to use one of the identifiers to flash relevant
fw, like:

devlink dev flash pci/0000:82:00.0 XXX fw.mgmt file flash-boot.bin

I'm not sure about the name of "xxx" attribute. Maybe "id":

devlink dev flash pci/0000:82:00.0 id fw.mgmt file flash-boot.bin
devlink dev flash pci/0000:82:00.0 id fw.cpld file some-other.bin



[...]

^ permalink raw reply

* Re: [net-next PATCH 1/2] mm: add dma_addr_t to struct page
From: Matthew Wilcox @ 2019-02-11 16:55 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, linux-mm, Toke Høiland-Jørgensen,
	Ilias Apalodimas, Saeed Mahameed, Andrew Morton, mgorman,
	David S. Miller, Tariq Toukan
In-Reply-To: <154990120685.24530.15350136329514629029.stgit@firesoul>

On Mon, Feb 11, 2019 at 05:06:46PM +0100, Jesper Dangaard Brouer wrote:
> The page_pool API is using page->private to store DMA addresses.
> As pointed out by David Miller we can't use that on 32-bit architectures
> with 64-bit DMA
> 
> This patch adds a new dma_addr_t struct to allow storing DMA addresses
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Reviewed-by: Matthew Wilcox <willy@infradead.org>

> +		struct {	/* page_pool used by netstack */
> +			/**
> +			 * @dma_addr: Page_pool need to store DMA-addr, and

s/need/needs/

> +			 * cannot use @private, as DMA-mappings can be 64-bit

s/DMA-mappings/DMA addresses/

> +			 * even on 32-bit Architectures.

s/A/a/

> +			 */
> +			dma_addr_t dma_addr; /* Shares area with @lru */

It also shares with @slab_list, @next, @compound_head, @pgmap and
@rcu_head.  I think it's pointless to try to document which other fields
something shares space with; the places which do it are a legacy from
before I rearranged struct page last year.  Anyone looking at this should
now be able to see "Oh, this is a union, only use the fields which are
in the union for the type of struct page I have here".

Are the pages allocated from this API ever supposed to be mapped to
userspace?

You also say in the documentation:

 * If no DMA mapping is done, then it can act as shim-layer that
 * fall-through to alloc_page.  As no state is kept on the page, the
 * regular put_page() call is sufficient.

I think this is probably a dangerous precedent to set.  Better to require
exactly one call to page_pool_put_page() (with the understanding that the
refcount may be elevated, so this may not be the final free of the page,
but the page will no longer be usable for its page_pool purpose).

^ permalink raw reply

* Re: [PATCH net] sctp: make sctp_setsockopt_events() less strict about the option length
From: Marcelo Ricardo Leitner @ 2019-02-11 17:05 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, julien, netdev, linux-sctp, linux-kernel, vyasevich,
	lucien.xin
In-Reply-To: <20190211150432.GA13525@hmswarspite.think-freely.org>

On Mon, Feb 11, 2019 at 10:04:32AM -0500, Neil Horman wrote:
> On Sun, Feb 10, 2019 at 10:46:16AM -0200, Marcelo Ricardo Leitner wrote:
> > On Sat, Feb 09, 2019 at 03:12:17PM -0800, David Miller wrote:
> > > From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > Date: Wed, 6 Feb 2019 18:37:54 -0200
> > > 
> > > > On Wed, Feb 06, 2019 at 12:14:30PM -0800, Julien Gomes wrote:
> > > >> Make sctp_setsockopt_events() able to accept sctp_event_subscribe
> > > >> structures longer than the current definitions.
> > > >> 
> > > >> This should prevent unjustified setsockopt() failures due to struct
> > > >> sctp_event_subscribe extensions (as in 4.11 and 4.12) when using
> > > >> binaries that should be compatible, but were built with later kernel
> > > >> uapi headers.
> > > > 
> > > > Not sure if we support backwards compatibility like this?
> > > 
> > > What a complete mess we have here.
> > > 
> > > Use new socket option numbers next time, do not change the size and/or
> > > layout of existing socket options.
> > 
> > What about reusing the same socket option, but defining a new struct?
> > Say, MYSOCKOPT supports struct mysockopt, struct mysockopt2, struct
> > mysockopt3...
> > 
> > That way we have a clear definition of the user's intent.
> > 
> Thats possible, but I think thats pretty equivalaent to what daves saying, in
> that he wants us to identify all the sizes of this struct and the git history
> and act on them accordingly.  Having internal versions of the struct seems like
> a fine way to get there, but I think we need to consider how we got to this
> situations before we go down the implementation path.

I was more referring to future stuff, but yes. I find it a bit easier
to handle than having to switch the sockopt too and so far I couldn't
find drawbacks to it.

That is, when using a new sockopt, we could accept a buffer larger
than the needed, but I'm not considering that as a valid point
anymore. Putting this compatibility aside for a moment, that pretty
much means the user doesn't know what it wants and so we also don't.

> 
> > > 
> > > This whole thread, if you read it, is basically "if we compatability
> > > this way, that breaks, and if we do compatability this other way oh
> > > shit this other thing doesn't work."
> > > 
> > > I think we really need to specifically check for the difference sizes
> > > that existed one by one, clear out the part not given by the user, and
> > > backport this as far back as possible in a way that in the older kernels
> > > we see if the user is actually trying to use the new features and if so
> > > error out.
> > 
> > I'm afraid clearing out may not be enough, though seems it's the best
> > we can do so far. If the struct is allocated but not fully initialized
> > via a memset, but by setting its fields one by one, the remaining new
> > fields will be left uninitinialized.
> > 
> 
> I'm not sure this even makes sense.  Currently (as I understood it), the issue
> we are facing is the one in which an application is built against a newer kernel
> and run on an older one, the implication there being that the application will
> pass in a buffer that is larger than what the kernel expects.  In that
> situation, clearing isn't needed, all thats needed (I think), is a memcmp of the
> space between the sizeof(kernel struct version), and sizeof(userspace struct
> version) to see if any bits are non-zero.  If they are, we error out, otherwise,
> we ignore the space and move forward as though that overage doesn't exist.

That's exactly what I tried to mean. :-)

> 
> Mind you, I'm not (yet) advocating for that approach, just trying to clarify
> whats needed.

Ok.

> > > 
> > > Which, btw, is terrible behavior.  Newly compiled apps should work on
> > > older kernels if they don't try to use the new features, and if they
> > 
> > One use case here is: a given distro is using kernel X and app Foo is
> > built against it. Then upgrades to X+1, Foo is patched to fix an issue
> > and is rebuilt against X+1. The user upgrades Foo package but for
> > whatever reason, doesn't upgrade kernel or reboot the system. Here,
> > Foo doesn't work anymore until the new kernel is also running.
> > 
> Yes, thats the use case that we're trying to address.
> 
> > > can the ones that want to try to use the new features should be able
> > > to fall back when that feature isn't available in a non-ambiguous
> > > and precisely defined way.
> > > 
> > > The fact that the use of the new feature is hidden in the new
> > > structure elements is really rotten.
> > > 
> > > This patch, at best, needs some work and definitely a longer and more
> > > detailed commit message.
> > 
> FWIW, before we decide on a course of action, I think I need to point out that,
> over the last 10 years, we've extended this structure 6 times, in the following
> commits:
> 0f3fffd8ab1db
> 7e8616d8e7731
> e1cdd553d482c
> 35ea82d611da5
> c95129d127c6d
> b444153fb5a64
> 
> The first two I believe were modifications during a period when sctp was
> actually getting integrated to the kernel, but the last 4 were definately done
> during more recent development periods and wen't in without any commentary about
> the impact to UAPI compatibility.  The check for optlen > sizeof(struct
> sctp_event_subscribe) was made back in 2008, and while not spelled out, seems
> pretty clearly directed at enforcing compatibility with older appliations, not
> compatibility with newer applications running on older kernels.
> 
> I really worry about situations in which we need to support applications
> expecting features that the running kernel doesn't have.  In this particular
> situation it seems like a fixable thing, but I could envision situations in
> which we just can't do it, and I don't want to set that expectation when we
> can't consistently meet it.
> 
> So, if the consensus is that we need to support applications built on newer
> kernels, but run on older kernels (and I'd like to get verbal consensus on

Yes from my side.

> that), then we need to identify a method to fix this.  I'm still hesitant to
> do anything that involves us accepting any size buffer over the kernel expected
> size, as that puts us in a position to have to read large amounts of user data
> (i.e. possible DOS), and just picking an arbitrary large number to limit the
> buffer size seems wrong.  What if, on receipt of a structure from a newer kernel
> (implying a size larger than what the kernel expects), we clamp optlen to the
> kernel size, and put_user it back to the application?  i.e. we don't check any

We can't do that on setsockopt calls, as optlen is R/O there.
Returning > 0 is not specified on setsockopt(2).

> data above and beyond what the the kernel knows about, but we use the optlen as
> an indicator to user space that not all the data was processed?  That allows the
> kernel to ignore the overage safely, and while its not in the socket api
> extension RFC, its not violating anything, and is something we can document in
> the sctp(7) man page as a linux only behavior.
> 
> Thoughts?
> Neil

I also need to dig deeper on this, but in general what if we draw
a line based on the current implementation:
- Current struct is X bytes long
- Patch current and older kernels to accept up to X bytes, as long as
  the trailing bytes are zeroed. Otherwise, EINVAL.
  X may be a magic number for old kernel, but this way we avoid
  unbounded buffers and the limit is not random.
- On further changes, create a new, explicitly versioned struct.
  Older kernels will EINVAL if this new struct is used, which is
  expected.
  Newer kernels will then have to cope with the different
  sizes/structs accordingly.


  Marcelo

^ permalink raw reply

* Re: linux-next: Tree for Feb 11 (wireless/80211)
From: Randy Dunlap @ 2019-02-11 17:07 UTC (permalink / raw)
  To: Stephen Rothwell, Linux Next Mailing List
  Cc: Linux Kernel Mailing List, linux-wireless, netdev@vger.kernel.org,
	Johannes Berg
In-Reply-To: <20190211183951.3fe1dad9@canb.auug.org.au>

On 2/10/19 11:39 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20190208:
> 

on i386:

ERROR: "__umoddi3" [net/wireless/cfg80211.ko] undefined!
ERROR: "__umoddi3" [net/mac80211/mac80211.ko] undefined!


-- 
~Randy

^ permalink raw reply

* Re: [RFC, PATCH] net: page_pool: Don't use page->private to store dma_addr_t
From: Eric Dumazet @ 2019-02-11 17:14 UTC (permalink / raw)
  To: Tariq Toukan, Ilias Apalodimas, Matthew Wilcox
  Cc: David Miller, brouer@redhat.com, toke@redhat.com,
	netdev@vger.kernel.org, mgorman@techsingularity.net,
	linux-mm@kvack.org
In-Reply-To: <bfd83487-7073-18c8-6d89-e50fe9a83313@mellanox.com>



On 02/11/2019 12:53 AM, Tariq Toukan wrote:
> 

> Hi,
> 
> It's great to use the struct page to store its dma mapping, but I am 
> worried about extensibility.
> page_pool is evolving, and it would need several more per-page fields. 
> One of them would be pageref_bias, a planned optimization to reduce the 
> number of the costly atomic pageref operations (and replace existing 
> code in several drivers).
> 

But the point about pageref_bias is to place it in a different cache line than "struct page"

The major cost is having a cache line bouncing between producer and consumer.

pageref_bias means the producer only have to read the "struct page" and not dirty it
in the case the page can be recycled.



> I would replace this dma field with a pointer to an extensible struct, 
> that would contain the dma mapping (and other stuff in the near future).
> This pointer fits perfectly with the existing unsigned long private; 
> they can share the memory, for both 32- and 64-bits systems.
> 
> The only downside is one more pointer de-reference. This should be perf 
> tested.
> However, when introducing the page refcnt bias optimization into 
> page_pool, I believe the perf gain would be guaranteed.

Only in some cases perhaps (when the cache line can be dirtied without performance hit)


^ permalink raw reply

* RE: [PATCH 1/2] xsk: do not use mmap_sem
From: Weiny, Ira @ 2019-02-11 17:16 UTC (permalink / raw)
  To: Williams, Dan J, Daniel Borkmann
  Cc: Björn Töpel, Davidlohr Bueso, Andrew Morton, Linux MM,
	LKML, David S . Miller, Topel, Bjorn, Karlsson, Magnus, Netdev,
	Davidlohr Bueso
In-Reply-To: <CAPcyv4gjUmRdV1jZegLecocj155m7dpQLxQSnF_HQQErD8Gtag@mail.gmail.com>

> > >> ---
> > >>  net/xdp/xdp_umem.c | 6 ++----
> > >>  1 file changed, 2 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index
> > >> 5ab236c5c9a5..25e1e76654a8 100644
> > >> --- a/net/xdp/xdp_umem.c
> > >> +++ b/net/xdp/xdp_umem.c
> > >> @@ -265,10 +265,8 @@ static int xdp_umem_pin_pages(struct
> xdp_umem *umem)
> > >>         if (!umem->pgs)
> > >>                 return -ENOMEM;
> > >>
> > >> -       down_write(&current->mm->mmap_sem);
> > >> -       npgs = get_user_pages(umem->address, umem->npgs,
> > >> -                             gup_flags, &umem->pgs[0], NULL);
> > >> -       up_write(&current->mm->mmap_sem);
> > >> +       npgs = get_user_pages_fast(umem->address, umem->npgs,
> > >> +                                  gup_flags, &umem->pgs[0]);
> > >>
> > >
> > > Thanks for the patch!
> > >
> > > The lifetime of the pinning is similar to RDMA umem mapping, so
> > > isn't gup_longterm preferred?
> >
> > Seems reasonable from reading what gup_longterm seems to do. Davidlohr
> > or Dan, any thoughts on the above?
> 
> Yes, any gup where the lifetime of the corresponding put_page() is under
> direct control of userspace should be using the _longterm flavor to
> coordinate be careful in the presence of dax. In the dax case there is no page
> cache indirection to coordinate truncate vs page pins. Ira is looking to add a
> "fast + longterm" flavor for cases like this.

Yes I'm just about ready with a small patch set to add a GUP fast longterm.

I think this should wait for that series.

Ira
 

^ permalink raw reply

* [PATCH V1 net 0/2] net: ena: race condition bug fix and version update
From: akiyano @ 2019-02-11 17:17 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi

From: Arthur Kiyanovski <akiyano@amazon.com>

This patchset includes a fix to a race condition that can cause
kernel panic, as well as a driver version update because of this
fix.

Arthur Kiyanovski (2):
  net: ena: fix race between link up and device initalization
  net: ena: update driver version from 2.0.2 to 2.0.3

 drivers/net/ethernet/amazon/ena/ena_netdev.c | 10 +++++-----
 drivers/net/ethernet/amazon/ena/ena_netdev.h |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH V1 net 2/2] net: ena: update driver version from 2.0.2 to 2.0.3
From: akiyano @ 2019-02-11 17:17 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi
In-Reply-To: <1549905464-13758-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

Update driver version due to bug fix.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index dc8b6173d8d8..63870072cbbd 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -45,7 +45,7 @@
 
 #define DRV_MODULE_VER_MAJOR	2
 #define DRV_MODULE_VER_MINOR	0
-#define DRV_MODULE_VER_SUBMINOR 2
+#define DRV_MODULE_VER_SUBMINOR 3
 
 #define DRV_MODULE_NAME		"ena"
 #ifndef DRV_MODULE_VERSION
-- 
2.17.1


^ permalink raw reply related

* [PATCH V1 net 1/2] net: ena: fix race between link up and device initalization
From: akiyano @ 2019-02-11 17:17 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi
In-Reply-To: <1549905464-13758-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

Fix race condition between ena_update_on_link_change() and
ena_restore_device().

This race can occur if link notification arrives while the driver
is performing a reset sequence. In this case link can be set up,
enabling the device, before it is fully restored. If packets are
sent at this time, the driver might access uninitialized data
structures, causing kernel crash.

Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on()
after ena_up() to ensure the device is ready when link is set up.

Fixes: d18e4f683445 ("net: ena: fix race condition between device reset and link up setup")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index a70bb1bb90e7..a6eacf2099c3 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2663,11 +2663,6 @@ static int ena_restore_device(struct ena_adapter *adapter)
 		goto err_device_destroy;
 	}
 
-	clear_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags);
-	/* Make sure we don't have a race with AENQ Links state handler */
-	if (test_bit(ENA_FLAG_LINK_UP, &adapter->flags))
-		netif_carrier_on(adapter->netdev);
-
 	rc = ena_enable_msix_and_set_admin_interrupts(adapter,
 						      adapter->num_queues);
 	if (rc) {
@@ -2684,6 +2679,11 @@ static int ena_restore_device(struct ena_adapter *adapter)
 	}
 
 	set_bit(ENA_FLAG_DEVICE_RUNNING, &adapter->flags);
+
+	clear_bit(ENA_FLAG_ONGOING_RESET, &adapter->flags);
+	if (test_bit(ENA_FLAG_LINK_UP, &adapter->flags))
+		netif_carrier_on(adapter->netdev);
+
 	mod_timer(&adapter->timer_service, round_jiffies(jiffies + HZ));
 	dev_err(&pdev->dev,
 		"Device reset completed successfully, Driver info: %s\n",
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net V2] net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
From: Eric Dumazet @ 2019-02-11 17:24 UTC (permalink / raw)
  To: Tariq Toukan, David S. Miller
  Cc: netdev, Eran Ben Elisha, Saeed Mahameed, Eric Dumazet
In-Reply-To: <1549901057-2614-1-git-send-email-tariqt@mellanox.com>



On 02/11/2019 08:04 AM, Tariq Toukan wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
> 
> When an ethernet frame is padded to meet the minimum ethernet frame
> size, the padding octets are not covered by the hardware checksum.
> Fortunately the padding octets are usually zero's, which don't affect
> checksum. However, it is not guaranteed. For example, switches might
> choose to make other use of these octets.
> This repeatedly causes kernel hardware checksum fault.
> 
> Prior to the cited commit below, skb checksum was forced to be
> CHECKSUM_NONE when padding is detected. After it, we need to keep
> skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
> verify and parse IP headers, it does not worth the effort as the packets
> are so small that CHECKSUM_COMPLETE has no significant advantage.
> 
> Future work: when reporting checksum complete is not an option for
> IP non-TCP/UDP packets, we can actually fallback to report checksum
> unnecessary, by looking at cqe IPOK bit.
> 
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>

Thanks Tariq and Saeed

Reviewed-by: Eric Dumazet <edumazet@google.com>


^ permalink raw reply

* Re: Possible bug into DSA2 code.
From: Florian Fainelli @ 2019-02-11 17:28 UTC (permalink / raw)
  To: Rodolfo Giometti, Andrew Lunn; +Cc: Vivien Didelot, David S. Miller, netdev
In-Reply-To: <084b3973-88b9-8c71-50d5-9d773999ad04@enneenne.com>

On 2/10/19 3:45 AM, Rodolfo Giometti wrote:
> On 09/02/2019 20:34, Andrew Lunn wrote:
>>> So we I see two possible solutions:
>>>
>>> 1) having both ds->slave_mii_bus and ds->ops->phy_read already
>>> defined is an
>>> error, then it must be signaled to the calling code, or
>>
>> I don't think we can do that. mv88e6xxx optionally instantiates the
>> MDIO busses, depending on what is in device tree. If there is no mdio
>> property, we need the DSA core to create an MDIO bus.
> 
> OK, but using the following check to know if DSA did such allocation is
> not correct because DSA drivers can allocate it by their own:
> 
> static void dsa_switch_teardown(struct dsa_switch *ds)
> {
>         if (ds->slave_mii_bus && ds->ops->phy_read)
>                 mdiobus_unregister(ds->slave_mii_bus);
> 
> Maybe can we add a flag to register ds->slave_mii_bus allocation by DSA?

If drivers allocate the slave_mii_bus, or use it as a pointer to their
bus, then they should not be providing a ds->ops->phy_read() callback
since we assume they would have mii_bus::read and mii_bus::write set to
their driver internal version.

> 
>> Looking at the driver, ds->slave_mii_bus is assigned in
>> mv88e6xxx_setup().
>>
>> We have talked about adding a teardown() to the ops structure. This
>> seems like another argument we should do it. The mv88e6xxx_teardown()
>> can set ds->slave_mii_bus back to NULL, undoing what it did in the
>> setup code.
> 
> This seems reasonable to me, but in this case you have to call
> teardown() operation before calling mdiobus_unregister() into
> dsa_switch_teardown() or we still have the problem...
> 
> Ciao,
> 
> Rodolfo
> 


-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v3 0/9] net: Remove switchdev_ops
From: Florian Fainelli @ 2019-02-11 17:31 UTC (permalink / raw)
  To: netdev; +Cc: idosch, linux-kernel, devel, bridge, jiri, andrew, vivien.didelot
In-Reply-To: <20190210234007.16173-1-f.fainelli@gmail.com>

On 2/10/19 3:39 PM, Florian Fainelli wrote:
> Hi all,
> 
> This patch series finishes by the removal of switchdev_ops. To get there
> we need to do a few things:
> 
> - get rid of the one and only call to switchdev_port_attr_get() which is
>   used to fetch the device's bridge port flags capabilities, instead we
>   can just check what flags are being programmed during the prepare
>   phase
> 
> - once we get rid of getting switchdev port attributes we convert the
>   setting of such attributes using a blocking notifier
> 
> And then remove switchdev_ops completely.
> 
> Please review and let me know what you think!
> 
> David, I would like to get Ido's feedback on this to make sure I did not
> miss something, thank you!

David, I will be reposting a v4 with Jiri's Acked-by as well as adding
fallthrough switch/case annotations so we don't regress on that front.
Thank you.

> 
> Changes in v3:
> - dropped patches removing te need to get the attribute since we
>   still need that in order to support different sleeping vs.
>   non-sleeping contexts
> 
> Changes in v2:
> - fixed bisectability issues in patch #15
> - added Acked-by from Jiri where necessary
> - fixed a few minor issues according to Jiri's feedback:
> 	- rename port_attr_event -> port_attr_set_event
> 	- moved SWITCHDEV_PORT_ATTR_SET closer to other blocking events
> 
> Florian Fainelli (9):
>   Documentation: networking: switchdev: Update port parent ID section
>   switchdev: Add SWITCHDEV_PORT_ATTR_SET, SWITCHDEV_PORT_ATTR_GET
>   rocker: Handle SWITCHDEV_PORT_ATTR_GET/SET
>   mlxsw: spectrum_switchdev: Handle SWITCHDEV_PORT_ATTR_GET/SET
>   net: mscc: ocelot: Handle SWITCHDEV_PORT_ATTR_GET/SET
>   staging: fsl-dpaa2: ethsw: Handle SWITCHDEV_PORT_ATTR_GET/SET
>   net: dsa: Handle SWITCHDEV_PORT_ATTR_GET/SET
>   net: switchdev: Replace port attr get/set SDO with a notification
>   net: Remove switchdev_ops
> 
>  Documentation/networking/switchdev.txt        |  10 +-
>  .../net/ethernet/mellanox/mlxsw/spectrum.c    |  12 --
>  .../net/ethernet/mellanox/mlxsw/spectrum.h    |   2 -
>  .../mellanox/mlxsw/spectrum_switchdev.c       |  36 +++---
>  drivers/net/ethernet/mscc/ocelot.c            |  26 ++++-
>  drivers/net/ethernet/rocker/rocker_main.c     |  30 ++++-
>  drivers/staging/fsl-dpaa2/ethsw/ethsw.c       |  30 ++++-
>  include/linux/netdevice.h                     |   3 -
>  include/net/switchdev.h                       |  28 ++---
>  net/dsa/slave.c                               |  30 ++++-
>  net/switchdev/switchdev.c                     | 107 ++++++------------
>  11 files changed, 168 insertions(+), 146 deletions(-)
> 


-- 
Florian

^ permalink raw reply

* Re: Possible bug into DSA2 code.
From: Rodolfo Giometti @ 2019-02-11 17:51 UTC (permalink / raw)
  To: Florian Fainelli, Andrew Lunn; +Cc: Vivien Didelot, David S. Miller, netdev
In-Reply-To: <fe3891e6-4a91-f345-f543-60e19b006be3@gmail.com>

On 11/02/2019 18:28, Florian Fainelli wrote:
> On 2/10/19 3:45 AM, Rodolfo Giometti wrote:
>> On 09/02/2019 20:34, Andrew Lunn wrote:
>>>> So we I see two possible solutions:
>>>>
>>>> 1) having both ds->slave_mii_bus and ds->ops->phy_read already
>>>> defined is an
>>>> error, then it must be signaled to the calling code, or
>>>
>>> I don't think we can do that. mv88e6xxx optionally instantiates the
>>> MDIO busses, depending on what is in device tree. If there is no mdio
>>> property, we need the DSA core to create an MDIO bus.
>>
>> OK, but using the following check to know if DSA did such allocation is
>> not correct because DSA drivers can allocate it by their own:
>>
>> static void dsa_switch_teardown(struct dsa_switch *ds)
>> {
>>          if (ds->slave_mii_bus && ds->ops->phy_read)
>>                  mdiobus_unregister(ds->slave_mii_bus);
>>
>> Maybe can we add a flag to register ds->slave_mii_bus allocation by DSA?
> 
> If drivers allocate the slave_mii_bus, or use it as a pointer to their
> bus, then they should not be providing a ds->ops->phy_read() callback
> since we assume they would have mii_bus::read and mii_bus::write set to
> their driver internal version.

I see, so having ds->slave_mii_bus and ds->ops->phy_read both not NULL into 
dsa_switch_setup() is a potential bug, I suppose... If so why not adding a 
BUG_ON() call to signal it instead of doing nothing? :-o

Ciao,

Rodolfo

-- 
GNU/Linux Solutions                  e-mail: giometti@enneenne.com
Linux Device Driver                          giometti@linux.it
Embedded Systems                     phone:  +39 349 2432127
UNIX programming                     skype:  rodolfo.giometti

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox