Netdev List
 help / color / mirror / Atom feed
* [PATCH 3/8] netfilter: nft_meta_bridge: Eliminate 'out' label
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Phil Sutter <phil@nwl.cc>

The label is used just once and the code it points at is not reused, no
point in keeping it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/nft_meta_bridge.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c
index a98dec2cf0cf..1804e867f715 100644
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ b/net/bridge/netfilter/nft_meta_bridge.c
@@ -57,13 +57,11 @@ static void nft_meta_bridge_get_eval(const struct nft_expr *expr,
 		return;
 	}
 	default:
-		goto out;
+		return nft_meta_get_eval(expr, regs, pkt);
 	}
 
 	strncpy((char *)dest, br_dev ? br_dev->name : "", IFNAMSIZ);
 	return;
-out:
-	return nft_meta_get_eval(expr, regs, pkt);
 err:
 	regs->verdict.code = NFT_BREAK;
 }
-- 
2.11.0


^ permalink raw reply related

* [PATCH 1/8] netfilter: ebtables: fix a memory leak bug in compat
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Wenwen Wang <wenwen@cs.uga.edu>

In compat_do_replace(), a temporary buffer is allocated through vmalloc()
to hold entries copied from the user space. The buffer address is firstly
saved to 'newinfo->entries', and later on assigned to 'entries_tmp'. Then
the entries in this temporary buffer is copied to the internal kernel
structure through compat_copy_entries(). If this copy process fails,
compat_do_replace() should be terminated. However, the allocated temporary
buffer is not freed on this path, leading to a memory leak.

To fix the bug, free the buffer before returning from compat_do_replace().

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/ebtables.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 963dfdc14827..fd84b48e48b5 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -2261,8 +2261,10 @@ static int compat_do_replace(struct net *net, void __user *user,
 	state.buf_kern_len = size64;
 
 	ret = compat_copy_entries(entries_tmp, tmp.entries_size, &state);
-	if (WARN_ON(ret < 0))
+	if (WARN_ON(ret < 0)) {
+		vfree(entries_tmp);
 		goto out_unlock;
+	}
 
 	vfree(entries_tmp);
 	tmp.entries_size = size64;
-- 
2.11.0


^ permalink raw reply related

* [PATCH 0/8] netfilter fixes for net
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi,

The following patchset contains Netfilter fixes for your net tree:

1) memleak in ebtables from the error path for the 32/64 compat layer,
   from Florian Westphal.

2) Fix inverted meta ifname/ifidx matching when no interface is set
   on either from the input/output path, from Phil Sutter.

3) Remove goto label in nft_meta_bridge, also from Phil.

4) Missing include guard in xt_connlabel, from Masahiro Yamada.

5) Two patch to fix ipset destination MAC matching coming from
   Stephano Brivio, via Jozsef Kadlecsik.

6) Fix set rename and listing concurrency problem, from Shijie Luo.
   Patch also coming via Jozsef Kadlecsik.

7) ebtables 32/64 compat missing base chain policy in rule count,
   from Florian Westphal.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit 0cea0e1148fe134a4a3aaf0b1496f09241fb943a:

  net: phy: sfp: hwmon: Fix scaling of RX power (2019-07-21 11:51:50 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 7cdc4412284777c76c919e2ab33b3b8dbed18559:

  Merge branch 'master' of git://blackhole.kfki.hu/nf (2019-07-30 13:39:20 +0200)

----------------------------------------------------------------
Florian Westphal (1):
      netfilter: ebtables: also count base chain policies

Jozsef Kadlecsik (1):
      netfilter: ipset: Fix rename concurrency with listing

Masahiro Yamada (1):
      netfilter: add include guard to xt_connlabel.h

Pablo Neira Ayuso (1):
      Merge branch 'master' of git://blackhole.kfki.hu/nf

Phil Sutter (2):
      netfilter: nf_tables: Make nft_meta expression more robust
      netfilter: nft_meta_bridge: Eliminate 'out' label

Stefano Brivio (2):
      netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too
      netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets

Wenwen Wang (1):
      netfilter: ebtables: fix a memory leak bug in compat

 include/uapi/linux/netfilter/xt_connlabel.h |  6 ++++++
 net/bridge/netfilter/ebtables.c             | 32 ++++++++++++++++++-----------
 net/bridge/netfilter/nft_meta_bridge.c      | 10 ++-------
 net/netfilter/ipset/ip_set_bitmap_ipmac.c   |  2 +-
 net/netfilter/ipset/ip_set_core.c           |  2 +-
 net/netfilter/ipset/ip_set_hash_ipmac.c     |  6 +-----
 net/netfilter/nft_meta.c                    | 16 ++++-----------
 7 files changed, 35 insertions(+), 39 deletions(-)

^ permalink raw reply

* Re: [PATCH net] drop_monitor: Add missing uAPI file to MAINTAINERS file
From: Neil Horman @ 2019-07-31 11:36 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, Ido Schimmel
In-Reply-To: <20190731063819.10001-1-idosch@idosch.org>

On Wed, Jul 31, 2019 at 09:38:19AM +0300, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> Fixes: 6e43650cee64 ("add maintainer for network drop monitor kernel service")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9f5b8bd4faf9..b540794cbd91 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11137,6 +11137,7 @@ L:	netdev@vger.kernel.org
>  S:	Maintained
>  W:	https://fedorahosted.org/dropwatch/
>  F:	net/core/drop_monitor.c
> +F:	include/uapi/linux/net_dropmon.h
>  
>  NETWORKING DRIVERS
>  M:	"David S. Miller" <davem@davemloft.net>
> -- 
> 2.21.0
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>


^ permalink raw reply

* Re: next/master build: 221 builds: 11 failed, 210 passed, 13 errors, 1174 warnings (next-20190731)
From: Greg Kroah-Hartman @ 2019-07-31 11:35 UTC (permalink / raw)
  To: Mark Brown
  Cc: Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Matthew Wilcox (Oracle), devel, netdev, linux-next,
	linux-arm-kernel, kernel-build-reports
In-Reply-To: <20190731112441.GB4369@sirena.org.uk>

On Wed, Jul 31, 2019 at 12:24:41PM +0100, Mark Brown wrote:
> On Wed, Jul 31, 2019 at 04:07:41AM -0700, kernelci.org bot wrote:
> 
> Today's -next fails to build an ARM allmodconfig due to:
> 
> > allmodconfig (arm, gcc-8) — FAIL, 1 error, 40 warnings, 0 section mismatches
> > 
> > Errors:
> >     drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of function 'writeq'; did you mean 'writel'? [-Werror=implicit-function-declaration]
> 
> as a result of the changes that introduced:
> 
> WARNING: unmet direct dependencies detected for MDIO_OCTEON
>   Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=m] && MDIO_BUS [=m] && 64BIT && HAS_IOMEM [=y] && OF_MDIO [=m]
>   Selected by [m]:
>   - OCTEON_ETHERNET [=m] && STAGING [=y] && (CAVIUM_OCTEON_SOC && NETDEVICES [=y] || COMPILE_TEST [=y])
> 
> which is triggered by the staging OCTEON_ETHERNET driver which misses a
> 64BIT dependency but added COMPILE_TEST in 171a9bae68c72f2
> (staging/octeon: Allow test build on !MIPS).

A patch was posted for this, but it needs to go through the netdev tree
as that's where the offending patches are coming from.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Joe Perches @ 2019-07-31 11:32 UTC (permalink / raw)
  To: Neil Horman
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <20190731111932.GA9823@hmswarspite.think-freely.org>

On Wed, 2019-07-31 at 07:19 -0400, Neil Horman wrote:
> On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> > fallthrough may become a pseudo reserved keyword so this only use of
> > fallthrough is better renamed to allow it.
> > 
> > Signed-off-by: Joe Perches <joe@perches.com>
> Are you referring to the __attribute__((fallthrough)) statement that gcc
> supports?  If so the compiler should by all rights be able to differentiate
> between a null statement attribute and a explicit goto and label without the
> need for renaming here.  Or are you referring to something else?

Hi.

I sent after this a patch that adds

# define fallthrough                    __attribute__((__fallthrough__))

https://lore.kernel.org/patchwork/patch/1108577/

So this rename is a prerequisite to adding this #define.

> > diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
[]
> > @@ -2152,7 +2152,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
> >  	case SCTP_PARAM_SET_PRIMARY:
> >  		if (net->sctp.addip_enable)
> >  			break;
> > -		goto fallthrough;
> > +		goto unhandled;

etc...



^ permalink raw reply

* Re: next/master build: 221 builds: 11 failed, 210 passed, 13 errors, 1174 warnings (next-20190731)
From: Mark Brown @ 2019-07-31 11:24 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Greg Kroah-Hartman, Matthew Wilcox (Oracle)
  Cc: kernel-build-reports, linux-arm-kernel, netdev, devel, linux-next
In-Reply-To: <5d41767d.1c69fb81.d6304.4c8c@mx.google.com>

[-- Attachment #1: Type: text/plain, Size: 915 bytes --]

On Wed, Jul 31, 2019 at 04:07:41AM -0700, kernelci.org bot wrote:

Today's -next fails to build an ARM allmodconfig due to:

> allmodconfig (arm, gcc-8) — FAIL, 1 error, 40 warnings, 0 section mismatches
> 
> Errors:
>     drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of function 'writeq'; did you mean 'writel'? [-Werror=implicit-function-declaration]

as a result of the changes that introduced:

WARNING: unmet direct dependencies detected for MDIO_OCTEON
  Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=m] && MDIO_BUS [=m] && 64BIT && HAS_IOMEM [=y] && OF_MDIO [=m]
  Selected by [m]:
  - OCTEON_ETHERNET [=m] && STAGING [=y] && (CAVIUM_OCTEON_SOC && NETDEVICES [=y] || COMPILE_TEST [=y])

which is triggered by the staging OCTEON_ETHERNET driver which misses a
64BIT dependency but added COMPILE_TEST in 171a9bae68c72f2
(staging/octeon: Allow test build on !MIPS).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Neil Horman @ 2019-07-31 11:19 UTC (permalink / raw)
  To: Joe Perches
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <e0dd3af448e38e342c1ac6e7c0c802696eb77fd6.1564549413.git.joe@perches.com>

On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> fallthrough may become a pseudo reserved keyword so this only use of
> fallthrough is better renamed to allow it.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
Are you referring to the __attribute__((fallthrough)) statement that gcc
supports?  If so the compiler should by all rights be able to differentiate
between a null statement attribute and a explicit goto and label without the
need for renaming here.  Or are you referring to something else?

Neil

> ---
>  net/sctp/sm_make_chunk.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 36bd8a6e82df..3fdcaa2fbf12 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -2152,7 +2152,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  	case SCTP_PARAM_SET_PRIMARY:
>  		if (net->sctp.addip_enable)
>  			break;
> -		goto fallthrough;
> +		goto unhandled;
>  
>  	case SCTP_PARAM_HOST_NAME_ADDRESS:
>  		/* Tell the peer, we won't support this param.  */
> @@ -2163,11 +2163,11 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  	case SCTP_PARAM_FWD_TSN_SUPPORT:
>  		if (ep->prsctp_enable)
>  			break;
> -		goto fallthrough;
> +		goto unhandled;
>  
>  	case SCTP_PARAM_RANDOM:
>  		if (!ep->auth_enable)
> -			goto fallthrough;
> +			goto unhandled;
>  
>  		/* SCTP-AUTH: Secion 6.1
>  		 * If the random number is not 32 byte long the association
> @@ -2184,7 +2184,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  
>  	case SCTP_PARAM_CHUNKS:
>  		if (!ep->auth_enable)
> -			goto fallthrough;
> +			goto unhandled;
>  
>  		/* SCTP-AUTH: Section 3.2
>  		 * The CHUNKS parameter MUST be included once in the INIT or
> @@ -2200,7 +2200,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  
>  	case SCTP_PARAM_HMAC_ALGO:
>  		if (!ep->auth_enable)
> -			goto fallthrough;
> +			goto unhandled;
>  
>  		hmacs = (struct sctp_hmac_algo_param *)param.p;
>  		n_elt = (ntohs(param.p->length) -
> @@ -2223,7 +2223,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  			retval = SCTP_IERROR_ABORT;
>  		}
>  		break;
> -fallthrough:
> +unhandled:
>  	default:
>  		pr_debug("%s: unrecognized param:%d for chunk:%d\n",
>  			 __func__, ntohs(param.p->type), cid);
> -- 
> 2.15.0
> 
> 

^ permalink raw reply

* Re: [PATCH net-next] be2net: disable bh with spin_lock in be_process_mcc
From: Denis Kirjanov @ 2019-07-31 10:54 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna,
	Network Development, Denis Kirjanov
In-Reply-To: <CA+FuTSfnqV4zGvW+W0fh+=X-wm8rz1O5ZqGKXpxSVN1vPMD+sw@mail.gmail.com>

On 7/30/19, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> On Tue, Jul 30, 2019 at 7:33 AM Denis Kirjanov <kda@linux-powerpc.org>
> wrote:
>>
>> Signed-off-by: Denis Kirjanov <kdav@linux-powerpc.org>
>
> This is a partial revert of the previous change to these lines in 2012
> in commit 072a9c486004 ("netpoll: revert 6bdb7fe3104 and fix be_poll()
> instead").
>
> The commit message is empty. Can you give some context as to why this
> is needed and correct?

The idea was just to make some cleanup. Now I've checked that be_process_mcc
is invoked in 3 different places and always with BHs disabled except
the be_poll function but since it's invoked from softirq with BHs
disabled it won't hurt.


>

^ permalink raw reply

* [PATCH net-next 2/2] selftests: mlxsw: Add a test for leftover DSCP rule
From: Petr Machata @ 2019-07-31 10:30 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Petr Machata, Ido Schimmel
In-Reply-To: <cover.1564568595.git.petrm@mellanox.com>

Commit dedfde2fe1c4 ("mlxsw: spectrum_dcb: Configure DSCP map as the last
rule is removed") fixed a problem in mlxsw where last DSCP rule to be
removed remained in effect when DSCP rewrite was applied.

Add a selftest that covers this problem.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 .../drivers/net/mlxsw/qos_dscp_router.sh      | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
index f25e3229e1cc..c745ce3befee 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
@@ -31,6 +31,7 @@ ALL_TESTS="
 	ping_ipv4
 	test_update
 	test_no_update
+	test_dscp_leftover
 "
 
 lib_dir=$(dirname $0)/../../../net/forwarding
@@ -50,6 +51,11 @@ reprioritize()
 	echo ${reprio[$in]}
 }
 
+zero()
+{
+    echo 0
+}
+
 h1_create()
 {
 	simple_if_init $h1 192.0.2.1/28
@@ -225,6 +231,19 @@ test_no_update()
 	__test_update 0 echo
 }
 
+# Test that when the last APP rule is removed, the prio->DSCP map is properly
+# set to zeroes, and that the last APP rule does not stay active in the ASIC.
+test_dscp_leftover()
+{
+	lldptool -T -i $swp2 -V APP -d $(dscp_map 0) >/dev/null
+	lldpad_app_wait_del
+
+	__test_update 0 zero
+
+	lldptool -T -i $swp2 -V APP $(dscp_map 0) >/dev/null
+	lldpad_app_wait_set $swp2
+}
+
 trap cleanup EXIT
 
 setup_prepare
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 1/2] selftests: mlxsw: Fix local variable declarations in DSCP tests
From: Petr Machata @ 2019-07-31 10:30 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Petr Machata, Ido Schimmel
In-Reply-To: <cover.1564568595.git.petrm@mellanox.com>

These two tests have some problems in the global scope pollution and on
contrary, contain unnecessary local declarations. Fix them.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 .../testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh  | 6 ++++--
 .../testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh  | 5 +++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh
index 40f16f2a3afd..5cbff8038f84 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh
@@ -36,8 +36,6 @@ source $lib_dir/lib.sh
 
 h1_create()
 {
-	local dscp;
-
 	simple_if_init $h1 192.0.2.1/28
 	tc qdisc add dev $h1 clsact
 	dscp_capture_install $h1 10
@@ -67,6 +65,7 @@ h2_destroy()
 dscp_map()
 {
 	local base=$1; shift
+	local prio
 
 	for prio in {0..7}; do
 		echo app=$prio,5,$((base + prio))
@@ -138,6 +137,7 @@ dscp_ping_test()
 	local prio=$1; shift
 	local dev_10=$1; shift
 	local dev_20=$1; shift
+	local key
 
 	local dscp_10=$(((prio + 10) << 2))
 	local dscp_20=$(((prio + 20) << 2))
@@ -175,6 +175,8 @@ dscp_ping_test()
 
 test_dscp()
 {
+	local prio
+
 	for prio in {0..7}; do
 		dscp_ping_test v$h1 192.0.2.1 192.0.2.2 $prio $h1 $h2
 	done
diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
index 9faf02e32627..f25e3229e1cc 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
@@ -52,8 +52,6 @@ reprioritize()
 
 h1_create()
 {
-	local dscp;
-
 	simple_if_init $h1 192.0.2.1/28
 	tc qdisc add dev $h1 clsact
 	dscp_capture_install $h1 0
@@ -87,6 +85,7 @@ h2_destroy()
 dscp_map()
 {
 	local base=$1; shift
+	local prio
 
 	for prio in {0..7}; do
 		echo app=$prio,5,$((base + prio))
@@ -156,6 +155,7 @@ dscp_ping_test()
 	local reprio=$1; shift
 	local dev1=$1; shift
 	local dev2=$1; shift
+	local i
 
 	local prio2=$($reprio $prio)   # ICMP Request egress prio
 	local prio3=$($reprio $prio2)  # ICMP Response egress prio
@@ -205,6 +205,7 @@ __test_update()
 {
 	local update=$1; shift
 	local reprio=$1; shift
+	local prio
 
 	sysctl_restore net.ipv4.ip_forward_update_priority
 	sysctl_set net.ipv4.ip_forward_update_priority $update
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 0/2] mlxsw: Test coverage for DSCP leftover fix
From: Petr Machata @ 2019-07-31 10:30 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Petr Machata, Ido Schimmel

This patch set fixes some global scope pollution issues in the DSCP tests
(in patch #1), and then proceeds (in patch #2) to add a new test for
checking whether, after DSCP prioritization rules are removed from a port,
DSCP rewrites consistently to zero, instead of the last removed rule still
staying in effect.

Petr Machata (2):
  selftests: mlxsw: Fix local variable declarations in DSCP tests
  selftests: mlxsw: Add a test for leftover DSCP rule

 .../drivers/net/mlxsw/qos_dscp_bridge.sh      |  6 +++--
 .../drivers/net/mlxsw/qos_dscp_router.sh      | 24 +++++++++++++++++--
 2 files changed, 26 insertions(+), 4 deletions(-)

-- 
2.20.1


^ permalink raw reply

* Re: [PATCH 1/2] tools: bpftool: add net load command to load XDP on interface
From: Jesper Dangaard Brouer @ 2019-07-31 10:08 UTC (permalink / raw)
  To: Daniel T. Lee
  Cc: brouer, Daniel Borkmann, Alexei Starovoitov, netdev,
	Quentin Monnet
In-Reply-To: <20190730184821.10833-2-danieltimlee@gmail.com>

On Wed, 31 Jul 2019 03:48:20 +0900
"Daniel T. Lee" <danieltimlee@gmail.com> wrote:

> By this commit, using `bpftool net load`, user can load XDP prog on
> interface. New type of enum 'net_load_type' has been made, as stated at
> cover-letter, the meaning of 'load' is, prog will be loaded on interface.

Why the keyword "load" ?
Why not "attach" (and "detach")?

For BPF there is a clear distinction between the "load" and "attach"
steps.  I know this is under subcommand "net", but to follow the
conversion of other subcommands e.g. "prog" there are both "load" and
"attach" commands.


> BPF prog will be loaded through libbpf 'bpf_set_link_xdp_fd'.

Again this is a "set" operation, not a "load" operation.

> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>

[...]
>  static int do_show(int argc, char **argv)
>  {
>  	struct bpf_attach_info attach_info = {};
> @@ -305,13 +405,17 @@ static int do_help(int argc, char **argv)
>  
>  	fprintf(stderr,
>  		"Usage: %s %s { show | list } [dev <devname>]\n"
> +		"       %s %s load PROG LOAD_TYPE <devname>\n"

The "PROG" here does it correspond to the 'bpftool prog' syntax?:

 PROG := { id PROG_ID | pinned FILE | tag PROG_TAG }

>  		"       %s %s help\n"
> +		"\n"
> +		"       " HELP_SPEC_PROGRAM "\n"
> +		"       LOAD_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
>  		"Note: Only xdp and tc attachments are supported now.\n"
>  		"      For progs attached to cgroups, use \"bpftool cgroup\"\n"
>  		"      to dump program attachments. For program types\n"
>  		"      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
>  		"      consult iproute2.\n",


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH V2 9/9] vhost: do not return -EAGIAN for non blocking invalidation too early
From: Jason Wang @ 2019-07-31 10:05 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm, jgg
In-Reply-To: <20190731095950.d6zr472megt7rgkt@steredhat>


On 2019/7/31 下午5:59, Stefano Garzarella wrote:
> A little typo in the title: s/EAGIAN/EAGAIN
>
> Thanks,
> Stefano


Right, will fix if need respin or Michael can help to fix.

Thanks


>
> On Wed, Jul 31, 2019 at 04:46:55AM -0400, Jason Wang wrote:
>> Instead of returning -EAGAIN unconditionally, we'd better do that only
>> we're sure the range is overlapped with the metadata area.
>>
>> Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
>> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/vhost/vhost.c | 32 +++++++++++++++++++-------------
>>   1 file changed, 19 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index fc2da8a0c671..96c6aeb1871f 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -399,16 +399,19 @@ static void inline vhost_vq_sync_access(struct vhost_virtqueue *vq)
>>   	smp_mb();
>>   }
>>   
>> -static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>> -				      int index,
>> -				      unsigned long start,
>> -				      unsigned long end)
>> +static int vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>> +				     int index,
>> +				     unsigned long start,
>> +				     unsigned long end,
>> +				     bool blockable)
>>   {
>>   	struct vhost_uaddr *uaddr = &vq->uaddrs[index];
>>   	struct vhost_map *map;
>>   
>>   	if (!vhost_map_range_overlap(uaddr, start, end))
>> -		return;
>> +		return 0;
>> +	else if (!blockable)
>> +		return -EAGAIN;
>>   
>>   	spin_lock(&vq->mmu_lock);
>>   	++vq->invalidate_count;
>> @@ -423,6 +426,8 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>>   		vhost_set_map_dirty(vq, map, index);
>>   		vhost_map_unprefetch(map);
>>   	}
>> +
>> +	return 0;
>>   }
>>   
>>   static void vhost_invalidate_vq_end(struct vhost_virtqueue *vq,
>> @@ -443,18 +448,19 @@ static int vhost_invalidate_range_start(struct mmu_notifier *mn,
>>   {
>>   	struct vhost_dev *dev = container_of(mn, struct vhost_dev,
>>   					     mmu_notifier);
>> -	int i, j;
>> -
>> -	if (!mmu_notifier_range_blockable(range))
>> -		return -EAGAIN;
>> +	bool blockable = mmu_notifier_range_blockable(range);
>> +	int i, j, ret;
>>   
>>   	for (i = 0; i < dev->nvqs; i++) {
>>   		struct vhost_virtqueue *vq = dev->vqs[i];
>>   
>> -		for (j = 0; j < VHOST_NUM_ADDRS; j++)
>> -			vhost_invalidate_vq_start(vq, j,
>> -						  range->start,
>> -						  range->end);
>> +		for (j = 0; j < VHOST_NUM_ADDRS; j++) {
>> +			ret = vhost_invalidate_vq_start(vq, j,
>> +							range->start,
>> +							range->end, blockable);
>> +			if (ret)
>> +				return ret;
>> +		}
>>   	}
>>   
>>   	return 0;
>> -- 
>> 2.18.1
>>

^ permalink raw reply

* Re: [PATCH V2 9/9] vhost: do not return -EAGIAN for non blocking invalidation too early
From: Stefano Garzarella @ 2019-07-31  9:59 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm, jgg
In-Reply-To: <20190731084655.7024-10-jasowang@redhat.com>

A little typo in the title: s/EAGIAN/EAGAIN

Thanks,
Stefano

On Wed, Jul 31, 2019 at 04:46:55AM -0400, Jason Wang wrote:
> Instead of returning -EAGAIN unconditionally, we'd better do that only
> we're sure the range is overlapped with the metadata area.
> 
> Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/vhost.c | 32 +++++++++++++++++++-------------
>  1 file changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index fc2da8a0c671..96c6aeb1871f 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -399,16 +399,19 @@ static void inline vhost_vq_sync_access(struct vhost_virtqueue *vq)
>  	smp_mb();
>  }
>  
> -static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
> -				      int index,
> -				      unsigned long start,
> -				      unsigned long end)
> +static int vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
> +				     int index,
> +				     unsigned long start,
> +				     unsigned long end,
> +				     bool blockable)
>  {
>  	struct vhost_uaddr *uaddr = &vq->uaddrs[index];
>  	struct vhost_map *map;
>  
>  	if (!vhost_map_range_overlap(uaddr, start, end))
> -		return;
> +		return 0;
> +	else if (!blockable)
> +		return -EAGAIN;
>  
>  	spin_lock(&vq->mmu_lock);
>  	++vq->invalidate_count;
> @@ -423,6 +426,8 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>  		vhost_set_map_dirty(vq, map, index);
>  		vhost_map_unprefetch(map);
>  	}
> +
> +	return 0;
>  }
>  
>  static void vhost_invalidate_vq_end(struct vhost_virtqueue *vq,
> @@ -443,18 +448,19 @@ static int vhost_invalidate_range_start(struct mmu_notifier *mn,
>  {
>  	struct vhost_dev *dev = container_of(mn, struct vhost_dev,
>  					     mmu_notifier);
> -	int i, j;
> -
> -	if (!mmu_notifier_range_blockable(range))
> -		return -EAGAIN;
> +	bool blockable = mmu_notifier_range_blockable(range);
> +	int i, j, ret;
>  
>  	for (i = 0; i < dev->nvqs; i++) {
>  		struct vhost_virtqueue *vq = dev->vqs[i];
>  
> -		for (j = 0; j < VHOST_NUM_ADDRS; j++)
> -			vhost_invalidate_vq_start(vq, j,
> -						  range->start,
> -						  range->end);
> +		for (j = 0; j < VHOST_NUM_ADDRS; j++) {
> +			ret = vhost_invalidate_vq_start(vq, j,
> +							range->start,
> +							range->end, blockable);
> +			if (ret)
> +				return ret;
> +		}
>  	}
>  
>  	return 0;
> -- 
> 2.18.1
> 

-- 

^ permalink raw reply

* [RFC PATCH 1/2] dt-bindings: net: macb: Add new property for PS SGMII only
From: Harini Katakam @ 2019-07-31  9:40 UTC (permalink / raw)
  To: nicolas.ferre, davem, claudiu.beznea, robh+dt, mark.rutland
  Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
	harini.katakam, devicetree
In-Reply-To: <1564566033-676-1-git-send-email-harini.katakam@xilinx.com>

Add a new property to indicate when PS SGMII is used with NO
external PHY on board.

Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
---
 Documentation/devicetree/bindings/net/macb.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt
index 63c73fa..ae1b109 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -38,6 +38,10 @@ Optional properties for PHY child node:
   up via magic packet.
 - phy-handle : see ethernet.txt file in the same directory
 
+Optional properties:
+- is-internal-pcspma: Add when GEM's internal SGMII support is used without
+  any external SGMII PHY.
+
 Examples:
 
 	macb0: ethernet@fffc4000 {
-- 
2.7.4


^ permalink raw reply related

* [RFC PATCH 2/2] net: macb: Add SGMII poll thread
From: Harini Katakam @ 2019-07-31  9:40 UTC (permalink / raw)
  To: nicolas.ferre, davem, claudiu.beznea, robh+dt, mark.rutland
  Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
	harini.katakam, devicetree
In-Reply-To: <1564566033-676-1-git-send-email-harini.katakam@xilinx.com>

The internal SGMII mode in PS GEM on ZynqMP can be used without any
external PHY on board. In this case, the phy framework doesn't kick
in to monitor the link status. Hence do the same in macb driver.

Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: Kester Aernoudt <kester.aernoudt@xilinx.com>
---
 drivers/net/ethernet/cadence/macb.h      |  8 ++++
 drivers/net/ethernet/cadence/macb_main.c | 65 ++++++++++++++++++++++++++++++--
 2 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 03983bd..b07284a 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -155,6 +155,7 @@
 #define GEM_PEFTN		0x01f4 /* PTP Peer Event Frame Tx Ns */
 #define GEM_PEFRSL		0x01f8 /* PTP Peer Event Frame Rx Sec Low */
 #define GEM_PEFRN		0x01fc /* PTP Peer Event Frame Rx Ns */
+#define GEM_PCSSTATUS		0x0204 /* PCS Status */
 #define GEM_DCFG1		0x0280 /* Design Config 1 */
 #define GEM_DCFG2		0x0284 /* Design Config 2 */
 #define GEM_DCFG3		0x0288 /* Design Config 3 */
@@ -455,6 +456,10 @@
 #define MACB_REV_OFFSET				0
 #define MACB_REV_SIZE				16
 
+/* Bitfields in PCSSTATUS */
+#define GEM_PCSLINK_OFFSET			2
+#define GEM_PCSLINK_SIZE			1
+
 /* Bitfields in DCFG1. */
 #define GEM_IRQCOR_OFFSET			23
 #define GEM_IRQCOR_SIZE				1
@@ -1232,6 +1237,9 @@ struct macb {
 	u32	rx_intr_mask;
 
 	struct macb_pm_data pm_data;
+
+	int internal_pcspma;
+	struct task_struct *sgmii_poll_task;
 };
 
 #ifdef CONFIG_MACB_USE_HWSTAMP
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 5ca17e6..ae1f18d 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -36,6 +36,7 @@
 #include <linux/tcp.h>
 #include <linux/iopoll.h>
 #include <linux/pm_runtime.h>
+#include <linux/kthread.h>
 #include "macb.h"
 
 /* This structure is only used for MACB on SiFive FU540 devices */
@@ -2418,8 +2419,10 @@ static int macb_open(struct net_device *dev)
 	/* carrier starts down */
 	netif_carrier_off(dev);
 
-	/* if the phy is not yet register, retry later*/
-	if (!dev->phydev) {
+	/* if the phy is not yet registered and internal SGMII is not used,
+	 * retry later
+	 */
+	if (!bp->internal_pcspma && !dev->phydev) {
 		err = -EAGAIN;
 		goto pm_exit;
 	}
@@ -2441,7 +2444,8 @@ static int macb_open(struct net_device *dev)
 	macb_init_hw(bp);
 
 	/* schedule a link state check */
-	phy_start(dev->phydev);
+	if (!bp->internal_pcspma)
+		phy_start(dev->phydev);
 
 	netif_tx_start_all_queues(dev);
 
@@ -2468,7 +2472,7 @@ static int macb_close(struct net_device *dev)
 	for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue)
 		napi_disable(&queue->napi);
 
-	if (dev->phydev)
+	if (!bp->internal_pcspma && dev->phydev)
 		phy_stop(dev->phydev);
 
 	spin_lock_irqsave(&bp->lock, flags);
@@ -3187,6 +3191,49 @@ static const struct ethtool_ops gem_ethtool_ops = {
 	.set_rxnfc			= gem_set_rxnfc,
 };
 
+int gem_sgmii_status_poll(void *data)
+{
+	struct net_device *dev = data;
+	struct macb *bp = netdev_priv(dev);
+	int status, prev_status = 0;
+	u32 reg;
+
+	while (!kthread_should_stop()) {
+		status = gem_readl(bp, PCSSTATUS) & GEM_BIT(PCSLINK);
+		reg = macb_readl(bp, NCR);
+		if (!(reg & MACB_BIT(RE)) || !(reg & MACB_BIT(TE)) ||
+		    (!netif_carrier_ok(dev) && prev_status))
+			status = 0;
+
+		if (status != prev_status) {
+			if (status) {
+				reg = macb_readl(bp, NCFGR);
+				reg |= MACB_BIT(FD);
+				reg |= GEM_BIT(GBE);
+
+				macb_or_gem_writel(bp, NCFGR, reg);
+
+				bp->speed = SPEED_1000;
+				bp->duplex = DUPLEX_FULL;
+				bp->link = 1;
+				macb_set_tx_clk(bp->tx_clk, SPEED_1000, dev);
+
+				netif_carrier_on(dev);
+				netdev_info(dev, "link up (%d/%s)\n",
+					    SPEED_1000, "Full");
+			} else {
+				netif_carrier_off(dev);
+				netdev_info(dev, "link down\n");
+			}
+
+			prev_status = status;
+		}
+		schedule_timeout_uninterruptible(HZ);
+	}
+
+	return 0;
+}
+
 static int macb_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 {
 	struct phy_device *phydev = dev->phydev;
@@ -4344,6 +4391,12 @@ static int macb_probe(struct platform_device *pdev)
 		    macb_is_gem(bp) ? "GEM" : "MACB", macb_readl(bp, MID),
 		    dev->base_addr, dev->irq, dev->dev_addr);
 
+	bp->internal_pcspma = of_property_read_bool(np, "is-internal-pcspma");
+
+	if (bp->internal_pcspma)
+		bp->sgmii_poll_task = kthread_run(gem_sgmii_status_poll, dev,
+						  "gem_sgmii_status_poll");
+
 	pm_runtime_mark_last_busy(&bp->pdev->dev);
 	pm_runtime_put_autosuspend(&bp->pdev->dev);
 
@@ -4384,6 +4437,10 @@ static int macb_remove(struct platform_device *pdev)
 
 	if (dev) {
 		bp = netdev_priv(dev);
+
+		if (bp->internal_pcspma)
+			kthread_stop(bp->sgmii_poll_task);
+
 		if (dev->phydev)
 			phy_disconnect(dev->phydev);
 		mdiobus_unregister(bp->mii_bus);
-- 
2.7.4


^ permalink raw reply related

* [RFC PATCH 0/2] Macb SGMII status poll thread
From: Harini Katakam @ 2019-07-31  9:40 UTC (permalink / raw)
  To: nicolas.ferre, davem, claudiu.beznea, robh+dt, mark.rutland
  Cc: netdev, linux-kernel, michal.simek, harinikatakamlinux,
	harini.katakam, devicetree

When PS GEM is used with SGMII mode without an external PHY
on board, add a link status reporting mechanism.

Harini Katakam (2):
  dt-bindings: net: macb: Add new property for PS SGMII only
  net: macb: Add SGMII poll thread

 Documentation/devicetree/bindings/net/macb.txt |  4 ++
 drivers/net/ethernet/cadence/macb.h            |  8 ++++
 drivers/net/ethernet/cadence/macb_main.c       | 65 ++++++++++++++++++++++++--
 3 files changed, 73 insertions(+), 4 deletions(-)

-- 
2.7.4


^ permalink raw reply

* Re: [PATCH 1/2] tools: bpftool: add net load command to load XDP on interface
From: Jesper Dangaard Brouer @ 2019-07-31  9:38 UTC (permalink / raw)
  To: Daniel T. Lee; +Cc: brouer, Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <20190730184821.10833-2-danieltimlee@gmail.com>

On Wed, 31 Jul 2019 03:48:20 +0900
"Daniel T. Lee" <danieltimlee@gmail.com> wrote:

> diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> index 67e99c56bc88..d3a4f18b5b95 100644
> --- a/tools/bpf/bpftool/net.c
> +++ b/tools/bpf/bpftool/net.c
> @@ -55,6 +55,35 @@ struct bpf_attach_info {
>  	__u32 flow_dissector_id;
>  };
>  
> +enum net_load_type {
> +	NET_LOAD_TYPE_XDP,
> +	NET_LOAD_TYPE_XDP_GENERIC,
> +	NET_LOAD_TYPE_XDP_DRIVE,

Why "DRIVE" ?
Why not "DRIVER" ?

> +	NET_LOAD_TYPE_XDP_OFFLOAD,
> +	__MAX_NET_LOAD_TYPE
> +};
> +
> +static const char * const load_type_strings[] = {
> +	[NET_LOAD_TYPE_XDP] = "xdp",
> +	[NET_LOAD_TYPE_XDP_GENERIC] = "xdpgeneric",
> +	[NET_LOAD_TYPE_XDP_DRIVE] = "xdpdrv",
> +	[NET_LOAD_TYPE_XDP_OFFLOAD] = "xdpoffload",
> +	[__MAX_NET_LOAD_TYPE] = NULL,
> +};



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH 1/2] include: mdio: Add private field to mdio structure
From: Harini Katakam @ 2019-07-31  9:36 UTC (permalink / raw)
  To: andrew, f.fainelli, hkallweit1, davem
  Cc: michal.simek, netdev, linux-arm-kernel, linux-kernel,
	harinikatakamlinux, harini.katakam, radhey.shyam.pandey
In-Reply-To: <1564565779-29537-1-git-send-email-harini.katakam@xilinx.com>

Add a priv pointer to mdio structure to be used by mdio devices if
required.

This priv field will be used by gmii2rgmii driver. As this IP has
no capability to read status on the MDIO bus, the driver currently
snoops the same and needs the instance information is some private
field. Since phy device "priv" can be used by external phy drivers,
it is not appropriate. Hence this addition to mdio device. This is
a temporary solution before the IP can be improved. The need for
this priv field can be re-evaluated later based on other mdio devices.

Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
---
 include/linux/mdio.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index e8242ad8..3399de7 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -40,6 +40,9 @@ struct mdio_device {
 	struct reset_control *reset_ctrl;
 	unsigned int reset_assert_delay;
 	unsigned int reset_deassert_delay;
+
+	/* private data pointer for use by MDIO devices */
+	void *priv;
 };
 #define to_mdio_device(d) container_of(d, struct mdio_device, dev)
 
-- 
2.7.4


^ permalink raw reply related

* [PATCH 2/2] net: gmii2rgmii: Switch priv field in mdio device structure
From: Harini Katakam @ 2019-07-31  9:36 UTC (permalink / raw)
  To: andrew, f.fainelli, hkallweit1, davem
  Cc: michal.simek, netdev, linux-arm-kernel, linux-kernel,
	harinikatakamlinux, harini.katakam, radhey.shyam.pandey
In-Reply-To: <1564565779-29537-1-git-send-email-harini.katakam@xilinx.com>

Use the priv field in mdio device structure instead of the one in
phy device structure. The phy device priv field may be used by the
external phy driver and should not be overwritten.

Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
---
 drivers/net/phy/xilinx_gmii2rgmii.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/xilinx_gmii2rgmii.c b/drivers/net/phy/xilinx_gmii2rgmii.c
index 2d14493..ba31b5c3 100644
--- a/drivers/net/phy/xilinx_gmii2rgmii.c
+++ b/drivers/net/phy/xilinx_gmii2rgmii.c
@@ -29,7 +29,7 @@ struct gmii2rgmii {
 
 static int xgmiitorgmii_read_status(struct phy_device *phydev)
 {
-	struct gmii2rgmii *priv = phydev->priv;
+	struct gmii2rgmii *priv = phydev->mdio.priv;
 	struct mii_bus *bus = priv->mdio->bus;
 	int addr = priv->mdio->addr;
 	u16 val = 0;
@@ -90,7 +90,7 @@ static int xgmiitorgmii_probe(struct mdio_device *mdiodev)
 	memcpy(&priv->conv_phy_drv, priv->phy_dev->drv,
 	       sizeof(struct phy_driver));
 	priv->conv_phy_drv.read_status = xgmiitorgmii_read_status;
-	priv->phy_dev->priv = priv;
+	priv->phy_dev->mdio.priv = priv;
 	priv->phy_dev->drv = &priv->conv_phy_drv;
 
 	return 0;
-- 
2.7.4


^ permalink raw reply related

* [PATCH 0/2] Fix GMII2RGMII private field
From: Harini Katakam @ 2019-07-31  9:36 UTC (permalink / raw)
  To: andrew, f.fainelli, hkallweit1, davem
  Cc: michal.simek, netdev, linux-arm-kernel, linux-kernel,
	harinikatakamlinux, harini.katakam, radhey.shyam.pandey

Fix the usage of external phy's priv field by gmii2rgmii driver.

Based on net-next.

Harini Katakam (2):
  include: mdio: Add private field to mdio structure
  net: gmii2rgmii: Switch priv field in mdio device structure

 drivers/net/phy/xilinx_gmii2rgmii.c | 4 ++--
 include/linux/mdio.h                | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

-- 
2.7.4


^ permalink raw reply

* Re: [PATCH 0/2] tools: bpftool: add net (un)load command to load XDP
From: Jesper Dangaard Brouer @ 2019-07-31  9:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jakub Kicinski, Daniel T. Lee, Stephen Hemminger, David Ahern,
	John Fastabend, Daniel Borkmann, Alexei Starovoitov, David Miller,
	netdev, brouer
In-Reply-To: <20190731015238.3kq3r7rlascv7tzs@ast-mbp>

On Tue, 30 Jul 2019 18:52:40 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Tue, Jul 30, 2019 at 06:21:44PM -0700, Jakub Kicinski wrote:
> > > > Duplicating the same features in bpftool will only diminish the
> > > > incentive for moving iproute2 to libbpf.     
> > > 
> > > not at all. why do you think so?  
> > 
> > Because iproute2's BPF has fallen behind so the simplest thing is to
> > just contribute to bpftool. But iproute2 is the tool set for Linux
> > networking, we can't let it bit rot :(  
> 
> where were you when a lot of libbpf was copy pasted into iproute2 ?!
> Now it diverged a lot and it's difficult to move iproute2 back to the main
> train which is libbpf.

I hope we all agree that libbpf it the way forward.  I really hope we
can convert iproute2 into using libbpf.  It is challenging because
iproute2 ELF files use another map-layout, and I guess we need to stay
backward compatible.

> Same thing with at least 5 copy-pastes of samples/bpf/bpf_load.c
> that spawned a bunch of different bpf loaders.

I'm also to blame here... as I ran with bpf_load.c and based my
prototype-kernel github project on it.  And it seems that it have
gotten enough Google entropy, that people find that first. E.g. I have
pull requests that add new tools, which I have not merged, as I want to
switch to libbpf first.

Recently I've added a README[1] that warn about this issue, and point
people to xdp-tutorial[2] instead.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/README.rst
[2] https://github.com/xdp-project/xdp-tutorial

 Quote[1]:
   "WARNING: This directory contains its own out-of-date BPF
    ELF-loader (in bpf_load.c). Instead people should use libbpf."

> > IMHO vaguely competent users of Linux networking will know ip link.
> > If they are not vaguely competent, they should not attach XDP programs
> > to interfaces by hand...  
> 
> I'm a prime example of moderately competent linux user who
> doesn't know iproute2. I'm not joking.
> I don't know tc syntax and always use my cheat sheet of ip/tc commands
> to do anything.

I agree... I also need to lookup the TC commands used for attaching BPF
programs every time.  It is a pain, and I've often made mistakes that
cause several tc filter's to get attached, which leads to strange
behavior.  (AFAIK you have to remember to use both prio and handle when
removing the old, before adding the new).

 
> > > 
> > > bpftool must be able to introspect every aspect of bpf programming.
> > > That includes detaching and attaching anywhere.
> > > Anyone doing 'bpftool p s' should be able to switch off particular
> > > prog id without learning ten different other tools.  
> > 
> > I think the fact that we already have an implementation in iproute2,
> > which is at the risk of bit rot is more important to me that the
> > hypothetical scenario where everyone knows to just use bpftool (for
> > XDP, for TC it's still iproute2 unless there's someone crazy enough 
> > to reimplement the TC functionality :))  
> 
> I think you're missing the point that iproute2 is still necessary
> to configure it.
> bpftool should be able to attach/detach from anything.
> But configuring that thing (whether it's cgroup or tc/xdp) is
> a job of corresponding apis and tools.
> 
> > I'm not sure we can settle our differences over email :)
> > I have tremendous respect for all the maintainers I CCed here, 
> > if nobody steps up to agree with me I'll concede the bpftool net
> > battle entirely :)  
> 
> we can keep arguing forever. Respect to ease-of-use only comes
> from the pain of operational experience. I don't think I can
> convey that pain in the email either.

Let me try to convey some pain...

Regarding operational experience.  I have a customer using this:
 https://github.com/xdp-project/xdp-cpumap-tc

The project combines TC and XDP (with CPUMAP redirect).  It has been a
*HUGE* pain that we needed to use iproute2 tc commands to attach and
load TC BPF-programs.

(Issue#1)
The iproute2 tc BPF loader uses another ELF-layout for maps.  This was
worked around, by keeping XDP and TC BPF-progs in different C-file
using different struct's for maps.  And avoiding to use map features,
that iproute2 doesn't know about... although we found a workaround for
using more advanced map features via load-order and pinning see issue#3
sub-problem(2).

(Issue#2)
As this is a C-prog I would really prefer using a library, instead of
having to call cmdline utilities, see C-code doing this[3].  This lead
to several issues.  E.g. they had installed too old version of iproute2
on their systems, after installing newer version (in /usr/local) it was
not first in $PATH for root.  Load order also matters see issue#3.

[3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/common_user.c#L381-L493

(Issue#3)
Next to get TC and XDP programs to cooperate, pinned maps via bpffs is
used.  And iproute2 dictates that pinned maps are located in directory
/sys/fs/bpf/tc/globals/.  What could go wrong, it's only a static dir path.

Sub-problem(1): If XDP loads _before_ any tc-bpf cmd, then the subdirs
are not created, leading to replicating mkdir creation in C[4].  Else
the XDP load will fail.  (Troubleshooting this was complicated by 

[4] https://github.com/xdp-project/xdp-cpumap-tc/commit/25e7e56699cd75a4a

Sub-problem(2): We really want to load XDP first, because libbpf
creates maps in a better way, e.g. with "name" (and BTF info).
The "name" part was needed by libbpf to find a given map (to avoid
depending on the order maps are defined in C/ELF file) via helper
bpf_object__find_map_fd_by_name().  To handle TC noname case, I had to
code up this workaround[5], which depend on extracting the name of the
pinned file name.

[5] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L117-L131


Experience/conclusion: Getting XDP and TC to cooperate suck, primarily
because iproute2 tc is based on a separate BPF-ELF loader, which is
features are not in sync / lacking behind.  Calling cmdline binaries
from C also sucks, and I would prefer some libbpf TC attach function,
but most pain comes from the slight differences between ELF-loaders.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH] myri10ge: remove unneeded variable
From: Ding Xiang @ 2019-07-31  8:53 UTC (permalink / raw)
  To: christopher.lee, davem; +Cc: netdev, linux-kernel

"error" is unneeded,just return 0

Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
---
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index d8b7fba..a4165e1 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -3037,7 +3037,6 @@ static int myri10ge_set_mac_address(struct net_device *dev, void *addr)
 static int myri10ge_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct myri10ge_priv *mgp = netdev_priv(dev);
-	int error = 0;
 
 	netdev_info(dev, "changing mtu from %d to %d\n", dev->mtu, new_mtu);
 	if (mgp->running) {
@@ -3049,7 +3048,7 @@ static int myri10ge_change_mtu(struct net_device *dev, int new_mtu)
 	} else
 		dev->mtu = new_mtu;
 
-	return error;
+	return 0;
 }
 
 /*
-- 
1.9.1




^ permalink raw reply related

* Re: [PATCH V2 7/9] vhost: do not use RCU to synchronize MMU notifier with worker
From: Jason Wang @ 2019-07-31  8:50 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev; +Cc: linux-kernel, linux-mm, jgg
In-Reply-To: <20190731084655.7024-8-jasowang@redhat.com>


On 2019/7/31 下午4:46, Jason Wang wrote:
> We used to use RCU to synchronize MMU notifier with worker. This leads
> calling synchronize_rcu() in invalidate_range_start(). But on a busy
> system, there would be many factors that may slow down the
> synchronize_rcu() which makes it unsuitable to be called in MMU
> notifier.
>
> A solution is SRCU but its overhead is obvious with the expensive full
> memory barrier. Another choice is to use seqlock, but it doesn't
> provide a synchronization method between readers and writers. The last
> choice is to use vq mutex, but it need to deal with the worst case
> that MMU notifier must be blocked and wait for the finish of swap in.
>
> So this patch switches use a counter to track whether or not the map
> was used. The counter was increased when vq try to start or finish
> uses the map. This means, when it was even, we're sure there's no
> readers and MMU notifier is synchronized. When it was odd, it means
> there's a reader we need to wait it to be even again then we are
> synchronized. To avoid full memory barrier, store_release +
> load_acquire on the counter is used.


For reviewers, I try hard to avoid e.g smp_mb(), please double check 
whether or not this trick work.

Thanks


>
> Consider the read critical section is pretty small the synchronization
> should be done very fast.
>
> Note the patch lead about 3% PPS dropping.
>
> Reported-by: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>   drivers/vhost/vhost.c | 145 ++++++++++++++++++++++++++----------------
>   drivers/vhost/vhost.h |   7 +-
>   2 files changed, 94 insertions(+), 58 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index cfc11f9ed9c9..db2c81cb1e90 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -324,17 +324,16 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
>   
>   	spin_lock(&vq->mmu_lock);
>   	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
> -		map[i] = rcu_dereference_protected(vq->maps[i],
> -				  lockdep_is_held(&vq->mmu_lock));
> +		map[i] = vq->maps[i];
>   		if (map[i]) {
>   			vhost_set_map_dirty(vq, map[i], i);
> -			rcu_assign_pointer(vq->maps[i], NULL);
> +			vq->maps[i] = NULL;
>   		}
>   	}
>   	spin_unlock(&vq->mmu_lock);
>   
> -	/* No need for synchronize_rcu() or kfree_rcu() since we are
> -	 * serialized with memory accessors (e.g vq mutex held).
> +	/* No need for synchronization since we are serialized with
> +	 * memory accessors (e.g vq mutex held).
>   	 */
>   
>   	for (i = 0; i < VHOST_NUM_ADDRS; i++)
> @@ -362,6 +361,44 @@ static bool vhost_map_range_overlap(struct vhost_uaddr *uaddr,
>   	return !(end < uaddr->uaddr || start > uaddr->uaddr - 1 + uaddr->size);
>   }
>   
> +static void inline vhost_vq_access_map_begin(struct vhost_virtqueue *vq)
> +{
> +	int ref = READ_ONCE(vq->ref);
> +
> +	smp_store_release(&vq->ref, ref + 1);
> +	/* Make sure ref counter is visible before accessing the map */
> +	smp_load_acquire(&vq->ref);
> +}
> +
> +static void inline vhost_vq_access_map_end(struct vhost_virtqueue *vq)
> +{
> +	int ref = READ_ONCE(vq->ref);
> +
> +	/* Make sure vq access is done before increasing ref counter */
> +	smp_store_release(&vq->ref, ref + 1);
> +}
> +
> +static void inline vhost_vq_sync_access(struct vhost_virtqueue *vq)
> +{
> +	int ref;
> +
> +	/* Make sure map change was done before checking ref counter */
> +	smp_mb();
> +
> +	ref = READ_ONCE(vq->ref);
> +	if (ref & 0x1) {
> +		/* When ref change, we are sure no reader can see
> +		 * previous map */
> +		while (READ_ONCE(vq->ref) == ref) {
> +			set_current_state(TASK_RUNNING);
> +			schedule();
> +		}
> +	}
> +	/* Make sure ref counter was checked before any other
> +	 * operations that was dene on map. */
> +	smp_mb();
> +}
> +
>   static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>   				      int index,
>   				      unsigned long start,
> @@ -376,16 +413,15 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>   	spin_lock(&vq->mmu_lock);
>   	++vq->invalidate_count;
>   
> -	map = rcu_dereference_protected(vq->maps[index],
> -					lockdep_is_held(&vq->mmu_lock));
> +	map = vq->maps[index];
>   	if (map) {
>   		vhost_set_map_dirty(vq, map, index);
> -		rcu_assign_pointer(vq->maps[index], NULL);
> +		vq->maps[index] = NULL;
>   	}
>   	spin_unlock(&vq->mmu_lock);
>   
>   	if (map) {
> -		synchronize_rcu();
> +		vhost_vq_sync_access(vq);
>   		vhost_map_unprefetch(map);
>   	}
>   }
> @@ -457,7 +493,7 @@ static void vhost_init_maps(struct vhost_dev *dev)
>   	for (i = 0; i < dev->nvqs; ++i) {
>   		vq = dev->vqs[i];
>   		for (j = 0; j < VHOST_NUM_ADDRS; j++)
> -			RCU_INIT_POINTER(vq->maps[j], NULL);
> +			vq->maps[j] = NULL;
>   	}
>   }
>   #endif
> @@ -655,6 +691,7 @@ void vhost_dev_init(struct vhost_dev *dev,
>   		vq->indirect = NULL;
>   		vq->heads = NULL;
>   		vq->dev = dev;
> +		vq->ref = 0;
>   		mutex_init(&vq->mutex);
>   		spin_lock_init(&vq->mmu_lock);
>   		vhost_vq_reset(dev, vq);
> @@ -921,7 +958,7 @@ static int vhost_map_prefetch(struct vhost_virtqueue *vq,
>   	map->npages = npages;
>   	map->pages = pages;
>   
> -	rcu_assign_pointer(vq->maps[index], map);
> +	vq->maps[index] = map;
>   	/* No need for a synchronize_rcu(). This function should be
>   	 * called by dev->worker so we are serialized with all
>   	 * readers.
> @@ -1216,18 +1253,18 @@ static inline int vhost_put_avail_event(struct vhost_virtqueue *vq)
>   	struct vring_used *used;
>   
>   	if (!vq->iotlb) {
> -		rcu_read_lock();
> +		vhost_vq_access_map_begin(vq);
>   
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
> +		map = vq->maps[VHOST_ADDR_USED];
>   		if (likely(map)) {
>   			used = map->addr;
>   			*((__virtio16 *)&used->ring[vq->num]) =
>   				cpu_to_vhost16(vq, vq->avail_idx);
> -			rcu_read_unlock();
> +			vhost_vq_access_map_end(vq);
>   			return 0;
>   		}
>   
> -		rcu_read_unlock();
> +		vhost_vq_access_map_end(vq);
>   	}
>   #endif
>   
> @@ -1245,18 +1282,18 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq,
>   	size_t size;
>   
>   	if (!vq->iotlb) {
> -		rcu_read_lock();
> +		vhost_vq_access_map_begin(vq);
>   
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
> +		map = vq->maps[VHOST_ADDR_USED];
>   		if (likely(map)) {
>   			used = map->addr;
>   			size = count * sizeof(*head);
>   			memcpy(used->ring + idx, head, size);
> -			rcu_read_unlock();
> +			vhost_vq_access_map_end(vq);
>   			return 0;
>   		}
>   
> -		rcu_read_unlock();
> +		vhost_vq_access_map_end(vq);
>   	}
>   #endif
>   
> @@ -1272,17 +1309,17 @@ static inline int vhost_put_used_flags(struct vhost_virtqueue *vq)
>   	struct vring_used *used;
>   
>   	if (!vq->iotlb) {
> -		rcu_read_lock();
> +		vhost_vq_access_map_begin(vq);
>   
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
> +		map = vq->maps[VHOST_ADDR_USED];
>   		if (likely(map)) {
>   			used = map->addr;
>   			used->flags = cpu_to_vhost16(vq, vq->used_flags);
> -			rcu_read_unlock();
> +			vhost_vq_access_map_end(vq);
>   			return 0;
>   		}
>   
> -		rcu_read_unlock();
> +		vhost_vq_access_map_end(vq);
>   	}
>   #endif
>   
> @@ -1298,17 +1335,17 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
>   	struct vring_used *used;
>   
>   	if (!vq->iotlb) {
> -		rcu_read_lock();
> +		vhost_vq_access_map_begin(vq);
>   
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
> +		map = vq->maps[VHOST_ADDR_USED];
>   		if (likely(map)) {
>   			used = map->addr;
>   			used->idx = cpu_to_vhost16(vq, vq->last_used_idx);
> -			rcu_read_unlock();
> +			vhost_vq_access_map_end(vq);
>   			return 0;
>   		}
>   
> -		rcu_read_unlock();
> +		vhost_vq_access_map_end(vq);
>   	}
>   #endif
>   
> @@ -1362,17 +1399,17 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
>   	struct vring_avail *avail;
>   
>   	if (!vq->iotlb) {
> -		rcu_read_lock();
> +		vhost_vq_access_map_begin(vq);
>   
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
> +		map = vq->maps[VHOST_ADDR_AVAIL];
>   		if (likely(map)) {
>   			avail = map->addr;
>   			*idx = avail->idx;
> -			rcu_read_unlock();
> +			vhost_vq_access_map_end(vq);
>   			return 0;
>   		}
>   
> -		rcu_read_unlock();
> +		vhost_vq_access_map_end(vq);
>   	}
>   #endif
>   
> @@ -1387,17 +1424,17 @@ static inline int vhost_get_avail_head(struct vhost_virtqueue *vq,
>   	struct vring_avail *avail;
>   
>   	if (!vq->iotlb) {
> -		rcu_read_lock();
> +		vhost_vq_access_map_begin(vq);
>   
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
> +		map = vq->maps[VHOST_ADDR_AVAIL];
>   		if (likely(map)) {
>   			avail = map->addr;
>   			*head = avail->ring[idx & (vq->num - 1)];
> -			rcu_read_unlock();
> +			vhost_vq_access_map_end(vq);
>   			return 0;
>   		}
>   
> -		rcu_read_unlock();
> +		vhost_vq_access_map_end(vq);
>   	}
>   #endif
>   
> @@ -1413,17 +1450,17 @@ static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq,
>   	struct vring_avail *avail;
>   
>   	if (!vq->iotlb) {
> -		rcu_read_lock();
> +		vhost_vq_access_map_begin(vq);
>   
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
> +		map = vq->maps[VHOST_ADDR_AVAIL];
>   		if (likely(map)) {
>   			avail = map->addr;
>   			*flags = avail->flags;
> -			rcu_read_unlock();
> +			vhost_vq_access_map_end(vq);
>   			return 0;
>   		}
>   
> -		rcu_read_unlock();
> +		vhost_vq_access_map_end(vq);
>   	}
>   #endif
>   
> @@ -1438,15 +1475,15 @@ static inline int vhost_get_used_event(struct vhost_virtqueue *vq,
>   	struct vring_avail *avail;
>   
>   	if (!vq->iotlb) {
> -		rcu_read_lock();
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
> +		vhost_vq_access_map_begin(vq);
> +		map = vq->maps[VHOST_ADDR_AVAIL];
>   		if (likely(map)) {
>   			avail = map->addr;
>   			*event = (__virtio16)avail->ring[vq->num];
> -			rcu_read_unlock();
> +			vhost_vq_access_map_end(vq);
>   			return 0;
>   		}
> -		rcu_read_unlock();
> +		vhost_vq_access_map_end(vq);
>   	}
>   #endif
>   
> @@ -1461,17 +1498,17 @@ static inline int vhost_get_used_idx(struct vhost_virtqueue *vq,
>   	struct vring_used *used;
>   
>   	if (!vq->iotlb) {
> -		rcu_read_lock();
> +		vhost_vq_access_map_begin(vq);
>   
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
> +		map = vq->maps[VHOST_ADDR_USED];
>   		if (likely(map)) {
>   			used = map->addr;
>   			*idx = used->idx;
> -			rcu_read_unlock();
> +			vhost_vq_access_map_end(vq);
>   			return 0;
>   		}
>   
> -		rcu_read_unlock();
> +		vhost_vq_access_map_end(vq);
>   	}
>   #endif
>   
> @@ -1486,17 +1523,17 @@ static inline int vhost_get_desc(struct vhost_virtqueue *vq,
>   	struct vring_desc *d;
>   
>   	if (!vq->iotlb) {
> -		rcu_read_lock();
> +		vhost_vq_access_map_begin(vq);
>   
> -		map = rcu_dereference(vq->maps[VHOST_ADDR_DESC]);
> +		map = vq->maps[VHOST_ADDR_DESC];
>   		if (likely(map)) {
>   			d = map->addr;
>   			*desc = *(d + idx);
> -			rcu_read_unlock();
> +			vhost_vq_access_map_end(vq);
>   			return 0;
>   		}
>   
> -		rcu_read_unlock();
> +		vhost_vq_access_map_end(vq);
>   	}
>   #endif
>   
> @@ -1843,13 +1880,11 @@ static bool iotlb_access_ok(struct vhost_virtqueue *vq,
>   #if VHOST_ARCH_CAN_ACCEL_UACCESS
>   static void vhost_vq_map_prefetch(struct vhost_virtqueue *vq)
>   {
> -	struct vhost_map __rcu *map;
> +	struct vhost_map *map;
>   	int i;
>   
>   	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
> -		rcu_read_lock();
> -		map = rcu_dereference(vq->maps[i]);
> -		rcu_read_unlock();
> +		map = vq->maps[i];
>   		if (unlikely(!map))
>   			vhost_map_prefetch(vq, i);
>   	}
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index a9a2a93857d2..f9e9558a529d 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -115,16 +115,17 @@ struct vhost_virtqueue {
>   #if VHOST_ARCH_CAN_ACCEL_UACCESS
>   	/* Read by memory accessors, modified by meta data
>   	 * prefetching, MMU notifier and vring ioctl().
> -	 * Synchonrized through mmu_lock (writers) and RCU (writers
> -	 * and readers).
> +	 * Synchonrized through mmu_lock (writers) and ref counters,
> +	 * see vhost_vq_access_map_begin()/vhost_vq_access_map_end().
>   	 */
> -	struct vhost_map __rcu *maps[VHOST_NUM_ADDRS];
> +	struct vhost_map *maps[VHOST_NUM_ADDRS];
>   	/* Read by MMU notifier, modified by vring ioctl(),
>   	 * synchronized through MMU notifier
>   	 * registering/unregistering.
>   	 */
>   	struct vhost_uaddr uaddrs[VHOST_NUM_ADDRS];
>   #endif
> +	int ref;
>   	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
>   
>   	struct file *kick;

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox