Netdev List
 help / color / mirror / Atom feed
* [PATCH] igc: Prefer pcie_capability_read_word()
From: Frederick Lawler @ 2019-07-18  2:07 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: Frederick Lawler, intel-wired-lan, netdev, linux-kernel, bhelgaas
In-Reply-To: <20190718020745.8867-1-fred@fredlawl.com>

Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.

Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().

Signed-off-by: Frederick Lawler <fred@fredlawl.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 34fa0e60a780..8e8ad07a5776 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -3891,13 +3891,11 @@ void igc_write_pci_cfg(struct igc_hw *hw, u32 reg, u16 *value)
 s32 igc_read_pcie_cap_reg(struct igc_hw *hw, u32 reg, u16 *value)
 {
 	struct igc_adapter *adapter = hw->back;
-	u16 cap_offset;
 
-	cap_offset = pci_find_capability(adapter->pdev, PCI_CAP_ID_EXP);
-	if (!cap_offset)
+	if (!pci_is_pcie(adapter->pdev))
 		return -IGC_ERR_CONFIG;
 
-	pci_read_config_word(adapter->pdev, cap_offset + reg, value);
+	pcie_capability_read_word(adapter->pdev, reg, value);
 
 	return IGC_SUCCESS;
 }
@@ -3905,13 +3903,11 @@ s32 igc_read_pcie_cap_reg(struct igc_hw *hw, u32 reg, u16 *value)
 s32 igc_write_pcie_cap_reg(struct igc_hw *hw, u32 reg, u16 *value)
 {
 	struct igc_adapter *adapter = hw->back;
-	u16 cap_offset;
 
-	cap_offset = pci_find_capability(adapter->pdev, PCI_CAP_ID_EXP);
-	if (!cap_offset)
+	if (!pci_is_pcie(adapter->pdev))
 		return -IGC_ERR_CONFIG;
 
-	pci_write_config_word(adapter->pdev, cap_offset + reg, *value);
+	pcie_capability_write_word(adapter->pdev, reg, *value);
 
 	return IGC_SUCCESS;
 }
-- 
2.17.1


^ permalink raw reply related

* [PATCH] cxgb4: Prefer pcie_capability_read_word()
From: Frederick Lawler @ 2019-07-18  2:07 UTC (permalink / raw)
  To: vishal; +Cc: Frederick Lawler, netdev, linux-kernel, bhelgaas

Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.

Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().

Signed-off-by: Frederick Lawler <fred@fredlawl.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 6 ++----
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c      | 9 +++------
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 715e4edcf4a2..98ff71434673 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5441,7 +5441,6 @@ static int cxgb4_iov_configure(struct pci_dev *pdev, int num_vfs)
 		char name[IFNAMSIZ];
 		u32 devcap2;
 		u16 flags;
-		int pos;
 
 		/* If we want to instantiate Virtual Functions, then our
 		 * parent bridge's PCI-E needs to support Alternative Routing
@@ -5449,9 +5448,8 @@ static int cxgb4_iov_configure(struct pci_dev *pdev, int num_vfs)
 		 * and above.
 		 */
 		pbridge = pdev->bus->self;
-		pos = pci_find_capability(pbridge, PCI_CAP_ID_EXP);
-		pci_read_config_word(pbridge, pos + PCI_EXP_FLAGS, &flags);
-		pci_read_config_dword(pbridge, pos + PCI_EXP_DEVCAP2, &devcap2);
+		pcie_capability_read_word(pbridge, PCI_EXP_FLAGS, &flags);
+		pcie_capability_read_dword(pbridge, PCI_EXP_DEVCAP2, &devcap2);
 
 		if ((flags & PCI_EXP_FLAGS_VERS) < 2 ||
 		    !(devcap2 & PCI_EXP_DEVCAP2_ARI)) {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index f9b70be59792..346d7b59c50b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -7267,7 +7267,6 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
 	} else {
 		unsigned int pack_align;
 		unsigned int ingpad, ingpack;
-		unsigned int pcie_cap;
 
 		/* T5 introduced the separation of the Free List Padding and
 		 * Packing Boundaries.  Thus, we can select a smaller Padding
@@ -7292,8 +7291,7 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
 		 * multiple of the Maximum Payload Size.
 		 */
 		pack_align = fl_align;
-		pcie_cap = pci_find_capability(adap->pdev, PCI_CAP_ID_EXP);
-		if (pcie_cap) {
+		if (pci_is_pcie(adap->pdev)) {
 			unsigned int mps, mps_log;
 			u16 devctl;
 
@@ -7301,9 +7299,8 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
 			 * [bits 7:5] encodes sizes as powers of 2 starting at
 			 * 128 bytes.
 			 */
-			pci_read_config_word(adap->pdev,
-					     pcie_cap + PCI_EXP_DEVCTL,
-					     &devctl);
+			pcie_capability_read_word(adap->pdev, PCI_EXP_DEVCTL,
+						  &devctl);
 			mps_log = ((devctl & PCI_EXP_DEVCTL_PAYLOAD) >> 5) + 7;
 			mps = 1 << mps_log;
 			if (mps > pack_align)
-- 
2.17.1


^ permalink raw reply related

* [PATCH net] be2net: Synchronize be_update_queues with dev_watchdog
From: Benjamin Poirier @ 2019-07-18  1:42 UTC (permalink / raw)
  To: David Miller
  Cc: Sathya Perla, Ajit Khaparde, Sriharsha Basavapatna, Somnath Kotur,
	Firo Yang, Saeed Mahameed, netdev

As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
ethtool set_channels operation is started. be_tx_timeout(), which dumps
some queue structures, is not written to run concurrently with
be_update_queues(), which frees/allocates those queues structures. Add some
synchronization between the two.

Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index b7a246b33599..2edb86ec9fe9 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4698,8 +4698,13 @@ int be_update_queues(struct be_adapter *adapter)
 	int status;
 
 	if (netif_running(netdev)) {
+		/* be_tx_timeout() must not run concurrently with this
+		 * function, synchronize with an already-running dev_watchdog
+		 */
+		netif_tx_lock_bh(netdev);
 		/* device cannot transmit now, avoid dev_watchdog timeouts */
 		netif_carrier_off(netdev);
+		netif_tx_unlock_bh(netdev);
 
 		be_close(netdev);
 	}
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH 1/4] dt-bindings: allow up to four clocks for orion-mdio
From: Andrew Lunn @ 2019-07-18  1:31 UTC (permalink / raw)
  To: Rob Herring; +Cc: josua, netdev, stable, David S. Miller, Mark Rutland
In-Reply-To: <CAL_JsqK=qpCi6whqmjW2L8O=3u4oZemH=czm60q9QnC09Gr_ig@mail.gmail.com>

On Tue, Jul 09, 2019 at 04:03:28PM -0600, Rob Herring wrote:
> On Mon, Jul 8, 2019 at 8:41 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > >  Optional properties:
> > > >  - interrupts: interrupt line number for the SMI error/done interrupt
> > > > -- clocks: phandle for up to three required clocks for the MDIO instance
> > > > +- clocks: phandle for up to four required clocks for the MDIO instance
> > >
> > > This needs to enumerate exactly what the clocks are. Shouldn't there
> > > be an additional clock-names value too?
> >
> > Hi Rob
> >
> > The driver does not care what they are called. It just turns them all
> > on, and turns them off again when removed.
> 
> That's fine for the driver to do, but this is the hardware description.
> 
> It's not just what they are called, but how many too. Is 1 clock in
> the DT valid? 0? It would be unusual for a given piece of h/w to
> function with a variable number of clocks.

Hi Rob

The orion5x has 0 clocks. kirkwood, dove, Armada XP, 370 375, 380
has 1 clock. Armada 37xx has 4.

So yes, 1 clock is valid. 0 clocks is also valid. The piece of
hardware itself does not care how many clocks are feeding it, so long
as they are all turned on.

   Andrew 

^ permalink raw reply

* RE: [EXT] [PATCH v1] net: fec: optionally reset PHY via a reset-controller
From: Andy Duan @ 2019-07-18  1:24 UTC (permalink / raw)
  To: Sven Van Asbroeck
  Cc: David S . Miller, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAGngYiU7B4uuqSAawNE6RFsjGPzbj5gzK9S299H+Qy+CWFjaAg@mail.gmail.com>

From: Sven Van Asbroeck <thesven73@gmail.com> Sent: Wednesday, July 17, 2019 8:48 PM
> On Tue, Jul 16, 2019 at 9:32 PM Andy Duan <fugang.duan@nxp.com> wrote:
> >
> > Yes, so the old legacy code is kept there. But it is better to clean
> > up all if there have enough boards to verify them.
> 
> Would it make sense to print a warning message to the log whenever
> someone tries to use the legacy phy reset on the fec?

It is feasible, just as reminder to drop such usage.

^ permalink raw reply

* [PATCH iproute2] json: fix backslash escape typo in jsonw_puts
From: Ivan Delalande @ 2019-07-18  1:15 UTC (permalink / raw)
  To: Stephen Hemminger, David Ahern; +Cc: netdev

Fixes: fcc16c22 ("provide common json output formatter")
Signed-off-by: Ivan Delalande <colona@arista.com>
---
 lib/json_writer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/json_writer.c b/lib/json_writer.c
index 5004c181..88c5eb88 100644
--- a/lib/json_writer.c
+++ b/lib/json_writer.c
@@ -75,7 +75,7 @@ static void jsonw_puts(json_writer_t *self, const char *str)
 			fputs("\\b", self->out);
 			break;
 		case '\\':
-			fputs("\\n", self->out);
+			fputs("\\\\", self->out);
 			break;
 		case '"':
 			fputs("\\\"", self->out);
-- 
2.22.0

^ permalink raw reply related

* Re: [PATCH ghak90 V6 02/10] audit: add container id
From: Richard Guy Briggs @ 2019-07-18  0:51 UTC (permalink / raw)
  To: Paul Moore
  Cc: Serge E. Hallyn, Tycho Andersen, containers, linux-api,
	Linux-Audit Mailing List, linux-fsdevel, LKML, netdev,
	netfilter-devel, sgrubb, omosnace, dhowells, simo, Eric Paris,
	ebiederm, nhorman
In-Reply-To: <CAHC9VhScHizB2r5q3T5s0P3jkYdvzBPPudDksosYFp_TO7W9-Q@mail.gmail.com>

On 2019-07-16 19:30, Paul Moore wrote:
> On Tue, Jul 16, 2019 at 6:03 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2019-07-15 17:04, Paul Moore wrote:
> > > On Mon, Jul 8, 2019 at 2:06 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> 
> ...
> 
> > > > If we can't trust ns_capable() then why are we passing on
> > > > CAP_AUDIT_CONTROL?  It is being passed down and not stripped purposely
> > > > by the orchestrator/engine.  If ns_capable() isn't inherited how is it
> > > > gained otherwise?  Can it be inserted by cotainer image?  I think the
> > > > answer is "no".  Either we trust ns_capable() or we have audit
> > > > namespaces (recommend based on user namespace) (or both).
> > >
> > > My thinking is that since ns_capable() checks the credentials with
> > > respect to the current user namespace we can't rely on it to control
> > > access since it would be possible for a privileged process running
> > > inside an unprivileged container to manipulate the audit container ID
> > > (containerized process has CAP_AUDIT_CONTROL, e.g. running as root in
> > > the container, while the container itself does not).
> >
> > What makes an unprivileged container unprivileged?  "root", or "CAP_*"?
> 
> My understanding is that when most people refer to an unprivileged
> container they are referring to a container run without capabilities
> or a container run by a user other than root.  I'm sure there are
> better definitions out there, by folks much smarter than me on these
> things, but that's my working definition.

Close enough to my understanding...

> > If CAP_AUDIT_CONTROL is granted, does "root" matter?
> 
> Our discussions here have been about capabilities, not UIDs.  The only
> reason root might matter is that it generally has the full capability
> set.

Good, that's my understanding.

> > Does it matter what user namespace it is in?
> 
> What likely matters is what check is called: capable() or
> ns_capable().  Those can yield very different results.

Ok, I finally found what I was looking for to better understand the
challenge with trusting ns_capable().  Sorry for being so dense and slow
on this one.  I thought I had gone through the code carefully enough,
but this time I finally found it.  set_cred_user_ns() sets a full set of
capabilities rather than inheriting them from the parent user_ns, called
from userns_install() or create_userns().  Even if the container
orchestrator/engine restricts those capabilities on its own containers,
they could easily unshare a userns and get a full set unless it also
restricted CAP_SYS_ADMIN, which is used too many other places to be
practical to restrict.

> > I understand that root is *gained* in an
> > unprivileged user namespace, but capabilities are inherited or permitted
> > and that process either has it or it doesn't and an unprivileged user
> > namespace can't gain a capability that has been rescinded.  Different
> > subsystems use the userid or capabilities or both to determine
> > privileges.
> 
> Once again, I believe the important thing to focus on here is
> capable() vs ns_capable().  We can't safely rely on ns_capable() for
> the audit container ID policy since that is easily met inside the
> container regardless of the process' creds which started the
> container.

Agreed.

> > In this case, is the userid relevant?
> 
> We don't do UID checks, we do capability checks, so yes, the UID is irrelevant.

Agreed.

> > > > At this point I would say we are at an impasse unless we trust
> > > > ns_capable() or we implement audit namespaces.
> > >
> > > I'm not sure how we can trust ns_capable(), but if you can think of a
> > > way I would love to hear it.  I'm also not sure how namespacing audit
> > > is helpful (see my above comments), but if you think it is please
> > > explain.
> >
> > So if we are not namespacing, why do we not trust capabilities?
> 
> We can trust capable(CAP_AUDIT_CONTROL) for enforcing audit container
> ID policy, we can not trust ns_capable(CAP_AUDIT_CONTROL).

Ok.  So does a process in a non-init user namespace have two (or more)
sets of capabilities stored in creds, one in the init_user_ns, and one
in current_user_ns?  Or does it get stripped of all its capabilities in
init_user_ns once it has its own set in current_user_ns?  If the former,
then we can use capable().  If the latter, we need another mechanism, as
you have suggested might be needed.

If some random unprivileged user wants to fire up a container
orchestrator/engine in his own user namespace, then audit needs to be
namespaced.  Can we safely discard this scenario for now?  That user can
use a VM.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH v3 bpf-next 05/12] libbpf: add resizable non-thread safe internal hashmap
From: Arnaldo Carvalho de Melo @ 2019-07-18  0:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: andrii.nakryiko, ast, daniel, netdev, bpf, kernel-team,
	linux-perf-users, Jiri Olsa, Namhyung Kim, Adrian Hunter
In-Reply-To: <20190524185908.3562231-6-andriin@fb.com>

Em Fri, May 24, 2019 at 11:59:00AM -0700, Andrii Nakryiko escreveu:
> There is a need for fast point lookups inside libbpf for multiple use
> cases (e.g., name resolution for BTF-to-C conversion, by-name lookups in
> BTF for upcoming BPF CO-RE relocation support, etc). This patch
> implements simple resizable non-thread safe hashmap using single linked
> list chains.

This is breaking the tools/perf build in some systems where __WORDSIZE use in
tools/lib/bpf/hashmap.h is not finding the definition of __WORDSIZE,
which I think requires using:

#include <limits.h>

Or perhaps

#include <stdint.h>

Non-exhaustive search on a fedora:30 system:

[acme@quaco perf]$ grep "define.*__WORDSIZE" /usr/include/*/*.h
/usr/include/bits/elfclass.h:#define __ELF_NATIVE_CLASS __WORDSIZE
/usr/include/bits/siginfo-arch.h:#if defined __x86_64__ && __WORDSIZE == 32
/usr/include/bits/timesize.h:# define __TIMESIZE	__WORDSIZE
/usr/include/bits/wordsize.h:# define __WORDSIZE	64
/usr/include/bits/wordsize.h:# define __WORDSIZE	32
/usr/include/bits/wordsize.h:#define __WORDSIZE32_SIZE_ULONG		0
/usr/include/bits/wordsize.h:#define __WORDSIZE32_PTRDIFF_LONG	0
/usr/include/bits/wordsize.h:# define __WORDSIZE_TIME64_COMPAT32	1
/usr/include/bits/wordsize.h:# define __WORDSIZE_TIME64_COMPAT32	0
[acme@quaco perf]$ grep bits\/wordsize.h /usr/include/*.h
/usr/include/bfd.h:#include <bits/wordsize.h>
/usr/include/limits.h:#include <bits/wordsize.h>
/usr/include/pthread.h:#include <bits/wordsize.h>
/usr/include/stdint.h:#include <bits/wordsize.h>
/usr/include/tiffconf.h:#include <bits/wordsize.h>
[acme@quaco perf]$

On fedora:30 it works, probably because some header included from there
ends up adding the file that has that def, but these fail, for instance:

[perfbuilder@quaco linux-perf-tools-build]$ export PERF_TARBALL=http://192.168.124.1/perf/perf-5.2.0.tar.xz
[perfbuilder@quaco linux-perf-tools-build]$ time dm
   1    13.57 alpine:3.4                    : FAIL gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
   2    13.78 alpine:3.5                    : FAIL gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
   3    12.42 alpine:3.6                    : FAIL gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final)
   4    15.01 alpine:3.7                    : FAIL gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
   5    13.80 alpine:3.8                    : FAIL gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
   6    15.37 alpine:3.9                    : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
   7    15.10 alpine:3.10                   : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
   8    15.26 alpine:edge                   : FAIL gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 7.0.1 (tags/RELEASE_701/final) (based on LLVM 7.0.1)
   9: amazonlinux:1^C

I'm trying to see if adding #include <limits.h> fixes the problems in
all the distros in my container based build test suite.

Lets see with at least alpine:3.4:

Nope, didn't work, will try revisiting this tomorrow...

I'll let it build anyway to see if this fails in other systems/libcs,
BTW, Alpine uses musl libc.

- Arnaldo
 
> Four different insert strategies are supported:
>  - HASHMAP_ADD - only add key/value if key doesn't exist yet;
>  - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise,
>    update value;
>  - HASHMAP_UPDATE - update value, if key already exists; otherwise, do
>    nothing and return -ENOENT;
>  - HASHMAP_APPEND - always add key/value pair, even if key already exists.
>    This turns hashmap into a multimap by allowing multiple values to be
>    associated with the same key. Most useful read API for such hashmap is
>    hashmap__for_each_key_entry() iteration. If hashmap__find() is still
>    used, it will return last inserted key/value entry (first in a bucket
>    chain).
> 
> For HASHMAP_SET and HASHMAP_UPDATE, old key/value pair is returned, so
> that calling code can handle proper memory management, if necessary.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  tools/lib/bpf/Build     |   2 +-
>  tools/lib/bpf/hashmap.c | 229 ++++++++++++++++++++++++++++++++++++++++
>  tools/lib/bpf/hashmap.h | 173 ++++++++++++++++++++++++++++++
>  3 files changed, 403 insertions(+), 1 deletion(-)
>  create mode 100644 tools/lib/bpf/hashmap.c
>  create mode 100644 tools/lib/bpf/hashmap.h
> 
> diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
> index ee9d5362f35b..dcf0d36598e0 100644
> --- a/tools/lib/bpf/Build
> +++ b/tools/lib/bpf/Build
> @@ -1 +1 @@
> -libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o
> +libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o hashmap.o
> diff --git a/tools/lib/bpf/hashmap.c b/tools/lib/bpf/hashmap.c
> new file mode 100644
> index 000000000000..6122272943e6
> --- /dev/null
> +++ b/tools/lib/bpf/hashmap.c
> @@ -0,0 +1,229 @@
> +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> +
> +/*
> + * Generic non-thread safe hash map implementation.
> + *
> + * Copyright (c) 2019 Facebook
> + */
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <errno.h>
> +#include <linux/err.h>
> +#include "hashmap.h"
> +
> +/* start with 4 buckets */
> +#define HASHMAP_MIN_CAP_BITS 2
> +
> +static void hashmap_add_entry(struct hashmap_entry **pprev,
> +			      struct hashmap_entry *entry)
> +{
> +	entry->next = *pprev;
> +	*pprev = entry;
> +}
> +
> +static void hashmap_del_entry(struct hashmap_entry **pprev,
> +			      struct hashmap_entry *entry)
> +{
> +	*pprev = entry->next;
> +	entry->next = NULL;
> +}
> +
> +void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
> +		   hashmap_equal_fn equal_fn, void *ctx)
> +{
> +	map->hash_fn = hash_fn;
> +	map->equal_fn = equal_fn;
> +	map->ctx = ctx;
> +
> +	map->buckets = NULL;
> +	map->cap = 0;
> +	map->cap_bits = 0;
> +	map->sz = 0;
> +}
> +
> +struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
> +			     hashmap_equal_fn equal_fn,
> +			     void *ctx)
> +{
> +	struct hashmap *map = malloc(sizeof(struct hashmap));
> +
> +	if (!map)
> +		return ERR_PTR(-ENOMEM);
> +	hashmap__init(map, hash_fn, equal_fn, ctx);
> +	return map;
> +}
> +
> +void hashmap__clear(struct hashmap *map)
> +{
> +	free(map->buckets);
> +	map->cap = map->cap_bits = map->sz = 0;
> +}
> +
> +void hashmap__free(struct hashmap *map)
> +{
> +	if (!map)
> +		return;
> +
> +	hashmap__clear(map);
> +	free(map);
> +}
> +
> +size_t hashmap__size(const struct hashmap *map)
> +{
> +	return map->sz;
> +}
> +
> +size_t hashmap__capacity(const struct hashmap *map)
> +{
> +	return map->cap;
> +}
> +
> +static bool hashmap_needs_to_grow(struct hashmap *map)
> +{
> +	/* grow if empty or more than 75% filled */
> +	return (map->cap == 0) || ((map->sz + 1) * 4 / 3 > map->cap);
> +}
> +
> +static int hashmap_grow(struct hashmap *map)
> +{
> +	struct hashmap_entry **new_buckets;
> +	struct hashmap_entry *cur, *tmp;
> +	size_t new_cap_bits, new_cap;
> +	size_t h;
> +	int bkt;
> +
> +	new_cap_bits = map->cap_bits + 1;
> +	if (new_cap_bits < HASHMAP_MIN_CAP_BITS)
> +		new_cap_bits = HASHMAP_MIN_CAP_BITS;
> +
> +	new_cap = 1UL << new_cap_bits;
> +	new_buckets = calloc(new_cap, sizeof(new_buckets[0]));
> +	if (!new_buckets)
> +		return -ENOMEM;
> +
> +	hashmap__for_each_entry_safe(map, cur, tmp, bkt) {
> +		h = hash_bits(map->hash_fn(cur->key, map->ctx), new_cap_bits);
> +		hashmap_add_entry(&new_buckets[h], cur);
> +	}
> +
> +	map->cap = new_cap;
> +	map->cap_bits = new_cap_bits;
> +	free(map->buckets);
> +	map->buckets = new_buckets;
> +
> +	return 0;
> +}
> +
> +static bool hashmap_find_entry(const struct hashmap *map,
> +			       const void *key, size_t hash,
> +			       struct hashmap_entry ***pprev,
> +			       struct hashmap_entry **entry)
> +{
> +	struct hashmap_entry *cur, **prev_ptr;
> +
> +	if (!map->buckets)
> +		return false;
> +
> +	for (prev_ptr = &map->buckets[hash], cur = *prev_ptr;
> +	     cur;
> +	     prev_ptr = &cur->next, cur = cur->next) {
> +		if (map->equal_fn(cur->key, key, map->ctx)) {
> +			if (pprev)
> +				*pprev = prev_ptr;
> +			*entry = cur;
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +int hashmap__insert(struct hashmap *map, const void *key, void *value,
> +		    enum hashmap_insert_strategy strategy,
> +		    const void **old_key, void **old_value)
> +{
> +	struct hashmap_entry *entry;
> +	size_t h;
> +	int err;
> +
> +	if (old_key)
> +		*old_key = NULL;
> +	if (old_value)
> +		*old_value = NULL;
> +
> +	h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> +	if (strategy != HASHMAP_APPEND &&
> +	    hashmap_find_entry(map, key, h, NULL, &entry)) {
> +		if (old_key)
> +			*old_key = entry->key;
> +		if (old_value)
> +			*old_value = entry->value;
> +
> +		if (strategy == HASHMAP_SET || strategy == HASHMAP_UPDATE) {
> +			entry->key = key;
> +			entry->value = value;
> +			return 0;
> +		} else if (strategy == HASHMAP_ADD) {
> +			return -EEXIST;
> +		}
> +	}
> +
> +	if (strategy == HASHMAP_UPDATE)
> +		return -ENOENT;
> +
> +	if (hashmap_needs_to_grow(map)) {
> +		err = hashmap_grow(map);
> +		if (err)
> +			return err;
> +		h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> +	}
> +
> +	entry = malloc(sizeof(struct hashmap_entry));
> +	if (!entry)
> +		return -ENOMEM;
> +
> +	entry->key = key;
> +	entry->value = value;
> +	hashmap_add_entry(&map->buckets[h], entry);
> +	map->sz++;
> +
> +	return 0;
> +}
> +
> +bool hashmap__find(const struct hashmap *map, const void *key, void **value)
> +{
> +	struct hashmap_entry *entry;
> +	size_t h;
> +
> +	h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> +	if (!hashmap_find_entry(map, key, h, NULL, &entry))
> +		return false;
> +
> +	if (value)
> +		*value = entry->value;
> +	return true;
> +}
> +
> +bool hashmap__delete(struct hashmap *map, const void *key,
> +		     const void **old_key, void **old_value)
> +{
> +	struct hashmap_entry **pprev, *entry;
> +	size_t h;
> +
> +	h = hash_bits(map->hash_fn(key, map->ctx), map->cap_bits);
> +	if (!hashmap_find_entry(map, key, h, &pprev, &entry))
> +		return false;
> +
> +	if (old_key)
> +		*old_key = entry->key;
> +	if (old_value)
> +		*old_value = entry->value;
> +
> +	hashmap_del_entry(pprev, entry);
> +	free(entry);
> +	map->sz--;
> +
> +	return true;
> +}
> +
> diff --git a/tools/lib/bpf/hashmap.h b/tools/lib/bpf/hashmap.h
> new file mode 100644
> index 000000000000..03748a742146
> --- /dev/null
> +++ b/tools/lib/bpf/hashmap.h
> @@ -0,0 +1,173 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +
> +/*
> + * Generic non-thread safe hash map implementation.
> + *
> + * Copyright (c) 2019 Facebook
> + */
> +#ifndef __LIBBPF_HASHMAP_H
> +#define __LIBBPF_HASHMAP_H
> +
> +#include <stdbool.h>
> +#include <stddef.h>
> +#include "libbpf_internal.h"
> +
> +static inline size_t hash_bits(size_t h, int bits)
> +{
> +	/* shuffle bits and return requested number of upper bits */
> +	return (h * 11400714819323198485llu) >> (__WORDSIZE - bits);
> +}
> +
> +typedef size_t (*hashmap_hash_fn)(const void *key, void *ctx);
> +typedef bool (*hashmap_equal_fn)(const void *key1, const void *key2, void *ctx);
> +
> +struct hashmap_entry {
> +	const void *key;
> +	void *value;
> +	struct hashmap_entry *next;
> +};
> +
> +struct hashmap {
> +	hashmap_hash_fn hash_fn;
> +	hashmap_equal_fn equal_fn;
> +	void *ctx;
> +
> +	struct hashmap_entry **buckets;
> +	size_t cap;
> +	size_t cap_bits;
> +	size_t sz;
> +};
> +
> +#define HASHMAP_INIT(hash_fn, equal_fn, ctx) {	\
> +	.hash_fn = (hash_fn),			\
> +	.equal_fn = (equal_fn),			\
> +	.ctx = (ctx),				\
> +	.buckets = NULL,			\
> +	.cap = 0,				\
> +	.cap_bits = 0,				\
> +	.sz = 0,				\
> +}
> +
> +void hashmap__init(struct hashmap *map, hashmap_hash_fn hash_fn,
> +		   hashmap_equal_fn equal_fn, void *ctx);
> +struct hashmap *hashmap__new(hashmap_hash_fn hash_fn,
> +			     hashmap_equal_fn equal_fn,
> +			     void *ctx);
> +void hashmap__clear(struct hashmap *map);
> +void hashmap__free(struct hashmap *map);
> +
> +size_t hashmap__size(const struct hashmap *map);
> +size_t hashmap__capacity(const struct hashmap *map);
> +
> +/*
> + * Hashmap insertion strategy:
> + * - HASHMAP_ADD - only add key/value if key doesn't exist yet;
> + * - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise,
> + *   update value;
> + * - HASHMAP_UPDATE - update value, if key already exists; otherwise, do
> + *   nothing and return -ENOENT;
> + * - HASHMAP_APPEND - always add key/value pair, even if key already exists.
> + *   This turns hashmap into a multimap by allowing multiple values to be
> + *   associated with the same key. Most useful read API for such hashmap is
> + *   hashmap__for_each_key_entry() iteration. If hashmap__find() is still
> + *   used, it will return last inserted key/value entry (first in a bucket
> + *   chain).
> + */
> +enum hashmap_insert_strategy {
> +	HASHMAP_ADD,
> +	HASHMAP_SET,
> +	HASHMAP_UPDATE,
> +	HASHMAP_APPEND,
> +};
> +
> +/*
> + * hashmap__insert() adds key/value entry w/ various semantics, depending on
> + * provided strategy value. If a given key/value pair replaced already
> + * existing key/value pair, both old key and old value will be returned
> + * through old_key and old_value to allow calling code do proper memory
> + * management.
> + */
> +int hashmap__insert(struct hashmap *map, const void *key, void *value,
> +		    enum hashmap_insert_strategy strategy,
> +		    const void **old_key, void **old_value);
> +
> +static inline int hashmap__add(struct hashmap *map,
> +			       const void *key, void *value)
> +{
> +	return hashmap__insert(map, key, value, HASHMAP_ADD, NULL, NULL);
> +}
> +
> +static inline int hashmap__set(struct hashmap *map,
> +			       const void *key, void *value,
> +			       const void **old_key, void **old_value)
> +{
> +	return hashmap__insert(map, key, value, HASHMAP_SET,
> +			       old_key, old_value);
> +}
> +
> +static inline int hashmap__update(struct hashmap *map,
> +				  const void *key, void *value,
> +				  const void **old_key, void **old_value)
> +{
> +	return hashmap__insert(map, key, value, HASHMAP_UPDATE,
> +			       old_key, old_value);
> +}
> +
> +static inline int hashmap__append(struct hashmap *map,
> +				  const void *key, void *value)
> +{
> +	return hashmap__insert(map, key, value, HASHMAP_APPEND, NULL, NULL);
> +}
> +
> +bool hashmap__delete(struct hashmap *map, const void *key,
> +		     const void **old_key, void **old_value);
> +
> +bool hashmap__find(const struct hashmap *map, const void *key, void **value);
> +
> +/*
> + * hashmap__for_each_entry - iterate over all entries in hashmap
> + * @map: hashmap to iterate
> + * @cur: struct hashmap_entry * used as a loop cursor
> + * @bkt: integer used as a bucket loop cursor
> + */
> +#define hashmap__for_each_entry(map, cur, bkt)				    \
> +	for (bkt = 0; bkt < map->cap; bkt++)				    \
> +		for (cur = map->buckets[bkt]; cur; cur = cur->next)
> +
> +/*
> + * hashmap__for_each_entry_safe - iterate over all entries in hashmap, safe
> + * against removals
> + * @map: hashmap to iterate
> + * @cur: struct hashmap_entry * used as a loop cursor
> + * @tmp: struct hashmap_entry * used as a temporary next cursor storage
> + * @bkt: integer used as a bucket loop cursor
> + */
> +#define hashmap__for_each_entry_safe(map, cur, tmp, bkt)		    \
> +	for (bkt = 0; bkt < map->cap; bkt++)				    \
> +		for (cur = map->buckets[bkt];				    \
> +		     cur && ({tmp = cur->next; true; });		    \
> +		     cur = tmp)
> +
> +/*
> + * hashmap__for_each_key_entry - iterate over entries associated with given key
> + * @map: hashmap to iterate
> + * @cur: struct hashmap_entry * used as a loop cursor
> + * @key: key to iterate entries for
> + */
> +#define hashmap__for_each_key_entry(map, cur, _key)			    \
> +	for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\
> +					     map->cap_bits);		    \
> +		     map->buckets ? map->buckets[bkt] : NULL; });	    \
> +	     cur;							    \
> +	     cur = cur->next)						    \
> +		if (map->equal_fn(cur->key, (_key), map->ctx))
> +
> +#define hashmap__for_each_key_entry_safe(map, cur, tmp, _key)		    \
> +	for (cur = ({ size_t bkt = hash_bits(map->hash_fn((_key), map->ctx),\
> +					     map->cap_bits);		    \
> +		     cur = map->buckets ? map->buckets[bkt] : NULL; });	    \
> +	     cur && ({ tmp = cur->next; true; });			    \
> +	     cur = tmp)							    \
> +		if (map->equal_fn(cur->key, (_key), map->ctx))
> +
> +#endif /* __LIBBPF_HASHMAP_H */
> -- 
> 2.17.1

-- 

- Arnaldo

^ permalink raw reply

* Re: [PATCH AUTOSEL 5.2 226/249] selftests: bpf: fix inlines in test_lwt_seg6local
From: Sasha Levin @ 2019-07-17 23:47 UTC (permalink / raw)
  To: Jiri Benc
  Cc: linux-kernel, stable, Yonghong Song, Daniel Borkmann,
	linux-kselftest, netdev, bpf, clang-built-linux
In-Reply-To: <20190717114334.5556a14e@redhat.com>

On Wed, Jul 17, 2019 at 11:43:34AM +0200, Jiri Benc wrote:
>On Mon, 15 Jul 2019 09:46:31 -0400, Sasha Levin wrote:
>> From: Jiri Benc <jbenc@redhat.com>
>>
>> [ Upstream commit 11aca65ec4db09527d3e9b6b41a0615b7da4386b ]
>>
>> Selftests are reporting this failure in test_lwt_seg6local.sh:
>
>I don't think this is critical in any way and I don't think this is a
>stable material. How was this selected?

It fixes a bug, right?

--
Thanks,
Sasha

^ permalink raw reply

* Re: phylink: flow control on fixed-link not working.
From: René van Dorst @ 2019-07-17 22:58 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: netdev
In-Reply-To: <20190717215150.tk3gvq7lqc56scac@shell.armlinux.org.uk>

Quoting Russell King - ARM Linux admin <linux@armlinux.org.uk>:

> On Wed, Jul 17, 2019 at 09:31:11PM +0000, René van Dorst wrote:
>> Hi,
>>
>> I am trying to enable flow control/pause on PHYLINK and fixed-link.
>>
>> My setup SOC mac (mt7621) <-> RGMII <-> SWITCH mac (mt7530).
>>
>> It seems that in fixed-link mode all the flow control/pause bits are cleared
>> in
>> phylink_parse_fixedlink(). If I read phylink_parse_fixedlink() [0]  
>> correctly,
>> I see that pl->link_config.advertising is AND with pl->supprted  
>> which has only
>> the PHY_SETTING() modes bits set. pl->link_config.advertising is  
>> losing Pause
>> bits. pl->link_config.advertising is used in phylink_resolve_flow()  
>> to set the
>> MLO_PAUSE_RX/TX BITS.
>>
>> I think this is an error.
>> Because in phylink_start() see this part [1].
>>
>>  /* Apply the link configuration to the MAC when starting. This allows
>>   * a fixed-link to start with the correct parameters, and also
>>   * ensures that we set the appropriate advertisement for Serdes links.
>>   */
>>  phylink_resolve_flow(pl, &pl->link_config);
>>  phylink_mac_config(pl, &pl->link_config);
>>
>>
>> If I add a this hacky patch below, flow control is enabled on the  
>> fixed-link.
>>         if (s) {
>>                 __set_bit(s->bit, pl->supported);
>> +               if (phylink_test(pl->link_config.advertising, Pause))
>> +                       phylink_set(pl->supported, Pause);
>>         } else {
>>
>> So is phylink_parse_fixedlink() broken or should it handled in a other way?
>
> Quite simply, if the MAC says it doesn't support pause modes (i.o.w.
> the validate callback clears them) then pause modes aren't supported.

Hi Russel,

Thanks for your response.

I believe that I am setting pause bits right on both ends see SOC [0] and
SWITCH [1] and also in the DTS [2].

Correct me if it is not the right way.


Maybe I am looking in the wrong part of the code.
But I added many debug lines in phylink_parse_fixedlink() [3] to see what
happens with the Pause bit in the pl->link_config.advertising and  
pl->supported.


This is the dmesg output.
[    1.991245] libphy: Fixed MDIO Bus: probed
[    2.031260] phylink_create: config0: Pause
[    2.039410] phylink_create: supported: Pause
[    2.047904] mtk_validate: mask: Pause
[    2.055186] mtk_validate: supported: Pause
[    2.063332] mtk_validate: advertising: Pause
[    2.071825] phylink_create: config1: Pause
[    2.079966] phylink_create: config2: Pause
[    2.088132] phylink_parse_fixedlink: config: Pause
[    2.097660] phylink_parse_fixedlink: support: Pause
[    2.107366] mtk_validate: mask: Pause
[    2.114647] mtk_validate: supported: Pause
[    2.122792] mtk_validate: advertising: Pause
[    2.131283] phylink_parse_fixedlink: config2: Pause
[    2.140971] phylink_parse_fixedlink: support2: Pause
[    2.150845] phylink_parse_fixedlink: config3: Pause
[    2.160546] phylink_parse_fixedlink: support3: Pause
[    2.170420] phylink_parse_fixedlink: config4: Pause
[    2.180120] phylink_parse_fixedlink: config5: Pause

[    5.854674] mt7530 mdio-bus:1f: configuring for fixed/trgmii link mode
[    5.867665] phylink_resolve_flow: PAUSE_AN: pause: 0, 12, 8dfba630
[    5.867670] phylink_resolve_flow: new_pause: 0
[    5.879980] mt7530 mdio-bus:1f: phylink_mac_config:  
mode=fixed/trgmii/1Gbps/Full adv=00,00000000,00000220 pause=12 link=1  
an=1
[    6.651239] DSA: tree 0 setup
[    6.658192] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[    6.672108] mt7530 mdio-bus:1f: phylink_mac_config:  
mode=fixed/trgmii/1Gbps/Full adv=00,00000000,00000220 pause=12 link=1  
an=1
[   28.937543] mtk_soc_eth 1e100000.ethernet eth0: configuring for  
fixed/trgmii link mode
[   28.965884] mtk_soc_eth 1e100000.ethernet eth0: phylink_mac_config:  
mode=fixed/trgmii/1Gbps/Full adv=00,00000000,00000220 pause=12 link=1  
an=1
[   29.000740] mtk_soc_eth 1e100000.ethernet eth0: phylink_mac_config:  
mode=fixed/trgmii/1Gbps/Full adv=00,00000000,00000220 pause=12 link=1  
an=1
[   29.026392] mtk_soc_eth 1e100000.ethernet eth0: Link is Up -  
1Gbps/Full - flow control off
[   29.373577] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready


I don't see the "config6:" [4] debug.
I think the pause bits are always cleared in  
pl->link_config.advertising by phylink_parse_fixedlink()

Again I may understand the code wrong or I am looking at the wrong place.
So I hope you can point me in the right direction.

Greats,

René


[0]:  
https://github.com/vDorst/linux-1/blob/8538cdefd425592d249a71445c466159b0f27475/drivers/net/ethernet/mediatek/mtk_eth_soc.c#L502
[1]:  
https://github.com/vDorst/linux-1/blob/8538cdefd425592d249a71445c466159b0f27475/drivers/net/dsa/mt7530.c#L1468
[2]:  
https://github.com/vDorst/linux-1/blob/8538cdefd425592d249a71445c466159b0f27475/drivers/staging/mt7621-dts/UBNT-ER-e50.dtsi#L122
[3]:  
https://github.com/vDorst/linux-1/blob/8538cdefd425592d249a71445c466159b0f27475/drivers/net/phy/phylink.c#L214
[4]:  
https://github.com/vDorst/linux-1/blob/8538cdefd425592d249a71445c466159b0f27475/drivers/net/phy/phylink.c#L263

>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up




^ permalink raw reply

* Re: [PATCH] qlge: Move drivers/net/ethernet/qlogic/qlge/ to drivers/staging/qlge/
From: Benjamin Poirier @ 2019-07-17 22:52 UTC (permalink / raw)
  To: David Miller; +Cc: gregkh, GR-Linux-NIC-Dev, manishc, netdev
In-Reply-To: <20190717.120208.205802053970227674.davem@davemloft.net>

On 2019/07/17 12:02, David Miller wrote:
> From: Benjamin Poirier <bpoirier@suse.com>
> Date: Tue, 16 Jul 2019 11:34:59 +0900
> 
> > The hardware has been declared EOL by the vendor more than 5 years ago.
> > What's more relevant to the Linux kernel is that the quality of this driver
> > is not on par with many other mainline drivers.
> > 
> > Cc: Manish Chopra <manishc@marvell.com>
> > Message-id: <20190617074858.32467-1-bpoirier@suse.com>
> > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> 
> Please resubmit this when the net-next tree opens back up.
> 

Sorry, I thought this was gonna go through Greg's tree. Will do.

^ permalink raw reply

* Re: [PATCH net] ipv6: rt6_check should return NULL if 'from' is NULL
From: David Miller @ 2019-07-17 22:26 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, linux-kernel, dsahern
In-Reply-To: <20190717220843.974-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Wed, 17 Jul 2019 15:08:43 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Paul reported that l2tp sessions were broken after the commit referenced
> in the Fixes tag. Prior to this commit rt6_check returned NULL if the
> rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB
> entry. Restore that behavior.
> 
> Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
> Reported-by: Paul Donohue <linux-kernel@PaulSD.com>
> Tested-by: Paul Donohue <linux-kernel@PaulSD.com>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [net 1/1] tipc: initialize 'validated' field of received packets
From: David Miller @ 2019-07-17 22:24 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, gordan.mihaljevic, tung.q.nguyen, hoang.h.le, canh.d.luu,
	ying.xue, tipc-discussion
In-Reply-To: <1563399824-4462-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 17 Jul 2019 23:43:44 +0200

> The tipc_msg_validate() function leaves a boolean flag 'validated' in
> the validated buffer's control block, to avoid performing this action
> more than once. However, at reception of new packets, the position of
> this field may already have been set by lower layer protocols, so
> that the packet is erroneously perceived as already validated by TIPC.
> 
> We fix this by initializing the said field to 'false' before performing
> the initial validation.
> 
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH bpf] bpf: fix narrower loads on s390
From: Y Song @ 2019-07-17 22:23 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: bpf, netdev, gor, heiko.carstens
In-Reply-To: <B91434A8-6056-49E2-852D-6DE5FFD53B29@linux.ibm.com>

On Wed, Jul 17, 2019 at 1:52 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> > Am 17.07.2019 um 18:25 schrieb Y Song <ys114321@gmail.com>:
> >
> > On Wed, Jul 17, 2019 at 3:36 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> >>
> >>
> >> Here is a better one: len=0x11223344 and we would like to do
> >> ((u8 *)&len)[3].
> >>
> >> len is represented as `11 22 33 44` in memory, so the desired result is
> >> 0x44. It can be obtained by doing (*(u32 *)&len) & 0xff, but today the
> >> verifier does ((*(u32 *)&len) >> 24) & 0xff instead.
> >
> > What you described above for the memory layout all makes sense.
> > The root cause is for big endian, we should do *((u8 *)&len + 3).
> > This is exactly what macros in test_pkt_md_access.c tries to do.
> >
> > if  __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> > #define TEST_FIELD(TYPE, FIELD, MASK)                                   \
> >        {                                                               \
> >                TYPE tmp = *(volatile TYPE *)&skb->FIELD;               \
> >                if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK))   \
> >                        return TC_ACT_SHOT;                             \
> >        }
> > #else
> > #define TEST_FIELD_OFFSET(a, b) ((sizeof(a) - sizeof(b)) / sizeof(b))
> > #define TEST_FIELD(TYPE, FIELD, MASK)                                   \
> >        {                                                               \
> >                TYPE tmp = *((volatile TYPE *)&skb->FIELD +             \
> >                              TEST_FIELD_OFFSET(skb->FIELD, TYPE));     \
> >                if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK))   \
> >                        return TC_ACT_SHOT;                             \
> >        }
> > #endif
> >
> > Could you check whether your __BYTE_ORDER__ is set
> > correctly or not for this case? You may need to tweak Makefile
> > if you are doing cross compilation, I am not sure how as I
> > did not have environment.
>
> I’m building natively on s390.
>
> Here is the (formatted) preprocessed C code for the first condition:
>
> {
>         __u8 tmp = *((volatile __u8 *)&skb->len +
>                 ((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
>         if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
> };
>
> So I believe the endianness is chosen correctly.
>
> Here is the clang-generated BPF bytecode for the first condition:
>
> # llvm-objdump -d test_pkt_md_access.o
> 0000000000000000 process:
>        0:       71 21 00 03 00 00 00 00 r2 = *(u8 *)(r1 + 3)
>        1:       61 31 00 00 00 00 00 00 r3 = *(u32 *)(r1 + 0)
>        2:       57 30 00 00 00 00 00 ff r3 &= 255
>        3:       5d 23 00 1d 00 00 00 00 if r2 != r3 goto +29 <LBB0_10>
>
> This also looks good to me.
>
> Finally, here is the verifier-generated BPF bytecode:
>
> # bpftool prog dump xlated id 14
> ; TEST_FIELD(__u8,  len, 0xFF);
>    0: (61) r2 = *(u32 *)(r1 +104)
>    1: (bc) w2 = w2
>    2: (74) w2 >>= 24
>    3: (bc) w2 = w2
>    4: (54) w2 &= 255
>    5: (bc) w2 = w2
>
> Here we can see the shift that I'm referring to. I believe we should
> translate *(u8 *)(r1 + 3) in this case without this shift on big-endian
> machines.

Thanks for the detailed illustration.
Now, with your detailed output of byte codes and xlated program, it
indeed becomes apparent
that verifier should not do shift at insn 2.

I was correct that after insn 0, register r2 should hold the same
value for big and little endian.
But I missed the fact in the first review that insn->off for original
first insn is different.
r2 = *(u8 *)(r1 + 3), the first insn on big endian, and r2 = *(u8 *)r1
for little endian.
They should really have the same shift amount.

Therefore, indeed, shifting amount is actually different between big
and little endians.
So your code is correct. Could you add a macro in linux/filter.h? Most
narrow load related
macros are there? This way, we maintain verifier.c __BYTE_ORDER__ macro free.

Also, could you put your above analysis in the commit message? This will help
reasoning the change easily later on.

Thanks!

>
> Best regards,
> Ilya

^ permalink raw reply

* Re: [Patch net v3 0/2] ipv4: relax source validation check for loopback packets
From: David Miller @ 2019-07-17 22:23 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, dsahern
In-Reply-To: <20190717214159.25959-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 17 Jul 2019 14:41:57 -0700

> This patchset fixes a corner case when loopback packets get dropped
> by rp_filter when we route them from veth to lo. Patch 1 is the fix
> and patch 2 provides a simplified test case for this scenario.

Series applied, thanks Cong.

^ permalink raw reply

* Re: [PATCH net 0/2] mlxsw: Two fixes
From: David Miller @ 2019-07-17 22:20 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jiri, petrm, mlxsw, idosch
In-Reply-To: <20190717202908.1547-1-idosch@idosch.org>

From: Ido Schimmel <idosch@idosch.org>
Date: Wed, 17 Jul 2019 23:29:06 +0300

> From: Ido Schimmel <idosch@mellanox.com>
> 
> This patchset contains two fixes for mlxsw.
> 
> Patch #1 from Petr fixes an issue in which DSCP rewrite can occur even
> if the egress port was switched to Trust L2 mode where priority mapping
> is based on PCP.
> 
> Patch #2 fixes a problem where packets can be learned on a non-existing
> FID if a tc filter with a redirect action is configured on a bridged
> port. The problem and fix are explained in detail in the commit message.

Series applied.

> Please consider both patches for 5.2.y

I'll queue them up for -stable, thanks.

^ permalink raw reply

* Re: [PATCHv2] net: ag71xx: Add missing header
From: David Miller @ 2019-07-17 22:17 UTC (permalink / raw)
  To: rosenp; +Cc: netdev
In-Reply-To: <20190717194645.24239-1-rosenp@gmail.com>

From: Rosen Penev <rosenp@gmail.com>
Date: Wed, 17 Jul 2019 12:46:45 -0700

> ag71xx uses devm_ioremap_nocache. This fixes usage of an implicit function
> 
> Fixes: d51b6ce441d356369387d20bc1de5f2edb0ab71e
> 
> Signed-off-by: Rosen Penev <rosenp@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net,v3 1/4] net: openvswitch: rename flow_stats to sw_flow_stats
From: David Miller @ 2019-07-17 22:16 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, jiri, jakub.kicinski
In-Reply-To: <20190717194248.2522-1-pablo@netfilter.org>


What is this series doing?

Where is your "0/4" cover letter which would tell us this?

Also:

> OVS compilation breaks here after this patchset since flow_stats
> structure is already defined in include/net/flow_offload.h. This patch
> is new in this batch.

You need to explain in more detail and in more context what this
means.  Does the build break when this patch is applies?  If so, why
is that OK?

I'm tossing this series until you submit it properly.

^ permalink raw reply

* [PATCH net] ipv6: rt6_check should return NULL if 'from' is NULL
From: David Ahern @ 2019-07-17 22:08 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, David Ahern

From: David Ahern <dsahern@gmail.com>

Paul reported that l2tp sessions were broken after the commit referenced
in the Fixes tag. Prior to this commit rt6_check returned NULL if the
rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB
entry. Restore that behavior.

Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Paul Donohue <linux-kernel@PaulSD.com>
Tested-by: Paul Donohue <linux-kernel@PaulSD.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4d2e6b31a8d6..6fe3097b9ab7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2563,7 +2563,7 @@ static struct dst_entry *rt6_check(struct rt6_info *rt,
 {
 	u32 rt_cookie = 0;
 
-	if ((from && !fib6_get_cookie_safe(from, &rt_cookie)) ||
+	if (!from || !fib6_get_cookie_safe(from, &rt_cookie) ||
 	    rt_cookie != cookie)
 		return NULL;
 
-- 
2.11.0


^ permalink raw reply related

* [PATCH] net: bcmgenet: use promisc for unsupported filters
From: justinpopo6 @ 2019-07-17 21:58 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, bcm-kernel-feedback-list, davem, f.fainelli,
	opendmb, Justin Chen

From: Justin Chen <justinpopo6@gmail.com>

Currently we silently ignore filters if we cannot meet the filter
requirements. This will lead to the MAC dropping packets that are
expected to pass. A better solution would be to set the NIC to promisc
mode when the required filters cannot be met.

Also correct the number of MDF filters supported. It should be 17,
not 16.

Signed-off-by: Justin Chen <justinpopo6@gmail.com>
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 57 ++++++++++++--------------
 1 file changed, 26 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 34466b8..a2b5780 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -3083,39 +3083,42 @@ static void bcmgenet_timeout(struct net_device *dev)
 	netif_tx_wake_all_queues(dev);
 }
 
-#define MAX_MC_COUNT	16
+#define MAX_MDF_FILTER	17
 
 static inline void bcmgenet_set_mdf_addr(struct bcmgenet_priv *priv,
 					 unsigned char *addr,
-					 int *i,
-					 int *mc)
+					 int *i)
 {
-	u32 reg;
-
 	bcmgenet_umac_writel(priv, addr[0] << 8 | addr[1],
 			     UMAC_MDF_ADDR + (*i * 4));
 	bcmgenet_umac_writel(priv, addr[2] << 24 | addr[3] << 16 |
 			     addr[4] << 8 | addr[5],
 			     UMAC_MDF_ADDR + ((*i + 1) * 4));
-	reg = bcmgenet_umac_readl(priv, UMAC_MDF_CTRL);
-	reg |= (1 << (MAX_MC_COUNT - *mc));
-	bcmgenet_umac_writel(priv, reg, UMAC_MDF_CTRL);
 	*i += 2;
-	(*mc)++;
 }
 
 static void bcmgenet_set_rx_mode(struct net_device *dev)
 {
 	struct bcmgenet_priv *priv = netdev_priv(dev);
 	struct netdev_hw_addr *ha;
-	int i, mc;
+	int i, nfilter;
 	u32 reg;
 
 	netif_dbg(priv, hw, dev, "%s: %08X\n", __func__, dev->flags);
 
-	/* Promiscuous mode */
+	/* Number of filters needed */
+	nfilter = netdev_uc_count(dev) + netdev_mc_count(dev) + 2;
+
+	/*
+	 * Turn on promicuous mode for three scenarios
+	 * 1. IFF_PROMISC flag is set
+	 * 2. IFF_ALLMULTI flag is set
+	 * 3. The number of filters needed exceeds the number filters
+	 *    supported by the hardware.
+	*/
 	reg = bcmgenet_umac_readl(priv, UMAC_CMD);
-	if (dev->flags & IFF_PROMISC) {
+	if ((dev->flags & (IFF_PROMISC | IFF_ALLMULTI)) ||
+	    (nfilter > MAX_MDF_FILTER)) {
 		reg |= CMD_PROMISC;
 		bcmgenet_umac_writel(priv, reg, UMAC_CMD);
 		bcmgenet_umac_writel(priv, 0, UMAC_MDF_CTRL);
@@ -3125,32 +3128,24 @@ static void bcmgenet_set_rx_mode(struct net_device *dev)
 		bcmgenet_umac_writel(priv, reg, UMAC_CMD);
 	}
 
-	/* UniMac doesn't support ALLMULTI */
-	if (dev->flags & IFF_ALLMULTI) {
-		netdev_warn(dev, "ALLMULTI is not supported\n");
-		return;
-	}
-
 	/* update MDF filter */
 	i = 0;
-	mc = 0;
 	/* Broadcast */
-	bcmgenet_set_mdf_addr(priv, dev->broadcast, &i, &mc);
+	bcmgenet_set_mdf_addr(priv, dev->broadcast, &i);
 	/* my own address.*/
-	bcmgenet_set_mdf_addr(priv, dev->dev_addr, &i, &mc);
-	/* Unicast list*/
-	if (netdev_uc_count(dev) > (MAX_MC_COUNT - mc))
-		return;
+	bcmgenet_set_mdf_addr(priv, dev->dev_addr, &i);
 
-	if (!netdev_uc_empty(dev))
-		netdev_for_each_uc_addr(ha, dev)
-			bcmgenet_set_mdf_addr(priv, ha->addr, &i, &mc);
-	/* Multicast */
-	if (netdev_mc_empty(dev) || netdev_mc_count(dev) >= (MAX_MC_COUNT - mc))
-		return;
+	/* Unicast */
+	netdev_for_each_uc_addr(ha, dev)
+		bcmgenet_set_mdf_addr(priv, ha->addr, &i);
 
+	/* Multicast */
 	netdev_for_each_mc_addr(ha, dev)
-		bcmgenet_set_mdf_addr(priv, ha->addr, &i, &mc);
+		bcmgenet_set_mdf_addr(priv, ha->addr, &i);
+
+	/* Enable filters */
+	reg = GENMASK(MAX_MDF_FILTER - 1, MAX_MDF_FILTER - nfilter);
+	bcmgenet_umac_writel(priv, reg, UMAC_MDF_CTRL);
 }
 
 /* Set the hardware MAC address. */
-- 
2.7.4


^ permalink raw reply related

* Re: [Patch net v3 1/2] fib: relax source validation check for loopback packets
From: David Ahern @ 2019-07-17 21:59 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Julian Anastasov
In-Reply-To: <20190717214159.25959-2-xiyou.wangcong@gmail.com>

On 7/17/19 3:41 PM, Cong Wang wrote:
> In a rare case where we redirect local packets from veth to lo,
> these packets fail to pass the source validation when rp_filter
> is turned on, as the tracing shows:
> 
>   <...>-311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0
>   <...>-311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130
> 
> So, the fib table lookup returns eth0 as the nexthop even though
> the packets are local and should be routed to loopback nonetheless,
> but they can't pass the dev match check in fib_info_nh_uses_dev()
> without this patch.
> 
> It should be safe to relax this check for this special case, as
> normally packets coming out of loopback device still have skb_dst
> so they won't even hit this slow path.
> 
> Cc: Julian Anastasov <ja@ssi.bg>
> Cc: David Ahern <dsahern@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  net/ipv4/fib_frontend.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 

Seems ok to me.
Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [Patch net v3 2/2] selftests: add a test case for rp_filter
From: David Ahern @ 2019-07-17 21:54 UTC (permalink / raw)
  To: Cong Wang, netdev
In-Reply-To: <20190717214159.25959-3-xiyou.wangcong@gmail.com>

On 7/17/19 3:41 PM, Cong Wang wrote:
> Add a test case to simulate the loopback packet case fixed
> in the previous patch.
> 
> This test gets passed after the fix:
> 
> IPv4 rp_filter tests
>     TEST: rp_filter passes local packets                                [ OK ]
>     TEST: rp_filter passes loopback packets                             [ OK ]
> 
> Cc: David Ahern <dsahern@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  tools/testing/selftests/net/fib_tests.sh | 35 +++++++++++++++++++++++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 

Thanks for adding the test

Reviewed-by: David Ahern <dsahern@gmail.com>


^ permalink raw reply

* Re: regression with napi/softirq ?
From: Thomas Gleixner @ 2019-07-17 21:52 UTC (permalink / raw)
  To: Sudip Mukherjee
  Cc: Peter Zijlstra (Intel), David S. Miller, netdev, linux-kernel
In-Reply-To: <CADVatmO_m-NYotb9Htd7gS0d2-o0DeEWeDJ1uYKE+oj_HjoN0Q@mail.gmail.com>

Sudip,

On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
> On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > You can hack ksoftirq_running() to return always false to avoid this, but
> > that might cause application starvation and a huge packet buffer backlog
> > when the amount of incoming packets makes the CPU do nothing else than
> > softirq processing.
> 
> I tried that now, it is better but still not as good as v3.8
> Now I am getting 375.9usec as the maximum time between raising the softirq
> and it starting to execute and packet drops still there.
>
> And just a thought, do you think there should be a CONFIG_ option for
> this feature of ksoftirqd_running() so that it can be disabled if needed
> by users like us?

If at all then a sysctl to allow runtime control.
 
> Can you please think of anything else that might have changed which I still need
> to change to make the time comparable to v3.8..

Something with in that small range of:

 63592 files changed, 13783320 insertions(+), 5155492 deletions(-)

:)

Seriously, that can be anything.

Can you please test with Linus' head of tree and add some more
instrumentation, so we can see what holds off softirqs from being
processed. If the ksoftirqd enforcement is disabled, then the only reason
can be a long lasting softirq disabled region. Tracing should tell.

Thanks,

	tglx

^ permalink raw reply

* Re: phylink: flow control on fixed-link not working.
From: Russell King - ARM Linux admin @ 2019-07-17 21:51 UTC (permalink / raw)
  To: René van Dorst; +Cc: netdev
In-Reply-To: <20190717213111.Horde.nir2D5kAJww569fjh8BZgZm@www.vdorst.com>

On Wed, Jul 17, 2019 at 09:31:11PM +0000, René van Dorst wrote:
> Hi,
> 
> I am trying to enable flow control/pause on PHYLINK and fixed-link.
> 
> My setup SOC mac (mt7621) <-> RGMII <-> SWITCH mac (mt7530).
> 
> It seems that in fixed-link mode all the flow control/pause bits are cleared
> in
> phylink_parse_fixedlink(). If I read phylink_parse_fixedlink() [0] correctly,
> I see that pl->link_config.advertising is AND with pl->supprted which has only
> the PHY_SETTING() modes bits set. pl->link_config.advertising is losing Pause
> bits. pl->link_config.advertising is used in phylink_resolve_flow() to set the
> MLO_PAUSE_RX/TX BITS.
> 
> I think this is an error.
> Because in phylink_start() see this part [1].
> 
>  /* Apply the link configuration to the MAC when starting. This allows
>   * a fixed-link to start with the correct parameters, and also
>   * ensures that we set the appropriate advertisement for Serdes links.
>   */
>  phylink_resolve_flow(pl, &pl->link_config);
>  phylink_mac_config(pl, &pl->link_config);
> 
> 
> If I add a this hacky patch below, flow control is enabled on the fixed-link.
>         if (s) {
>                 __set_bit(s->bit, pl->supported);
> +               if (phylink_test(pl->link_config.advertising, Pause))
> +                       phylink_set(pl->supported, Pause);
>         } else {
> 
> So is phylink_parse_fixedlink() broken or should it handled in a other way?

Quite simply, if the MAC says it doesn't support pause modes (i.o.w.
the validate callback clears them) then pause modes aren't supported.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* [net  1/1] tipc: initialize 'validated' field of received packets
From: Jon Maloy @ 2019-07-17 21:43 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

The tipc_msg_validate() function leaves a boolean flag 'validated' in
the validated buffer's control block, to avoid performing this action
more than once. However, at reception of new packets, the position of
this field may already have been set by lower layer protocols, so
that the packet is erroneously perceived as already validated by TIPC.

We fix this by initializing the said field to 'false' before performing
the initial validation.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/node.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tipc/node.c b/net/tipc/node.c
index 324a1f9..3a5be1d 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1807,6 +1807,7 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b)
 	__skb_queue_head_init(&xmitq);
 
 	/* Ensure message is well-formed before touching the header */
+	TIPC_SKB_CB(skb)->validated = false;
 	if (unlikely(!tipc_msg_validate(&skb)))
 		goto discard;
 	hdr = buf_msg(skb);
-- 
2.1.4


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox