Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] be2net: make two arrays static const, makes object smaller
From: David Miller @ 2019-09-07 16:02 UTC (permalink / raw)
  To: colin.king
  Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, somnath.kotur,
	netdev, kernel-janitors, linux-kernel
In-Reply-To: <20190906111943.5285-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Fri,  6 Sep 2019 12:19:43 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> Don't populate the arrays on the stack but instead make them
> static const. Makes the object code smaller by 281 bytes.
> 
> Before:
>    text	   data	    bss	    dec	    hex	filename
>   87553	   5672	      0	  93225	  16c29	benet/be_cmds.o
> 
> After:
>    text	   data	    bss	    dec	    hex	filename
>   87112	   5832	      0	  92944	  16b10	benet/be_cmds.o
> 
> (gcc version 9.2.1, amd64)
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: hns3: make array spec_opcode static const, makes object smaller
From: David Miller @ 2019-09-07 16:03 UTC (permalink / raw)
  To: colin.king
  Cc: yisen.zhuang, salil.mehta, lipeng321, netdev, kernel-janitors,
	linux-kernel
In-Reply-To: <20190906112804.7812-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Fri,  6 Sep 2019 12:28:04 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> Don't populate the array spec_opcode on the stack but instead make it
> static const. Makes the object code smaller by 48 bytes.
> 
> Before:
>    text	   data	    bss	    dec	    hex	filename
>    6914	   1040	    128	   8082	   1f92	hns3/hns3vf/hclgevf_cmd.o
> 
> After:
>    text	   data	    bss	    dec	    hex	filename
>    6866	   1040	    128	   8034	   1f62	hns3/hns3vf/hclgevf_cmd.o
> 
> (gcc version 9.2.1, amd64)
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] nfp: flower: cmsg rtnl locks can timeout reify messages
From: David Miller @ 2019-09-07 16:06 UTC (permalink / raw)
  To: simon.horman; +Cc: jakub.kicinski, netdev, oss-drivers, frederik.lotter
In-Reply-To: <20190906172941.25136-1-simon.horman@netronome.com>

From: Simon Horman <simon.horman@netronome.com>
Date: Fri,  6 Sep 2019 19:29:41 +0200

> From: Fred Lotter <frederik.lotter@netronome.com>
> 
> Flower control message replies are handled in different locations. The truly
> high priority replies are handled in the BH (tasklet) context, while the
> remaining replies are handled in a predefined Linux work queue. The work
> queue handler orders replies into high and low priority groups, and always
> start servicing the high priority replies within the received batch first.
> 
> Reply Type:			Rtnl Lock:	Handler:
 ...
> A subset of control messages can block waiting for an rtnl lock (from both
> work queue priority groups). The rtnl lock is heavily contended for by
> external processes such as systemd-udevd, systemd-network and libvirtd,
> especially during netdev creation, such as when flower VFs and representors
> are instantiated.
> 
> Kernel netlink instrumentation shows that external processes (such as
> systemd-udevd) often use successive rtnl_trylock() sequences, which can result
> in an rtnl_lock() blocked control message to starve for longer periods of time
> during rtnl lock contention, i.e. netdev creation.
> 
> In the current design a single blocked control message will block the entire
> work queue (both priorities), and introduce a latency which is
> nondeterministic and dependent on system wide rtnl lock usage.
> 
> In some extreme cases, one blocked control message at exactly the wrong time,
> just before the maximum number of VFs are instantiated, can block the work
> queue for long enough to prevent VF representor REIFY replies from getting
> handled in time for the 40ms timeout.
> 
> The firmware will deliver the total maximum number of REIFY message replies in
> around 300us.
> 
> Only REIFY and MTU update messages require replies within a timeout period (of
> 40ms). The MTU-only updates are already done directly in the BH (tasklet)
> handler.
> 
> Move the REIFY handler down into the BH (tasklet) in order to resolve timeouts
> caused by a blocked work queue waiting on rtnl locks.
> 
> Signed-off-by: Fred Lotter <frederik.lotter@netronome.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>

Applied.

^ permalink raw reply

* Re: pull request: bluetooth-next 2019-09-06
From: David Miller @ 2019-09-07 16:08 UTC (permalink / raw)
  To: johan.hedberg; +Cc: netdev, linux-bluetooth
In-Reply-To: <20190906172339.GA74057@jmoran1-mobl1.ger.corp.intel.com>

From: Johan Hedberg <johan.hedberg@gmail.com>
Date: Fri, 6 Sep 2019 20:23:39 +0300

> Here's the main bluetooth-next pull request for the 5.4 kernel.
> 
>  - Cleanups & fixes to btrtl driver
>  - Fixes for Realtek devices in btusb, e.g. for suspend handling
>  - Firmware loading support for BCM4345C5
>  - hidp_send_message() return value handling fixes
>  - Added support for utilizing Fast Advertising Interval
>  - Various other minor cleanups & fixes
> 
> Please let me know if there are any issues pulling. Thanks.

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/4] net/tls: small TX offload optimizations
From: David Miller @ 2019-09-07 16:11 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: netdev, oss-drivers, davejwatson, borisp, aviadye, john.fastabend,
	daniel
In-Reply-To: <20190907053000.23869-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Fri,  6 Sep 2019 22:29:56 -0700

> Hi!
> 
> This set brings small TLS TX device optimizations. The biggest
> gain comes from fixing a misuse of non temporal copy instructions.
> On a synthetic workload modelled after customer's RFC application
> I see 3-5% percent gain.

Series applied.

But if history is any indication I'd watch for how much this actually
helps or hurts universally.  We once tried to use non-temporal stores
for sendmsg/recvmsg copies and had to turn that off because it only
helped in certain situations on certain cpus and hurt in others.

^ permalink raw reply

* Re: [PATCH net-next] netfilter: nf_tables: avoid excessive stack usage
From: Pablo Neira Ayuso @ 2019-09-07 18:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, wenxu, netfilter-devel, coreteam, netdev,
	linux-kernel
In-Reply-To: <20190906151242.1115282-1-arnd@arndb.de>

[-- Attachment #1: Type: text/plain, Size: 998 bytes --]

Hi Arnd,

On Fri, Sep 06, 2019 at 05:12:30PM +0200, Arnd Bergmann wrote:
> The nft_offload_ctx structure is much too large to put on the
> stack:
> 
> net/netfilter/nf_tables_offload.c:31:23: error: stack frame size of 1200 bytes in function 'nft_flow_rule_create' [-Werror,-Wframe-larger-than=]
> 
> Use dynamic allocation here, as we do elsewhere in the same
> function.
>
> Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> Since we only really care about two members of the structure, an
> alternative would be a larger rewrite, but that is probably too
> late for v5.4.

Thanks for this patch.

I'm attaching a patch to reduce this structure size a bit. Do you
think this alternative patch is ok until this alternative rewrite
happens? Anyway I agree we should to get this structure away from the
stack, even after this is still large, so your patch (or a variant of
it) will be useful sooner than later I think.

[-- Attachment #2: x.patch --]
[-- Type: text/x-diff, Size: 485 bytes --]

diff --git a/include/net/netfilter/nf_tables_offload.h b/include/net/netfilter/nf_tables_offload.h
index db104665a9e4..cc44d29e9fd7 100644
--- a/include/net/netfilter/nf_tables_offload.h
+++ b/include/net/netfilter/nf_tables_offload.h
@@ -5,10 +5,10 @@
 #include <net/netfilter/nf_tables.h>
 
 struct nft_offload_reg {
-	u32		key;
-	u32		len;
-	u32		base_offset;
-	u32		offset;
+	u8		key;
+	u8		len;
+	u8		base_offset;
+	u8		offset;
 	struct nft_data data;
 	struct nft_data	mask;
 };

^ permalink raw reply related

* Re: [PATCH net-next] netfilter: nf_tables: avoid excessive stack usage
From: Arnd Bergmann @ 2019-09-07 18:41 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, wenxu, netfilter-devel, coreteam, Networking,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190907180754.dz7gstqfj7djlbrs@salvia>

On Sat, Sep 7, 2019 at 8:07 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> Hi Arnd,
>
> On Fri, Sep 06, 2019 at 05:12:30PM +0200, Arnd Bergmann wrote:
> > The nft_offload_ctx structure is much too large to put on the
> > stack:
> >
> > net/netfilter/nf_tables_offload.c:31:23: error: stack frame size of 1200 bytes in function 'nft_flow_rule_create' [-Werror,-Wframe-larger-than=]
> >
> > Use dynamic allocation here, as we do elsewhere in the same
> > function.
> >
> > Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > ---
> > Since we only really care about two members of the structure, an
> > alternative would be a larger rewrite, but that is probably too
> > late for v5.4.
>
> Thanks for this patch.
>
> I'm attaching a patch to reduce this structure size a bit. Do you
> think this alternative patch is ok until this alternative rewrite
> happens?

I haven't tried it yet, but it looks like that would save 8 of the
48 bytes in each for each of the 24 registers (12 bytes on m68k
or i386, which only use 4 byte alignment for nft_data), so
this wouldn't make too much difference.

> Anyway I agree we should to get this structure away from the
> stack, even after this is still large, so your patch (or a variant of
> it) will be useful sooner than later I think.

What I was thinking for a possible smaller fix would be to not
pass the ctx into the expr->ops->offload callback but
only pass the 'dep' member. Since I've never seen this code
before, I have no idea if that would be an improvement
in the end.

       Arnd

^ permalink raw reply

* Re: [PATCH net-next] netfilter: nf_tables: avoid excessive stack usage
From: Pablo Neira Ayuso @ 2019-09-07 18:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	Jakub Kicinski, wenxu, netfilter-devel, coreteam, Networking,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAK8P3a04ic_VP6L_=N5P7vfQG1VDV25g3KvUpuCVdX483hx_cA@mail.gmail.com>

On Sat, Sep 07, 2019 at 08:41:22PM +0200, Arnd Bergmann wrote:
> On Sat, Sep 7, 2019 at 8:07 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> >
> > Hi Arnd,
> >
> > On Fri, Sep 06, 2019 at 05:12:30PM +0200, Arnd Bergmann wrote:
> > > The nft_offload_ctx structure is much too large to put on the
> > > stack:
> > >
> > > net/netfilter/nf_tables_offload.c:31:23: error: stack frame size of 1200 bytes in function 'nft_flow_rule_create' [-Werror,-Wframe-larger-than=]
> > >
> > > Use dynamic allocation here, as we do elsewhere in the same
> > > function.
> > >
> > > Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
> > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > > ---
> > > Since we only really care about two members of the structure, an
> > > alternative would be a larger rewrite, but that is probably too
> > > late for v5.4.
> >
> > Thanks for this patch.
> >
> > I'm attaching a patch to reduce this structure size a bit. Do you
> > think this alternative patch is ok until this alternative rewrite
> > happens?
> 
> I haven't tried it yet, but it looks like that would save 8 of the
> 48 bytes in each for each of the 24 registers (12 bytes on m68k
> or i386, which only use 4 byte alignment for nft_data), so
> this wouldn't make too much difference.

I'll take your patch as is.

> > Anyway I agree we should to get this structure away from the
> > stack, even after this is still large, so your patch (or a variant of
> > it) will be useful sooner than later I think.
> 
> What I was thinking for a possible smaller fix would be to not
> pass the ctx into the expr->ops->offload callback but
> only pass the 'dep' member. Since I've never seen this code
> before, I have no idea if that would be an improvement
> in the end.

We might need this more fields of this context structure, this code is
very new, still under development, let's revisit this later.

Thanks.

^ permalink raw reply

* [PATCH net-next 0/3] net: dsa: mv88e6xxx: add PCL support
From: Vivien Didelot @ 2019-09-07 20:00 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, Vivien Didelot

This small series implements the ethtool RXNFC operations in the
mv88e6xxx DSA driver to configure a port's Layer 2 Policy Control List
(PCL) supported by models such as 88E6352 and 88E6390 and equivalent.

This allows to configure a port to discard frames based on a configured
destination or source MAC address and an optional VLAN, with e.g.:

    # ethtool --config-nfc lan1 flow-type ether src 00:11:22:33:44:55 action -1

Vivien Didelot (3):
  net: dsa: mv88e6xxx: complete ATU state definitions
  net: dsa: mv88e6xxx: introduce .port_set_policy
  net: dsa: mv88e6xxx: add RXNFC support

 drivers/net/dsa/mv88e6xxx/chip.c        | 241 ++++++++++++++++++++++--
 drivers/net/dsa/mv88e6xxx/chip.h        |  35 ++++
 drivers/net/dsa/mv88e6xxx/global1.h     |  43 +++--
 drivers/net/dsa/mv88e6xxx/global1_atu.c |   6 +-
 drivers/net/dsa/mv88e6xxx/port.c        |  74 ++++++++
 drivers/net/dsa/mv88e6xxx/port.h        |  17 +-
 6 files changed, 388 insertions(+), 28 deletions(-)

-- 
2.23.0


^ permalink raw reply

* [PATCH net-next 1/3] net: dsa: mv88e6xxx: complete ATU state definitions
From: Vivien Didelot @ 2019-09-07 20:00 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, Vivien Didelot
In-Reply-To: <20190907200049.25273-1-vivien.didelot@gmail.com>

Marvell has different values for the state of a MAC address,
depending on its multicast bit. This patch completes the definitions
for these states.

At the same time, use 0 which is intuitive enough and simplifies the
code a bit, instead of the UC or MC unused value.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c        | 19 +++++------
 drivers/net/dsa/mv88e6xxx/global1.h     | 43 +++++++++++++++++--------
 drivers/net/dsa/mv88e6xxx/global1_atu.c |  6 ++--
 3 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 30365a54c31b..0d54a69f3622 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1497,7 +1497,7 @@ static int mv88e6xxx_port_db_load_purge(struct mv88e6xxx_chip *chip, int port,
 		fid = vlan.fid;
 	}
 
-	entry.state = MV88E6XXX_G1_ATU_DATA_STATE_UNUSED;
+	entry.state = 0;
 	ether_addr_copy(entry.mac, addr);
 	eth_addr_dec(entry.mac);
 
@@ -1506,17 +1506,16 @@ static int mv88e6xxx_port_db_load_purge(struct mv88e6xxx_chip *chip, int port,
 		return err;
 
 	/* Initialize a fresh ATU entry if it isn't found */
-	if (entry.state == MV88E6XXX_G1_ATU_DATA_STATE_UNUSED ||
-	    !ether_addr_equal(entry.mac, addr)) {
+	if (!entry.state || !ether_addr_equal(entry.mac, addr)) {
 		memset(&entry, 0, sizeof(entry));
 		ether_addr_copy(entry.mac, addr);
 	}
 
 	/* Purge the ATU entry only if no port is using it anymore */
-	if (state == MV88E6XXX_G1_ATU_DATA_STATE_UNUSED) {
+	if (!state) {
 		entry.portvec &= ~BIT(port);
 		if (!entry.portvec)
-			entry.state = MV88E6XXX_G1_ATU_DATA_STATE_UNUSED;
+			entry.state = 0;
 	} else {
 		entry.portvec |= BIT(port);
 		entry.state = state;
@@ -1732,8 +1731,7 @@ static int mv88e6xxx_port_fdb_del(struct dsa_switch *ds, int port,
 	int err;
 
 	mv88e6xxx_reg_lock(chip);
-	err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid,
-					   MV88E6XXX_G1_ATU_DATA_STATE_UNUSED);
+	err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid, 0);
 	mv88e6xxx_reg_unlock(chip);
 
 	return err;
@@ -1747,7 +1745,7 @@ static int mv88e6xxx_port_db_dump_fid(struct mv88e6xxx_chip *chip,
 	bool is_static;
 	int err;
 
-	addr.state = MV88E6XXX_G1_ATU_DATA_STATE_UNUSED;
+	addr.state = 0;
 	eth_broadcast_addr(addr.mac);
 
 	do {
@@ -1755,7 +1753,7 @@ static int mv88e6xxx_port_db_dump_fid(struct mv88e6xxx_chip *chip,
 		if (err)
 			return err;
 
-		if (addr.state == MV88E6XXX_G1_ATU_DATA_STATE_UNUSED)
+		if (!addr.state)
 			break;
 
 		if (addr.trunk || (addr.portvec & BIT(port)) == 0)
@@ -4690,8 +4688,7 @@ static int mv88e6xxx_port_mdb_del(struct dsa_switch *ds, int port,
 	int err;
 
 	mv88e6xxx_reg_lock(chip);
-	err = mv88e6xxx_port_db_load_purge(chip, port, mdb->addr, mdb->vid,
-					   MV88E6XXX_G1_ATU_DATA_STATE_UNUSED);
+	err = mv88e6xxx_port_db_load_purge(chip, port, mdb->addr, mdb->vid, 0);
 	mv88e6xxx_reg_unlock(chip);
 
 	return err;
diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h
index 78b9ae22d18c..0870fcc8bfc8 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.h
+++ b/drivers/net/dsa/mv88e6xxx/global1.h
@@ -128,19 +128,36 @@
 #define MV88E6XXX_G1_ATU_OP_FULL_VIOLATION		BIT(4)
 
 /* Offset 0x0C: ATU Data Register */
-#define MV88E6XXX_G1_ATU_DATA				0x0c
-#define MV88E6XXX_G1_ATU_DATA_TRUNK			0x8000
-#define MV88E6XXX_G1_ATU_DATA_TRUNK_ID_MASK		0x00f0
-#define MV88E6XXX_G1_ATU_DATA_PORT_VECTOR_MASK		0x3ff0
-#define MV88E6XXX_G1_ATU_DATA_STATE_MASK		0x000f
-#define MV88E6XXX_G1_ATU_DATA_STATE_UNUSED		0x0000
-#define MV88E6XXX_G1_ATU_DATA_STATE_UC_MGMT		0x000d
-#define MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC		0x000e
-#define MV88E6XXX_G1_ATU_DATA_STATE_UC_PRIO_OVER	0x000f
-#define MV88E6XXX_G1_ATU_DATA_STATE_MC_NONE_RATE	0x0005
-#define MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC		0x0007
-#define MV88E6XXX_G1_ATU_DATA_STATE_MC_MGMT		0x000e
-#define MV88E6XXX_G1_ATU_DATA_STATE_MC_PRIO_OVER	0x000f
+#define MV88E6XXX_G1_ATU_DATA					0x0c
+#define MV88E6XXX_G1_ATU_DATA_TRUNK				0x8000
+#define MV88E6XXX_G1_ATU_DATA_TRUNK_ID_MASK			0x00f0
+#define MV88E6XXX_G1_ATU_DATA_PORT_VECTOR_MASK			0x3ff0
+#define MV88E6XXX_G1_ATU_DATA_STATE_MASK			0x000f
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_UNUSED			0x0000
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_AGE_1_OLDEST		0x0001
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_AGE_2			0x0002
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_AGE_3			0x0003
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_AGE_4			0x0004
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_AGE_5			0x0005
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_AGE_6			0x0006
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_AGE_7_NEWEST		0x0007
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC_POLICY		0x0008
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC_POLICY_PO		0x0009
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC_AVB_NRL		0x000a
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC_AVB_NRL_PO	0x000b
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC_DA_MGMT		0x000c
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC_DA_MGMT_PO	0x000d
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC			0x000e
+#define MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC_PO		0x000f
+#define MV88E6XXX_G1_ATU_DATA_STATE_MC_UNUSED			0x0000
+#define MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_POLICY		0x0004
+#define MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_AVB_NRL		0x0005
+#define MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_DA_MGMT		0x0006
+#define MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC			0x0007
+#define MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_POLICY_PO		0x000c
+#define MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_AVB_NRL_PO	0x000d
+#define MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_DA_MGMT_PO	0x000e
+#define MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_PO		0x000f
 
 /* Offset 0x0D: ATU MAC Address Register Bytes 0 & 1
  * Offset 0x0E: ATU MAC Address Register Bytes 2 & 3
diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c b/drivers/net/dsa/mv88e6xxx/global1_atu.c
index 18b86515b6bc..792a96ef418f 100644
--- a/drivers/net/dsa/mv88e6xxx/global1_atu.c
+++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c
@@ -135,7 +135,7 @@ static int mv88e6xxx_g1_atu_data_read(struct mv88e6xxx_chip *chip,
 		return err;
 
 	entry->state = val & 0xf;
-	if (entry->state != MV88E6XXX_G1_ATU_DATA_STATE_UNUSED) {
+	if (entry->state) {
 		entry->trunk = !!(val & MV88E6XXX_G1_ATU_DATA_TRUNK);
 		entry->portvec = (val >> 4) & mv88e6xxx_port_mask(chip);
 	}
@@ -148,7 +148,7 @@ static int mv88e6xxx_g1_atu_data_write(struct mv88e6xxx_chip *chip,
 {
 	u16 data = entry->state & 0xf;
 
-	if (entry->state != MV88E6XXX_G1_ATU_DATA_STATE_UNUSED) {
+	if (entry->state) {
 		if (entry->trunk)
 			data |= MV88E6XXX_G1_ATU_DATA_TRUNK;
 
@@ -209,7 +209,7 @@ int mv88e6xxx_g1_atu_getnext(struct mv88e6xxx_chip *chip, u16 fid,
 		return err;
 
 	/* Write the MAC address to iterate from only once */
-	if (entry->state == MV88E6XXX_G1_ATU_DATA_STATE_UNUSED) {
+	if (!entry->state) {
 		err = mv88e6xxx_g1_atu_mac_write(chip, entry);
 		if (err)
 			return err;
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next 2/3] net: dsa: mv88e6xxx: introduce .port_set_policy
From: Vivien Didelot @ 2019-09-07 20:00 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, Vivien Didelot
In-Reply-To: <20190907200049.25273-1-vivien.didelot@gmail.com>

Introduce a new .port_set_policy operation to configure a port's
Policy Control List, based on mapping such as DA, SA, Etype and so on.

Models similar to 88E6352 and 88E6390 are supported at the moment.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c |  9 ++++
 drivers/net/dsa/mv88e6xxx/chip.h | 22 ++++++++++
 drivers/net/dsa/mv88e6xxx/port.c | 74 ++++++++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/port.h | 17 +++++++-
 4 files changed, 121 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 0d54a69f3622..6f4d5303a1f3 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3132,6 +3132,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed = mv88e6352_port_set_speed,
 	.port_tag_remap = mv88e6095_port_tag_remap,
+	.port_set_policy = mv88e6352_port_set_policy,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
 	.port_set_ether_type = mv88e6351_port_set_ether_type,
@@ -3218,6 +3219,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed = mv88e6352_port_set_speed,
 	.port_tag_remap = mv88e6095_port_tag_remap,
+	.port_set_policy = mv88e6352_port_set_policy,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
 	.port_set_ether_type = mv88e6351_port_set_ether_type,
@@ -3303,6 +3305,7 @@ static const struct mv88e6xxx_ops mv88e6190_ops = {
 	.port_set_speed = mv88e6390_port_set_speed,
 	.port_max_speed_mode = mv88e6390_port_max_speed_mode,
 	.port_tag_remap = mv88e6390_port_tag_remap,
+	.port_set_policy = mv88e6352_port_set_policy,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
 	.port_set_ether_type = mv88e6351_port_set_ether_type,
@@ -3351,6 +3354,7 @@ static const struct mv88e6xxx_ops mv88e6190x_ops = {
 	.port_set_speed = mv88e6390x_port_set_speed,
 	.port_max_speed_mode = mv88e6390x_port_max_speed_mode,
 	.port_tag_remap = mv88e6390_port_tag_remap,
+	.port_set_policy = mv88e6352_port_set_policy,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
 	.port_set_ether_type = mv88e6351_port_set_ether_type,
@@ -3448,6 +3452,7 @@ static const struct mv88e6xxx_ops mv88e6240_ops = {
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed = mv88e6352_port_set_speed,
 	.port_tag_remap = mv88e6095_port_tag_remap,
+	.port_set_policy = mv88e6352_port_set_policy,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
 	.port_set_ether_type = mv88e6351_port_set_ether_type,
@@ -3539,6 +3544,7 @@ static const struct mv88e6xxx_ops mv88e6290_ops = {
 	.port_set_speed = mv88e6390_port_set_speed,
 	.port_max_speed_mode = mv88e6390_port_max_speed_mode,
 	.port_tag_remap = mv88e6390_port_tag_remap,
+	.port_set_policy = mv88e6352_port_set_policy,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
 	.port_set_ether_type = mv88e6351_port_set_ether_type,
@@ -3809,6 +3815,7 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
 	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
 	.port_set_speed = mv88e6352_port_set_speed,
 	.port_tag_remap = mv88e6095_port_tag_remap,
+	.port_set_policy = mv88e6352_port_set_policy,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
 	.port_set_ether_type = mv88e6351_port_set_ether_type,
@@ -3863,6 +3870,7 @@ static const struct mv88e6xxx_ops mv88e6390_ops = {
 	.port_set_speed = mv88e6390_port_set_speed,
 	.port_max_speed_mode = mv88e6390_port_max_speed_mode,
 	.port_tag_remap = mv88e6390_port_tag_remap,
+	.port_set_policy = mv88e6352_port_set_policy,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
 	.port_set_ether_type = mv88e6351_port_set_ether_type,
@@ -3915,6 +3923,7 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = {
 	.port_set_speed = mv88e6390x_port_set_speed,
 	.port_max_speed_mode = mv88e6390x_port_max_speed_mode,
 	.port_tag_remap = mv88e6390_port_tag_remap,
+	.port_set_policy = mv88e6352_port_set_policy,
 	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
 	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
 	.port_set_ether_type = mv88e6351_port_set_ether_type,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 6bc0a4e4fe7b..04a329a98158 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -189,6 +189,24 @@ struct mv88e6xxx_port_hwtstamp {
 	struct hwtstamp_config tstamp_config;
 };
 
+enum mv88e6xxx_policy_mapping {
+	MV88E6XXX_POLICY_MAPPING_DA,
+	MV88E6XXX_POLICY_MAPPING_SA,
+	MV88E6XXX_POLICY_MAPPING_VTU,
+	MV88E6XXX_POLICY_MAPPING_ETYPE,
+	MV88E6XXX_POLICY_MAPPING_PPPOE,
+	MV88E6XXX_POLICY_MAPPING_VBAS,
+	MV88E6XXX_POLICY_MAPPING_OPT82,
+	MV88E6XXX_POLICY_MAPPING_UDP,
+};
+
+enum mv88e6xxx_policy_action {
+	MV88E6XXX_POLICY_ACTION_NORMAL,
+	MV88E6XXX_POLICY_ACTION_MIRROR,
+	MV88E6XXX_POLICY_ACTION_TRAP,
+	MV88E6XXX_POLICY_ACTION_DISCARD,
+};
+
 struct mv88e6xxx_port {
 	struct mv88e6xxx_chip *chip;
 	int port;
@@ -381,6 +399,10 @@ struct mv88e6xxx_ops {
 
 	int (*port_tag_remap)(struct mv88e6xxx_chip *chip, int port);
 
+	int (*port_set_policy)(struct mv88e6xxx_chip *chip, int port,
+			       enum mv88e6xxx_policy_mapping mapping,
+			       enum mv88e6xxx_policy_action action);
+
 	int (*port_set_frame_mode)(struct mv88e6xxx_chip *chip, int port,
 				   enum mv88e6xxx_frame_mode mode);
 	int (*port_set_egress_floods)(struct mv88e6xxx_chip *chip, int port,
diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index 04006344adb2..15ef81654b67 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -1341,3 +1341,77 @@ int mv88e6390_port_tag_remap(struct mv88e6xxx_chip *chip, int port)
 
 	return 0;
 }
+
+/* Offset 0x0E: Policy Control Register */
+
+int mv88e6352_port_set_policy(struct mv88e6xxx_chip *chip, int port,
+			      enum mv88e6xxx_policy_mapping mapping,
+			      enum mv88e6xxx_policy_action action)
+{
+	u16 reg, mask, val;
+	int shift;
+	int err;
+
+	switch (mapping) {
+	case MV88E6XXX_POLICY_MAPPING_DA:
+		shift = __bf_shf(MV88E6XXX_PORT_POLICY_CTL_DA_MASK);
+		mask = MV88E6XXX_PORT_POLICY_CTL_DA_MASK;
+		break;
+	case MV88E6XXX_POLICY_MAPPING_SA:
+		shift = __bf_shf(MV88E6XXX_PORT_POLICY_CTL_SA_MASK);
+		mask = MV88E6XXX_PORT_POLICY_CTL_SA_MASK;
+		break;
+	case MV88E6XXX_POLICY_MAPPING_VTU:
+		shift = __bf_shf(MV88E6XXX_PORT_POLICY_CTL_VTU_MASK);
+		mask = MV88E6XXX_PORT_POLICY_CTL_VTU_MASK;
+		break;
+	case MV88E6XXX_POLICY_MAPPING_ETYPE:
+		shift = __bf_shf(MV88E6XXX_PORT_POLICY_CTL_ETYPE_MASK);
+		mask = MV88E6XXX_PORT_POLICY_CTL_ETYPE_MASK;
+		break;
+	case MV88E6XXX_POLICY_MAPPING_PPPOE:
+		shift = __bf_shf(MV88E6XXX_PORT_POLICY_CTL_PPPOE_MASK);
+		mask = MV88E6XXX_PORT_POLICY_CTL_PPPOE_MASK;
+		break;
+	case MV88E6XXX_POLICY_MAPPING_VBAS:
+		shift = __bf_shf(MV88E6XXX_PORT_POLICY_CTL_VBAS_MASK);
+		mask = MV88E6XXX_PORT_POLICY_CTL_VBAS_MASK;
+		break;
+	case MV88E6XXX_POLICY_MAPPING_OPT82:
+		shift = __bf_shf(MV88E6XXX_PORT_POLICY_CTL_OPT82_MASK);
+		mask = MV88E6XXX_PORT_POLICY_CTL_OPT82_MASK;
+		break;
+	case MV88E6XXX_POLICY_MAPPING_UDP:
+		shift = __bf_shf(MV88E6XXX_PORT_POLICY_CTL_UDP_MASK);
+		mask = MV88E6XXX_PORT_POLICY_CTL_UDP_MASK;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	switch (action) {
+	case MV88E6XXX_POLICY_ACTION_NORMAL:
+		val = MV88E6XXX_PORT_POLICY_CTL_NORMAL;
+		break;
+	case MV88E6XXX_POLICY_ACTION_MIRROR:
+		val = MV88E6XXX_PORT_POLICY_CTL_MIRROR;
+		break;
+	case MV88E6XXX_POLICY_ACTION_TRAP:
+		val = MV88E6XXX_PORT_POLICY_CTL_TRAP;
+		break;
+	case MV88E6XXX_POLICY_ACTION_DISCARD:
+		val = MV88E6XXX_PORT_POLICY_CTL_DISCARD;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_POLICY_CTL, &reg);
+	if (err)
+		return err;
+
+	reg &= ~mask;
+	reg |= (val << shift) & mask;
+
+	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_POLICY_CTL, reg);
+}
diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
index d4e9bea6e82f..03a480cd71b9 100644
--- a/drivers/net/dsa/mv88e6xxx/port.h
+++ b/drivers/net/dsa/mv88e6xxx/port.h
@@ -222,7 +222,19 @@
 #define MV88E6XXX_PORT_PRI_OVERRIDE	0x0d
 
 /* Offset 0x0E: Policy Control Register */
-#define MV88E6XXX_PORT_POLICY_CTL	0x0e
+#define MV88E6XXX_PORT_POLICY_CTL		0x0e
+#define MV88E6XXX_PORT_POLICY_CTL_DA_MASK	0xc000
+#define MV88E6XXX_PORT_POLICY_CTL_SA_MASK	0x3000
+#define MV88E6XXX_PORT_POLICY_CTL_VTU_MASK	0x0c00
+#define MV88E6XXX_PORT_POLICY_CTL_ETYPE_MASK	0x0300
+#define MV88E6XXX_PORT_POLICY_CTL_PPPOE_MASK	0x00c0
+#define MV88E6XXX_PORT_POLICY_CTL_VBAS_MASK	0x0030
+#define MV88E6XXX_PORT_POLICY_CTL_OPT82_MASK	0x000c
+#define MV88E6XXX_PORT_POLICY_CTL_UDP_MASK	0x0003
+#define MV88E6XXX_PORT_POLICY_CTL_NORMAL	0x0000
+#define MV88E6XXX_PORT_POLICY_CTL_MIRROR	0x0001
+#define MV88E6XXX_PORT_POLICY_CTL_TRAP		0x0002
+#define MV88E6XXX_PORT_POLICY_CTL_DISCARD	0x0003
 
 /* Offset 0x0F: Port Special Ether Type */
 #define MV88E6XXX_PORT_ETH_TYPE		0x0f
@@ -324,6 +336,9 @@ int mv88e6185_port_set_egress_floods(struct mv88e6xxx_chip *chip, int port,
 				     bool unicast, bool multicast);
 int mv88e6352_port_set_egress_floods(struct mv88e6xxx_chip *chip, int port,
 				     bool unicast, bool multicast);
+int mv88e6352_port_set_policy(struct mv88e6xxx_chip *chip, int port,
+			      enum mv88e6xxx_policy_mapping mapping,
+			      enum mv88e6xxx_policy_action action);
 int mv88e6351_port_set_ether_type(struct mv88e6xxx_chip *chip, int port,
 				  u16 etype);
 int mv88e6xxx_port_set_message_port(struct mv88e6xxx_chip *chip, int port,
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next 3/3] net: dsa: mv88e6xxx: add RXNFC support
From: Vivien Didelot @ 2019-09-07 20:00 UTC (permalink / raw)
  To: netdev; +Cc: davem, f.fainelli, andrew, Vivien Didelot
In-Reply-To: <20190907200049.25273-1-vivien.didelot@gmail.com>

Implement the .get_rxnfc and .set_rxnfc DSA operations to configure
a port's Layer 2 Policy Control List (PCL) via ethtool.

Currently only dropping frames based on MAC Destination or Source
Address (including the option VLAN parameter) is supported.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 213 +++++++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/chip.h |  13 ++
 2 files changed, 226 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 6f4d5303a1f3..6787d560e9e3 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1524,6 +1524,216 @@ static int mv88e6xxx_port_db_load_purge(struct mv88e6xxx_chip *chip, int port,
 	return mv88e6xxx_g1_atu_loadpurge(chip, fid, &entry);
 }
 
+static int mv88e6xxx_policy_apply(struct mv88e6xxx_chip *chip, int port,
+				  const struct mv88e6xxx_policy *policy)
+{
+	enum mv88e6xxx_policy_mapping mapping = policy->mapping;
+	enum mv88e6xxx_policy_action action = policy->action;
+	const u8 *addr = policy->addr;
+	u16 vid = policy->vid;
+	u8 state;
+	int err;
+	int id;
+
+	if (!chip->info->ops->port_set_policy)
+		return -EOPNOTSUPP;
+
+	switch (mapping) {
+	case MV88E6XXX_POLICY_MAPPING_DA:
+	case MV88E6XXX_POLICY_MAPPING_SA:
+		if (action == MV88E6XXX_POLICY_ACTION_NORMAL)
+			state = 0; /* Dissociate the port and address */
+		else if (action == MV88E6XXX_POLICY_ACTION_DISCARD &&
+			 is_multicast_ether_addr(addr))
+			state = MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_POLICY;
+		else if (action == MV88E6XXX_POLICY_ACTION_DISCARD &&
+			 is_unicast_ether_addr(addr))
+			state = MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC_POLICY;
+		else
+			return -EOPNOTSUPP;
+
+		err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid,
+						   state);
+		if (err)
+			return err;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	/* Skip the port's policy clearing if the mapping is still in use */
+	if (action == MV88E6XXX_POLICY_ACTION_NORMAL)
+		idr_for_each_entry(&chip->policies, policy, id)
+			if (policy->port == port &&
+			    policy->mapping == mapping &&
+			    policy->action != action)
+				return 0;
+
+	return chip->info->ops->port_set_policy(chip, port, mapping, action);
+}
+
+static int mv88e6xxx_policy_insert(struct mv88e6xxx_chip *chip, int port,
+				   struct ethtool_rx_flow_spec *fs)
+{
+	struct ethhdr *mac_entry = &fs->h_u.ether_spec;
+	struct ethhdr *mac_mask = &fs->m_u.ether_spec;
+	enum mv88e6xxx_policy_mapping mapping;
+	enum mv88e6xxx_policy_action action;
+	struct mv88e6xxx_policy *policy;
+	u16 vid = 0;
+	u8 *addr;
+	int err;
+	int id;
+
+	if (fs->location != RX_CLS_LOC_ANY)
+		return -EINVAL;
+
+	if (fs->ring_cookie == RX_CLS_FLOW_DISC)
+		action = MV88E6XXX_POLICY_ACTION_DISCARD;
+	else
+		return -EOPNOTSUPP;
+
+	switch (fs->flow_type & ~FLOW_EXT) {
+	case ETHER_FLOW:
+		if (!is_zero_ether_addr(mac_mask->h_dest) &&
+		    is_zero_ether_addr(mac_mask->h_source)) {
+			mapping = MV88E6XXX_POLICY_MAPPING_DA;
+			addr = mac_entry->h_dest;
+		} else if (is_zero_ether_addr(mac_mask->h_dest) &&
+		    !is_zero_ether_addr(mac_mask->h_source)) {
+			mapping = MV88E6XXX_POLICY_MAPPING_SA;
+			addr = mac_entry->h_source;
+		} else {
+			/* Cannot support DA and SA mapping in the same rule */
+			return -EOPNOTSUPP;
+		}
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	if ((fs->flow_type & FLOW_EXT) && fs->m_ext.vlan_tci) {
+		if (fs->m_ext.vlan_tci != 0xffff)
+			return -EOPNOTSUPP;
+		vid = be16_to_cpu(fs->h_ext.vlan_tci) & VLAN_VID_MASK;
+	}
+
+	idr_for_each_entry(&chip->policies, policy, id) {
+		if (policy->port == port && policy->mapping == mapping &&
+		    policy->action == action && policy->vid == vid &&
+		    ether_addr_equal(policy->addr, addr))
+			return -EEXIST;
+	}
+
+	policy = devm_kzalloc(chip->dev, sizeof(*policy), GFP_KERNEL);
+	if (!policy)
+		return -ENOMEM;
+
+	fs->location = 0;
+	err = idr_alloc_u32(&chip->policies, policy, &fs->location, 0xffffffff,
+			    GFP_KERNEL);
+	if (err) {
+		devm_kfree(chip->dev, policy);
+		return err;
+	}
+
+	memcpy(&policy->fs, fs, sizeof(*fs));
+	ether_addr_copy(policy->addr, addr);
+	policy->mapping = mapping;
+	policy->action = action;
+	policy->port = port;
+	policy->vid = vid;
+
+	err = mv88e6xxx_policy_apply(chip, port, policy);
+	if (err) {
+		idr_remove(&chip->policies, fs->location);
+		devm_kfree(chip->dev, policy);
+		return err;
+	}
+
+	return 0;
+}
+
+static int mv88e6xxx_get_rxnfc(struct dsa_switch *ds, int port,
+			       struct ethtool_rxnfc *rxnfc, u32 *rule_locs)
+{
+	struct ethtool_rx_flow_spec *fs = &rxnfc->fs;
+	struct mv88e6xxx_chip *chip = ds->priv;
+	struct mv88e6xxx_policy *policy;
+	int err;
+	int id;
+
+	mv88e6xxx_reg_lock(chip);
+
+	switch (rxnfc->cmd) {
+	case ETHTOOL_GRXCLSRLCNT:
+		rxnfc->data = 0;
+		rxnfc->data |= RX_CLS_LOC_SPECIAL;
+		rxnfc->rule_cnt = 0;
+		idr_for_each_entry(&chip->policies, policy, id)
+			if (policy->port == port)
+				rxnfc->rule_cnt++;
+		err = 0;
+		break;
+	case ETHTOOL_GRXCLSRULE:
+		err = -ENOENT;
+		policy = idr_find(&chip->policies, fs->location);
+		if (policy) {
+			memcpy(fs, &policy->fs, sizeof(*fs));
+			err = 0;
+		}
+		break;
+	case ETHTOOL_GRXCLSRLALL:
+		rxnfc->data = 0;
+		rxnfc->rule_cnt = 0;
+		idr_for_each_entry(&chip->policies, policy, id)
+			if (policy->port == port)
+				rule_locs[rxnfc->rule_cnt++] = id;
+		err = 0;
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	mv88e6xxx_reg_unlock(chip);
+
+	return err;
+}
+
+static int mv88e6xxx_set_rxnfc(struct dsa_switch *ds, int port,
+			       struct ethtool_rxnfc *rxnfc)
+{
+	struct ethtool_rx_flow_spec *fs = &rxnfc->fs;
+	struct mv88e6xxx_chip *chip = ds->priv;
+	struct mv88e6xxx_policy *policy;
+	int err;
+
+	mv88e6xxx_reg_lock(chip);
+
+	switch (rxnfc->cmd) {
+	case ETHTOOL_SRXCLSRLINS:
+		err = mv88e6xxx_policy_insert(chip, port, fs);
+		break;
+	case ETHTOOL_SRXCLSRLDEL:
+		err = -ENOENT;
+		policy = idr_remove(&chip->policies, fs->location);
+		if (policy) {
+			policy->action = MV88E6XXX_POLICY_ACTION_NORMAL;
+			err = mv88e6xxx_policy_apply(chip, port, policy);
+			devm_kfree(chip->dev, policy);
+		}
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	mv88e6xxx_reg_unlock(chip);
+
+	return err;
+}
+
 static int mv88e6xxx_port_add_broadcast(struct mv88e6xxx_chip *chip, int port,
 					u16 vid)
 {
@@ -4655,6 +4865,7 @@ static struct mv88e6xxx_chip *mv88e6xxx_alloc_chip(struct device *dev)
 
 	mutex_init(&chip->reg_lock);
 	INIT_LIST_HEAD(&chip->mdios);
+	idr_init(&chip->policies);
 
 	return chip;
 }
@@ -4739,6 +4950,8 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
 	.set_eeprom		= mv88e6xxx_set_eeprom,
 	.get_regs_len		= mv88e6xxx_get_regs_len,
 	.get_regs		= mv88e6xxx_get_regs,
+	.get_rxnfc		= mv88e6xxx_get_rxnfc,
+	.set_rxnfc		= mv88e6xxx_set_rxnfc,
 	.set_ageing_time	= mv88e6xxx_set_ageing_time,
 	.port_bridge_join	= mv88e6xxx_port_bridge_join,
 	.port_bridge_leave	= mv88e6xxx_port_bridge_leave,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 04a329a98158..e9b1a1ac9a8e 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -8,6 +8,7 @@
 #ifndef _MV88E6XXX_CHIP_H
 #define _MV88E6XXX_CHIP_H
 
+#include <linux/idr.h>
 #include <linux/if_vlan.h>
 #include <linux/irq.h>
 #include <linux/gpio/consumer.h>
@@ -207,6 +208,15 @@ enum mv88e6xxx_policy_action {
 	MV88E6XXX_POLICY_ACTION_DISCARD,
 };
 
+struct mv88e6xxx_policy {
+	enum mv88e6xxx_policy_mapping mapping;
+	enum mv88e6xxx_policy_action action;
+	struct ethtool_rx_flow_spec fs;
+	u8 addr[ETH_ALEN];
+	int port;
+	u16 vid;
+};
+
 struct mv88e6xxx_port {
 	struct mv88e6xxx_chip *chip;
 	int port;
@@ -265,6 +275,9 @@ struct mv88e6xxx_chip {
 	/* List of mdio busses */
 	struct list_head mdios;
 
+	/* Policy Control List IDs and rules */
+	struct idr policies;
+
 	/* There can be two interrupt controllers, which are chained
 	 * off a GPIO as interrupt source
 	 */
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH net-next 3/3] net: dsa: mv88e6xxx: add RXNFC support
From: Andrew Lunn @ 2019-09-07 20:32 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, davem, f.fainelli
In-Reply-To: <20190907200049.25273-4-vivien.didelot@gmail.com>

> +static int mv88e6xxx_policy_insert(struct mv88e6xxx_chip *chip, int port,
> +				   struct ethtool_rx_flow_spec *fs)
> +{
> +	struct ethhdr *mac_entry = &fs->h_u.ether_spec;
> +	struct ethhdr *mac_mask = &fs->m_u.ether_spec;
> +	enum mv88e6xxx_policy_mapping mapping;
> +	enum mv88e6xxx_policy_action action;
> +	struct mv88e6xxx_policy *policy;
> +	u16 vid = 0;
> +	u8 *addr;
> +	int err;
> +	int id;
> +
> +	if (fs->location != RX_CLS_LOC_ANY)
> +		return -EINVAL;
> +
> +	if (fs->ring_cookie == RX_CLS_FLOW_DISC)
> +		action = MV88E6XXX_POLICY_ACTION_DISCARD;
> +	else
> +		return -EOPNOTSUPP;
> +
> +	switch (fs->flow_type & ~FLOW_EXT) {
> +	case ETHER_FLOW:
> +		if (!is_zero_ether_addr(mac_mask->h_dest) &&
> +		    is_zero_ether_addr(mac_mask->h_source)) {
> +			mapping = MV88E6XXX_POLICY_MAPPING_DA;
> +			addr = mac_entry->h_dest;
> +		} else if (is_zero_ether_addr(mac_mask->h_dest) &&
> +		    !is_zero_ether_addr(mac_mask->h_source)) {
> +			mapping = MV88E6XXX_POLICY_MAPPING_SA;
> +			addr = mac_entry->h_source;
> +		} else {
> +			/* Cannot support DA and SA mapping in the same rule */
> +			return -EOPNOTSUPP;
> +		}
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if ((fs->flow_type & FLOW_EXT) && fs->m_ext.vlan_tci) {
> +		if (fs->m_ext.vlan_tci != 0xffff)
> +			return -EOPNOTSUPP;
> +		vid = be16_to_cpu(fs->h_ext.vlan_tci) & VLAN_VID_MASK;
> +	}
> +
> +	idr_for_each_entry(&chip->policies, policy, id) {
> +		if (policy->port == port && policy->mapping == mapping &&
> +		    policy->action == action && policy->vid == vid &&
> +		    ether_addr_equal(policy->addr, addr))
> +			return -EEXIST;
> +	}
> +
> +	policy = devm_kzalloc(chip->dev, sizeof(*policy), GFP_KERNEL);
> +	if (!policy)
> +		return -ENOMEM;

Hi Vivien

I think this might be the first time we have done dynamic memory
allocation in the mv88e6xxx driver. It might even be a first for a DSA
driver?

I'm not saying it is wrong, but maybe we should discuss it. 

I assume you are doing this because the ATU entry itself is not
sufficient?

How much memory is involved here, worst case? I assume one struct
mv88e6xxx_policy per ATU entry? Which you think is too much to
allocate as part of chip? I guess most users will never use this
feature, so for most users it would be wasted memory. So i do see the
point for dynamically allocating it.

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: dsa: mv88e6xxx: complete ATU state definitions
From: Andrew Lunn @ 2019-09-07 20:33 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, davem, f.fainelli
In-Reply-To: <20190907200049.25273-2-vivien.didelot@gmail.com>

On Sat, Sep 07, 2019 at 04:00:47PM -0400, Vivien Didelot wrote:
> Marvell has different values for the state of a MAC address,
> depending on its multicast bit. This patch completes the definitions
> for these states.
> 
> At the same time, use 0 which is intuitive enough and simplifies the
> code a bit, instead of the UC or MC unused value.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 2/3] net: dsa: mv88e6xxx: introduce .port_set_policy
From: Andrew Lunn @ 2019-09-07 20:33 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, davem, f.fainelli
In-Reply-To: <20190907200049.25273-3-vivien.didelot@gmail.com>

On Sat, Sep 07, 2019 at 04:00:48PM -0400, Vivien Didelot wrote:
> Introduce a new .port_set_policy operation to configure a port's
> Policy Control List, based on mapping such as DA, SA, Etype and so on.
> 
> Models similar to 88E6352 and 88E6390 are supported at the moment.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [patch net-next 3/3] net: devlink: move reload fail indication to devlink core and expose to user
From: Jiri Pirko @ 2019-09-07 20:38 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, idosch, jakub.kicinski, tariqt, mlxsw
In-Reply-To: <6ff0726a-f910-8107-883e-83476f80b9de@gmail.com>

Sat, Sep 07, 2019 at 05:08:59PM CEST, dsahern@gmail.com wrote:
>On 9/6/19 7:44 PM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Currently the fact that devlink failed is stored in drivers. Move this
>> flag into devlink core. Also, expose it to the user.
>
>you mean 'reload failed', not 'devlink failed'?

Yeah, "reload failed".

>

^ permalink raw reply

* [patch net-next v2 0/3] net: devlink: move reload fail indication to devlink core and expose to user
From: Jiri Pirko @ 2019-09-07 20:53 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, dsahern, jakub.kicinski, tariqt, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

First two patches are dependencies of the last one. That moves devlink
reload failure indication to the devlink code, so the drivers do not
have to track it themselves. Currently it is only mlxsw, but I will send
a follow-up patchset that introduces this in netdevsim too.

Jiri Pirko (3):
  mlx4: Split restart_one into two functions
  net: devlink: split reload op into two
  net: devlink: move reload fail indication to devlink core and expose
    to user

 drivers/net/ethernet/mellanox/mlx4/catas.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c  | 44 ++++++++++++++++++----
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |  3 +-
 drivers/net/ethernet/mellanox/mlxsw/core.c | 30 +++++++++------
 drivers/net/netdevsim/dev.c                | 13 +++++--
 include/net/devlink.h                      |  8 +++-
 include/uapi/linux/devlink.h               |  2 +
 net/core/devlink.c                         | 35 +++++++++++++++--
 8 files changed, 106 insertions(+), 31 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [patch net-next v2 1/3] mlx4: Split restart_one into two functions
From: Jiri Pirko @ 2019-09-07 20:53 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, dsahern, jakub.kicinski, tariqt, mlxsw
In-Reply-To: <20190907205400.14589-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Split the function restart_one into two functions and separate teardown
and buildup.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/catas.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c  | 25 ++++++++++++++++++----
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |  3 +--
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c
index 87e90b5d4d7d..5b11557f1ae4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/catas.c
+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c
@@ -210,7 +210,7 @@ static void mlx4_handle_error_state(struct mlx4_dev_persistent *persist)
 	mutex_lock(&persist->interface_state_mutex);
 	if (persist->interface_state & MLX4_INTERFACE_STATE_UP &&
 	    !(persist->interface_state & MLX4_INTERFACE_STATE_DELETION)) {
-		err = mlx4_restart_one(persist->pdev, false, NULL);
+		err = mlx4_restart_one(persist->pdev);
 		mlx4_info(persist->dev, "mlx4_restart_one was ended, ret=%d\n",
 			  err);
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 07c204bd3fc4..a39c647c12dc 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3931,6 +3931,10 @@ static void mlx4_devlink_param_load_driverinit_values(struct devlink *devlink)
 	}
 }
 
+static void mlx4_restart_one_down(struct pci_dev *pdev);
+static int mlx4_restart_one_up(struct pci_dev *pdev, bool reload,
+			       struct devlink *devlink);
+
 static int mlx4_devlink_reload(struct devlink *devlink,
 			       struct netlink_ext_ack *extack)
 {
@@ -3941,9 +3945,11 @@ static int mlx4_devlink_reload(struct devlink *devlink,
 
 	if (persist->num_vfs)
 		mlx4_warn(persist->dev, "Reload performed on PF, will cause reset on operating Virtual Functions\n");
-	err = mlx4_restart_one(persist->pdev, true, devlink);
+	mlx4_restart_one_down(persist->pdev);
+	err = mlx4_restart_one_up(persist->pdev, true, devlink);
 	if (err)
-		mlx4_err(persist->dev, "mlx4_restart_one failed, ret=%d\n", err);
+		mlx4_err(persist->dev, "mlx4_restart_one_up failed, ret=%d\n",
+			 err);
 
 	return err;
 }
@@ -4163,7 +4169,13 @@ static int restore_current_port_types(struct mlx4_dev *dev,
 	return err;
 }
 
-int mlx4_restart_one(struct pci_dev *pdev, bool reload, struct devlink *devlink)
+static void mlx4_restart_one_down(struct pci_dev *pdev)
+{
+	mlx4_unload_one(pdev);
+}
+
+static int mlx4_restart_one_up(struct pci_dev *pdev, bool reload,
+			       struct devlink *devlink)
 {
 	struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev);
 	struct mlx4_dev	 *dev  = persist->dev;
@@ -4175,7 +4187,6 @@ int mlx4_restart_one(struct pci_dev *pdev, bool reload, struct devlink *devlink)
 	total_vfs = dev->persist->num_vfs;
 	memcpy(nvfs, dev->persist->nvfs, sizeof(dev->persist->nvfs));
 
-	mlx4_unload_one(pdev);
 	if (reload)
 		mlx4_devlink_param_load_driverinit_values(devlink);
 	err = mlx4_load_one(pdev, pci_dev_data, total_vfs, nvfs, priv, 1);
@@ -4194,6 +4205,12 @@ int mlx4_restart_one(struct pci_dev *pdev, bool reload, struct devlink *devlink)
 	return err;
 }
 
+int mlx4_restart_one(struct pci_dev *pdev)
+{
+	mlx4_restart_one_down(pdev);
+	return mlx4_restart_one_up(pdev, false, NULL);
+}
+
 #define MLX_SP(id) { PCI_VDEVICE(MELLANOX, id), MLX4_PCI_DEV_FORCE_SENSE_PORT }
 #define MLX_VF(id) { PCI_VDEVICE(MELLANOX, id), MLX4_PCI_DEV_IS_VF }
 #define MLX_GN(id) { PCI_VDEVICE(MELLANOX, id), 0 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 23f1b5b512c2..527b52e48276 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -1043,8 +1043,7 @@ int mlx4_catas_init(struct mlx4_dev *dev);
 void mlx4_catas_end(struct mlx4_dev *dev);
 int mlx4_crdump_init(struct mlx4_dev *dev);
 void mlx4_crdump_end(struct mlx4_dev *dev);
-int mlx4_restart_one(struct pci_dev *pdev, bool reload,
-		     struct devlink *devlink);
+int mlx4_restart_one(struct pci_dev *pdev);
 int mlx4_register_device(struct mlx4_dev *dev);
 void mlx4_unregister_device(struct mlx4_dev *dev);
 void mlx4_dispatch_event(struct mlx4_dev *dev, enum mlx4_dev_event type,
-- 
2.21.0


^ permalink raw reply related

* [patch net-next v2 2/3] net: devlink: split reload op into two
From: Jiri Pirko @ 2019-09-07 20:53 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, dsahern, jakub.kicinski, tariqt, mlxsw
In-Reply-To: <20190907205400.14589-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

In order to properly implement failure indication during reload,
split the reload op into two ops, one for down phase and one for
up phase.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c  | 19 +++++++++++++++----
 drivers/net/ethernet/mellanox/mlxsw/core.c | 19 +++++++++++++++----
 drivers/net/netdevsim/dev.c                | 13 ++++++++++---
 include/net/devlink.h                      |  5 ++++-
 net/core/devlink.c                         | 16 ++++++++++++----
 5 files changed, 56 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index a39c647c12dc..ef3f3d06ff1e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3935,17 +3935,27 @@ static void mlx4_restart_one_down(struct pci_dev *pdev);
 static int mlx4_restart_one_up(struct pci_dev *pdev, bool reload,
 			       struct devlink *devlink);
 
-static int mlx4_devlink_reload(struct devlink *devlink,
-			       struct netlink_ext_ack *extack)
+static int mlx4_devlink_reload_down(struct devlink *devlink,
+				    struct netlink_ext_ack *extack)
 {
 	struct mlx4_priv *priv = devlink_priv(devlink);
 	struct mlx4_dev *dev = &priv->dev;
 	struct mlx4_dev_persistent *persist = dev->persist;
-	int err;
 
 	if (persist->num_vfs)
 		mlx4_warn(persist->dev, "Reload performed on PF, will cause reset on operating Virtual Functions\n");
 	mlx4_restart_one_down(persist->pdev);
+	return 0;
+}
+
+static int mlx4_devlink_reload_up(struct devlink *devlink,
+				  struct netlink_ext_ack *extack)
+{
+	struct mlx4_priv *priv = devlink_priv(devlink);
+	struct mlx4_dev *dev = &priv->dev;
+	struct mlx4_dev_persistent *persist = dev->persist;
+	int err;
+
 	err = mlx4_restart_one_up(persist->pdev, true, devlink);
 	if (err)
 		mlx4_err(persist->dev, "mlx4_restart_one_up failed, ret=%d\n",
@@ -3956,7 +3966,8 @@ static int mlx4_devlink_reload(struct devlink *devlink,
 
 static const struct devlink_ops mlx4_devlink_ops = {
 	.port_type_set	= mlx4_devlink_port_type_set,
-	.reload		= mlx4_devlink_reload,
+	.reload_down	= mlx4_devlink_reload_down,
+	.reload_up	= mlx4_devlink_reload_up,
 };
 
 static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 963a2b4b61b1..c71a1d9ea17b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -984,16 +984,26 @@ mlxsw_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
 	return 0;
 }
 
-static int mlxsw_devlink_core_bus_device_reload(struct devlink *devlink,
-						struct netlink_ext_ack *extack)
+static int
+mlxsw_devlink_core_bus_device_reload_down(struct devlink *devlink,
+					  struct netlink_ext_ack *extack)
 {
 	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
-	int err;
 
 	if (!(mlxsw_core->bus->features & MLXSW_BUS_F_RESET))
 		return -EOPNOTSUPP;
 
 	mlxsw_core_bus_device_unregister(mlxsw_core, true);
+	return 0;
+}
+
+static int
+mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink,
+					struct netlink_ext_ack *extack)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+	int err;
+
 	err = mlxsw_core_bus_device_register(mlxsw_core->bus_info,
 					     mlxsw_core->bus,
 					     mlxsw_core->bus_priv, true,
@@ -1066,7 +1076,8 @@ mlxsw_devlink_trap_group_init(struct devlink *devlink,
 }
 
 static const struct devlink_ops mlxsw_devlink_ops = {
-	.reload				= mlxsw_devlink_core_bus_device_reload,
+	.reload_down		= mlxsw_devlink_core_bus_device_reload_down,
+	.reload_up		= mlxsw_devlink_core_bus_device_reload_up,
 	.port_type_set			= mlxsw_devlink_port_type_set,
 	.port_split			= mlxsw_devlink_port_split,
 	.port_unsplit			= mlxsw_devlink_port_unsplit,
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 39cdb6c18ec0..7fba7b271a57 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -521,8 +521,14 @@ static void nsim_dev_traps_exit(struct devlink *devlink)
 	kfree(nsim_dev->trap_data);
 }
 
-static int nsim_dev_reload(struct devlink *devlink,
-			   struct netlink_ext_ack *extack)
+static int nsim_dev_reload_down(struct devlink *devlink,
+				struct netlink_ext_ack *extack)
+{
+	return 0;
+}
+
+static int nsim_dev_reload_up(struct devlink *devlink,
+			      struct netlink_ext_ack *extack)
 {
 	enum nsim_resource_id res_ids[] = {
 		NSIM_RESOURCE_IPV4_FIB, NSIM_RESOURCE_IPV4_FIB_RULES,
@@ -638,7 +644,8 @@ nsim_dev_devlink_trap_action_set(struct devlink *devlink,
 }
 
 static const struct devlink_ops nsim_dev_devlink_ops = {
-	.reload = nsim_dev_reload,
+	.reload_down = nsim_dev_reload_down,
+	.reload_up = nsim_dev_reload_up,
 	.flash_update = nsim_dev_flash_update,
 	.trap_init = nsim_dev_devlink_trap_init,
 	.trap_action_set = nsim_dev_devlink_trap_action_set,
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 460bc629d1a4..c17709c0d0ec 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -637,7 +637,10 @@ enum devlink_trap_group_generic_id {
 	}
 
 struct devlink_ops {
-	int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
+	int (*reload_down)(struct devlink *devlink,
+			   struct netlink_ext_ack *extack);
+	int (*reload_up)(struct devlink *devlink,
+			 struct netlink_ext_ack *extack);
 	int (*port_type_set)(struct devlink_port *devlink_port,
 			     enum devlink_port_type port_type);
 	int (*port_split)(struct devlink *devlink, unsigned int port_index,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 6e52d639dac6..1e3a2288b0b2 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2672,12 +2672,17 @@ devlink_resources_validate(struct devlink *devlink,
 	return err;
 }
 
+static bool devlink_reload_supported(struct devlink *devlink)
+{
+	return devlink->ops->reload_down && devlink->ops->reload_up;
+}
+
 static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
 {
 	struct devlink *devlink = info->user_ptr[0];
 	int err;
 
-	if (!devlink->ops->reload)
+	if (!devlink_reload_supported(devlink))
 		return -EOPNOTSUPP;
 
 	err = devlink_resources_validate(devlink, NULL, info);
@@ -2685,7 +2690,10 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
 		NL_SET_ERR_MSG_MOD(info->extack, "resources size validation failed");
 		return err;
 	}
-	return devlink->ops->reload(devlink, info->extack);
+	err = devlink->ops->reload_down(devlink, info->extack);
+	if (err)
+		return err;
+	return devlink->ops->reload_up(devlink, info->extack);
 }
 
 static int devlink_nl_flash_update_fill(struct sk_buff *msg,
@@ -7145,7 +7153,7 @@ __devlink_param_driverinit_value_set(struct devlink *devlink,
 int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id,
 				       union devlink_param_value *init_val)
 {
-	if (!devlink->ops->reload)
+	if (!devlink_reload_supported(devlink))
 		return -EOPNOTSUPP;
 
 	return __devlink_param_driverinit_value_get(&devlink->param_list,
@@ -7192,7 +7200,7 @@ int devlink_port_param_driverinit_value_get(struct devlink_port *devlink_port,
 {
 	struct devlink *devlink = devlink_port->devlink;
 
-	if (!devlink->ops->reload)
+	if (!devlink_reload_supported(devlink))
 		return -EOPNOTSUPP;
 
 	return __devlink_param_driverinit_value_get(&devlink_port->param_list,
-- 
2.21.0


^ permalink raw reply related

* [patch net-next v2 3/3] net: devlink: move reload fail indication to devlink core and expose to user
From: Jiri Pirko @ 2019-09-07 20:54 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, dsahern, jakub.kicinski, tariqt, mlxsw
In-Reply-To: <20190907205400.14589-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Currently the fact that devlink reload failed is stored in drivers.
Move this flag into devlink core. Also, expose it to the user.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v1->v2:
- s/devlink failed/devlink reload failed/ in description
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 15 +++++----------
 include/net/devlink.h                      |  3 +++
 include/uapi/linux/devlink.h               |  2 ++
 net/core/devlink.c                         | 21 ++++++++++++++++++++-
 4 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index c71a1d9ea17b..3fa96076e8a5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -80,7 +80,6 @@ struct mlxsw_core {
 	struct mlxsw_thermal *thermal;
 	struct mlxsw_core_port *ports;
 	unsigned int max_ports;
-	bool reload_fail;
 	bool fw_flash_in_progress;
 	unsigned long driver_priv[0];
 	/* driver_priv has to be always the last item */
@@ -1002,15 +1001,11 @@ mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink,
 					struct netlink_ext_ack *extack)
 {
 	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
-	int err;
-
-	err = mlxsw_core_bus_device_register(mlxsw_core->bus_info,
-					     mlxsw_core->bus,
-					     mlxsw_core->bus_priv, true,
-					     devlink);
-	mlxsw_core->reload_fail = !!err;
 
-	return err;
+	return mlxsw_core_bus_device_register(mlxsw_core->bus_info,
+					      mlxsw_core->bus,
+					      mlxsw_core->bus_priv, true,
+					      devlink);
 }
 
 static int mlxsw_devlink_flash_update(struct devlink *devlink,
@@ -1254,7 +1249,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
 {
 	struct devlink *devlink = priv_to_devlink(mlxsw_core);
 
-	if (mlxsw_core->reload_fail) {
+	if (devlink_is_reload_failed(devlink)) {
 		if (!reload)
 			/* Only the parts that were not de-initialized in the
 			 * failed reload attempt need to be de-initialized.
diff --git a/include/net/devlink.h b/include/net/devlink.h
index c17709c0d0ec..9c881dc25273 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -38,6 +38,7 @@ struct devlink {
 	struct device *dev;
 	possible_net_t _net;
 	struct mutex lock;
+	bool reload_failed;
 	char priv[0] __aligned(NETDEV_ALIGN);
 };
 
@@ -940,6 +941,8 @@ void
 devlink_health_reporter_state_update(struct devlink_health_reporter *reporter,
 				     enum devlink_health_reporter_state state);
 
+bool devlink_is_reload_failed(struct devlink *devlink);
+
 void devlink_flash_update_begin_notify(struct devlink *devlink);
 void devlink_flash_update_end_notify(struct devlink *devlink);
 void devlink_flash_update_status_notify(struct devlink *devlink,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 546e75dd74ac..7cb5e8c5ae0d 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -410,6 +410,8 @@ enum devlink_attr {
 	DEVLINK_ATTR_TRAP_METADATA,			/* nested */
 	DEVLINK_ATTR_TRAP_GROUP_NAME,			/* string */
 
+	DEVLINK_ATTR_RELOAD_FAILED,			/* u8 0 or 1 */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 1e3a2288b0b2..e00a4a643d17 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -471,6 +471,8 @@ static int devlink_nl_fill(struct sk_buff *msg, struct devlink *devlink,
 
 	if (devlink_nl_put_handle(msg, devlink))
 		goto nla_put_failure;
+	if (nla_put_u8(msg, DEVLINK_ATTR_RELOAD_FAILED, devlink->reload_failed))
+		goto nla_put_failure;
 
 	genlmsg_end(msg, hdr);
 	return 0;
@@ -2677,6 +2679,21 @@ static bool devlink_reload_supported(struct devlink *devlink)
 	return devlink->ops->reload_down && devlink->ops->reload_up;
 }
 
+static void devlink_reload_failed_set(struct devlink *devlink,
+				      bool reload_failed)
+{
+	if (devlink->reload_failed == reload_failed)
+		return;
+	devlink->reload_failed = reload_failed;
+	devlink_notify(devlink, DEVLINK_CMD_NEW);
+}
+
+bool devlink_is_reload_failed(struct devlink *devlink)
+{
+	return devlink->reload_failed;
+}
+EXPORT_SYMBOL_GPL(devlink_is_reload_failed);
+
 static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
 {
 	struct devlink *devlink = info->user_ptr[0];
@@ -2693,7 +2710,9 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
 	err = devlink->ops->reload_down(devlink, info->extack);
 	if (err)
 		return err;
-	return devlink->ops->reload_up(devlink, info->extack);
+	err = devlink->ops->reload_up(devlink, info->extack);
+	devlink_reload_failed_set(devlink, !!err);
+	return err;
 }
 
 static int devlink_nl_flash_update_fill(struct sk_buff *msg,
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH net-next 3/3] net: dsa: mv88e6xxx: add RXNFC support
From: Vivien Didelot @ 2019-09-07 21:25 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, davem, f.fainelli
In-Reply-To: <20190907203256.GA18741@lunn.ch>

Hi Andrew,

On Sat, 7 Sep 2019 22:32:56 +0200, Andrew Lunn <andrew@lunn.ch> wrote:
> > +	policy = devm_kzalloc(chip->dev, sizeof(*policy), GFP_KERNEL);
> > +	if (!policy)
> > +		return -ENOMEM;
> 
> I think this might be the first time we have done dynamic memory
> allocation in the mv88e6xxx driver. It might even be a first for a DSA
> driver?
> 
> I'm not saying it is wrong, but maybe we should discuss it. 
> 
> I assume you are doing this because the ATU entry itself is not
> sufficient?
> 
> How much memory is involved here, worst case? I assume one struct
> mv88e6xxx_policy per ATU entry? Which you think is too much to
> allocate as part of chip? I guess most users will never use this
> feature, so for most users it would be wasted memory. So i do see the
> point for dynamically allocating it.

A layer 2 policy is not limited to the ATU. It can also be based on a VTU
entry, on the port's Etype, or frame's Etype. We can have 0, 1 or literally
thousands of policies programmed by the user. The ethtool API does not
store the entries and requires the driver to dump them on get operations,
hence the allocation for simplicity. But we may accomodate the DSA layer in
the future if there are more RXNFC users than just bcm_sf2 and mv88e6xxx.


Thanks,

	Vivien

^ permalink raw reply

* [RFC bpf-next 0/7] bpf: packet capture helpers, bpftool support
From: Alan Maguire @ 2019-09-07 21:40 UTC (permalink / raw)
  To: ast, daniel, kafai, songliubraving, yhs, davem, jakub.kicinski,
	hawk, john.fastabend, rostedt, mingo, quentin.monnet, rdna, joe,
	acme, jolsa, alexey.budankov, gregkh, namhyung, sdf, f.fainelli,
	shuah, peter, ivan, andriin, bhole_prashant_q7, david.calavera,
	danieltimlee, ctakshak, netdev, bpf, linux-kselftest
  Cc: Alan Maguire

Packet capture is useful from a general debugging standpoint, and is
useful in particular in debugging BPF programs that do packet processing.
For general debugging, being able to initiate arbitrary packet capture
from kprobes and tracepoints is highly valuable; e.g. what do the packets
that reach kfree_skb() - representing error codepaths - look like?
Arbitrary packet capture is distinct from the traditional concept of
pre-defined hooks, and gives much more flexibility in probing system
behaviour. For packet-processing BPF programs, packet capture can be useful
for doing things such as debugging checksum errors.

The intent of this RFC patchset is to initiate discussion on if and how to
work packet capture-specific capabilities into BPF.  It is possible -
and indeed projects like xdpcap [1] have demonstrated how - to carry out
packet capture in BPF today via perf events, but the aim here is to
simplify both the in-BPF capture and the userspace collection.

The suggested approach is to add a new bpf helper - bpf_pcap() - to
simplify packet capture within BPF programs, and to enhance bpftool
to add a "pcap" subcommand to aid in retrieving packets.  The helper
is for the most part a wrapper around perf event sending, using
data relevant for packet capture as metadata.

The end result is being able to capture packet data in the following
manner.  For example if we add an iptables drop rule, we can observe
TCP SYN segments being freed at kfree_skb:

$ iptables -A INPUT -p tcp --dport 6666 -j DROP
$ bpftool pcap trace kprobe:kfree_skb proto ip data_out /tmp/cap &
$ nc 127.0.0.1 6666
Ncat: Connection timed out.
$ fg
^C
$ tshark -r /tmp/cap
Running as user "root" and group "root". This could be dangerous.
...
  3          7    127.0.0.1 -> 127.0.0.1    TCP 60 54732 > ircu [SYN] Seq=0 Win=65495 Len=0 MSS=65495 SACK_PERM=1 TSval=696475539 TSecr=0 WS=128
...

Tracepoints are also supported, and by default data is sent to
stdout, so we can pipe to tcpdump:

$ bpftool pcap trace tracepoint:net_dev_xmit:arg1 proto eth | tcpdump -r -
reading from file -, link-type EN10MB (Ethernet)
00:16:49.150880 IP 10.11.12.13 > 10.11.12.14: ICMP echo reply, id 10519, seq 1, length 64
...

Patch 1 adds support for bpf_pcap() in skb and XDP programs.  In those cases,
the argument is the relevant context (struct __sk_buff or xdp metadata)
from which we capture.
Patch 2 extends the helper to allow it to work for tracing programs, and in
that case the data argument is a pointer to an skb, derived from raw
tracepoint or kprobe arguments.
Patch 3 syncs uapi and tools headers for the new helper, flags and associated
pcap header type.
Patch 4 adds a feature test for libpcap which will be used in the next patch.
Patch 5 adds a "pcap" subcommand to bpftool to collect packet data from
BPF-driven perf event maps in existing programs.  Also supplied are simple
tracepoint and kprobe programs which can be used to attach to a kprobe
or raw tracepoint to retrieve arguments and capture the associated skb.
Patch 6 adds documentation for the new pcap subcommand.
Patch 7 tests the pcap subcommand for tracing, skb and xdp programs.

Alan Maguire (7):
  bpf: add bpf_pcap() helper to simplify packet capture
  bpf: extend bpf_pcap support to tracing programs
  bpf: sync tools/include/uapi/linux/bpf.h for pcap support
  bpf: add libpcap feature test
  bpf: add pcap support to bpftool
  bpf: add documentation for bpftool pcap subcommand
  bpf: add tests for bpftool packet capture

 include/linux/bpf.h                                |  20 +
 include/uapi/linux/bpf.h                           |  92 +++-
 kernel/bpf/verifier.c                              |   4 +-
 kernel/trace/bpf_trace.c                           | 214 +++++++++
 net/core/filter.c                                  |  67 +++
 tools/bpf/bpftool/Documentation/bpftool-btf.rst    |   1 +
 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |   1 +
 .../bpf/bpftool/Documentation/bpftool-feature.rst  |   1 +
 tools/bpf/bpftool/Documentation/bpftool-map.rst    |   1 +
 tools/bpf/bpftool/Documentation/bpftool-net.rst    |   1 +
 tools/bpf/bpftool/Documentation/bpftool-pcap.rst   | 119 +++++
 tools/bpf/bpftool/Documentation/bpftool-perf.rst   |   1 +
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   |   1 +
 tools/bpf/bpftool/Documentation/bpftool.rst        |   1 +
 tools/bpf/bpftool/Makefile                         |  39 +-
 tools/bpf/bpftool/main.c                           |   3 +-
 tools/bpf/bpftool/main.h                           |   1 +
 tools/bpf/bpftool/pcap.c                           | 496 +++++++++++++++++++++
 tools/bpf/bpftool/progs/bpftool_pcap_kprobe.c      |  80 ++++
 tools/bpf/bpftool/progs/bpftool_pcap_tracepoint.c  |  68 +++
 tools/build/Makefile.feature                       |   2 +
 tools/build/feature/Makefile                       |   4 +
 tools/build/feature/test-libpcap.c                 |  26 ++
 tools/include/uapi/linux/bpf.h                     |  92 +++-
 tools/testing/selftests/bpf/Makefile               |   3 +-
 tools/testing/selftests/bpf/bpf_helpers.h          |  11 +
 .../testing/selftests/bpf/progs/bpftool_pcap_tc.c  |  41 ++
 .../testing/selftests/bpf/progs/bpftool_pcap_xdp.c |  39 ++
 tools/testing/selftests/bpf/test_bpftool_pcap.sh   | 132 ++++++
 29 files changed, 1549 insertions(+), 12 deletions(-)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-pcap.rst
 create mode 100644 tools/bpf/bpftool/pcap.c
 create mode 100644 tools/bpf/bpftool/progs/bpftool_pcap_kprobe.c
 create mode 100644 tools/bpf/bpftool/progs/bpftool_pcap_tracepoint.c
 create mode 100644 tools/build/feature/test-libpcap.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpftool_pcap_tc.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpftool_pcap_xdp.c
 create mode 100755 tools/testing/selftests/bpf/test_bpftool_pcap.sh

-- 
1.8.3.1

^ permalink raw reply

* [RFC bpf-next 2/7] bpf: extend bpf_pcap support to tracing programs
From: Alan Maguire @ 2019-09-07 21:40 UTC (permalink / raw)
  To: ast, daniel, kafai, songliubraving, yhs, davem, jakub.kicinski,
	hawk, john.fastabend, rostedt, mingo, quentin.monnet, rdna, joe,
	acme, jolsa, alexey.budankov, gregkh, namhyung, sdf, f.fainelli,
	shuah, peter, ivan, andriin, bhole_prashant_q7, david.calavera,
	danieltimlee, ctakshak, netdev, bpf, linux-kselftest
  Cc: Alan Maguire
In-Reply-To: <1567892444-16344-1-git-send-email-alan.maguire@oracle.com>

packet capture is especially valuable in tracing contexts, so
extend bpf_pcap helper to take a tracing-derived skb pointer
as an argument.

In the case of tracing programs, the starting protocol
(corresponding to libpcap DLT_* values; 1 for Ethernet, 12 for
IP, etc) needs to be specified and should reflect the protocol
type which is pointed to by the skb->data pointer; i.e. the
start of the packet.  This can derived in a limited set of cases,
but should be specified where possible.  For skb and xdp programs
this protocol will nearly always be 1 (BPF_PCAP_TYPE_ETH).

Example usage for a tracing program, where we use a
struct bpf_pcap_hdr array map to pass in preferences for
protocol and max len:

struct bpf_map_def SEC("maps") pcap_conf_map = {
	.type = BPF_MAP_TYPE_ARRAY,
	.key_size = sizeof(int),
	.value_size = sizeof(struct bpf_pcap_hdr),
	.max_entries = 1,
};

struct bpf_map_def SEC("maps") pcap_map = {
	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
	.key_size = sizeof(int),
	.value_size = sizeof(int),
	.max_entries = 1024,
};

SEC("kprobe/kfree_skb")
int probe_kfree_skb(struct pt_regs *ctx)
{
	struct bpf_pcap_hdr *conf;
	int key = 0;

	conf = bpf_map_lookup_elem(&pcap_conf_map, &key);
	if (!conf)
		return 0;

	bpf_pcap((void *)PT_REGS_PARM1(ctx), conf->cap_len, &pcap_map,
		 conf->protocol, BPF_F_CURRENT_CPU);
	return 0;
}

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 include/uapi/linux/bpf.h |  21 ++++-
 kernel/trace/bpf_trace.c | 214 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 233 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a27e58e..13f86d3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2758,7 +2758,9 @@ struct bpf_stack_build_id {
  *              held by *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. This
  *		perf event has the same attributes as perf events generated
  *		by bpf_perf_event_output.  For skb and xdp programs, *data*
- *		is the relevant context.
+ *		is the relevant context, while for tracing programs,
+ *		*data* must be a pointer to a **struct sk_buff** derived
+ *		from kprobe or tracepoint arguments.
  *
  *		Metadata for this event is a **struct bpf_pcap_hdr**; this
  *		contains the capture length, actual packet length and
@@ -2771,6 +2773,14 @@ struct bpf_stack_build_id {
  *		to 48 bits; the id can be used to correlate captured packets
  *		with other trace data, since the passed-in flags value is stored
  *		stored in the **struct bpf_pcap_hdr** in the **flags** field.
+ *		Specifying **BPF_F_PCAP_ID_IIFINDEX** and a non-zero value in
+ *		the id portion of the flags limits capture events to skbs
+ *		with the specified incoming ifindex, allowing limiting of
+ *		tracing to the the associated interface.  Specifying
+ *		**BPF_F_PCAP_STRICT_TYPE** will cause *bpf_pcap* to return
+ *		-EPROTO and skip capture if a specific protocol is specified
+ *		and it does not match the current skb.  These additional flags
+ *		are only valid (and useful) for tracing programs.
  *
  *		The *protocol* value specifies the protocol type of the start
  *		of the packet so that packet capture can carry out
@@ -2780,7 +2790,12 @@ struct bpf_stack_build_id {
  *	Return
  *		0 on success, or a negative error in case of failure.
  *		-ENOENT will be returned if the associated perf event
- *		map entry is empty, or the skb is zero-length.
+ *		map entry is empty, the skb is zero-length,  or the incoming
+ *		ifindex was specified and we failed to match.
+ *		-EPROTO will be returned if **BPF_PCAP_TYPE_UNSET** is specified
+ *		and no protocol can be determined, or if we specify a protocol
+ *		along with **BPF_F_PCAP_STRICT_TYPE** and the skb protocol does
+ *		not match.
  *		-EINVAL will be returned if the flags value is invalid.
  *
  */
@@ -2977,6 +2992,8 @@ enum bpf_func_id {
 
 /* BPF_FUNC_pcap flags */
 #define	BPF_F_PCAP_ID_MASK		0xffffffffffff
+#define BPF_F_PCAP_ID_IIFINDEX		(1ULL << 48)
+#define BPF_F_PCAP_STRICT_TYPE         (1ULL << 56)
 
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1255d..311883b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -13,6 +13,8 @@
 #include <linux/kprobes.h>
 #include <linux/syscalls.h>
 #include <linux/error-injection.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
 
 #include <asm/tlb.h>
 
@@ -530,6 +532,216 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
 	return __bpf_perf_event_output(regs, map, flags, sd);
 }
 
+/* Essentially just skb_copy_bits() using probe_kernel_read() where needed. */
+static unsigned long bpf_trace_skb_copy(void *tobuf, const void *from,
+					unsigned long offset,
+					unsigned long len)
+{
+	const struct sk_buff *frag_iterp, *skb = from;
+	struct skb_shared_info *shinfop, shinfo;
+	struct sk_buff frag_iter;
+	unsigned long copy, start;
+	void *to = tobuf;
+	int i, ret;
+
+	start = skb_headlen(skb);
+
+	copy = start - offset;
+	if (copy > 0) {
+		if (copy > len)
+			copy = len;
+		ret = probe_kernel_read(to, skb->data, copy);
+		if (unlikely(ret < 0))
+			goto out;
+		len -= copy;
+		if (len == 0)
+			return 0;
+		offset += copy;
+		to += copy;
+	}
+
+	if (skb->data_len == 0)
+		goto out;
+
+	shinfop = skb_shinfo(skb);
+
+	ret = probe_kernel_read(&shinfo, shinfop, sizeof(shinfo));
+	if (unlikely(ret < 0))
+		goto out;
+
+	if (shinfo.nr_frags > MAX_SKB_FRAGS) {
+		ret = -EINVAL;
+		goto out;
+	}
+	for (i = 0; i < shinfo.nr_frags; i++) {
+		skb_frag_t *f = &shinfo.frags[i];
+		int end;
+
+		if (start > offset + len) {
+			ret = -E2BIG;
+			goto out;
+		}
+
+		end = start + skb_frag_size(f);
+		copy = end - offset;
+		if (copy > 0) {
+			u32 poff, p_len, copied;
+			struct page *p;
+			u8 *vaddr;
+
+			if (copy > len)
+				copy = len;
+
+			skb_frag_foreach_page(f,
+					      skb_frag_off(f) + offset - start,
+					      copy, p, poff, p_len, copied) {
+
+				vaddr = kmap_atomic(p);
+				ret = probe_kernel_read(to + copied,
+							vaddr + poff, p_len);
+				kunmap_atomic(vaddr);
+
+				if (unlikely(ret < 0))
+					goto out;
+			}
+			len -= copy;
+			if (len == 0)
+				return 0;
+			offset += copy;
+			to += copy;
+		}
+		start = end;
+	}
+
+	for (frag_iterp = shinfo.frag_list; frag_iterp;
+	     frag_iterp = frag_iter.next) {
+		int end;
+
+		if (start > offset + len) {
+			ret = -E2BIG;
+			goto out;
+		}
+		ret = probe_kernel_read(&frag_iter, frag_iterp,
+					sizeof(frag_iter));
+		if (ret)
+			goto out;
+
+		end = start + frag_iter.len;
+		copy = end - offset;
+		if (copy > 0) {
+			if (copy > len)
+				copy = len;
+			ret = bpf_trace_skb_copy(to, &frag_iter,
+						offset - start,
+						copy);
+			if (ret)
+				goto out;
+
+			len -= copy;
+			if (len == 0)
+				return 0;
+			offset += copy;
+			to += copy;
+		}
+		start = end;
+	}
+out:
+	if (ret)
+		memset(tobuf, 0, len);
+
+	return ret;
+}
+
+/* Derive protocol for some of the easier cases.  For tracing, a probe point
+ * may be dealing with packets in various states. Common cases are IP
+ * packets prior to adding MAC header (_PCAP_TYPE_IP) and a full packet
+ * (_PCAP_TYPE_ETH).  For other cases the caller must specify the
+ * protocol they expect.  Other heuristics for packet identification
+ * should be added here as needed, since determining the packet type
+ * ensures we do not capture packets that fail to match the desired
+ * pcap type in BPF_F_PCAP_STRICT_TYPE mode.
+ */
+static inline int bpf_skb_protocol_get(struct sk_buff *skb)
+{
+	switch (htons(skb->protocol)) {
+	case ETH_P_IP:
+	case ETH_P_IPV6:
+		if (skb_network_header(skb) == skb->data)
+			return BPF_PCAP_TYPE_IP;
+		else
+			return BPF_PCAP_TYPE_ETH;
+	default:
+		return BPF_PCAP_TYPE_UNSET;
+	}
+}
+
+BPF_CALL_5(bpf_trace_pcap, void *, data, u32, size, struct bpf_map *, map,
+	   int, protocol_wanted, u64, flags)
+{
+	struct bpf_pcap_hdr pcap;
+	struct sk_buff skb;
+	int protocol;
+	int ret;
+
+	if (unlikely(flags & ~(BPF_F_PCAP_ID_IIFINDEX | BPF_F_PCAP_ID_MASK |
+			       BPF_F_PCAP_STRICT_TYPE)))
+		return -EINVAL;
+
+	ret = probe_kernel_read(&skb, data, sizeof(skb));
+	if (unlikely(ret < 0))
+		return ret;
+
+	/* Sanity check skb len in case we get bogus data. */
+	if (unlikely(!skb.len))
+		return -ENOENT;
+	if (unlikely(skb.len > GSO_MAX_SIZE || skb.data_len > skb.len))
+		return -E2BIG;
+
+	protocol = bpf_skb_protocol_get(&skb);
+
+	if (protocol_wanted == BPF_PCAP_TYPE_UNSET) {
+		/* If we cannot determine protocol type, bail. */
+		if (protocol == BPF_PCAP_TYPE_UNSET)
+			return -EPROTO;
+	} else {
+		/* if we determine protocol type, and it's not what we asked
+		 * for _and_ we are in strict mode, bail.  Otherwise we assume
+		 * the packet is the requested protocol type and drive on.
+		 */
+		if (flags & BPF_F_PCAP_STRICT_TYPE &&
+		    protocol != BPF_PCAP_TYPE_UNSET &&
+		    protocol != protocol_wanted)
+			return -EPROTO;
+		protocol = protocol_wanted;
+	}
+
+	/* If we specified a matching incoming ifindex, bail if not a match. */
+	if (flags & BPF_F_PCAP_ID_IIFINDEX) {
+		int iif = flags & BPF_F_PCAP_ID_MASK;
+
+		if (iif && skb.skb_iif != iif)
+			return -ENOENT;
+	}
+
+	ret = bpf_pcap_prepare(protocol, size, skb.len, flags, &pcap);
+	if (ret)
+		return ret;
+
+	return bpf_event_output(map, BPF_F_CURRENT_CPU, &pcap, sizeof(pcap),
+				&skb, pcap.cap_len, bpf_trace_skb_copy);
+}
+
+static const struct bpf_func_proto bpf_trace_pcap_proto = {
+	.func		= bpf_trace_pcap,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_CONST_MAP_PTR,
+	.arg4_type	= ARG_ANYTHING,
+	.arg5_type	= ARG_ANYTHING,
+};
+
 BPF_CALL_0(bpf_get_current_task)
 {
 	return (long) current;
@@ -709,6 +921,8 @@ static void do_bpf_send_signal(struct irq_work *entry)
 #endif
 	case BPF_FUNC_send_signal:
 		return &bpf_send_signal_proto;
+	case BPF_FUNC_pcap:
+		return &bpf_trace_pcap_proto;
 	default:
 		return NULL;
 	}
-- 
1.8.3.1


^ permalink raw reply related

* [RFC bpf-next 4/7] bpf: add libpcap feature test
From: Alan Maguire @ 2019-09-07 21:40 UTC (permalink / raw)
  To: ast, daniel, kafai, songliubraving, yhs, davem, jakub.kicinski,
	hawk, john.fastabend, rostedt, mingo, quentin.monnet, rdna, joe,
	acme, jolsa, alexey.budankov, gregkh, namhyung, sdf, f.fainelli,
	shuah, peter, ivan, andriin, bhole_prashant_q7, david.calavera,
	danieltimlee, ctakshak, netdev, bpf, linux-kselftest
  Cc: Alan Maguire
In-Reply-To: <1567892444-16344-1-git-send-email-alan.maguire@oracle.com>

this test will be used when deciding whether to add the pcap
support features in the following patch

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/build/Makefile.feature       |  2 ++
 tools/build/feature/Makefile       |  4 ++++
 tools/build/feature/test-libpcap.c | 26 ++++++++++++++++++++++++++
 3 files changed, 32 insertions(+)
 create mode 100644 tools/build/feature/test-libpcap.c

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 86b793d..35e65418 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -85,6 +85,7 @@ FEATURE_TESTS_EXTRA :=                  \
          libbfd-liberty                 \
          libbfd-liberty-z               \
          libopencsd                     \
+         libpcap                        \
          libunwind-x86                  \
          libunwind-x86_64               \
          libunwind-arm                  \
@@ -113,6 +114,7 @@ FEATURE_DISPLAY ?=              \
          libelf                 \
          libnuma                \
          numa_num_possible_cpus \
+         libpcap                \
          libperl                \
          libpython              \
          libcrypto              \
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 0658b8c..c7585a1 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -27,6 +27,7 @@ FILES=                                          \
          test-libelf-mmap.bin                   \
          test-libnuma.bin                       \
          test-numa_num_possible_cpus.bin        \
+         test-libpcap.bin                       \
          test-libperl.bin                       \
          test-libpython.bin                     \
          test-libpython-version.bin             \
@@ -209,6 +210,9 @@ FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS)
 $(OUTPUT)test-libperl.bin:
 	$(BUILD) $(FLAGS_PERL_EMBED)
 
+$(OUTPUT)test-libpcap.bin:
+	$(BUILD) -lpcap
+
 $(OUTPUT)test-libpython.bin:
 	$(BUILD) $(FLAGS_PYTHON_EMBED)
 
diff --git a/tools/build/feature/test-libpcap.c b/tools/build/feature/test-libpcap.c
new file mode 100644
index 0000000..7f60eb9
--- /dev/null
+++ b/tools/build/feature/test-libpcap.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <pcap.h>
+
+#define PKTLEN 100
+
+int main(void)
+{
+	char dummy_data[PKTLEN] = { 0 };
+	pcap_dumper_t *pcap_dumper;
+	struct pcap_pkthdr hdr;
+	int proto = 1;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(proto, PKTLEN);
+	pcap_dumper = pcap_dump_open(pcap, "-");
+	hdr.caplen = PKTLEN;
+	hdr.len = PKTLEN;
+	hdr.ts.tv_sec = 0;
+	hdr.ts.tv_usec = 0;
+	pcap_dump((u_char *)pcap_dumper, &hdr, (const u_char *)dummy_data);
+	pcap_dump_flush(pcap_dumper);
+	pcap_dump_close(pcap_dumper);
+	pcap_close(pcap);
+
+	return 0;
+}
-- 
1.8.3.1


^ permalink raw reply related

* [RFC bpf-next 3/7] bpf: sync tools/include/uapi/linux/bpf.h for pcap support
From: Alan Maguire @ 2019-09-07 21:40 UTC (permalink / raw)
  To: ast, daniel, kafai, songliubraving, yhs, davem, jakub.kicinski,
	hawk, john.fastabend, rostedt, mingo, quentin.monnet, rdna, joe,
	acme, jolsa, alexey.budankov, gregkh, namhyung, sdf, f.fainelli,
	shuah, peter, ivan, andriin, bhole_prashant_q7, david.calavera,
	danieltimlee, ctakshak, netdev, bpf, linux-kselftest
  Cc: Alan Maguire
In-Reply-To: <1567892444-16344-1-git-send-email-alan.maguire@oracle.com>

sync bpf.h updates for bpf_pcap helper and associated definitions

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/include/uapi/linux/bpf.h | 92 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 91 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 77c6be9..13f86d3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2750,6 +2750,54 @@ struct bpf_stack_build_id {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_pcap(void *data, u32 size, struct bpf_map *map, int protocol,
+ *		u64 flags)
+ *	Description
+ *		Write packet data from *data* into a special BPF perf event
+ *              held by *map* of type **BPF_MAP_TYPE_PERF_EVENT_ARRAY**. This
+ *		perf event has the same attributes as perf events generated
+ *		by bpf_perf_event_output.  For skb and xdp programs, *data*
+ *		is the relevant context, while for tracing programs,
+ *		*data* must be a pointer to a **struct sk_buff** derived
+ *		from kprobe or tracepoint arguments.
+ *
+ *		Metadata for this event is a **struct bpf_pcap_hdr**; this
+ *		contains the capture length, actual packet length and
+ *		the starting protocol.
+ *
+ *		The max number of bytes of context to store is specified via
+ *		*size*.
+ *
+ *		The flags value can be used to specify an id value of up
+ *		to 48 bits; the id can be used to correlate captured packets
+ *		with other trace data, since the passed-in flags value is stored
+ *		stored in the **struct bpf_pcap_hdr** in the **flags** field.
+ *		Specifying **BPF_F_PCAP_ID_IIFINDEX** and a non-zero value in
+ *		the id portion of the flags limits capture events to skbs
+ *		with the specified incoming ifindex, allowing limiting of
+ *		tracing to the the associated interface.  Specifying
+ *		**BPF_F_PCAP_STRICT_TYPE** will cause *bpf_pcap* to return
+ *		-EPROTO and skip capture if a specific protocol is specified
+ *		and it does not match the current skb.  These additional flags
+ *		are only valid (and useful) for tracing programs.
+ *
+ *		The *protocol* value specifies the protocol type of the start
+ *		of the packet so that packet capture can carry out
+ *		interpretation.  See **pcap-linktype** (7) for details on
+ *		the supported values.
+ *
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *		-ENOENT will be returned if the associated perf event
+ *		map entry is empty, the skb is zero-length,  or the incoming
+ *		ifindex was specified and we failed to match.
+ *		-EPROTO will be returned if **BPF_PCAP_TYPE_UNSET** is specified
+ *		and no protocol can be determined, or if we specify a protocol
+ *		along with **BPF_F_PCAP_STRICT_TYPE** and the skb protocol does
+ *		not match.
+ *		-EINVAL will be returned if the flags value is invalid.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2862,7 +2910,8 @@ struct bpf_stack_build_id {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(pcap),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2941,6 +2990,11 @@ enum bpf_func_id {
 /* BPF_FUNC_sk_storage_get flags */
 #define BPF_SK_STORAGE_GET_F_CREATE	(1ULL << 0)
 
+/* BPF_FUNC_pcap flags */
+#define	BPF_F_PCAP_ID_MASK		0xffffffffffff
+#define BPF_F_PCAP_ID_IIFINDEX		(1ULL << 48)
+#define BPF_F_PCAP_STRICT_TYPE         (1ULL << 56)
+
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
 	BPF_ADJ_ROOM_NET,
@@ -3613,4 +3667,40 @@ struct bpf_sockopt {
 	__s32	retval;
 };
 
+/* bpf_pcap_hdr contains information related to a particular packet capture
+ * flow.  It specifies
+ *
+ * - a magic number BPF_PCAP_MAGIC which identifies the perf event as
+ *   a pcap-related event.
+ * - a starting protocol is the protocol associated with the header
+ * - a flags value, copied from the flags value passed into bpf_pcap().
+ *   IDs can be used to correlate packet capture data and other tracing data.
+ *
+ * bpf_pcap_hdr also contains the information relating to the to-be-captured
+ * packet, and closely corresponds to the struct pcap_pkthdr used by
+ * pcap_dump (3PCAP).  The bpf_pcap helper sets ktime_ns (nanoseconds since
+ * boot) to the ktime_ns value; to get sensible pcap times this value should
+ * be converted to a struct timeval time since epoch in the struct pcap_pkthdr.
+ *
+ * When bpf_pcap() is used, a "struct bpf_pcap_hdr" is stored as we
+ * need both information about the particular packet and the protocol
+ * we are capturing.
+ */
+
+#define BPF_PCAP_MAGIC		0xb7fca7
+
+struct bpf_pcap_hdr {
+	__u32			magic;
+	int			protocol;
+	__u64			flags;
+	__u64			ktime_ns;
+	__u32			tot_len;
+	__u32			cap_len;
+	__u8			data[0];
+};
+
+#define BPF_PCAP_TYPE_UNSET	-1
+#define BPF_PCAP_TYPE_ETH	1
+#define	BPF_PCAP_TYPE_IP	12
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
1.8.3.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox