Netdev List
 help / color / mirror / Atom feed
* [PATCH 1/4] net: lan78xx: Allow for VLAN headers in timeout calcs
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
  To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson
In-Reply-To: <cover.1529935234.git.dave.stevenson@raspberrypi.org>

The frame abort timeout being set by lan78xx_set_rx_max_frame_length
didn't account for any VLAN headers, resulting in very low
throughput if used with tagged VLANs.

Use VLAN_ETH_HLEN instead of ETH_HLEN to correct for this.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
---
 drivers/net/usb/lan78xx.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index a89570f..2f793d4 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2298,7 +2298,7 @@ static int lan78xx_change_mtu(struct net_device *netdev, int new_mtu)
 	if ((ll_mtu % dev->maxpacket) == 0)
 		return -EDOM;
 
-	ret = lan78xx_set_rx_max_frame_length(dev, new_mtu + ETH_HLEN);
+	ret = lan78xx_set_rx_max_frame_length(dev, new_mtu + VLAN_ETH_HLEN);
 
 	netdev->mtu = new_mtu;
 
@@ -2587,7 +2587,8 @@ static int lan78xx_reset(struct lan78xx_net *dev)
 	buf |= FCT_TX_CTL_EN_;
 	ret = lan78xx_write_reg(dev, FCT_TX_CTL, buf);
 
-	ret = lan78xx_set_rx_max_frame_length(dev, dev->net->mtu + ETH_HLEN);
+	ret = lan78xx_set_rx_max_frame_length(dev,
+					      dev->net->mtu + VLAN_ETH_HLEN);
 
 	ret = lan78xx_read_reg(dev, MAC_RX, &buf);
 	buf |= MAC_RX_RXEN_;
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/4] lan78xx minor fixes
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
  To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson

This is a small set of patches for the Microchip LAN78xx chip,
as used in the Raspberry Pi 3B+.
The main debug/discussion was on
https://github.com/raspberrypi/linux/issues/2458

Initial symptoms were that VLANs were very unreliable.
A couple of things were found:
- firstly that the hardware timeout value set failed to 
  take into account the VLAN tag, so a full MTU packet
  would be timed out.
- second was that regular checksum failures were being
  reported. Disabling checksum offload confirmed that
  the checksums were valid, and further experimentation
  identified that it was only if the VLAN tags were being
  passed through to the kernel that there were issues.
  The hardware supports VLAN filtering and tag stripping,
  therefore those have been implemented (much of the work
  was already done), and the driver drops back to s/w
  checksums should the choice be made not to use the h/w
  VLAN stripping.

Dave Stevenson (4):
  net: lan78xx: Allow for VLAN headers in timeout calcs
  net: lan78xx: Add support for VLAN filtering.
  net: lan78xx: Add support for VLAN tag stripping.
  net: lan78xx: Use s/w csum check on VLANs without tag stripping

 drivers/net/usb/lan78xx.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH 00/14] ARM: davinci: step towards removing at24_platform_data
From: Andrew Lunn @ 2018-06-25 18:02 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman, Linux ARM, Linux Kernel Mailing List,
	linux-omap, netdev, Bartosz Golaszewski
In-Reply-To: <CAMRc=MdEn74G5SJAaGLAfNSS7Zy7wqH04aN5CMZ45UXqhu_58g@mail.gmail.com>

> With my patch 1/14 you'll get -EPROBE_DEFER from nvmem_cell_get() if
> the nvmem provider is not yet registered. Will that help in your case?

I don't think so. My driver instantiates the AT24 device. So if i get
-EPROBE_DEFER, i need to cleanup the probe, and return -EPROBDE_DEFER
to the code. Which means i need to remove the AT24 device...

       Andrew

^ permalink raw reply

* Re: [PATCH] net/nfp: cast sizeof() to int when comparing with error code
From: Jakub Kicinski @ 2018-06-25 18:11 UTC (permalink / raw)
  To: Chengguang Xu; +Cc: davem, oss-drivers, netdev
In-Reply-To: <20180625073318.6828-1-cgxu519@gmx.com>

On Mon, 25 Jun 2018 15:33:18 +0800, Chengguang Xu wrote:
> Negative error code will be larger than sizeof().
> 
> Signed-off-by: Chengguang Xu <cgxu519@gmx.com>

You should include the name of the tree in the tag, e.g. [PATCH net] or
[PATCH net-next] and if it's a fix, i.e. directed at the net tree, you
should add a fixes tag, in this case it would most likely be:

Fixes: a0d8e02c35ff ("nfp: add support for reading nffw info")

You can find this and more information about the development process of
the networking subsystem in:

https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Thank you!

^ permalink raw reply

* [PATCH net-next] r8169: reject unsupported WoL options
From: Heiner Kallweit @ 2018-06-25 18:34 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org

So far unsupported WoL options are silently ignored. Change this and
reject attempts to set unsupported options. This prevents situations
where a user tries to set an unsupported WoL option and is under the
impression it was successful because ethtool doesn't complain.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 1d33672c..70c13cc2 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1712,11 +1712,14 @@ static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
 	struct rtl8169_private *tp = netdev_priv(dev);
 	struct device *d = tp_to_dev(tp);
 
+	if (wol->wolopts & ~WAKE_ANY)
+		return -EINVAL;
+
 	pm_runtime_get_noresume(d);
 
 	rtl_lock_work(tp);
 
-	tp->saved_wolopts = wol->wolopts & WAKE_ANY;
+	tp->saved_wolopts = wol->wolopts;
 
 	if (pm_runtime_active(d))
 		__rtl8169_set_wol(tp, tp->saved_wolopts);
-- 
2.18.0

^ permalink raw reply related

* Re: [net regression] "fib_rules: move common handling of newrule delrule msgs into fib_nl2rule" breaks suppress_prefixlength
From: Roopa Prabhu @ 2018-06-25 18:36 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev
In-Reply-To: <CAJieiUiDOQF0NTUGAYmQEtwnrsXoqTAt-cW+dueeRXVP8rhaAw@mail.gmail.com>

On Mon, Jun 25, 2018 at 8:23 AM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> On Sat, Jun 23, 2018 at 8:46 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>> Hey Roopa,
>>
>> On a kernel with a minimal networking config,
>> CONFIG_IP_MULTIPLE_TABLES appears to be broken for certain rules after
>> f9d4b0c1e9695e3de7af3768205bacc27312320c.
>>
>> Try, for example, running:
>>
>> $ ip -4 rule add table main suppress_prefixlength 0
>>
>> It returns with EEXIST.
>>
>> Perhaps the reason is that the new rule_find function does not match
>> on suppress_prefixlength? However, rule_exist from before didn't do
>> that either. I'll keep playing and see if I can track it down myself,
>> but thought I should let you know first.
>
> I am surprised at that also. I cannot find prior rule_exist looking at
> suppress_prefixlength.
> I will dig deeper also today. But your patch LGTM with a small change
> I commented on it.
>

So the previous rule_exists code did not check for attribute matches correctly.
It would ignore a rule at the first non-existent attribute mis-match.
eg in your case, it would
return at pref mismatch.


$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0

$ip rule show
0:      from all lookup local

32763:  from all lookup main suppress_prefixlength 0

32764:  from all lookup main suppress_prefixlength 0

32765:  from all lookup main suppress_prefixlength 0

32766:  from all lookup main

32767:  from all lookup default


With your patch (with my proposed fixes), you should get proper EXISTS check

$ git diff HEAD

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c

index 126ffc5..de4c171 100644

--- a/net/core/fib_rules.c

+++ b/net/core/fib_rules.c

@@ -428,6 +428,14 @@ static struct fib_rule *rule_find(struct
fib_rules_ops *ops,

                if (rule->l3mdev && r->l3mdev != rule->l3mdev)

                        continue;



+               if (rule->suppress_ifgroup != -1 &&

+                   (r->suppress_ifgroup != rule->suppress_ifgroup))

+                       continue;

+

+               if (rule->suppress_prefixlen != -1 &&

+                   (r->suppress_prefixlen != rule->suppress_prefixlen))

+                       continue;

+

                if (uid_range_set(&rule->uid_range) &&

                    (!uid_eq(r->uid_range.start, rule->uid_range.start) ||

                    !uid_eq(r->uid_range.end, rule->uid_range.end)))



$ ip -4 rule add table main suppress_prefixlength 0

$ ip -4 rule add table main suppress_prefixlength 0

RTNETLINK answers: File exists

^ permalink raw reply

* Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support
From: Santosh Shilimkar @ 2018-06-25 18:44 UTC (permalink / raw)
  To: Sowmini Varadhan, Ka-Cheong Poon; +Cc: netdev, davem, rds-devel
In-Reply-To: <20180625175006.GI14823@oracle.com>

On 6/25/2018 10:50 AM, Sowmini Varadhan wrote:
> On (06/26/18 01:43), Ka-Cheong Poon wrote:
>>
>> Yes, I think if the socket is bound, it should check the scope_id
>> in msg_name (if not NULL) to make sure that they match.  A bound
>> RDS socket can send to multiple peers.  But if the bound local
>> address is link local, it should only be allowed to send to peers
>> on the same link.
> 
> agree.
Yep. Its inline with RDS bind behavior.

> 
> 
>> If a socket is bound, I guess the scope_id should be used.  So
>> if a socket is not bound to a link local address and the socket
>> is used to sent to a link local peer, it should fail.
> 
> PF_RDS sockets *MUST* alwasy be bound.  See
> Documentation/networking/rds.txt:
> "   Sockets must be bound before you can send or receive data.
>      This is needed because binding also selects a transport and
>      attaches it to the socket. Once bound, the transport assignment
>      does not change."
> 
In any case link local or not, the socket needs to be bound before
any data can be sent as documented. Send path already enforces
it.

>>> Also, why is there no IPv6 support in rds_connect?
>>
>>
>> Oops, I missed this when I ported the internal version to the
>> net-next version.  Will add it back.
> 
So the net-next wasn't tested? IPv6 connections
itself wouldn't be formed with this missing. As mentioned
already, please test v2 before posting on list.

Regards,
Santosh

^ permalink raw reply

* Re: Route fallback issue
From: Julian Anastasov @ 2018-06-25 18:50 UTC (permalink / raw)
  To: Grant Taylor; +Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald
In-Reply-To: <544875a1-68e6-a8f3-4eb7-44f053605c3e@spamtrap.tnetconsulting.net>


	Hello,

On Thu, 21 Jun 2018, Grant Taylor wrote:

> On 06/21/2018 01:57 PM, Julian Anastasov wrote:
> > Hello,
> 
> > http://ja.ssi.bg/dgd-usage.txt
> 
> "DGD" or "Dead Gateway Detection" sounds very familiar.  I referenced it in an
> earlier reply.
> 
> I distinctly remember DGD not behaving satisfactorily years ago.  Where
> unsatisfactorily was something like 90 seconds (or more) to recover. Which
> actually matches what I was getting without the ignore_routes_with_linkdown=1
> setting that David A. mentioned.

	Yes, ARP state for unreachable GWs may be updated slowly,
there is in-time feedback only for reachable state.

> With ignore_routes_with_linkdown=1 things behaved much better.
> 
> > Not true. net/ipv4/fib_semantics.c:fib_select_path() calls
> > fib_select_default() only when prefixlen = 0 (default route).
> 
> Okay....  My testing last night disagrees with you.  Specifically, I was able
> to add a alternate routes to the same prefix, 192.0.2.128/26. There was not
> any default gateway configured on any of the NetNSs.  So everything was using
> routes for locally attacked or the two added via "ip route append".
> 
> What am I misinterpreting?  Or where are we otherwise talking past each other?

	You can create the two routes, of course. But only the
default routes are alternative.

> 
> > Otherwise, only the first route will be considered.
> 
> "only the first route" almost sounds like something akin to Equal Cost Multi
> Path.
> 
> I was not expecting "alternative routes" to use more than one route at a time,
> equally or otherwise.  I was wanting for the kernel to fall back to an
> alternate route / gateway / path in the event that the one that was being used
> became unusable / unreachable.
> 
> So what should "Alternative Routes" do?  How does this compare / contract to
> E.C.M.P. or D.G.D.

	The alternative routes work in this way:

- on lookup, routes are walked in order - as listed in table

- as long as route contains reachable gateway (ARP state), only this route 
is used

- if some gateway becomes unreachable (ARP state), next alternative routes 
are tried

- if ARP entry is expired (missing), this gateway can be probed if the 
route is before the currently used route. This is what happens initially
when no ARP state is present for the GWs. It is bad luck if the probed
GW is actually unreachable.

- active probing by user space (ping GWs) can only help to keep the
ARP state present for the used gateways. By this way, if ARP entry 
for GW is missing, the kernel will not risk to select unavailable route 
with the goal to probe the GW.

> > fib_select_default() is the function that decides which nexthop is reachable
> > and whether to contact it. It uses the ARP state via fib_detect_death().
> > That is all code that is behind this feature called "alternative routes":
> > the kernel selects one based on nexthop's ARP state.
> 
> Please confirm that you aren't entering / referring to E.C.M.P. territory when
> you say "nexthop".  I think that you are not, but I want to ask and be sure,
> particularly seeing as how things are very closely related.

	nexthop is the GW in the route

> It sounds like you're referring to literally the router that is the next hop
> in the path.  I.e. the device on the other end of the wire.

	Yes, the kernel avoids alternative routes with unreachable GWs

> I want to do some testing to see if fib_multipath_use_neigh alters this
> behavior at all.  I'm hoping that it will invalidate an alternate route if the
> MAC is not resolvable even if the physical link stays up.

	The multipath route uses all its alive nexthops at the same 
time... But you may need in the same way active probing by user space,
otherwise unavailable GW can be selected.

> Sure, the ARP cache may have a 30 ~ 120 second timeout before triggering this
> behavior.  But having that timeout and starting to use an alternative route is
> considerably better than not using an alternative route.

	Yes, if you prefer, you may run PING every second to avoid such 
delays...

Regards

^ permalink raw reply

* Re: Route fallback issue
From: Julian Anastasov @ 2018-06-25 19:02 UTC (permalink / raw)
  To: Erik Auerswald; +Cc: Grant Taylor, Akshat Kakkar, netdev, cronolog+lartc, lartc
In-Reply-To: <20180624134524.GA9925@unix-ag.uni-kl.de>


	Hello,

On Sun, 24 Jun 2018, Erik Auerswald wrote:

> Hello Julien,
> 
> > http://ja.ssi.bg/dgd-usage.txt
> 
> Thanks for that info!
> 
> Can you tell us what parts from the above text is actually implemented
> in the upstream Linux kernel, and starting with which version(s)
> (approximately)? The text describes ideas and patches from nearly two
> decades ago, is more recent documentation available somewhere?

	Nothing is included in kernel. The idea is that user space
has more control. It is best done with CONNMARK: stick NATed
connection to some path (via alive ISP), use route lookup just
to select alive path for the first packet in connection. So, what
we balance are connections, not packets (which does not work with
different ISPs). Probe GWs to keep only alive routes in the table.

Regards

^ permalink raw reply

* Re: Route fallback issue
From: Grant Taylor @ 2018-06-25 20:07 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald
In-Reply-To: <alpine.LFD.2.20.1806252058210.2593@ja.home.ssi.bg>

[-- Attachment #1: Type: text/plain, Size: 3308 bytes --]

On 06/25/2018 12:50 PM, Julian Anastasov wrote:
> Hello,

Hi Julian,

> Yes, ARP state for unreachable GWs may be updated slowly, there is 
> in-time feedback only for reachable state.

Fair.

Most of the installations where I needed D.G.D. to work would be okay 
with a < 5 minute timeout.  Obviously they would like faster, but 
automation is a LOT better than waiting on manual intervention.

IMHO < 30 seconds is great.  < 90 seconds is acceptable.  < 300 seconds 
leaves some room for improvement.

> You can create the two routes, of course. But only the default routes 
> are alternative.

Are you saying that the functionality I'm describing only works for 
default gateways or that the term "alternative route" only applies to 
default gateways?

The testing that I did indicated that alternative routes worked for 
specific prefixes too.

I tested multiple NetNSs with only directly attached routes and appended 
routes to a destination prefix, no default gateway / route of last resort.

The behavior seemed to be different when ignore_routes_with_linkdown was 
set verses unset.  Specifically, ignore_routes_with_linkdown seemed to 
help considerably.

Hence why I question the requirement for the "default" route verses a 
route to a specific prefix.

Can you explain why I saw the behavior difference with 
ignore_routes_with_linkdown if it only applies to the default route?

> The alternative routes work in this way:
> 
> - on lookup, routes are walked in order - as listed in table
> 
> - as long as route contains reachable gateway (ARP state), only this 
> route is used
> 
> - if some gateway becomes unreachable (ARP state), next alternative 
> routes are tried
> 
> - if ARP entry is expired (missing), this gateway can be probed if the 
> route is before the currently used route. This is what happens initially 
> when no ARP state is present for the GWs. It is bad luck if the probed 
> GW is actually unreachable.
> 
> - active probing by user space (ping GWs) can only help to keep the ARP 
> state present for the used gateways. By this way, if ARP entry for GW 
> is missing, the kernel will not risk to select unavailable route with 
> the goal to probe the GW.

This all makes sense.

Please confirm if "gateway" in this context is the "/default/ gateway" 
or not.  I ask because arguably "gateway" can be used as a term to 
describe the next hop for a route, or gateway, to a prefix.  Further, 
the "/default/ (gateway,router)" is the gateway or route of last resort. 
  Which to me means that "gateway" can be any route in this context.

> nexthop is the GW in the route

Thank you for confirming.

> Yes, the kernel avoids alternative routes with unreachable GWs

Fair enough.

> The multipath route uses all its alive nexthops at the same time... But 
> you may need in the same way active probing by user space, otherwise 
> unavailable GW can be selected.

I assume that the dead ECMP NEXTHOP is also subject to similar timeouts 
as alternative routes.  Correct?

> Yes, if you prefer, you may run PING every second to avoid such delays...

Agreed.

I'm trying to make sure I understand basic functionality before I do 
things to modify it.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply

* [PATCH net 0/2] nfp: MPLS and shared blocks TC offload fixes
From: Jakub Kicinski @ 2018-06-25 20:22 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Jakub Kicinski

Hi!

This series brings two fixes to TC filter/action offload code.
Pieter fixes matching MPLS packets when the match is purely on
the MPLS ethertype and none of the MPLS fields are used.
John provides a fix for offload of shared blocks.  Unfortunately,
with shared blocks there is currently no guarantee that filters
which were added by the core will be removed before block unbind.
Our simple fix is to not support offload of rules on shared blocks
at all, a revert of this fix will be send for -next once the
reoffload infrastructure lands.  The shared blocks became important
as we are trying to use them for bonding offload (managed from user
space) and lack of remove calls leads to resource leaks.


John Hurley (1):
  nfp: reject binding to shared blocks

Pieter Jansen van Vuuren (1):
  nfp: flower: fix mpls ether type detection

 drivers/net/ethernet/netronome/nfp/bpf/main.c      |  3 +++
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 14 ++++++++++++++
 .../net/ethernet/netronome/nfp/flower/offload.c    | 12 ++++++++++++
 3 files changed, 29 insertions(+)

-- 
2.17.1

^ permalink raw reply

* [PATCH net 1/2] nfp: flower: fix mpls ether type detection
From: Jakub Kicinski @ 2018-06-25 20:22 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, Pieter Jansen van Vuuren
In-Reply-To: <20180625202246.28871-1-jakub.kicinski@netronome.com>

From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>

Previously it was not possible to distinguish between mpls ether types and
other ether types. This leads to incorrect classification of offloaded
filters that match on mpls ether type. For example the following two
filters overlap:

 # tc filter add dev eth0 parent ffff: \
    protocol 0x8847 flower \
    action mirred egress redirect dev eth1

 # tc filter add dev eth0 parent ffff: \
    protocol 0x0800 flower \
    action mirred egress redirect dev eth2

The driver now correctly includes the mac_mpls layer where HW stores mpls
fields, when it detects an mpls ether type. It also sets the MPLS_Q bit to
indicate that the filter should match mpls packets.

Fixes: bb055c198d9b ("nfp: add mpls match offloading support")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 14 ++++++++++++++
 .../net/ethernet/netronome/nfp/flower/offload.c    |  8 ++++++++
 2 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c b/drivers/net/ethernet/netronome/nfp/flower/match.c
index 91935405f586..84f7a5dbea9d 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -123,6 +123,20 @@ nfp_flower_compile_mac(struct nfp_flower_mac_mpls *frame,
 			 NFP_FLOWER_MASK_MPLS_Q;
 
 		frame->mpls_lse = cpu_to_be32(t_mpls);
+	} else if (dissector_uses_key(flow->dissector,
+				      FLOW_DISSECTOR_KEY_BASIC)) {
+		/* Check for mpls ether type and set NFP_FLOWER_MASK_MPLS_Q
+		 * bit, which indicates an mpls ether type but without any
+		 * mpls fields.
+		 */
+		struct flow_dissector_key_basic *key_basic;
+
+		key_basic = skb_flow_dissector_target(flow->dissector,
+						      FLOW_DISSECTOR_KEY_BASIC,
+						      flow->key);
+		if (key_basic->n_proto == cpu_to_be16(ETH_P_MPLS_UC) ||
+		    key_basic->n_proto == cpu_to_be16(ETH_P_MPLS_MC))
+			frame->mpls_lse = cpu_to_be32(NFP_FLOWER_MASK_MPLS_Q);
 	}
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index c42e64f32333..477f584f6d28 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -264,6 +264,14 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
 		case cpu_to_be16(ETH_P_ARP):
 			return -EOPNOTSUPP;
 
+		case cpu_to_be16(ETH_P_MPLS_UC):
+		case cpu_to_be16(ETH_P_MPLS_MC):
+			if (!(key_layer & NFP_FLOWER_LAYER_MAC)) {
+				key_layer |= NFP_FLOWER_LAYER_MAC;
+				key_size += sizeof(struct nfp_flower_mac_mpls);
+			}
+			break;
+
 		/* Will be included in layer 2. */
 		case cpu_to_be16(ETH_P_8021Q):
 			break;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net 2/2] nfp: reject binding to shared blocks
From: Jakub Kicinski @ 2018-06-25 20:22 UTC (permalink / raw)
  To: davem; +Cc: oss-drivers, netdev, John Hurley, Jakub Kicinski
In-Reply-To: <20180625202246.28871-1-jakub.kicinski@netronome.com>

From: John Hurley <john.hurley@netronome.com>

TC shared blocks allow multiple qdiscs to be grouped together and filters
shared between them. Currently the chains of filters attached to a block
are only flushed when the block is removed. If a qdisc is removed from a
block but the block still exists, flow del messages are not passed to the
callback registered for that qdisc. For the NFP, this presents the
possibility of rules still existing in hw when they should be removed.

Prevent binding to shared blocks until the kernel can send per qdisc del
messages when block unbinds occur.

Fixes: 4861738775d7 ("net: sched: introduce shared filter blocks infrastructure")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c       | 3 +++
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index fcdfb8e7fdea..6b15e3b11956 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -202,6 +202,9 @@ static int nfp_bpf_setup_tc_block(struct net_device *netdev,
 	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
+	if (tcf_block_shared(f->block))
+		return -EOPNOTSUPP;
+
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 477f584f6d28..525057bee0ed 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -631,6 +631,9 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
 	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
+	if (tcf_block_shared(f->block))
+		return -EOPNOTSUPP;
+
 	switch (f->command) {
 	case TC_BLOCK_BIND:
 		return tcf_block_cb_register(f->block,
-- 
2.17.1

^ permalink raw reply related

* [PATCH 0/2] sh_eth: RPADIR related clean-ups
From: Sergei Shtylyov @ 2018-06-25 20:34 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: linux-renesas-soc

Hello!

Here's a set of 2 patches against DaveM's 'net-next.git' repo. They are
clean-ups related to RPADIR (DMA padding to NET_IP_ALIGN)...

[1/2] sh_eth: fix *enum* RPADIR_BIT
[2/2] sh_eth: remove sh_eth_cpu_data::rpadir_value

MBR, Sergei

^ permalink raw reply

* [PATCH 1/2] sh_eth: fix *enum* RPADIR_BIT
From: Sergei Shtylyov @ 2018-06-25 20:36 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: linux-renesas-soc
In-Reply-To: <2809eba8-4c9a-1d5f-a47d-8125777e365b@cogentembedded.com>

The *enum*  RPADIR_BIT  was declared in the commit 86a74ff21a7a ("net:
sh_eth: add support for Renesas SuperH Ethernet") adding SH771x support,
however the SH771x manual doesn't have the RPADIR register described and,
moreover, tells why the padding insertion must not be used. The newer SoC
manuals do have RPADIR documented, though with somewhat different layout --
update the *enum* according to these manuals...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
 drivers/net/ethernet/renesas/sh_eth.h |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.h
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h
+++ net-next/drivers/net/ethernet/renesas/sh_eth.h
@@ -403,8 +403,7 @@ enum DESC_I_BIT {
 
 /* RPADIR */
 enum RPADIR_BIT {
-	RPADIR_PADS1 = 0x20000, RPADIR_PADS0 = 0x10000,
-	RPADIR_PADR = 0x0003f,
+	RPADIR_PADS = 0x1f0000, RPADIR_PADR = 0xffff,
 };
 
 /* FDR */

^ permalink raw reply

* [PATCH 2/2] sh_eth: remove sh_eth_cpu_data::rpadir_value
From: Sergei Shtylyov @ 2018-06-25 20:37 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: linux-renesas-soc
In-Reply-To: <2809eba8-4c9a-1d5f-a47d-8125777e365b@cogentembedded.com>

If RPADIR exists, the value written to it is always the same for all SoCs
(and derived from NET_IP_ALIGN), so there has not  been any need to store
it in the *struct* sh_eth_cpu_data...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
 drivers/net/ethernet/renesas/sh_eth.c |    8 +-------
 drivers/net/ethernet/renesas/sh_eth.h |    1 -
 2 files changed, 1 insertion(+), 8 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net-next/drivers/net/ethernet/renesas/sh_eth.c
@@ -622,7 +622,6 @@ static struct sh_eth_cpu_data r7s72100_d
 	.tpauser	= 1,
 	.hw_swap	= 1,
 	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.xdfar_rw	= 1,
@@ -672,7 +671,6 @@ static struct sh_eth_cpu_data r8a7740_da
 	.bculr		= 1,
 	.hw_swap	= 1,
 	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.xdfar_rw	= 1,
@@ -798,7 +796,6 @@ static struct sh_eth_cpu_data r8a77980_d
 	.hw_swap	= 1,
 	.nbst		= 1,
 	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.xdfar_rw	= 1,
@@ -851,7 +848,6 @@ static struct sh_eth_cpu_data sh7724_dat
 	.tpauser	= 1,
 	.hw_swap	= 1,
 	.rpadir		= 1,
-	.rpadir_value	= 0x00020000, /* NET_IP_ALIGN assumed to be 2 */
 };
 
 static void sh_eth_set_rate_sh7757(struct net_device *ndev)
@@ -898,7 +894,6 @@ static struct sh_eth_cpu_data sh7757_dat
 	.hw_swap	= 1,
 	.no_ade		= 1,
 	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.rtrate		= 1,
 	.dual_port	= 1,
 };
@@ -978,7 +973,6 @@ static struct sh_eth_cpu_data sh7757_dat
 	.bculr		= 1,
 	.hw_swap	= 1,
 	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.xdfar_rw	= 1,
@@ -1467,7 +1461,7 @@ static int sh_eth_dev_init(struct net_de
 	/* Descriptor format */
 	sh_eth_ring_format(ndev);
 	if (mdp->cd->rpadir)
-		sh_eth_write(ndev, mdp->cd->rpadir_value, RPADIR);
+		sh_eth_write(ndev, NET_IP_ALIGN << 16, RPADIR);
 
 	/* all sh_eth int mask */
 	sh_eth_write(ndev, 0, EESIPR);
Index: net-next/drivers/net/ethernet/renesas/sh_eth.h
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h
+++ net-next/drivers/net/ethernet/renesas/sh_eth.h
@@ -487,7 +487,6 @@ struct sh_eth_cpu_data {
 	u32 ecsipr_value;
 	u32 fdr_value;
 	u32 fcftr_value;
-	u32 rpadir_value;
 
 	/* interrupt checking mask */
 	u32 tx_check;

^ permalink raw reply

* Re: [PATCH net 2/2] nfp: reject binding to shared blocks
From: Jiri Pirko @ 2018-06-25 20:40 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev, John Hurley
In-Reply-To: <20180625202246.28871-3-jakub.kicinski@netronome.com>

Mon, Jun 25, 2018 at 10:22:46PM CEST, jakub.kicinski@netronome.com wrote:
>From: John Hurley <john.hurley@netronome.com>
>
>TC shared blocks allow multiple qdiscs to be grouped together and filters
>shared between them. Currently the chains of filters attached to a block
>are only flushed when the block is removed. If a qdisc is removed from a
>block but the block still exists, flow del messages are not passed to the
>callback registered for that qdisc. For the NFP, this presents the
>possibility of rules still existing in hw when they should be removed.
>
>Prevent binding to shared blocks until the kernel can send per qdisc del
>messages when block unbinds occur.

This is not nfp-specific problem. Should be handled differently. The
driver has information about offloaded filters. On unbind, it have
enough info to do the flush, doesn't it?


>
>Fixes: 4861738775d7 ("net: sched: introduce shared filter blocks infrastructure")
>Signed-off-by: John Hurley <john.hurley@netronome.com>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>Reviewed-by: Simon Horman <simon.horman@netronome.com>
>---
> drivers/net/ethernet/netronome/nfp/bpf/main.c       | 3 +++
> drivers/net/ethernet/netronome/nfp/flower/offload.c | 3 +++
> 2 files changed, 6 insertions(+)
>
>diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
>index fcdfb8e7fdea..6b15e3b11956 100644
>--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
>+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
>@@ -202,6 +202,9 @@ static int nfp_bpf_setup_tc_block(struct net_device *netdev,
> 	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
> 		return -EOPNOTSUPP;
> 
>+	if (tcf_block_shared(f->block))
>+		return -EOPNOTSUPP;
>+
> 	switch (f->command) {
> 	case TC_BLOCK_BIND:
> 		return tcf_block_cb_register(f->block,
>diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>index 477f584f6d28..525057bee0ed 100644
>--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
>+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>@@ -631,6 +631,9 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
> 	if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
> 		return -EOPNOTSUPP;
> 
>+	if (tcf_block_shared(f->block))
>+		return -EOPNOTSUPP;
>+
> 	switch (f->command) {
> 	case TC_BLOCK_BIND:
> 		return tcf_block_cb_register(f->block,
>-- 
>2.17.1
>

^ permalink raw reply

* Re: [PATCH net-next 2/7] net: sched: add tcf_proto_op to offload a rule
From: Jiri Pirko @ 2018-06-25 20:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, xiyou.wangcong, jhs, gerlitz.or, netdev, oss-drivers,
	John Hurley
In-Reply-To: <20180625043431.13413-3-jakub.kicinski@netronome.com>

Mon, Jun 25, 2018 at 06:34:26AM CEST, jakub.kicinski@netronome.com wrote:
>From: John Hurley <john.hurley@netronome.com>
>
>Create a new tcf_proto_op called 'reoffload' that generates a new offload
>message for each node in a tcf_proto. Pointers to the tcf_proto and
>whether the offload request is to add or delete the node are included.
>Also included is a callback function to send the offload message to and
>the option of priv data to go with the cb.
>
>Signed-off-by: John Hurley <john.hurley@netronome.com>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> include/net/act_api.h     | 3 ---
> include/net/sch_generic.h | 6 ++++++
> 2 files changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/include/net/act_api.h b/include/net/act_api.h
>index 9e59ebfded62..5ff11adbe2a6 100644
>--- a/include/net/act_api.h
>+++ b/include/net/act_api.h
>@@ -190,9 +190,6 @@ static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes,
> #endif
> }
> 
>-typedef int tc_setup_cb_t(enum tc_setup_type type,
>-			  void *type_data, void *cb_priv);
>-
> #ifdef CONFIG_NET_CLS_ACT
> int tc_setup_cb_egdev_register(const struct net_device *dev,
> 			       tc_setup_cb_t *cb, void *cb_priv);
>diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>index 6488daa32f82..88ed64f60056 100644
>--- a/include/net/sch_generic.h
>+++ b/include/net/sch_generic.h
>@@ -20,6 +20,9 @@ struct qdisc_walker;
> struct tcf_walker;
> struct module;
> 
>+typedef int tc_setup_cb_t(enum tc_setup_type type,
>+			  void *type_data, void *cb_priv);
>+
> struct qdisc_rate_table {
> 	struct tc_ratespec rate;
> 	u32		data[256];
>@@ -256,6 +259,9 @@ struct tcf_proto_ops {
> 					  bool *last,
> 					  struct netlink_ext_ack *);
> 	void			(*walk)(struct tcf_proto*, struct tcf_walker *arg);
>+	int			(*reoffload)(struct tcf_proto *, bool,
>+					     tc_setup_cb_t *, void *,
>+					     struct netlink_ext_ack *);

Please, name the args. Unnamed args are annoying...


> 	void			(*bind_class)(void *, u32, unsigned long);
> 
> 	/* rtnetlink specific */
>-- 
>2.17.1
>

^ permalink raw reply

* Re: [PATCH v3,net-next] vlan: implement vlan id and protocol changes
From: David Ahern @ 2018-06-25 20:45 UTC (permalink / raw)
  To: Chas Williams, davem; +Cc: netdev, Roopa Prabhu, Ido Schimmel
In-Reply-To: <20180625103043.25384-1-3chas3@gmail.com>

On 6/25/18 4:30 AM, Chas Williams wrote:
> vlan_changelink silently ignores attempts to change the vlan id
> or protocol id of an existing vlan interface.  Implement by adding
> the new vlan id and protocol to the interface's vlan group and then
> removing the old vlan id and protocol from the vlan group.
> 
> Signed-off-by: Chas Williams <3chas3@gmail.com>
> ---
>  include/linux/netdevice.h |  1 +
>  net/8021q/vlan.c          |  4 ++--
>  net/8021q/vlan.h          |  2 ++
>  net/8021q/vlan_netlink.c  | 38 ++++++++++++++++++++++++++++++++++++++
>  net/core/dev.c            |  1 +
>  5 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3ec9850c7936..a95ae238addf 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2409,6 +2409,7 @@ enum netdev_cmd {
>  	NETDEV_CVLAN_FILTER_DROP_INFO,
>  	NETDEV_SVLAN_FILTER_PUSH_INFO,
>  	NETDEV_SVLAN_FILTER_DROP_INFO,
> +	NETDEV_CHANGEVLAN,
>  };
>  const char *netdev_cmd_to_name(enum netdev_cmd cmd);
>  

you add the new notifier, but do not add any hooks to catch and process it.

Personally, I think it is a bit sketchy to change the vlan id on an
existing device and I suspect it will cause latent errors.

What's your use case for trying to implement the change versus causing
it to generate an unsupported error?

If this patch does get accepted, I believe the mlxsw switchdev driver
will be impacted.


> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 73a65789271b..b5e0ad1a581a 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -51,8 +51,8 @@ const char vlan_version[] = DRV_VERSION;
>  
>  /* End of global variables definitions. */
>  
> -static int vlan_group_prealloc_vid(struct vlan_group *vg,
> -				   __be16 vlan_proto, u16 vlan_id)
> +int vlan_group_prealloc_vid(struct vlan_group *vg,
> +			    __be16 vlan_proto, u16 vlan_id)
>  {
>  	struct net_device **array;
>  	unsigned int pidx, vidx;
> diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
> index 44df1c3df02d..c734dd21d70d 100644
> --- a/net/8021q/vlan.h
> +++ b/net/8021q/vlan.h
> @@ -116,6 +116,8 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack);
>  void unregister_vlan_dev(struct net_device *dev, struct list_head *head);
>  bool vlan_dev_inherit_address(struct net_device *dev,
>  			      struct net_device *real_dev);
> +int vlan_group_prealloc_vid(struct vlan_group *vg,
> +			    __be16 vlan_proto, u16 vlan_id);
>  
>  static inline u32 vlan_get_ingress_priority(struct net_device *dev,
>  					    u16 vlan_tci)
> diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
> index 9b60c1e399e2..ee27781939e0 100644
> --- a/net/8021q/vlan_netlink.c
> +++ b/net/8021q/vlan_netlink.c
> @@ -107,10 +107,48 @@ static int vlan_changelink(struct net_device *dev, struct nlattr *tb[],
>  			   struct nlattr *data[],
>  			   struct netlink_ext_ack *extack)
>  {
> +	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
>  	struct ifla_vlan_flags *flags;
>  	struct ifla_vlan_qos_mapping *m;
>  	struct nlattr *attr;
>  	int rem;
> +	__be16 vlan_proto = vlan->vlan_proto;
> +	u16 vlan_id = vlan->vlan_id;
> +
> +	if (data[IFLA_VLAN_ID])
> +		vlan_id = nla_get_u16(data[IFLA_VLAN_ID]);
> +
> +	if (data[IFLA_VLAN_PROTOCOL])
> +		vlan_proto = nla_get_be16(data[IFLA_VLAN_PROTOCOL]);
> +
> +	if (vlan->vlan_id != vlan_id || vlan->vlan_proto != vlan_proto) {
> +		struct net_device *real_dev = vlan->real_dev;
> +		struct vlan_info *vlan_info;
> +		struct vlan_group *grp;
> +		__be16 old_vlan_proto = vlan->vlan_proto;
> +		u16 old_vlan_id = vlan->vlan_id;
> +		int err;
> +
> +		err = vlan_vid_add(real_dev, vlan_proto, vlan_id);
> +		if (err)
> +			return err;
> +		vlan_info = rtnl_dereference(real_dev->vlan_info);
> +		grp = &vlan_info->grp;
> +		err = vlan_group_prealloc_vid(grp, vlan_proto, vlan_id);
> +		if (err < 0) {
> +			vlan_vid_del(real_dev, vlan_proto, vlan_id);
> +			return err;
> +		}
> +		vlan_group_set_device(grp, vlan_proto, vlan_id, dev);
> +		vlan->vlan_proto = vlan_proto;
> +		vlan->vlan_id = vlan_id;
> +
> +		vlan_group_set_device(grp, old_vlan_proto, old_vlan_id, NULL);
> +		vlan_vid_del(real_dev, old_vlan_proto, old_vlan_id);
> +
> +		err = call_netdevice_notifiers(NETDEV_CHANGEVLAN, dev);
> +		notifier_to_errno(err);
> +	}
>  
>  	if (data[IFLA_VLAN_FLAGS]) {
>  		flags = nla_data(data[IFLA_VLAN_FLAGS]);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a5aa1c7444e6..4bda5d2e3c85 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1587,6 +1587,7 @@ const char *netdev_cmd_to_name(enum netdev_cmd cmd)
>  	N(UDP_TUNNEL_DROP_INFO) N(CHANGE_TX_QUEUE_LEN)
>  	N(CVLAN_FILTER_PUSH_INFO) N(CVLAN_FILTER_DROP_INFO)
>  	N(SVLAN_FILTER_PUSH_INFO) N(SVLAN_FILTER_DROP_INFO)
> +	N(CHANGEVLAN)
>  	}
>  #undef N
>  	return "UNKNOWN_NETDEV_EVENT";
> 

^ permalink raw reply

* Re: [PATCH net-next 3/7] net: sched: cls_flower: implement offload tcf_proto_op
From: Jiri Pirko @ 2018-06-25 20:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, xiyou.wangcong, jhs, gerlitz.or, netdev, oss-drivers,
	John Hurley
In-Reply-To: <20180625043431.13413-4-jakub.kicinski@netronome.com>

Mon, Jun 25, 2018 at 06:34:27AM CEST, jakub.kicinski@netronome.com wrote:
>From: John Hurley <john.hurley@netronome.com>
>
>Add the reoffload tcf_proto_op in flower to generate an offload message
>for each filter in the given tcf_proto. Call the specified callback with
>this new offload message. The function only returns an error if the
>callback rejects adding a 'hardware only' rule.
>
>A filter contains a flag to indicate if it is in hardware or not. To
>ensure the reoffload function properly maintains this flag, keep a
>reference counter for the number of instances of the filter that are in
>hardware. Only update the flag when this counter changes from or to 0. Add
>a generic helper function to implement this behaviour.
>
>Signed-off-by: John Hurley <john.hurley@netronome.com>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> include/net/sch_generic.h | 15 +++++++++++++
> net/sched/cls_flower.c    | 44 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 59 insertions(+)
>
>diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>index 88ed64f60056..c0bd11a928ed 100644
>--- a/include/net/sch_generic.h
>+++ b/include/net/sch_generic.h
>@@ -336,6 +336,21 @@ static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
> 	block->offloadcnt--;
> }
> 
>+static inline void
>+tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt, u32 *flags,

Why u32? This is not uapi or hw facing. Just use "unsigned int".

[...]

^ permalink raw reply

* Re: [PATCH net 2/2] nfp: reject binding to shared blocks
From: Jakub Kicinski @ 2018-06-25 20:50 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: davem, oss-drivers, netdev, John Hurley
In-Reply-To: <20180625204021.GE2161@nanopsycho>

On Mon, 25 Jun 2018 22:40:21 +0200, Jiri Pirko wrote:
> Mon, Jun 25, 2018 at 10:22:46PM CEST, jakub.kicinski@netronome.com wrote:
> >From: John Hurley <john.hurley@netronome.com>
> >
> >TC shared blocks allow multiple qdiscs to be grouped together and filters
> >shared between them. Currently the chains of filters attached to a block
> >are only flushed when the block is removed. If a qdisc is removed from a
> >block but the block still exists, flow del messages are not passed to the
> >callback registered for that qdisc. For the NFP, this presents the
> >possibility of rules still existing in hw when they should be removed.
> >
> >Prevent binding to shared blocks until the kernel can send per qdisc del
> >messages when block unbinds occur.  
> 
> This is not nfp-specific problem. Should be handled differently. The
> driver has information about offloaded filters. On unbind, it have
> enough info to do the flush, doesn't it?

Certainly.  But this fix is simpler and sufficient.  We need to
backport it back to 4.16.  If we have to go through driver tables 
and flush filters we may as well merge the reoffload series to net.

^ permalink raw reply

* Re: [PATCH net 2/2] nfp: reject binding to shared blocks
From: Jiri Pirko @ 2018-06-25 20:51 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, oss-drivers, netdev, John Hurley
In-Reply-To: <20180625135014.24f8f9c6@cakuba.netronome.com>

Mon, Jun 25, 2018 at 10:50:14PM CEST, jakub.kicinski@netronome.com wrote:
>On Mon, 25 Jun 2018 22:40:21 +0200, Jiri Pirko wrote:
>> Mon, Jun 25, 2018 at 10:22:46PM CEST, jakub.kicinski@netronome.com wrote:
>> >From: John Hurley <john.hurley@netronome.com>
>> >
>> >TC shared blocks allow multiple qdiscs to be grouped together and filters
>> >shared between them. Currently the chains of filters attached to a block
>> >are only flushed when the block is removed. If a qdisc is removed from a
>> >block but the block still exists, flow del messages are not passed to the
>> >callback registered for that qdisc. For the NFP, this presents the
>> >possibility of rules still existing in hw when they should be removed.
>> >
>> >Prevent binding to shared blocks until the kernel can send per qdisc del
>> >messages when block unbinds occur.  
>> 
>> This is not nfp-specific problem. Should be handled differently. The
>> driver has information about offloaded filters. On unbind, it have
>> enough info to do the flush, doesn't it?
>
>Certainly.  But this fix is simpler and sufficient.  We need to
>backport it back to 4.16.  If we have to go through driver tables 
>and flush filters we may as well merge the reoffload series to net.

Oh, I missed this is for net. Sorry.

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net-next 7/7] net: sched: call reoffload op on block callback reg
From: Jiri Pirko @ 2018-06-25 20:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, xiyou.wangcong, jhs, gerlitz.or, netdev, oss-drivers,
	John Hurley
In-Reply-To: <20180625043431.13413-8-jakub.kicinski@netronome.com>

Mon, Jun 25, 2018 at 06:34:31AM CEST, jakub.kicinski@netronome.com wrote:
>From: John Hurley <john.hurley@netronome.com>

[...]	
	
>+static int
>+tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
>+			    void *cb_priv, bool add, bool offload_in_use,
>+			    struct netlink_ext_ack *extack)
>+{
>+	struct tcf_chain *chain;
>+	struct tcf_proto *tp;
>+	int err;
>+
>+	list_for_each_entry(chain, &block->chain_list, list) {
>+		for (tp = rtnl_dereference(chain->filter_chain); tp;
>+		     tp = rtnl_dereference(tp->next)) {
>+			if (tp->ops->reoffload) {
>+				err = tp->ops->reoffload(tp, add, cb, cb_priv,
>+							 extack);
>+				if (err && add)
>+					goto err_playback_remove;
>+			} else if (add && offload_in_use) {
>+				err = -EOPNOTSUPP;
>+				NL_SET_ERR_MSG(extack, "Filter replay failed - a filters doesn't support re-offloading");

This msg sounds weird. Please fix it.

Otherwise this looks very good to me! Thanks!

^ permalink raw reply

* [patch net-next 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Jiri Pirko @ 2018-06-25 21:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jakub.kicinski, simon.horman,
	john.hurley, dsahern, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

For the TC clsact offload these days, some of HW drivers need
to hold a magic ball. The reason is, with the first inserted rule inside
HW they need to guess what fields will be used for the matching. If
later on this guess proves to be wrong and user adds a filter with a
different field to match, there's a problem. Mlxsw resolves it now with
couple of patterns. Those try to cover as many match fields as possible.
This aproach is far from optimal, both performance-wise and scale-wise.
Also, there is a combination of filters that in certain order won't
succeed.

Most of the time, when user inserts filters in chain, he knows right away
how the filters are going to look like - what type and option will they
have. For example, he knows that he will only insert filters of type
flower matching destination IP address. He can specify a template that
would cover all the filters in the chain.

This patchset is providing the possibility to user to provide such
template  to kernel and propagate it all the way down to device
drivers.

See the examples below.

Create dummy device with clsact first:
# ip link add type dummy
# tc qdisc add dev dummy0 clsact

There is no template assigned by default:
# tc filter template show dev dummy0 ingress

Add a template of type flower allowing to insert rules matching on last
2 bytes of destination mac address:
# tc filter template add dev dummy0 ingress proto ip flower dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF

The template is now showed in the list:
# tc filter template show dev dummy0 ingress
filter flower chain 0
  dst_mac 00:00:00:00:00:00/00:00:00:00:ff:ff
  eth_type ipv4

Add another template, this time for chain number 22:
# tc filter template add dev dummy0 ingress proto ip chain 22 flower dst_ip 0.0.0.0/16
# tc filter template show dev dummy0 ingress
filter flower chain 0
  dst_mac 00:00:00:00:00:00/00:00:00:00:ff:ff
  eth_type ipv4
filter flower chain 22
  eth_type ipv4
  dst_ip 0.0.0.0/16

Add a filter that fits the template:
# tc filter add dev dummy0 ingress proto ip flower dst_mac aa:bb:cc:dd:ee:ff/00:00:00:00:00:0F action drop

Addition of filters that does not fit the template would fail:
# tc filter add dev dummy0 ingress proto ip flower dst_mac aa:11:22:33:44:55/00:00:00:FF:00:00 action drop
Error: Mask does not fit the template.
We have an error talking to the kernel, -1
# tc filter add dev dummy0 ingress proto ip flower dst_ip 10.0.0.1 action drop
Error: Mask does not fit the template.
We have an error talking to the kernel, -1

Additions of filters to chain 22:
# tc filter add dev dummy0 ingress proto ip chain 22 flower dst_ip 10.0.0.1/8 action drop
# tc filter add dev dummy0 ingress proto ip chain 22 flower dst_ip 10.0.0.1 action drop
Error: Mask does not fit the template.
We have an error talking to the kernel, -1
# tc filter add dev dummy0 ingress proto ip chain 22 flower dst_ip 10.0.0.1/24 action drop
Error: Mask does not fit the template.
We have an error talking to the kernel, -1

Removal of a template from non-empty chain would fail:
# tc filter template del dev dummy0 ingress
Error: The chain is not empty, unable to delete template.
We have an error talking to the kernel, -1

Once the chain is flushed, the template could be removed:
# tc filter del dev dummy0 ingress
# tc filter template del dev dummy0 ingress

Jiri Pirko (9):
  net: sched: push ops lookup bits into tcf_proto_lookup_ops()
  net: sched: introduce chain templates
  net: sched: cls_flower: move key/mask dumping into a separate function
  net: sched: cls_flower: change fl_init_dissector to accept mask and
    dissector
  net: sched: cls_flower: implement chain templates
  net: sched: cls_flower: propagate chain teplate creation and
    destruction to drivers
  mlxsw: spectrum: Implement chain template hinting
  selftests: forwarding: move shblock tc support check to a separate
    helper
  selftests: forwarding: add tests for TC chain templates

 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |   5 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  12 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c |  12 +-
 .../ethernet/mellanox/mlxsw/spectrum_acl_tcam.c    |  25 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_flower.c  |  44 ++-
 include/net/pkt_cls.h                              |   2 +
 include/net/sch_generic.h                          |  14 +-
 include/uapi/linux/rtnetlink.h                     |   7 +
 net/sched/cls_api.c                                | 424 +++++++++++++++++++--
 net/sched/cls_basic.c                              |   2 +-
 net/sched/cls_bpf.c                                |   3 +-
 net/sched/cls_cgroup.c                             |   2 +-
 net/sched/cls_flow.c                               |   3 +-
 net/sched/cls_flower.c                             | 251 +++++++++---
 net/sched/cls_fw.c                                 |   3 +-
 net/sched/cls_matchall.c                           |   3 +-
 net/sched/cls_route.c                              |   2 +-
 net/sched/cls_rsvp.h                               |   3 +-
 net/sched/cls_tcindex.c                            |   2 +-
 net/sched/cls_u32.c                                |   2 +-
 security/selinux/nlmsgtab.c                        |   2 +-
 tools/testing/selftests/net/forwarding/lib.sh      |  12 +
 .../selftests/net/forwarding/tc_chaintemplates.sh  | 160 ++++++++
 .../selftests/net/forwarding/tc_shblocks.sh        |   2 +
 24 files changed, 901 insertions(+), 96 deletions(-)
 create mode 100755 tools/testing/selftests/net/forwarding/tc_chaintemplates.sh

-- 
2.14.4

^ permalink raw reply

* [patch net-next 1/9] net: sched: push ops lookup bits into tcf_proto_lookup_ops()
From: Jiri Pirko @ 2018-06-25 21:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jakub.kicinski, simon.horman,
	john.hurley, dsahern, mlxsw
In-Reply-To: <20180625210148.9386-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Push all bits that take care of ops lookup, including module loading
outside tcf_proto_create() function, into tcf_proto_lookup_ops()

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 53 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index cdc3c87c53e6..db45931bbada 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -39,7 +39,7 @@ static DEFINE_RWLOCK(cls_mod_lock);
 
 /* Find classifier type by string name */
 
-static const struct tcf_proto_ops *tcf_proto_lookup_ops(const char *kind)
+static const struct tcf_proto_ops *__tcf_proto_lookup_ops(const char *kind)
 {
 	const struct tcf_proto_ops *t, *res = NULL;
 
@@ -57,6 +57,33 @@ static const struct tcf_proto_ops *tcf_proto_lookup_ops(const char *kind)
 	return res;
 }
 
+static const struct tcf_proto_ops *
+tcf_proto_lookup_ops(const char *kind, struct netlink_ext_ack *extack)
+{
+	const struct tcf_proto_ops *ops;
+
+	ops = __tcf_proto_lookup_ops(kind);
+	if (ops)
+		return ops;
+#ifdef CONFIG_MODULES
+	rtnl_unlock();
+	request_module("cls_%s", kind);
+	rtnl_lock();
+	ops = __tcf_proto_lookup_ops(kind);
+	/* We dropped the RTNL semaphore in order to perform
+	 * the module load. So, even if we succeeded in loading
+	 * the module we have to replay the request. We indicate
+	 * this using -EAGAIN.
+	 */
+	if (ops) {
+		module_put(ops->owner);
+		return ERR_PTR(-EAGAIN);
+	}
+#endif
+	NL_SET_ERR_MSG(extack, "TC classifier not found");
+	return ERR_PTR(-ENOENT);
+}
+
 /* Register(unregister) new classifier type */
 
 int register_tcf_proto_ops(struct tcf_proto_ops *ops)
@@ -133,27 +160,9 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	if (!tp)
 		return ERR_PTR(-ENOBUFS);
 
-	err = -ENOENT;
-	tp->ops = tcf_proto_lookup_ops(kind);
-	if (!tp->ops) {
-#ifdef CONFIG_MODULES
-		rtnl_unlock();
-		request_module("cls_%s", kind);
-		rtnl_lock();
-		tp->ops = tcf_proto_lookup_ops(kind);
-		/* We dropped the RTNL semaphore in order to perform
-		 * the module load. So, even if we succeeded in loading
-		 * the module we have to replay the request. We indicate
-		 * this using -EAGAIN.
-		 */
-		if (tp->ops) {
-			module_put(tp->ops->owner);
-			err = -EAGAIN;
-		} else {
-			NL_SET_ERR_MSG(extack, "TC classifier not found");
-			err = -ENOENT;
-		}
-#endif
+	tp->ops = tcf_proto_lookup_ops(kind, extack);
+	if (IS_ERR(tp->ops)) {
+		err = PTR_ERR(tp->ops);
 		goto errout;
 	}
 	tp->classify = tp->ops->classify;
-- 
2.14.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox