Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [RFC v2 06/12] qedr: Add support for QP verbs
From: Amrani, Ram @ 2016-09-21 14:15 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: davem@davemloft.net, dledford@redhat.com, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval, Borundia, Rajesh,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20160920163048.GQ26673@leon.nu>

> >  include/uapi/rdma/providers/qedr-abi.h     |   35 +
> 
> Please be aware that the last version for ABI header files doesn't have "provider"
> name in directory path (include/uapi/rdma/qedr-abi.h)

OK. Note that I was using https://www.spinics.net/lists/linux-rdma/msg40414.html that dates to 7 days ago and it contains the "providers" folder.
Can you point to the most recent patch?

> > diff --git a/drivers/infiniband/hw/qedr/main.c
> > b/drivers/infiniband/hw/qedr/main.c
> > index c99dd6a..10ad9ed 100644
> > --- a/drivers/infiniband/hw/qedr/main.c
> > +++ b/drivers/infiniband/hw/qedr/main.c
> > @@ -52,7 +52,7 @@ uint debug;
> >  module_param(debug, uint, 0);
> >  MODULE_PARM_DESC(debug, "Default debug msglevel");
> >
> > -static LIST_HEAD(qedr_dev_list);
> 
> You are removing code which was added in previous patches, why do you do it?
> 

Sure. Will fix. Thanks.

> > diff --git a/drivers/infiniband/hw/qedr/qedr.h
> > b/drivers/infiniband/hw/qedr/qedr.h
> > index dd974d5..05017be 100644
> > --- a/drivers/infiniband/hw/qedr/qedr.h
> > +++ b/drivers/infiniband/hw/qedr/qedr.h
> > @@ -49,6 +49,9 @@
> >  enum DP_QEDR_MODULE {
> >  	QEDR_MSG_INIT = 0x10000,
> 
> We prefer shift notation (1 << 16) instead of 0x10000.
> 

OK, I'll convert accordingly.

^ permalink raw reply

* RE: [RFC v2 06/12] qedr: Add support for QP verbs
From: Amrani, Ram @ 2016-09-21 14:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval, Borundia, Rajesh,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20160920152849.GD32020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

> Ugh, each patch keeps adding to this?

The logic in the patch series is to have each patch contain only what is necessary for the specific functionality it adds.
This made it harder on us to prepare but, IMHO, easier for the reviewer.
If you'd like to have this file in one chunk, I can do this. What do you prefer?



> > +struct qedr_create_qp_ureq {
> > +       u32 qp_handle_hi;
> > +       u32 qp_handle_lo;
> > +
> > +       /* SQ */
> > +       /* user space virtual address of SQ buffer */
> > +       u64 sq_addr;
> > +
> > +       /* length of SQ buffer */
> > +       size_t sq_len;
> 
> Do not use size_t, that is just asking for 32/64 bit interop problems.

OK

> > +struct qedr_create_qp_uresp {
> > +       u32 qp_id;
> > +       int atomic_supported;
> 
> int again, etc. You get the idea.

Yeah, I did. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC v2 04/12] qedr: Add support for user context verbs
From: Amrani, Ram @ 2016-09-21 14:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: davem@davemloft.net, dledford@redhat.com, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval, Borundia, Rajesh,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20160920152729.GC32020@obsidianresearch.com>

> uapi headers need the __ varients and the #include <linux/types.h>
> 
> Follow what Leon has done.

OK

^ permalink raw reply

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
From: Tom Herbert @ 2016-09-21 14:19 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer
In-Reply-To: <20160921115545.GA12789@pox.localdomain>

On Wed, Sep 21, 2016 at 4:55 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 09/20/16 at 04:59pm, Tom Herbert wrote:
>> Well, need to measure to ascertain the cost. As for complexity, this
>> actually reduces complexity needed for XDP in the drivers which is a
>> good thing because that's where most of the support and development
>> pain will be.
>
> I'm not objecting to anything that simplifies the process of adding
> XDP capability to drivers. You have my full support here.
>
>> I am looking at using this for ILA router. The problem I am hitting is
>> that not all packets that we need to translate go through the XDP
>> path. Some would go through the kernel path, some through XDP path but
>
> When you say kernel path, what do you mean specifically? One aspect of
> XDP I love is that XDP can act as an acceleration option for existing
> BPF programs attached to cls_bpf. Support for direct packet read and
> write at clsact level have made it straight forward to write programs
> which are compatible or at minimum share a lot of common code. They
> can share data structures, lookup functionality, etc.
>
>> We can optimize for allowing only one hook, or maybe limit to only
>> allowing one hook to be set. In any case this obviously requires a lot
>> of performance evaluation, I am hoping to feedback on the design
>> first. My question about using a linear list for this was real, do you
>> know a better method off hand to implement a call list?
>
> My main concern is that we overload the XDP hook. Instead of making use
> of the programmable glue, we put a linked list in front where everybody
> can attach a program to.
>
> A possible alternative:
>  1. The XDP hook always has single consumer controlled by the user
>     through Netlink, BPF is one of them. If a user wants to hardcode
>     the ILA router to that hook, he can do that.
>
>  2. BPF for XDP is extended to allow returning a verdict which results
>     in something else to be invoked. If user wants to invoke the ILA
>     router for just some packets, he can do that.
>
> That said, I see so much value in a BPF implementation of ILA at XDP
> level with all of the parsing logic and exact semantics remain
> flexible without the cost of translating some configuration to a set
> of actions.

Thomas,

As I mentioned though, not all packets we need to translate are going
to go through XDP. ILA already integrates with the routing table via
LWT and that solution works incredibly well especially when we are
sourcing connections (ILA is by far the fastest most efficient
technique of network virtualization). We already have a lookup table
and the translation process coded in the kernel and configuration is
already implemented by netlink and iproute. So I'm not yet sure I want
to redesign all of this around BPF, maybe that is the right answer
maybe it isn't; but, my point is I don't want to be _forced_ into a
certain design that because of constraints on one kernel interface. As
a kernel developer I want flexibility on how we design and implement
things!

I think there are two questions that this patch set poses for the
community wrt XDP:

#1: Should we allow alternate code to run in XDP other than BPF?
#2: If #1 is true what is the best way to implement that?

If the answer to #1 is "no" then the answer to #2 is irrelevant. So
with this RFC I'm hoping we can come the agreement on questions #1.

IMO, there are at least three factors that would motivate the answer
to #1 be yes

1) There is precedence in nfhooks in creating generic hooks in the
critical data path that can have other uses than those originally
intended by the authors. I would take it as general principle that
hooks in the data path should be as generic as possible to get the
most utility.
2) The changes to make XDP work in drivers are quite invasive and as
we've seen in at least one attempt to modify a driver to support XDP
it is easy to break other mechanisms. If we're going to be modifying a
lot of drivers we really want to get the most value for that work.
3) The XDP interface is well designed and really has no dependencies
on BPF. Writing code to directly parse packets is not rocket science.
So it seems reasonable to me that a subsystem may want to use XDP
interface to accelerate existing functionality without introducing
additional dependencies on BPF or other non-kernel mechanisms

In any case, I ask everyone to be open minded. If there is agreement
that BPF is sufficient for all XDP use cases then I wont' pursue these
patches. But given the ramifications of XDP, especially the
considering the changes required across drivers to support it, I
really would like to get input from the community on this.

Thanks,
Tom

^ permalink raw reply

* RE: [RFC v2 00/11] QLogic RDMA Driver (qedr) RFC
From: Amrani, Ram @ 2016-09-21 14:19 UTC (permalink / raw)
  To: Leon Romanovsky, Elior, Ariel
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Kalderon, Michal, Mintz, Yuval, Borundia, Rajesh,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20160920162306.GP26673-2ukJVAZIZ/Y@public.gmane.org>

> 
> The module parameters is no-go.
> 

We'll remove it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC v2 04/12] qedr: Add support for user context verbs
From: Amrani, Ram @ 2016-09-21 14:18 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval, Borundia, Rajesh,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20160920164556.GS26673-2ukJVAZIZ/Y@public.gmane.org>

> Are you sure that you want GPL-only license?

Thanks for pointing this out. This is something we will obviously fix.

 
> > +#define QEDR_ABI_VERSION		(6)
> 
> Is it possible to start new file with new driver from ABI version 1 and not 6?

This is something that we cannot do since we already have a few libraries out there and we wouldn't like a user to use mismatch a library and a driver...

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: ethernet: mediatek: add the dts property to set if TRGMII supported on GMAC0
From: Andrew Lunn @ 2016-09-21 14:17 UTC (permalink / raw)
  To: Sean Wang; +Cc: john, davem, nbd, netdev, linux-mediatek, keyhaede
In-Reply-To: <1474438590-6855-1-git-send-email-sean.wang@mediatek.com>

On Wed, Sep 21, 2016 at 02:16:30PM +0800, Sean Wang wrote:
> Date: Tue, 20 Sep 2016 21:37:58 +0200, Andrew Lunn <andrew@lunn.ch> wrote:
> >On Tue, Sep 20, 2016 at 03:59:20PM +0800, sean.wang@mediatek.com wrote:
> >> From: Sean Wang <sean.wang@mediatek.com>
> >>
> >> Add the dts property for the capability if TRGMII supported on GAMC0
> >>
> >> Signed-off-by: Sean Wang <sean.wang@mediatek.com>
> >> ---
> >>  Documentation/devicetree/bindings/net/mediatek-net.txt | 5 ++++-
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt
> >> index 6103e55..32f79d8 100644
> >> --- a/Documentation/devicetree/bindings/net/mediatek-net.txt
> >> +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
> >> @@ -31,7 +31,10 @@ Optional properties:
> >>  Required properties:
> >>  - compatible: Should be "mediatek,eth-mac"
> >>  - reg: The number of the MAC
> >> -- phy-handle: see ethernet.txt file in the same directory.
> >> +- phy-handle: see ethernet.txt file in the same directory and
> >> +     the additional phy-mode "tgrmii" is provided in order to connect
> >> +     with the internal switch MT7530 which is only applicable when reg
> >> +     is equal to 0.
> >
> >Humm. How is the switch connected? Is it on the MDIO bus?
> 
> the switch is connected to MDIO bus
> 
> >If it is on the mdio bus, the binding is going to look something like:
> >
> >eth: ethernet@1b100000 {
> >        compatible = "mediatek,mt7623-eth";
> >        reg = <0 0x1b100000 0 0x20000>;
> >        clocks = <&topckgen CLK_TOP_ETHIF_SEL>,
> >                 <&ethsys CLK_ETHSYS_ESW>,
> >                 <&ethsys CLK_ETHSYS_GP2>,
> >                 <&ethsys CLK_ETHSYS_GP1>;
> >        clock-names = "ethif", "esw", "gp2", "gp1";
> >        interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW
> >                      GIC_SPI 199 IRQ_TYPE_LEVEL_LOW
> >                      GIC_SPI 198 IRQ_TYPE_LEVEL_LOW>;
> >        power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>;
> >        resets = <&ethsys MT2701_ETHSYS_ETH_RST>;
> >        reset-names = "eth";
> >        mediatek,ethsys = <&ethsys>;
> >        mediatek,pctl = <&syscfg_pctl_a>;
> >        #address-cells = <1>;
> >        #size-cells = <0>;
> >
> >        gmac1: mac@0 {
> >                compatible = "mediatek,eth-mac";
> >                reg = <0>;
> >        };
> >
> >        gmac2: mac@1 {
> >                compatible = "mediatek,eth-mac";
> >                reg = <1>;
> >        };
> >
> >        mdio-bus {
> >               reg = <1>;
> >               #address-cells = <1>;
> >               #size-cells = <0>;
> >
> >               switch0: switch0@0 {
> >                       compatible = "marvell,mv88e6085";
> >                       #address-cells = <1>;
> >                       #size-cells = <0>;
> >                       reg = <0>;
> >                       dsa,member = <0 0>;
> >
> >                       ports {
> >                               #address-cells = <1>;
> >                               #size-cells = <0>;
> >                               port@0 {
> >                                       reg = <0>;
> >                                       label = "lan0";
> >...
> >...
> >In this case the switch is an MDIO device, not an PHY. It will not
> >have an phy-mode. It cannot have a phy mode, it is not a PHY.
> >
> >Or am i missing something here?
> >
> >Thanks
> >
> 
> 1)
> 
> The switch driver is not supported for DSA so far yet 
> but DSA is good thing and I will try make it happen
> in the near future.

O.K. But if i understand correctly, the TRGMII is so you can use the
switch. So it needs to work when you have DSA.
 
> And another question about DSA, that is
> if I use DSA for switch, how to know the relationship
> between MAC and DSA ? such like I could know relationship 
> between MAC and PHY by phy-handle.

It will look like what i stated above. But i missed the cpu node in
the ports, which is what you are asking about. There will also be a
node like:

                            port@6 {
                                     reg = <6>;
                                     label = "cpu";
                                     ethernet = <&gmac1>;
                             };

And this is how you couple the MAC to DSA.

> The cause I ask is becasue I think it's good if the topology
> about MAC/PHYs/Switch is known just by dts files.
> 
> 2)
> 
> The phy-mode I mention is for fixed-link. For current MAC driver, 
> it just uses fixed phy to adapt into the part of switch, so the 
> device tree looks something like the below. 
> 
> &eth {
>         status = "okay";
>         gmac0: mac@0 {
>                 compatible = "mediatek,eth-mac";
>                 reg = <0>;
>                 phy-mode = "trgmii";
>                 fixed-link {
>                         speed = <1000>;
>                         full-duplex;
>                         pause;
>                 };
>         };
> 
>         gmac1: mac@1 {
>                 compatible = "mediatek,eth-mac";
>                 reg = <1>;
>                 phy-handle = <&phy5>;
>         };


static int mtk_phy_connect(struct mtk_mac *mac)
{
        struct mtk_eth *eth = mac->hw;
        struct device_node *np;
        u32 val;

        np = of_parse_phandle(mac->of_node, "phy-handle", 0);
        if (!np && of_phy_is_fixed_link(mac->of_node))
                if (!of_phy_register_fixed_link(mac->of_node))
                        np = of_node_get(mac->of_node);
	...
        ...
        mtk_phy_connect_node(eth, mac, np);


So in the case of a fixed-phy, you do look in the MAC node, and when
there is a phy-handle, you look in the PHY node.

So this does work....

   Andrew

^ permalink raw reply

* [PATCH] net: fec: set mac address unconditionally
From: Gavin Schenk @ 2016-09-21 13:30 UTC (permalink / raw)
  To: fugang.duan; +Cc: netdev, kernel, Gavin Schenk

Fixes: 9638d19e4816 ("net: fec: add netif status check before set mac address")

If the mac address origin is not dt, you can only safe assign a
mac address after "link up" of the device. If the link is down the
clocks are disabled and because of issues assigning registers when
clocks are down the new mac address is discarded on some soc's. This fix
sets the mac address unconditionally in fec_restart(...) and ensures
consistens between fec registers and the network layer.

Signed-off-by: Gavin Schenk <g.schenk@eckelmann.de>
---
 drivers/net/ethernet/freescale/fec_main.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 2a03857cca18..bdabea6cd981 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -903,13 +903,11 @@ fec_restart(struct net_device *ndev)
 	 * enet-mac reset will reset mac address registers too,
 	 * so need to reconfigure it.
 	 */
-	if (fep->quirks & FEC_QUIRK_ENET_MAC) {
-		memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
-		writel((__force u32)cpu_to_be32(temp_mac[0]),
-		       fep->hwp + FEC_ADDR_LOW);
-		writel((__force u32)cpu_to_be32(temp_mac[1]),
-		       fep->hwp + FEC_ADDR_HIGH);
-	}
+	memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
+	writel((__force u32)cpu_to_be32(temp_mac[0]),
+	       fep->hwp + FEC_ADDR_LOW);
+	writel((__force u32)cpu_to_be32(temp_mac[1]),
+	       fep->hwp + FEC_ADDR_HIGH);

 	/* Clear any outstanding interrupt. */
 	writel(0xffffffff, fep->hwp + FEC_IEVENT);
-- 
1.9.1

Eckelmann AG
Vorstand: Dipl.-Ing. Peter Frankenbach (Sprecher) Dipl.-Wi.-Ing. Philipp Eckelmann
Dr.-Ing. Frank-Thomas Mellert Dr.-Ing. Marco Münchhof Dr.-Ing. Frank Uhlemann
Vorsitzender des Aufsichtsrats: Hubertus G. Krossa
Sitz der Gesellschaft: Berliner Str. 161, 65205 Wiesbaden, Amtsgericht Wiesbaden HRB 12636
http://www.eckelmann.de 

^ permalink raw reply related

* Re: [PATCHv7 net-next 00/15] BPF hardware offload (cls_bpf for now)
From: Daniel Borkmann @ 2016-09-21 13:35 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast
In-Reply-To: <1474454647-20137-1-git-send-email-jakub.kicinski@netronome.com>

On 09/21/2016 12:43 PM, Jakub Kicinski wrote:
[...]
> Rebased and improved.

Did a couple of tests with regards to the core bits with this set
applied on top of net-next and looks good to me now.

Thanks a lot, Jakub!

^ permalink raw reply

* [PATCH stable 4.4] tipc: move linearization of buffers to generic code
From: Juerg Haefliger @ 2016-09-21 13:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: jonas.arndt, Jon Paul Maloy, Juerg Haefliger

From: Jon Paul Maloy <jon.maloy@ericsson.com>

commit c7cad0d6f70cd4ce8644ffe528a4df1cdc2e77f5 upstream.

In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR
and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
tipc_udp_recv(). The location of the change was selected in order
to make the commit easily appliable to 'net' and 'stable'.

We now move this linearization to where it should be done, in the
functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>

---
This commit fixes an issue with nodes not joining a cluster over TIPC
after reboots. Upstream TIPC recommends to include this commit in the
stable kernel:
https://sourceforge.net/p/tipc/mailman/message/35368150/
---
 net/tipc/link.c       | 2 ++
 net/tipc/name_distr.c | 1 +
 net/tipc/udp_media.c  | 5 -----
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 91aea071ab27..72268eac4ec7 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1262,6 +1262,8 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
 		/* fall thru' */
 
 	case ACTIVATE_MSG:
+		skb_linearize(skb);
+		hdr = buf_msg(skb);
 
 		/* Complete own link name with peer's interface name */
 		if_name =  strrchr(l->name, ':') + 1;
diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
index c07612bab95c..f51c8bdbea1c 100644
--- a/net/tipc/name_distr.c
+++ b/net/tipc/name_distr.c
@@ -397,6 +397,7 @@ void tipc_named_rcv(struct net *net, struct sk_buff_head *inputq)
 
 	spin_lock_bh(&tn->nametbl_lock);
 	for (skb = skb_dequeue(inputq); skb; skb = skb_dequeue(inputq)) {
+		skb_linearize(skb);
 		msg = buf_msg(skb);
 		mtype = msg_type(msg);
 		item = (struct distr_item *)msg_data(msg);
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index 70c03271b798..6af78c6276b4 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -48,7 +48,6 @@
 #include <linux/tipc_netlink.h>
 #include "core.h"
 #include "bearer.h"
-#include "msg.h"
 
 /* IANA assigned UDP port */
 #define UDP_PORT_DEFAULT	6118
@@ -224,10 +223,6 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct udp_bearer *ub;
 	struct tipc_bearer *b;
-	int usr = msg_user(buf_msg(skb));
-
-	if ((usr == LINK_PROTOCOL) || (usr == NAME_DISTRIBUTOR))
-		skb_linearize(skb);
 
 	ub = rcu_dereference_sk_user_data(sk);
 	if (!ub) {
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH v4 net-next 00/16] tcp: BBR congestion control algorithm
From: Thomas Graf @ 2016-09-21 12:51 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, Netdev
In-Reply-To: <CADVnQynOD2Hrq994BPzTb+AgtOU=tpyRM+odjSv5gNYTSvbhBQ@mail.gmail.com>

On 09/21/16 at 07:53am, Neal Cardwell wrote:
> On Wed, Sep 21, 2016 at 12:44 AM, David Miller <davem@davemloft.net> wrote:
> > From: Neal Cardwell <ncardwell@google.com>
> > Date: Mon, 19 Sep 2016 23:39:07 -0400
> >
> >> tcp: BBR congestion control algorithm
> >
> > Series applied, thanks Neal.
> 
> Thanks, David!

Let me just say that this is super awesome. Thank you to everybody who
was involved!

^ permalink raw reply

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
From: Thomas Graf @ 2016-09-21 11:55 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer
In-Reply-To: <CALx6S35LmUtwRA85Kg6s7OR=e5Pj9ssqmWLsjtmXfZTXeG02zQ@mail.gmail.com>

On 09/20/16 at 04:59pm, Tom Herbert wrote:
> Well, need to measure to ascertain the cost. As for complexity, this
> actually reduces complexity needed for XDP in the drivers which is a
> good thing because that's where most of the support and development
> pain will be.

I'm not objecting to anything that simplifies the process of adding
XDP capability to drivers. You have my full support here.

> I am looking at using this for ILA router. The problem I am hitting is
> that not all packets that we need to translate go through the XDP
> path. Some would go through the kernel path, some through XDP path but

When you say kernel path, what do you mean specifically? One aspect of
XDP I love is that XDP can act as an acceleration option for existing
BPF programs attached to cls_bpf. Support for direct packet read and
write at clsact level have made it straight forward to write programs
which are compatible or at minimum share a lot of common code. They
can share data structures, lookup functionality, etc.

> We can optimize for allowing only one hook, or maybe limit to only
> allowing one hook to be set. In any case this obviously requires a lot
> of performance evaluation, I am hoping to feedback on the design
> first. My question about using a linear list for this was real, do you
> know a better method off hand to implement a call list?

My main concern is that we overload the XDP hook. Instead of making use
of the programmable glue, we put a linked list in front where everybody
can attach a program to.

A possible alternative:
 1. The XDP hook always has single consumer controlled by the user
    through Netlink, BPF is one of them. If a user wants to hardcode
    the ILA router to that hook, he can do that.

 2. BPF for XDP is extended to allow returning a verdict which results
    in something else to be invoked. If user wants to invoke the ILA
    router for just some packets, he can do that.

That said, I see so much value in a BPF implementation of ILA at XDP
level with all of the parsing logic and exact semantics remain
flexible without the cost of translating some configuration to a set
of actions.

^ permalink raw reply

* Re: [PATCH v4 net-next 00/16] tcp: BBR congestion control algorithm
From: Neal Cardwell @ 2016-09-21 11:53 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev
In-Reply-To: <20160921.004459.574306460460243541.davem@davemloft.net>

On Wed, Sep 21, 2016 at 12:44 AM, David Miller <davem@davemloft.net> wrote:
> From: Neal Cardwell <ncardwell@google.com>
> Date: Mon, 19 Sep 2016 23:39:07 -0400
>
>> tcp: BBR congestion control algorithm
>
> Series applied, thanks Neal.

Thanks, David!

neal

^ permalink raw reply

* [patch net-next 6/6] doc: update switchdev L3 section
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

This is to reflect the change of FIB offload infrastructure from
switchdev objects to FIB notifier.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 Documentation/networking/switchdev.txt | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index 44235e8..c956ab8 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -314,30 +314,29 @@ the kernel, with the device doing the FIB lookup and forwarding.  The device
 does a longest prefix match (LPM) on FIB entries matching route prefix and
 forwards the packet to the matching FIB entry's nexthop(s) egress ports.
 
-To program the device, the driver implements support for
-SWITCHDEV_OBJ_IPV[4|6]_FIB object using switchdev_port_obj_xxx ops.
-switchdev_port_obj_add is used for both adding a new FIB entry to the device,
-or modifying an existing entry on the device.
+To program the device, the driver has to register a FIB notifier handler
+using register_fib_notifier. There are following events available:
+FIB_EVENT_ENTRY_ADD: used for both adding a new FIB entry to the device,
+                     or modifying an existing entry on the device.
+FIB_EVENT_ENTRY_DEL: used for removing a FIB entry
+FIB_EVENT_RULE_ADD, FIB_EVENT_RULE_DEL: used to propagate FIB rule changes
 
-XXX: Currently, only SWITCHDEV_OBJ_ID_IPV4_FIB objects are supported.
+FIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass:
 
-SWITCHDEV_OBJ_ID_IPV4_FIB object passes:
-
-	struct switchdev_obj_ipv4_fib {         /* IPV4_FIB */
+	struct fib_entry_notifier_info {
+		struct fib_notifier_info info; /* must be first */
 		u32 dst;
 		int dst_len;
 		struct fib_info *fi;
 		u8 tos;
 		u8 type;
-		u32 nlflags;
 		u32 tb_id;
-	} ipv4_fib;
+		u32 nlflags;
+	};
 
 to add/modify/delete IPv4 dst/dest_len prefix on table tb_id.  The *fi
 structure holds details on the route and route's nexthops.  *dev is one of the
-port netdevs mentioned in the routes next hop list.  If the output port netdevs
-referenced in the route's nexthop list don't all have the same switch ID, the
-driver is not called to add/modify/delete the FIB entry.
+port netdevs mentioned in the routes next hop list.
 
 Routes offloaded to the device are labeled with "offload" in the ip route
 listing:
@@ -355,6 +354,8 @@ listing:
 	12.0.0.4 via 11.0.0.9 dev sw1p2  proto zebra  metric 20 offload
 	192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.15
 
+The "offload" flag is set in case at least one device offloads the FIB entry.
+
 XXX: add/mod/del IPv6 FIB API
 
 Nexthop Resolution
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 5/6] switchdev: remove FIB offload infrastructure
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Since this is now taken care of by FIB notifier, remove the code, with
all unused dependencies.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip_fib.h      |   2 -
 include/net/switchdev.h   |  40 ----------
 net/ipv4/fib_frontend.c   |  13 ----
 net/ipv4/fib_rules.c      |   2 -
 net/ipv4/fib_trie.c       | 104 +-------------------------
 net/switchdev/switchdev.c | 181 ----------------------------------------------
 6 files changed, 1 insertion(+), 341 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index ffccf17..b9314b4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -243,7 +243,6 @@ int fib_table_dump(struct fib_table *table, struct sk_buff *skb,
 		   struct netlink_callback *cb);
 int fib_table_flush(struct net *net, struct fib_table *table);
 struct fib_table *fib_trie_unmerge(struct fib_table *main_tb);
-void fib_table_flush_external(struct fib_table *table);
 void fib_free_table(struct fib_table *tb);
 
 #ifndef CONFIG_IP_MULTIPLE_TABLES
@@ -356,7 +355,6 @@ static inline int fib_num_tclassid_users(struct net *net)
 }
 #endif
 int fib_unmerge(struct net *net);
-void fib_flush_external(struct net *net);
 
 /* Exported by fib_semantics.c */
 int ip_fib_check_default(__be32 gw, struct net_device *dev);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 729fe15..eba80c4 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -68,7 +68,6 @@ struct switchdev_attr {
 enum switchdev_obj_id {
 	SWITCHDEV_OBJ_ID_UNDEFINED,
 	SWITCHDEV_OBJ_ID_PORT_VLAN,
-	SWITCHDEV_OBJ_ID_IPV4_FIB,
 	SWITCHDEV_OBJ_ID_PORT_FDB,
 	SWITCHDEV_OBJ_ID_PORT_MDB,
 };
@@ -92,21 +91,6 @@ struct switchdev_obj_port_vlan {
 #define SWITCHDEV_OBJ_PORT_VLAN(obj) \
 	container_of(obj, struct switchdev_obj_port_vlan, obj)
 
-/* SWITCHDEV_OBJ_ID_IPV4_FIB */
-struct switchdev_obj_ipv4_fib {
-	struct switchdev_obj obj;
-	u32 dst;
-	int dst_len;
-	struct fib_info *fi;
-	u8 tos;
-	u8 type;
-	u32 nlflags;
-	u32 tb_id;
-};
-
-#define SWITCHDEV_OBJ_IPV4_FIB(obj) \
-	container_of(obj, struct switchdev_obj_ipv4_fib, obj)
-
 /* SWITCHDEV_OBJ_ID_PORT_FDB */
 struct switchdev_obj_port_fdb {
 	struct switchdev_obj obj;
@@ -209,11 +193,6 @@ int switchdev_port_bridge_setlink(struct net_device *dev,
 				  struct nlmsghdr *nlh, u16 flags);
 int switchdev_port_bridge_dellink(struct net_device *dev,
 				  struct nlmsghdr *nlh, u16 flags);
-int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
-			   u8 tos, u8 type, u32 nlflags, u32 tb_id);
-int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
-			   u8 tos, u8 type, u32 tb_id);
-void switchdev_fib_ipv4_abort(struct fib_info *fi);
 int switchdev_port_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 			   struct net_device *dev, const unsigned char *addr,
 			   u16 vid, u16 nlm_flags);
@@ -304,25 +283,6 @@ static inline int switchdev_port_bridge_dellink(struct net_device *dev,
 	return -EOPNOTSUPP;
 }
 
-static inline int switchdev_fib_ipv4_add(u32 dst, int dst_len,
-					 struct fib_info *fi,
-					 u8 tos, u8 type,
-					 u32 nlflags, u32 tb_id)
-{
-	return 0;
-}
-
-static inline int switchdev_fib_ipv4_del(u32 dst, int dst_len,
-					 struct fib_info *fi,
-					 u8 tos, u8 type, u32 tb_id)
-{
-	return 0;
-}
-
-static inline void switchdev_fib_ipv4_abort(struct fib_info *fi)
-{
-}
-
 static inline int switchdev_port_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 					 struct net_device *dev,
 					 const unsigned char *addr,
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 86c43dc..c3b8047 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -189,19 +189,6 @@ static void fib_flush(struct net *net)
 		rt_cache_flush(net);
 }
 
-void fib_flush_external(struct net *net)
-{
-	struct fib_table *tb;
-	struct hlist_head *head;
-	unsigned int h;
-
-	for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
-		head = &net->ipv4.fib_table_hash[h];
-		hlist_for_each_entry(tb, head, tb_hlist)
-			fib_table_flush_external(tb);
-	}
-}
-
 /*
  * Find address type as if only "dev" was present in the system. If
  * on_dev is NULL then all interfaces are taken into consideration.
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index ebadf6b..2e50062 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -228,7 +228,6 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
 	rule4->tos = frh->tos;
 
 	net->ipv4.fib_has_custom_rules = true;
-	fib_flush_external(rule->fr_net);
 	call_fib_rule_notifiers(net, FIB_EVENT_RULE_ADD);
 
 	err = 0;
@@ -251,7 +250,6 @@ static int fib4_rule_delete(struct fib_rule *rule)
 		net->ipv4.fib_num_tclassid_users--;
 #endif
 	net->ipv4.fib_has_custom_rules = true;
-	fib_flush_external(rule->fr_net);
 	call_fib_rule_notifiers(net, FIB_EVENT_RULE_DEL);
 errout:
 	return err;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 51a4537..31cef36 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -81,7 +81,6 @@
 #include <net/tcp.h>
 #include <net/sock.h>
 #include <net/ip_fib.h>
-#include <net/switchdev.h>
 #include <trace/events/fib.h>
 #include "fib_lookup.h"
 
@@ -1215,17 +1214,6 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 			new_fa->tb_id = tb->tb_id;
 			new_fa->fa_default = -1;
 
-			err = switchdev_fib_ipv4_add(key, plen, fi,
-						     new_fa->fa_tos,
-						     cfg->fc_type,
-						     cfg->fc_nlflags,
-						     tb->tb_id);
-			if (err) {
-				switchdev_fib_ipv4_abort(fi);
-				kmem_cache_free(fn_alias_kmem, new_fa);
-				goto out;
-			}
-
 			hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
 
 			alias_free_mem_rcu(fa);
@@ -1273,18 +1261,10 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 	new_fa->tb_id = tb->tb_id;
 	new_fa->fa_default = -1;
 
-	/* (Optionally) offload fib entry to switch hardware. */
-	err = switchdev_fib_ipv4_add(key, plen, fi, tos, cfg->fc_type,
-				     cfg->fc_nlflags, tb->tb_id);
-	if (err) {
-		switchdev_fib_ipv4_abort(fi);
-		goto out_free_new_fa;
-	}
-
 	/* Insert new entry to the list. */
 	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
 	if (err)
-		goto out_sw_fib_del;
+		goto out_free_new_fa;
 
 	if (!plen)
 		tb->tb_num_default++;
@@ -1297,8 +1277,6 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 succeeded:
 	return 0;
 
-out_sw_fib_del:
-	switchdev_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type, tb->tb_id);
 out_free_new_fa:
 	kmem_cache_free(fn_alias_kmem, new_fa);
 out:
@@ -1591,9 +1569,6 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
 	if (!fa_to_delete)
 		return -ESRCH;
 
-	switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
-			       cfg->fc_type, tb->tb_id);
-
 	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key, plen,
 				 fa_to_delete->fa_info, tos, cfg->fc_type,
 				 tb->tb_id, 0);
@@ -1785,80 +1760,6 @@ out:
 	return NULL;
 }
 
-/* Caller must hold RTNL */
-void fib_table_flush_external(struct fib_table *tb)
-{
-	struct trie *t = (struct trie *)tb->tb_data;
-	struct key_vector *pn = t->kv;
-	unsigned long cindex = 1;
-	struct hlist_node *tmp;
-	struct fib_alias *fa;
-
-	/* walk trie in reverse order */
-	for (;;) {
-		unsigned char slen = 0;
-		struct key_vector *n;
-
-		if (!(cindex--)) {
-			t_key pkey = pn->key;
-
-			/* cannot resize the trie vector */
-			if (IS_TRIE(pn))
-				break;
-
-			/* resize completed node */
-			pn = resize(t, pn);
-			cindex = get_index(pkey, pn);
-
-			continue;
-		}
-
-		/* grab the next available node */
-		n = get_child(pn, cindex);
-		if (!n)
-			continue;
-
-		if (IS_TNODE(n)) {
-			/* record pn and cindex for leaf walking */
-			pn = n;
-			cindex = 1ul << n->bits;
-
-			continue;
-		}
-
-		hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
-			struct fib_info *fi = fa->fa_info;
-
-			/* if alias was cloned to local then we just
-			 * need to remove the local copy from main
-			 */
-			if (tb->tb_id != fa->tb_id) {
-				hlist_del_rcu(&fa->fa_list);
-				alias_free_mem_rcu(fa);
-				continue;
-			}
-
-			/* record local slen */
-			slen = fa->fa_slen;
-
-			if (!fi || !(fi->fib_flags & RTNH_F_OFFLOAD))
-				continue;
-
-			switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
-					       fi, fa->fa_tos, fa->fa_type,
-					       tb->tb_id);
-		}
-
-		/* update leaf slen */
-		n->slen = slen;
-
-		if (hlist_empty(&n->leaf)) {
-			put_child_root(pn, n->key, NULL);
-			node_free(n);
-		}
-	}
-}
-
 /* Caller must hold RTNL. */
 int fib_table_flush(struct net *net, struct fib_table *tb)
 {
@@ -1909,9 +1810,6 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
 				continue;
 			}
 
-			switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
-					       fi, fa->fa_tos, fa->fa_type,
-					       tb->tb_id);
 			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL,
 						 n->key,
 						 KEYLENGTH - fa->fa_slen,
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index abd8d2a..02beb35 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -21,7 +21,6 @@
 #include <linux/workqueue.h>
 #include <linux/if_vlan.h>
 #include <linux/rtnetlink.h>
-#include <net/ip_fib.h>
 #include <net/switchdev.h>
 
 /**
@@ -344,8 +343,6 @@ static size_t switchdev_obj_size(const struct switchdev_obj *obj)
 	switch (obj->id) {
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
 		return sizeof(struct switchdev_obj_port_vlan);
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		return sizeof(struct switchdev_obj_ipv4_fib);
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		return sizeof(struct switchdev_obj_port_fdb);
 	case SWITCHDEV_OBJ_ID_PORT_MDB:
@@ -1108,184 +1105,6 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fdb_dump);
 
-static struct net_device *switchdev_get_lowest_dev(struct net_device *dev)
-{
-	const struct switchdev_ops *ops = dev->switchdev_ops;
-	struct net_device *lower_dev;
-	struct net_device *port_dev;
-	struct list_head *iter;
-
-	/* Recusively search down until we find a sw port dev.
-	 * (A sw port dev supports switchdev_port_attr_get).
-	 */
-
-	if (ops && ops->switchdev_port_attr_get)
-		return dev;
-
-	netdev_for_each_lower_dev(dev, lower_dev, iter) {
-		port_dev = switchdev_get_lowest_dev(lower_dev);
-		if (port_dev)
-			return port_dev;
-	}
-
-	return NULL;
-}
-
-static struct net_device *switchdev_get_dev_by_nhs(struct fib_info *fi)
-{
-	struct switchdev_attr attr = {
-		.id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
-	};
-	struct switchdev_attr prev_attr;
-	struct net_device *dev = NULL;
-	int nhsel;
-
-	ASSERT_RTNL();
-
-	/* For this route, all nexthop devs must be on the same switch. */
-
-	for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
-		const struct fib_nh *nh = &fi->fib_nh[nhsel];
-
-		if (!nh->nh_dev)
-			return NULL;
-
-		dev = switchdev_get_lowest_dev(nh->nh_dev);
-		if (!dev)
-			return NULL;
-
-		attr.orig_dev = dev;
-		if (switchdev_port_attr_get(dev, &attr))
-			return NULL;
-
-		if (nhsel > 0 &&
-		    !netdev_phys_item_id_same(&prev_attr.u.ppid, &attr.u.ppid))
-				return NULL;
-
-		prev_attr = attr;
-	}
-
-	return dev;
-}
-
-/**
- *	switchdev_fib_ipv4_add - Add/modify switch IPv4 route entry
- *
- *	@dst: route's IPv4 destination address
- *	@dst_len: destination address length (prefix length)
- *	@fi: route FIB info structure
- *	@tos: route TOS
- *	@type: route type
- *	@nlflags: netlink flags passed in (NLM_F_*)
- *	@tb_id: route table ID
- *
- *	Add/modify switch IPv4 route entry.
- */
-int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
-			   u8 tos, u8 type, u32 nlflags, u32 tb_id)
-{
-	struct switchdev_obj_ipv4_fib ipv4_fib = {
-		.obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
-		.dst = dst,
-		.dst_len = dst_len,
-		.fi = fi,
-		.tos = tos,
-		.type = type,
-		.nlflags = nlflags,
-		.tb_id = tb_id,
-	};
-	struct net_device *dev;
-	int err = 0;
-
-	/* Don't offload route if using custom ip rules or if
-	 * IPv4 FIB offloading has been disabled completely.
-	 */
-
-#ifdef CONFIG_IP_MULTIPLE_TABLES
-	if (fi->fib_net->ipv4.fib_has_custom_rules)
-		return 0;
-#endif
-
-	if (fi->fib_net->ipv4.fib_offload_disabled)
-		return 0;
-
-	dev = switchdev_get_dev_by_nhs(fi);
-	if (!dev)
-		return 0;
-
-	ipv4_fib.obj.orig_dev = dev;
-	err = switchdev_port_obj_add(dev, &ipv4_fib.obj);
-	if (!err)
-		fib_info_offload_inc(fi);
-
-	return err == -EOPNOTSUPP ? 0 : err;
-}
-EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_add);
-
-/**
- *	switchdev_fib_ipv4_del - Delete IPv4 route entry from switch
- *
- *	@dst: route's IPv4 destination address
- *	@dst_len: destination address length (prefix length)
- *	@fi: route FIB info structure
- *	@tos: route TOS
- *	@type: route type
- *	@tb_id: route table ID
- *
- *	Delete IPv4 route entry from switch device.
- */
-int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
-			   u8 tos, u8 type, u32 tb_id)
-{
-	struct switchdev_obj_ipv4_fib ipv4_fib = {
-		.obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
-		.dst = dst,
-		.dst_len = dst_len,
-		.fi = fi,
-		.tos = tos,
-		.type = type,
-		.nlflags = 0,
-		.tb_id = tb_id,
-	};
-	struct net_device *dev;
-	int err = 0;
-
-	if (!(fi->fib_flags & RTNH_F_OFFLOAD))
-		return 0;
-
-	dev = switchdev_get_dev_by_nhs(fi);
-	if (!dev)
-		return 0;
-
-	ipv4_fib.obj.orig_dev = dev;
-	err = switchdev_port_obj_del(dev, &ipv4_fib.obj);
-	if (!err)
-		fib_info_offload_dec(fi);
-
-	return err == -EOPNOTSUPP ? 0 : err;
-}
-EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_del);
-
-/**
- *	switchdev_fib_ipv4_abort - Abort an IPv4 FIB operation
- *
- *	@fi: route FIB info structure
- */
-void switchdev_fib_ipv4_abort(struct fib_info *fi)
-{
-	/* There was a problem installing this route to the offload
-	 * device.  For now, until we come up with more refined
-	 * policy handling, abruptly end IPv4 fib offloading for
-	 * for entire net by flushing offload device(s) of all
-	 * IPv4 routes, and mark IPv4 fib offloading broken from
-	 * this point forward.
-	 */
-
-	fib_flush_external(fi->fib_net);
-	fi->fib_net->ipv4.fib_offload_disabled = true;
-}
-EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_abort);
-
 bool switchdev_port_same_parent_id(struct net_device *a,
 				   struct net_device *b)
 {
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 4/6] rocker: use FIB notifications instead of switchdev calls
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Until now, in order to offload a FIB entry to HW we use switchdev op.
Use the newly introduced FIB notifier infrasturucture to process
FIB entries to offload the in HW.
Abort mechanism is now handled within the rocker driver.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/rocker/rocker.h       |  15 ++--
 drivers/net/ethernet/rocker/rocker_main.c  | 120 +++++++++++++++++++++--------
 drivers/net/ethernet/rocker/rocker_ofdpa.c | 115 ++++++++++++++++++++-------
 3 files changed, 185 insertions(+), 65 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h
index 1ab995f..2eb9b49 100644
--- a/drivers/net/ethernet/rocker/rocker.h
+++ b/drivers/net/ethernet/rocker/rocker.h
@@ -15,6 +15,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/netdevice.h>
+#include <linux/notifier.h>
 #include <net/neighbour.h>
 #include <net/switchdev.h>
 
@@ -52,6 +53,9 @@ struct rocker_port {
 	struct rocker_dma_ring_info rx_ring;
 };
 
+struct rocker_port *rocker_port_dev_lower_find(struct net_device *dev,
+					       struct rocker *rocker);
+
 struct rocker_world_ops;
 
 struct rocker {
@@ -66,6 +70,7 @@ struct rocker {
 	spinlock_t cmd_ring_lock;		/* for cmd ring accesses */
 	struct rocker_dma_ring_info cmd_ring;
 	struct rocker_dma_ring_info event_ring;
+	struct notifier_block fib_nb;
 	struct rocker_world_ops *wops;
 	void *wpriv;
 };
@@ -117,11 +122,6 @@ struct rocker_world_ops {
 	int (*port_obj_vlan_dump)(const struct rocker_port *rocker_port,
 				  struct switchdev_obj_port_vlan *vlan,
 				  switchdev_obj_dump_cb_t *cb);
-	int (*port_obj_fib4_add)(struct rocker_port *rocker_port,
-				 const struct switchdev_obj_ipv4_fib *fib4,
-				 struct switchdev_trans *trans);
-	int (*port_obj_fib4_del)(struct rocker_port *rocker_port,
-				 const struct switchdev_obj_ipv4_fib *fib4);
 	int (*port_obj_fdb_add)(struct rocker_port *rocker_port,
 				const struct switchdev_obj_port_fdb *fdb,
 				struct switchdev_trans *trans);
@@ -141,6 +141,11 @@ struct rocker_world_ops {
 	int (*port_ev_mac_vlan_seen)(struct rocker_port *rocker_port,
 				     const unsigned char *addr,
 				     __be16 vlan_id);
+	int (*fib4_add)(struct rocker *rocker,
+			const struct fib_entry_notifier_info *fen_info);
+	int (*fib4_del)(struct rocker *rocker,
+			const struct fib_entry_notifier_info *fen_info);
+	void (*fib4_abort)(struct rocker *rocker);
 };
 
 extern struct rocker_world_ops rocker_ofdpa_ops;
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 1f0c086..5424fb3 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -1625,29 +1625,6 @@ rocker_world_port_obj_vlan_dump(const struct rocker_port *rocker_port,
 }
 
 static int
-rocker_world_port_obj_fib4_add(struct rocker_port *rocker_port,
-			       const struct switchdev_obj_ipv4_fib *fib4,
-			       struct switchdev_trans *trans)
-{
-	struct rocker_world_ops *wops = rocker_port->rocker->wops;
-
-	if (!wops->port_obj_fib4_add)
-		return -EOPNOTSUPP;
-	return wops->port_obj_fib4_add(rocker_port, fib4, trans);
-}
-
-static int
-rocker_world_port_obj_fib4_del(struct rocker_port *rocker_port,
-			       const struct switchdev_obj_ipv4_fib *fib4)
-{
-	struct rocker_world_ops *wops = rocker_port->rocker->wops;
-
-	if (!wops->port_obj_fib4_del)
-		return -EOPNOTSUPP;
-	return wops->port_obj_fib4_del(rocker_port, fib4);
-}
-
-static int
 rocker_world_port_obj_fdb_add(struct rocker_port *rocker_port,
 			      const struct switchdev_obj_port_fdb *fdb,
 			      struct switchdev_trans *trans)
@@ -1733,6 +1710,34 @@ static int rocker_world_port_ev_mac_vlan_seen(struct rocker_port *rocker_port,
 	return wops->port_ev_mac_vlan_seen(rocker_port, addr, vlan_id);
 }
 
+static int rocker_world_fib4_add(struct rocker *rocker,
+				 const struct fib_entry_notifier_info *fen_info)
+{
+	struct rocker_world_ops *wops = rocker->wops;
+
+	if (!wops->fib4_add)
+		return 0;
+	return wops->fib4_add(rocker, fen_info);
+}
+
+static int rocker_world_fib4_del(struct rocker *rocker,
+				 const struct fib_entry_notifier_info *fen_info)
+{
+	struct rocker_world_ops *wops = rocker->wops;
+
+	if (!wops->fib4_del)
+		return 0;
+	return wops->fib4_del(rocker, fen_info);
+}
+
+static void rocker_world_fib4_abort(struct rocker *rocker)
+{
+	struct rocker_world_ops *wops = rocker->wops;
+
+	if (wops->fib4_abort)
+		wops->fib4_abort(rocker);
+}
+
 /*****************
  * Net device ops
  *****************/
@@ -2096,11 +2101,6 @@ static int rocker_port_obj_add(struct net_device *dev,
 						     SWITCHDEV_OBJ_PORT_VLAN(obj),
 						     trans);
 		break;
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		err = rocker_world_port_obj_fib4_add(rocker_port,
-						     SWITCHDEV_OBJ_IPV4_FIB(obj),
-						     trans);
-		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = rocker_world_port_obj_fdb_add(rocker_port,
 						    SWITCHDEV_OBJ_PORT_FDB(obj),
@@ -2125,10 +2125,6 @@ static int rocker_port_obj_del(struct net_device *dev,
 		err = rocker_world_port_obj_vlan_del(rocker_port,
 						     SWITCHDEV_OBJ_PORT_VLAN(obj));
 		break;
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		err = rocker_world_port_obj_fib4_del(rocker_port,
-						     SWITCHDEV_OBJ_IPV4_FIB(obj));
-		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = rocker_world_port_obj_fdb_del(rocker_port,
 						    SWITCHDEV_OBJ_PORT_FDB(obj));
@@ -2175,6 +2171,31 @@ static const struct switchdev_ops rocker_port_switchdev_ops = {
 	.switchdev_port_obj_dump	= rocker_port_obj_dump,
 };
 
+static int rocker_router_fib_event(struct notifier_block *nb,
+				   unsigned long event, void *ptr)
+{
+	struct rocker *rocker = container_of(nb, struct rocker, fib_nb);
+	struct fib_entry_notifier_info *fen_info = ptr;
+	int err;
+
+	switch (event) {
+	case FIB_EVENT_ENTRY_ADD:
+		err = rocker_world_fib4_add(rocker, fen_info);
+		if (err)
+			rocker_world_fib4_abort(rocker);
+		else
+		break;
+	case FIB_EVENT_ENTRY_DEL:
+		rocker_world_fib4_del(rocker, fen_info);
+		break;
+	case FIB_EVENT_RULE_ADD: /* fall through */
+	case FIB_EVENT_RULE_DEL:
+		rocker_world_fib4_abort(rocker);
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
 /********************
  * ethtool interface
  ********************/
@@ -2740,6 +2761,9 @@ static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto err_probe_ports;
 	}
 
+	rocker->fib_nb.notifier_call = rocker_router_fib_event;
+	register_fib_notifier(&rocker->fib_nb);
+
 	dev_info(&pdev->dev, "Rocker switch with id %*phN\n",
 		 (int)sizeof(rocker->hw.id), &rocker->hw.id);
 
@@ -2771,6 +2795,7 @@ static void rocker_remove(struct pci_dev *pdev)
 {
 	struct rocker *rocker = pci_get_drvdata(pdev);
 
+	unregister_fib_notifier(&rocker->fib_nb);
 	rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
 	rocker_remove_ports(rocker);
 	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
@@ -2799,6 +2824,37 @@ static bool rocker_port_dev_check(const struct net_device *dev)
 	return dev->netdev_ops == &rocker_port_netdev_ops;
 }
 
+static bool rocker_port_dev_check_under(const struct net_device *dev,
+					struct rocker *rocker)
+{
+	struct rocker_port *rocker_port;
+
+	if (!rocker_port_dev_check(dev))
+		return false;
+
+	rocker_port = netdev_priv(dev);
+	if (rocker_port->rocker != rocker)
+		return false;
+
+	return true;
+}
+
+struct rocker_port *rocker_port_dev_lower_find(struct net_device *dev,
+					       struct rocker *rocker)
+{
+	struct net_device *lower_dev;
+	struct list_head *iter;
+
+	if (rocker_port_dev_check_under(dev, rocker))
+		return netdev_priv(dev);
+
+	netdev_for_each_all_lower_dev(dev, lower_dev, iter) {
+		if (rocker_port_dev_check_under(lower_dev, rocker))
+			return netdev_priv(lower_dev);
+	}
+	return NULL;
+}
+
 static int rocker_netdevice_event(struct notifier_block *unused,
 				  unsigned long event, void *ptr)
 {
diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index fcad907..431a608 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -99,6 +99,7 @@ struct ofdpa_flow_tbl_entry {
 	struct ofdpa_flow_tbl_key key;
 	size_t key_len;
 	u32 key_crc32; /* key */
+	struct fib_info *fi;
 };
 
 struct ofdpa_group_tbl_entry {
@@ -189,6 +190,7 @@ struct ofdpa {
 	spinlock_t neigh_tbl_lock;		/* for neigh tbl accesses */
 	u32 neigh_tbl_next_index;
 	unsigned long ageing_time;
+	bool fib_aborted;
 };
 
 struct ofdpa_port {
@@ -1043,7 +1045,8 @@ static int ofdpa_flow_tbl_ucast4_routing(struct ofdpa_port *ofdpa_port,
 					 __be16 eth_type, __be32 dst,
 					 __be32 dst_mask, u32 priority,
 					 enum rocker_of_dpa_table_id goto_tbl,
-					 u32 group_id, int flags)
+					 u32 group_id, struct fib_info *fi,
+					 int flags)
 {
 	struct ofdpa_flow_tbl_entry *entry;
 
@@ -1060,6 +1063,7 @@ static int ofdpa_flow_tbl_ucast4_routing(struct ofdpa_port *ofdpa_port,
 	entry->key.ucast_routing.group_id = group_id;
 	entry->key_len = offsetof(struct ofdpa_flow_tbl_key,
 				  ucast_routing.group_id);
+	entry->fi = fi;
 
 	return ofdpa_flow_tbl_do(ofdpa_port, trans, flags, entry);
 }
@@ -1425,7 +1429,7 @@ static int ofdpa_port_ipv4_neigh(struct ofdpa_port *ofdpa_port,
 						    eth_type, ip_addr,
 						    inet_make_mask(32),
 						    priority, goto_tbl,
-						    group_id, flags);
+						    group_id, NULL, flags);
 
 		if (err)
 			netdev_err(ofdpa_port->dev, "Error (%d) /32 unicast route %pI4 group 0x%08x\n",
@@ -2390,7 +2394,7 @@ found:
 
 static int ofdpa_port_fib_ipv4(struct ofdpa_port *ofdpa_port,
 			       struct switchdev_trans *trans, __be32 dst,
-			       int dst_len, const struct fib_info *fi,
+			       int dst_len, struct fib_info *fi,
 			       u32 tb_id, int flags)
 {
 	const struct fib_nh *nh;
@@ -2426,7 +2430,7 @@ static int ofdpa_port_fib_ipv4(struct ofdpa_port *ofdpa_port,
 
 	err = ofdpa_flow_tbl_ucast4_routing(ofdpa_port, trans, eth_type, dst,
 					    dst_mask, priority, goto_tbl,
-					    group_id, flags);
+					    group_id, fi, flags);
 	if (err)
 		netdev_err(ofdpa_port->dev, "Error (%d) IPv4 route %pI4\n",
 			   err, &dst);
@@ -2718,28 +2722,6 @@ static int ofdpa_port_obj_vlan_dump(const struct rocker_port *rocker_port,
 	return err;
 }
 
-static int ofdpa_port_obj_fib4_add(struct rocker_port *rocker_port,
-				   const struct switchdev_obj_ipv4_fib *fib4,
-				   struct switchdev_trans *trans)
-{
-	struct ofdpa_port *ofdpa_port = rocker_port->wpriv;
-
-	return ofdpa_port_fib_ipv4(ofdpa_port, trans,
-				   htonl(fib4->dst), fib4->dst_len,
-				   fib4->fi, fib4->tb_id, 0);
-}
-
-static int ofdpa_port_obj_fib4_del(struct rocker_port *rocker_port,
-				   const struct switchdev_obj_ipv4_fib *fib4)
-{
-	struct ofdpa_port *ofdpa_port = rocker_port->wpriv;
-
-	return ofdpa_port_fib_ipv4(ofdpa_port, NULL,
-				   htonl(fib4->dst), fib4->dst_len,
-				   fib4->fi, fib4->tb_id,
-				   OFDPA_OP_FLAG_REMOVE);
-}
-
 static int ofdpa_port_obj_fdb_add(struct rocker_port *rocker_port,
 				  const struct switchdev_obj_port_fdb *fdb,
 				  struct switchdev_trans *trans)
@@ -2922,6 +2904,82 @@ static int ofdpa_port_ev_mac_vlan_seen(struct rocker_port *rocker_port,
 	return ofdpa_port_fdb(ofdpa_port, NULL, addr, vlan_id, flags);
 }
 
+static struct ofdpa_port *ofdpa_port_dev_lower_find(struct net_device *dev,
+						    struct rocker *rocker)
+{
+	struct rocker_port *rocker_port;
+
+	rocker_port = rocker_port_dev_lower_find(dev, rocker);
+	return rocker_port ? rocker_port->wpriv : NULL;
+}
+
+static int ofdpa_fib4_add(struct rocker *rocker,
+			  const struct fib_entry_notifier_info *fen_info)
+{
+	struct ofdpa *ofdpa = rocker->wpriv;
+	struct ofdpa_port *ofdpa_port;
+	int err;
+
+	if (ofdpa->fib_aborted)
+		return 0;
+	ofdpa_port = ofdpa_port_dev_lower_find(fen_info->fi->fib_dev, rocker);
+	if (!ofdpa_port)
+		return 0;
+	err = ofdpa_port_fib_ipv4(ofdpa_port, NULL, htonl(fen_info->dst),
+				  fen_info->dst_len, fen_info->fi,
+				  fen_info->tb_id, 0);
+	if (err)
+		return err;
+	fib_info_offload_inc(fen_info->fi);
+	return 0;
+}
+
+static int ofdpa_fib4_del(struct rocker *rocker,
+			  const struct fib_entry_notifier_info *fen_info)
+{
+	struct ofdpa *ofdpa = rocker->wpriv;
+	struct ofdpa_port *ofdpa_port;
+
+	if (ofdpa->fib_aborted)
+		return 0;
+	ofdpa_port = ofdpa_port_dev_lower_find(fen_info->fi->fib_dev, rocker);
+	if (!ofdpa_port)
+		return 0;
+	fib_info_offload_dec(fen_info->fi);
+	return ofdpa_port_fib_ipv4(ofdpa_port, NULL, htonl(fen_info->dst),
+				   fen_info->dst_len, fen_info->fi,
+				   fen_info->tb_id, OFDPA_OP_FLAG_REMOVE);
+}
+
+static void ofdpa_fib4_abort(struct rocker *rocker)
+{
+	struct ofdpa *ofdpa = rocker->wpriv;
+	struct ofdpa_port *ofdpa_port;
+	struct ofdpa_flow_tbl_entry *flow_entry;
+	struct hlist_node *tmp;
+	unsigned long flags;
+	int bkt;
+
+	if (ofdpa->fib_aborted)
+		return;
+
+	spin_lock_irqsave(&ofdpa->flow_tbl_lock, flags);
+	hash_for_each_safe(ofdpa->flow_tbl, bkt, tmp, flow_entry, entry) {
+		if (flow_entry->key.tbl_id !=
+		    ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING)
+			continue;
+		ofdpa_port = ofdpa_port_dev_lower_find(flow_entry->fi->fib_dev,
+						       rocker);
+		if (!ofdpa_port)
+			continue;
+		fib_info_offload_dec(flow_entry->fi);
+		ofdpa_flow_tbl_del(ofdpa_port, NULL, OFDPA_OP_FLAG_REMOVE,
+				   flow_entry);
+	}
+	spin_unlock_irqrestore(&ofdpa->flow_tbl_lock, flags);
+	ofdpa->fib_aborted = true;
+}
+
 struct rocker_world_ops rocker_ofdpa_ops = {
 	.kind = "ofdpa",
 	.priv_size = sizeof(struct ofdpa),
@@ -2941,8 +2999,6 @@ struct rocker_world_ops rocker_ofdpa_ops = {
 	.port_obj_vlan_add = ofdpa_port_obj_vlan_add,
 	.port_obj_vlan_del = ofdpa_port_obj_vlan_del,
 	.port_obj_vlan_dump = ofdpa_port_obj_vlan_dump,
-	.port_obj_fib4_add = ofdpa_port_obj_fib4_add,
-	.port_obj_fib4_del = ofdpa_port_obj_fib4_del,
 	.port_obj_fdb_add = ofdpa_port_obj_fdb_add,
 	.port_obj_fdb_del = ofdpa_port_obj_fdb_del,
 	.port_obj_fdb_dump = ofdpa_port_obj_fdb_dump,
@@ -2951,4 +3007,7 @@ struct rocker_world_ops rocker_ofdpa_ops = {
 	.port_neigh_update = ofdpa_port_neigh_update,
 	.port_neigh_destroy = ofdpa_port_neigh_destroy,
 	.port_ev_mac_vlan_seen = ofdpa_port_ev_mac_vlan_seen,
+	.fib4_add = ofdpa_fib4_add,
+	.fib4_del = ofdpa_fib4_del,
+	.fib4_abort = ofdpa_fib4_abort,
 };
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 3/6] mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Until now, in order to offload a FIB entry to HW we use switchdev op.
However that has limits. Mainly in case we need to make the HW aware of
all route prefixes configured in kernel. HW needs to know those in order
to properly trap appropriate packets and pass the to kernel to do
the forwarding. Abort mechanism is now handled within the mlxsw driver.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |   9 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 386 ++++++++++++---------
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   |   9 -
 3 files changed, 216 insertions(+), 188 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 73cae21..9b22863 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -45,7 +45,7 @@
 #include <linux/list.h>
 #include <linux/dcbnl.h>
 #include <linux/in6.h>
-#include <net/switchdev.h>
+#include <linux/notifier.h>
 
 #include "port.h"
 #include "core.h"
@@ -257,6 +257,7 @@ struct mlxsw_sp_router {
 #define MLXSW_SP_UNRESOLVED_NH_PROBE_INTERVAL 5000 /* ms */
 	struct list_head nexthop_group_list;
 	struct list_head nexthop_neighs_list;
+	bool aborted;
 };
 
 struct mlxsw_sp {
@@ -296,6 +297,7 @@ struct mlxsw_sp {
 		struct mlxsw_sp_span_entry *entries;
 		int entries_count;
 	} span;
+	struct notifier_block fib_nb;
 };
 
 static inline struct mlxsw_sp_upper *
@@ -584,11 +586,6 @@ static inline void mlxsw_sp_port_dcb_fini(struct mlxsw_sp_port *mlxsw_sp_port)
 
 int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp);
 void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp);
-int mlxsw_sp_router_fib4_add(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4,
-			     struct switchdev_trans *trans);
-int mlxsw_sp_router_fib4_del(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4);
 int mlxsw_sp_router_neigh_construct(struct net_device *dev,
 				    struct neighbour *n);
 void mlxsw_sp_router_neigh_destroy(struct net_device *dev,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index cc653ac..2186e25 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -43,6 +43,7 @@
 #include <net/netevent.h>
 #include <net/neighbour.h>
 #include <net/arp.h>
+#include <net/ip_fib.h>
 
 #include "spectrum.h"
 #include "core.h"
@@ -122,17 +123,20 @@ struct mlxsw_sp_nexthop_group;
 
 struct mlxsw_sp_fib_entry {
 	struct rhash_head ht_node;
+	struct list_head list;
 	struct mlxsw_sp_fib_key key;
 	enum mlxsw_sp_fib_entry_type type;
 	unsigned int ref_count;
 	u16 rif; /* used for action local */
 	struct mlxsw_sp_vr *vr;
+	struct fib_info *fi;
 	struct list_head nexthop_group_node;
 	struct mlxsw_sp_nexthop_group *nh_group;
 };
 
 struct mlxsw_sp_fib {
 	struct rhashtable ht;
+	struct list_head entry_list;
 	unsigned long prefix_ref_count[MLXSW_SP_PREFIX_COUNT];
 	struct mlxsw_sp_prefix_usage prefix_usage;
 };
@@ -154,6 +158,7 @@ static int mlxsw_sp_fib_entry_insert(struct mlxsw_sp_fib *fib,
 				     mlxsw_sp_fib_ht_params);
 	if (err)
 		return err;
+	list_add_tail(&fib_entry->list, &fib->entry_list);
 	if (fib->prefix_ref_count[prefix_len]++ == 0)
 		mlxsw_sp_prefix_usage_set(&fib->prefix_usage, prefix_len);
 	return 0;
@@ -166,6 +171,7 @@ static void mlxsw_sp_fib_entry_remove(struct mlxsw_sp_fib *fib,
 
 	if (--fib->prefix_ref_count[prefix_len] == 0)
 		mlxsw_sp_prefix_usage_clear(&fib->prefix_usage, prefix_len);
+	list_del(&fib_entry->list);
 	rhashtable_remove_fast(&fib->ht, &fib_entry->ht_node,
 			       mlxsw_sp_fib_ht_params);
 }
@@ -216,6 +222,7 @@ static struct mlxsw_sp_fib *mlxsw_sp_fib_create(void)
 	err = rhashtable_init(&fib->ht, &mlxsw_sp_fib_ht_params);
 	if (err)
 		goto err_rhashtable_init;
+	INIT_LIST_HEAD(&fib->entry_list);
 	return fib;
 
 err_rhashtable_init:
@@ -1520,85 +1527,6 @@ static void mlxsw_sp_nexthop_group_put(struct mlxsw_sp *mlxsw_sp,
 	mlxsw_sp_nexthop_group_destroy(mlxsw_sp, nh_grp);
 }
 
-static int __mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
-{
-	struct mlxsw_resources *resources;
-	char rgcr_pl[MLXSW_REG_RGCR_LEN];
-	int err;
-
-	resources = mlxsw_core_resources_get(mlxsw_sp->core);
-	if (!resources->max_rif_valid)
-		return -EIO;
-
-	mlxsw_sp->rifs = kcalloc(resources->max_rif,
-				 sizeof(struct mlxsw_sp_rif *), GFP_KERNEL);
-	if (!mlxsw_sp->rifs)
-		return -ENOMEM;
-
-	mlxsw_reg_rgcr_pack(rgcr_pl, true);
-	mlxsw_reg_rgcr_max_router_interfaces_set(rgcr_pl, resources->max_rif);
-	err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
-	if (err)
-		goto err_rgcr_fail;
-
-	return 0;
-
-err_rgcr_fail:
-	kfree(mlxsw_sp->rifs);
-	return err;
-}
-
-static void __mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
-{
-	struct mlxsw_resources *resources;
-	char rgcr_pl[MLXSW_REG_RGCR_LEN];
-	int i;
-
-	mlxsw_reg_rgcr_pack(rgcr_pl, false);
-	mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
-
-	resources = mlxsw_core_resources_get(mlxsw_sp->core);
-	for (i = 0; i < resources->max_rif; i++)
-		WARN_ON_ONCE(mlxsw_sp->rifs[i]);
-
-	kfree(mlxsw_sp->rifs);
-}
-
-int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
-{
-	int err;
-
-	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_neighs_list);
-	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_group_list);
-	err = __mlxsw_sp_router_init(mlxsw_sp);
-	if (err)
-		return err;
-
-	mlxsw_sp_lpm_init(mlxsw_sp);
-	err = mlxsw_sp_vrs_init(mlxsw_sp);
-	if (err)
-		goto err_vrs_init;
-
-	err =  mlxsw_sp_neigh_init(mlxsw_sp);
-	if (err)
-		goto err_neigh_init;
-
-	return 0;
-
-err_neigh_init:
-	mlxsw_sp_vrs_fini(mlxsw_sp);
-err_vrs_init:
-	__mlxsw_sp_router_fini(mlxsw_sp);
-	return err;
-}
-
-void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
-{
-	mlxsw_sp_neigh_fini(mlxsw_sp);
-	mlxsw_sp_vrs_fini(mlxsw_sp);
-	__mlxsw_sp_router_fini(mlxsw_sp);
-}
-
 static int mlxsw_sp_fib_entry_op4_remote(struct mlxsw_sp *mlxsw_sp,
 					 struct mlxsw_sp_fib_entry *fib_entry,
 					 enum mlxsw_reg_ralue_op op)
@@ -1706,45 +1634,34 @@ static int mlxsw_sp_fib_entry_del(struct mlxsw_sp *mlxsw_sp,
 				     MLXSW_REG_RALUE_OP_WRITE_DELETE);
 }
 
-struct mlxsw_sp_router_fib4_add_info {
-	struct switchdev_trans_item tritem;
-	struct mlxsw_sp *mlxsw_sp;
-	struct mlxsw_sp_fib_entry *fib_entry;
-};
-
-static void mlxsw_sp_router_fib4_add_info_destroy(void const *data)
-{
-	const struct mlxsw_sp_router_fib4_add_info *info = data;
-	struct mlxsw_sp_fib_entry *fib_entry = info->fib_entry;
-	struct mlxsw_sp *mlxsw_sp = info->mlxsw_sp;
-	struct mlxsw_sp_vr *vr = fib_entry->vr;
-
-	mlxsw_sp_fib_entry_destroy(fib_entry);
-	mlxsw_sp_vr_put(mlxsw_sp, vr);
-	kfree(info);
-}
-
 static int
 mlxsw_sp_router_fib4_entry_init(struct mlxsw_sp *mlxsw_sp,
-				const struct switchdev_obj_ipv4_fib *fib4,
+				const struct fib_entry_notifier_info *fen_info,
 				struct mlxsw_sp_fib_entry *fib_entry)
 {
-	struct fib_info *fi = fib4->fi;
+	struct fib_info *fi = fen_info->fi;
 
-	if (fib4->type == RTN_LOCAL || fib4->type == RTN_BROADCAST) {
+	if (fen_info->type == RTN_LOCAL || fen_info->type == RTN_BROADCAST) {
 		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
 		return 0;
 	}
-	if (fib4->type != RTN_UNICAST)
+	if (fen_info->type != RTN_UNICAST)
 		return -EINVAL;
 
 	if (fi->fib_scope != RT_SCOPE_UNIVERSE) {
 		struct mlxsw_sp_rif *r;
 
-		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_LOCAL;
 		r = mlxsw_sp_rif_find_by_dev(mlxsw_sp, fi->fib_dev);
-		if (!r)
-			return -EINVAL;
+		if (!r) {
+			/* In case router interface is not found, that means
+			 * the fib entry was added to some device unrelated
+			 * to us. Set trap and pass the packets for
+			 * this prefix to kernel.
+			 */
+			fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
+			return 0;
+		}
+		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_LOCAL;
 		fib_entry->rif = r->rif;
 		return 0;
 	}
@@ -1763,37 +1680,38 @@ mlxsw_sp_router_fib4_entry_fini(struct mlxsw_sp *mlxsw_sp,
 
 static struct mlxsw_sp_fib_entry *
 mlxsw_sp_fib_entry_get(struct mlxsw_sp *mlxsw_sp,
-		       const struct switchdev_obj_ipv4_fib *fib4)
+		       const struct fib_entry_notifier_info *fen_info)
 {
 	struct mlxsw_sp_fib_entry *fib_entry;
-	struct fib_info *fi = fib4->fi;
+	struct fib_info *fi = fen_info->fi;
 	struct mlxsw_sp_vr *vr;
 	int err;
 
-	vr = mlxsw_sp_vr_get(mlxsw_sp, fib4->dst_len, fib4->tb_id,
+	vr = mlxsw_sp_vr_get(mlxsw_sp, fen_info->dst_len, fen_info->tb_id,
 			     MLXSW_SP_L3_PROTO_IPV4);
 	if (IS_ERR(vr))
 		return ERR_CAST(vr);
 
-	fib_entry = mlxsw_sp_fib_entry_lookup(vr->fib, &fib4->dst,
-					      sizeof(fib4->dst),
-					      fib4->dst_len, fi->fib_dev);
+	fib_entry = mlxsw_sp_fib_entry_lookup(vr->fib, &fen_info->dst,
+					      sizeof(fen_info->dst),
+					      fen_info->dst_len, fi->fib_dev);
 	if (fib_entry) {
 		/* Already exists, just take a reference */
 		fib_entry->ref_count++;
 		return fib_entry;
 	}
-	fib_entry = mlxsw_sp_fib_entry_create(vr->fib, &fib4->dst,
-					      sizeof(fib4->dst),
-					      fib4->dst_len, fi->fib_dev);
+	fib_entry = mlxsw_sp_fib_entry_create(vr->fib, &fen_info->dst,
+					      sizeof(fen_info->dst),
+					      fen_info->dst_len, fi->fib_dev);
 	if (!fib_entry) {
 		err = -ENOMEM;
 		goto err_fib_entry_create;
 	}
 	fib_entry->vr = vr;
+	fib_entry->fi = fi;
 	fib_entry->ref_count = 1;
 
-	err = mlxsw_sp_router_fib4_entry_init(mlxsw_sp, fib4, fib_entry);
+	err = mlxsw_sp_router_fib4_entry_init(mlxsw_sp, fen_info, fib_entry);
 	if (err)
 		goto err_fib4_entry_init;
 
@@ -1809,17 +1727,19 @@ err_fib_entry_create:
 
 static struct mlxsw_sp_fib_entry *
 mlxsw_sp_fib_entry_find(struct mlxsw_sp *mlxsw_sp,
-			const struct switchdev_obj_ipv4_fib *fib4)
+			const struct fib_entry_notifier_info *fen_info)
 {
 	struct mlxsw_sp_vr *vr;
 
-	vr = mlxsw_sp_vr_find(mlxsw_sp, fib4->tb_id, MLXSW_SP_L3_PROTO_IPV4);
+	vr = mlxsw_sp_vr_find(mlxsw_sp, fen_info->tb_id,
+			      MLXSW_SP_L3_PROTO_IPV4);
 	if (!vr)
 		return NULL;
 
-	return mlxsw_sp_fib_entry_lookup(vr->fib, &fib4->dst,
-					 sizeof(fib4->dst), fib4->dst_len,
-					 fib4->fi->fib_dev);
+	return mlxsw_sp_fib_entry_lookup(vr->fib, &fen_info->dst,
+					 sizeof(fen_info->dst),
+					 fen_info->dst_len,
+					 fen_info->fi->fib_dev);
 }
 
 static void mlxsw_sp_fib_entry_put(struct mlxsw_sp *mlxsw_sp,
@@ -1834,62 +1754,48 @@ static void mlxsw_sp_fib_entry_put(struct mlxsw_sp *mlxsw_sp,
 	mlxsw_sp_vr_put(mlxsw_sp, vr);
 }
 
-static int
-mlxsw_sp_router_fib4_add_prepare(struct mlxsw_sp_port *mlxsw_sp_port,
-				 const struct switchdev_obj_ipv4_fib *fib4,
-				 struct switchdev_trans *trans)
+static void mlxsw_sp_fib_entry_put_all(struct mlxsw_sp *mlxsw_sp,
+				       struct mlxsw_sp_fib_entry *fib_entry)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
-	struct mlxsw_sp_router_fib4_add_info *info;
-	struct mlxsw_sp_fib_entry *fib_entry;
-	int err;
-
-	fib_entry = mlxsw_sp_fib_entry_get(mlxsw_sp, fib4);
-	if (IS_ERR(fib_entry))
-		return PTR_ERR(fib_entry);
-
-	info = kmalloc(sizeof(*info), GFP_KERNEL);
-	if (!info) {
-		err = -ENOMEM;
-		goto err_alloc_info;
-	}
-	info->mlxsw_sp = mlxsw_sp;
-	info->fib_entry = fib_entry;
-	switchdev_trans_item_enqueue(trans, info,
-				     mlxsw_sp_router_fib4_add_info_destroy,
-				     &info->tritem);
-	return 0;
+	unsigned int last_ref_count;
 
-err_alloc_info:
-	mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
-	return err;
+	do {
+		last_ref_count = fib_entry->ref_count;
+		mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
+	} while (last_ref_count != 1);
 }
 
-static int
-mlxsw_sp_router_fib4_add_commit(struct mlxsw_sp_port *mlxsw_sp_port,
-				const struct switchdev_obj_ipv4_fib *fib4,
-				struct switchdev_trans *trans)
+static int mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
+				    struct fib_entry_notifier_info *fen_info)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
-	struct mlxsw_sp_router_fib4_add_info *info;
 	struct mlxsw_sp_fib_entry *fib_entry;
 	struct mlxsw_sp_vr *vr;
 	int err;
 
-	info = switchdev_trans_item_dequeue(trans);
-	fib_entry = info->fib_entry;
-	kfree(info);
+	if (mlxsw_sp->router.aborted)
+		return 0;
+
+	fib_entry = mlxsw_sp_fib_entry_get(mlxsw_sp, fen_info);
+	if (IS_ERR(fib_entry)) {
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to get FIB4 entry being added.\n");
+		return PTR_ERR(fib_entry);
+	}
 
 	if (fib_entry->ref_count != 1)
-		return 0;
+		goto skip_add;
 
 	vr = fib_entry->vr;
 	err = mlxsw_sp_fib_entry_insert(vr->fib, fib_entry);
-	if (err)
+	if (err) {
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to insert FIB4 entry being added.\n");
 		goto err_fib_entry_insert;
-	err = mlxsw_sp_fib_entry_update(mlxsw_sp_port->mlxsw_sp, fib_entry);
+	}
+	err = mlxsw_sp_fib_entry_update(mlxsw_sp, fib_entry);
 	if (err)
 		goto err_fib_entry_add;
+
+skip_add:
+	fib_info_offload_inc(fen_info->fi);
 	return 0;
 
 err_fib_entry_add:
@@ -1899,29 +1805,22 @@ err_fib_entry_insert:
 	return err;
 }
 
-int mlxsw_sp_router_fib4_add(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4,
-			     struct switchdev_trans *trans)
-{
-	if (switchdev_trans_ph_prepare(trans))
-		return mlxsw_sp_router_fib4_add_prepare(mlxsw_sp_port,
-							fib4, trans);
-	return mlxsw_sp_router_fib4_add_commit(mlxsw_sp_port,
-					       fib4, trans);
-}
-
-int mlxsw_sp_router_fib4_del(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4)
+static int mlxsw_sp_router_fib4_del(struct mlxsw_sp *mlxsw_sp,
+				    struct fib_entry_notifier_info *fen_info)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
 	struct mlxsw_sp_fib_entry *fib_entry;
 
-	fib_entry = mlxsw_sp_fib_entry_find(mlxsw_sp, fib4);
+	if (mlxsw_sp->router.aborted)
+		return 0;
+
+	fib_entry = mlxsw_sp_fib_entry_find(mlxsw_sp, fen_info);
 	if (!fib_entry) {
 		dev_warn(mlxsw_sp->bus_info->dev, "Failed to find FIB4 entry being removed.\n");
 		return -ENOENT;
 	}
 
+	fib_info_offload_dec(fen_info->fi);
+
 	if (fib_entry->ref_count == 1) {
 		mlxsw_sp_fib_entry_del(mlxsw_sp, fib_entry);
 		mlxsw_sp_fib_entry_remove(fib_entry->vr->fib, fib_entry);
@@ -1930,3 +1829,144 @@ int mlxsw_sp_router_fib4_del(struct mlxsw_sp_port *mlxsw_sp_port,
 	mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
 	return 0;
 }
+
+static void mlxsw_sp_router_fib4_abort(struct mlxsw_sp *mlxsw_sp)
+{
+	char ralue_pl[MLXSW_REG_RALUE_LEN];
+	struct mlxsw_resources *resources;
+	struct mlxsw_sp_fib_entry *fib_entry;
+	struct mlxsw_sp_fib_entry *tmp;
+	struct mlxsw_sp_vr *vr;
+	int i;
+	int err;
+
+	resources = mlxsw_core_resources_get(mlxsw_sp->core);
+	for (i = 0; i < resources->max_virtual_routers; i++) {
+		vr = &mlxsw_sp->router.vrs[i];
+		if (!vr->used)
+			continue;
+
+		list_for_each_entry_safe(fib_entry, tmp,
+					 &vr->fib->entry_list, list) {
+			fib_info_offload_dec(fib_entry->fi);
+			mlxsw_sp_fib_entry_del(mlxsw_sp, fib_entry);
+			mlxsw_sp_fib_entry_remove(fib_entry->vr->fib,
+						  fib_entry);
+			mlxsw_sp_fib_entry_put_all(mlxsw_sp, fib_entry);
+		}
+	}
+	mlxsw_sp->router.aborted = true;
+
+	mlxsw_reg_ralue_pack4(ralue_pl, MLXSW_SP_L3_PROTO_IPV4,
+			      MLXSW_REG_RALUE_OP_WRITE_WRITE, 0, 0, 0);
+	mlxsw_reg_ralue_act_ip2me_pack(ralue_pl);
+	err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ralue), ralue_pl);
+	if (err)
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to set abort trap.\n");
+}
+
+static int __mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
+{
+	struct mlxsw_resources *resources;
+	char rgcr_pl[MLXSW_REG_RGCR_LEN];
+	int err;
+
+	resources = mlxsw_core_resources_get(mlxsw_sp->core);
+	if (!resources->max_rif_valid)
+		return -EIO;
+
+	mlxsw_sp->rifs = kcalloc(resources->max_rif,
+				 sizeof(struct mlxsw_sp_rif *), GFP_KERNEL);
+	if (!mlxsw_sp->rifs)
+		return -ENOMEM;
+
+	mlxsw_reg_rgcr_pack(rgcr_pl, true);
+	mlxsw_reg_rgcr_max_router_interfaces_set(rgcr_pl, resources->max_rif);
+	err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
+	if (err)
+		goto err_rgcr_fail;
+
+	return 0;
+
+err_rgcr_fail:
+	kfree(mlxsw_sp->rifs);
+	return err;
+}
+
+static void __mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	struct mlxsw_resources *resources;
+	char rgcr_pl[MLXSW_REG_RGCR_LEN];
+	int i;
+
+	mlxsw_reg_rgcr_pack(rgcr_pl, false);
+	mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
+
+	resources = mlxsw_core_resources_get(mlxsw_sp->core);
+	for (i = 0; i < resources->max_rif; i++)
+		WARN_ON_ONCE(mlxsw_sp->rifs[i]);
+
+	kfree(mlxsw_sp->rifs);
+}
+
+static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
+				     unsigned long event, void *ptr)
+{
+	struct mlxsw_sp *mlxsw_sp = container_of(nb, struct mlxsw_sp, fib_nb);
+	struct fib_entry_notifier_info *fen_info = ptr;
+	int err;
+
+	switch (event) {
+	case FIB_EVENT_ENTRY_ADD:
+		err = mlxsw_sp_router_fib4_add(mlxsw_sp, fen_info);
+		if (err)
+			mlxsw_sp_router_fib4_abort(mlxsw_sp);
+		break;
+	case FIB_EVENT_ENTRY_DEL:
+		mlxsw_sp_router_fib4_del(mlxsw_sp, fen_info);
+		break;
+	case FIB_EVENT_RULE_ADD: /* fall through */
+	case FIB_EVENT_RULE_DEL:
+		mlxsw_sp_router_fib4_abort(mlxsw_sp);
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
+{
+	int err;
+
+	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_neighs_list);
+	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_group_list);
+	err = __mlxsw_sp_router_init(mlxsw_sp);
+	if (err)
+		return err;
+
+	mlxsw_sp_lpm_init(mlxsw_sp);
+	err = mlxsw_sp_vrs_init(mlxsw_sp);
+	if (err)
+		goto err_vrs_init;
+
+	err =  mlxsw_sp_neigh_init(mlxsw_sp);
+	if (err)
+		goto err_neigh_init;
+
+	mlxsw_sp->fib_nb.notifier_call = mlxsw_sp_router_fib_event;
+	register_fib_notifier(&mlxsw_sp->fib_nb);
+	return 0;
+
+err_neigh_init:
+	mlxsw_sp_vrs_fini(mlxsw_sp);
+err_vrs_init:
+	__mlxsw_sp_router_fini(mlxsw_sp);
+	return err;
+}
+
+void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	unregister_fib_notifier(&mlxsw_sp->fib_nb);
+	mlxsw_sp_neigh_fini(mlxsw_sp);
+	mlxsw_sp_vrs_fini(mlxsw_sp);
+	__mlxsw_sp_router_fini(mlxsw_sp);
+}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 2b04b76..5e00c79 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1044,11 +1044,6 @@ static int mlxsw_sp_port_obj_add(struct net_device *dev,
 					      SWITCHDEV_OBJ_PORT_VLAN(obj),
 					      trans);
 		break;
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		err = mlxsw_sp_router_fib4_add(mlxsw_sp_port,
-					       SWITCHDEV_OBJ_IPV4_FIB(obj),
-					       trans);
-		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = mlxsw_sp_port_fdb_static_add(mlxsw_sp_port,
 						   SWITCHDEV_OBJ_PORT_FDB(obj),
@@ -1181,10 +1176,6 @@ static int mlxsw_sp_port_obj_del(struct net_device *dev,
 		err = mlxsw_sp_port_vlans_del(mlxsw_sp_port,
 					      SWITCHDEV_OBJ_PORT_VLAN(obj));
 		break;
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		err = mlxsw_sp_router_fib4_del(mlxsw_sp_port,
-					       SWITCHDEV_OBJ_IPV4_FIB(obj));
-		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = mlxsw_sp_port_fdb_static_del(mlxsw_sp_port,
 						   SWITCHDEV_OBJ_PORT_FDB(obj));
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 2/6] fib: introduce FIB info offload flag helpers
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

These helpers are to be used in case someone offloads the FIB entry. The
result is that if the entry is offloaded to at least one device, the
offload flag is set.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip_fib.h      | 13 +++++++++++++
 net/switchdev/switchdev.c |  4 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 116a9c0..ffccf17 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -123,6 +123,7 @@ struct fib_info {
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	int			fib_weight;
 #endif
+	unsigned int		fib_offload_cnt;
 	struct rcu_head		rcu;
 	struct fib_nh		fib_nh[0];
 #define fib_dev		fib_nh[0].nh_dev
@@ -174,6 +175,18 @@ struct fib_result_nl {
 
 __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
 
+static inline void fib_info_offload_inc(struct fib_info *fi)
+{
+	fi->fib_offload_cnt++;
+	fi->fib_flags |= RTNH_F_OFFLOAD;
+}
+
+static inline void fib_info_offload_dec(struct fib_info *fi)
+{
+	if (--fi->fib_offload_cnt == 0)
+		fi->fib_flags &= ~RTNH_F_OFFLOAD;
+}
+
 #define FIB_RES_SADDR(net, res)				\
 	((FIB_RES_NH(res).nh_saddr_genid ==		\
 	  atomic_read(&(net)->ipv4.dev_addr_genid)) ?	\
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 10b8193..abd8d2a 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -1216,7 +1216,7 @@ int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
 	ipv4_fib.obj.orig_dev = dev;
 	err = switchdev_port_obj_add(dev, &ipv4_fib.obj);
 	if (!err)
-		fi->fib_flags |= RTNH_F_OFFLOAD;
+		fib_info_offload_inc(fi);
 
 	return err == -EOPNOTSUPP ? 0 : err;
 }
@@ -1260,7 +1260,7 @@ int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
 	ipv4_fib.obj.orig_dev = dev;
 	err = switchdev_port_obj_del(dev, &ipv4_fib.obj);
 	if (!err)
-		fi->fib_flags &= ~RTNH_F_OFFLOAD;
+		fib_info_offload_dec(fi);
 
 	return err == -EOPNOTSUPP ? 0 : err;
 }
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 1/6] fib: introduce FIB notification infrastructure
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

This allows to pass information about added/deleted FIB entries/rules to
whoever is interested. This is done in a very similar way as devinet
notifies address additions/removals.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip_fib.h    | 34 ++++++++++++++++++++++++---
 net/ipv4/fib_frontend.c | 16 ++++++-------
 net/ipv4/fib_rules.c    | 10 ++++++++
 net/ipv4/fib_trie.c     | 62 ++++++++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 108 insertions(+), 14 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 7d4a72e..116a9c0 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -22,6 +22,7 @@
 #include <net/fib_rules.h>
 #include <net/inetpeer.h>
 #include <linux/percpu.h>
+#include <linux/notifier.h>
 
 struct fib_config {
 	u8			fc_dst_len;
@@ -185,6 +186,33 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
 #define FIB_RES_PREFSRC(net, res)	((res).fi->fib_prefsrc ? : \
 					 FIB_RES_SADDR(net, res))
 
+struct fib_notifier_info {
+	struct net *net;
+};
+
+struct fib_entry_notifier_info {
+	struct fib_notifier_info info; /* must be first */
+	u32 dst;
+	int dst_len;
+	struct fib_info *fi;
+	u8 tos;
+	u8 type;
+	u32 tb_id;
+	u32 nlflags;
+};
+
+enum fib_event_type {
+	FIB_EVENT_ENTRY_ADD,
+	FIB_EVENT_ENTRY_DEL,
+	FIB_EVENT_RULE_ADD,
+	FIB_EVENT_RULE_DEL,
+};
+
+int register_fib_notifier(struct notifier_block *nb);
+int unregister_fib_notifier(struct notifier_block *nb);
+int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
+		       struct fib_notifier_info *info);
+
 struct fib_table {
 	struct hlist_node	tb_hlist;
 	u32			tb_id;
@@ -196,11 +224,11 @@ struct fib_table {
 
 int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		     struct fib_result *res, int fib_flags);
-int fib_table_insert(struct fib_table *, struct fib_config *);
-int fib_table_delete(struct fib_table *, struct fib_config *);
+int fib_table_insert(struct net *, struct fib_table *, struct fib_config *);
+int fib_table_delete(struct net *, struct fib_table *, struct fib_config *);
 int fib_table_dump(struct fib_table *table, struct sk_buff *skb,
 		   struct netlink_callback *cb);
-int fib_table_flush(struct fib_table *table);
+int fib_table_flush(struct net *net, struct fib_table *table);
 struct fib_table *fib_trie_unmerge(struct fib_table *main_tb);
 void fib_table_flush_external(struct fib_table *table);
 void fib_free_table(struct fib_table *tb);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4e56a4c..86c43dc 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -182,7 +182,7 @@ static void fib_flush(struct net *net)
 		struct fib_table *tb;
 
 		hlist_for_each_entry_safe(tb, tmp, head, tb_hlist)
-			flushed += fib_table_flush(tb);
+			flushed += fib_table_flush(net, tb);
 	}
 
 	if (flushed)
@@ -590,13 +590,13 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 			if (cmd == SIOCDELRT) {
 				tb = fib_get_table(net, cfg.fc_table);
 				if (tb)
-					err = fib_table_delete(tb, &cfg);
+					err = fib_table_delete(net, tb, &cfg);
 				else
 					err = -ESRCH;
 			} else {
 				tb = fib_new_table(net, cfg.fc_table);
 				if (tb)
-					err = fib_table_insert(tb, &cfg);
+					err = fib_table_insert(net, tb, &cfg);
 				else
 					err = -ENOBUFS;
 			}
@@ -719,7 +719,7 @@ static int inet_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto errout;
 	}
 
-	err = fib_table_delete(tb, &cfg);
+	err = fib_table_delete(net, tb, &cfg);
 errout:
 	return err;
 }
@@ -741,7 +741,7 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto errout;
 	}
 
-	err = fib_table_insert(tb, &cfg);
+	err = fib_table_insert(net, tb, &cfg);
 errout:
 	return err;
 }
@@ -828,9 +828,9 @@ static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct in_ifad
 		cfg.fc_scope = RT_SCOPE_HOST;
 
 	if (cmd == RTM_NEWROUTE)
-		fib_table_insert(tb, &cfg);
+		fib_table_insert(net, tb, &cfg);
 	else
-		fib_table_delete(tb, &cfg);
+		fib_table_delete(net, tb, &cfg);
 }
 
 void fib_add_ifaddr(struct in_ifaddr *ifa)
@@ -1254,7 +1254,7 @@ static void ip_fib_net_exit(struct net *net)
 
 		hlist_for_each_entry_safe(tb, tmp, head, tb_hlist) {
 			hlist_del(&tb->tb_hlist);
-			fib_table_flush(tb);
+			fib_table_flush(net, tb);
 			fib_free_table(tb);
 		}
 	}
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 770bebe..ebadf6b 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -164,6 +164,14 @@ static struct fib_table *fib_empty_table(struct net *net)
 	return NULL;
 }
 
+static int call_fib_rule_notifiers(struct net *net,
+				   enum fib_event_type event_type)
+{
+	struct fib_notifier_info info;
+
+	return call_fib_notifiers(net, event_type, &info);
+}
+
 static const struct nla_policy fib4_rule_policy[FRA_MAX+1] = {
 	FRA_GENERIC_POLICY,
 	[FRA_FLOW]	= { .type = NLA_U32 },
@@ -221,6 +229,7 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
 
 	net->ipv4.fib_has_custom_rules = true;
 	fib_flush_external(rule->fr_net);
+	call_fib_rule_notifiers(net, FIB_EVENT_RULE_ADD);
 
 	err = 0;
 errout:
@@ -243,6 +252,7 @@ static int fib4_rule_delete(struct fib_rule *rule)
 #endif
 	net->ipv4.fib_has_custom_rules = true;
 	fib_flush_external(rule->fr_net);
+	call_fib_rule_notifiers(net, FIB_EVENT_RULE_DEL);
 errout:
 	return err;
 }
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 241f27b..51a4537 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -73,6 +73,7 @@
 #include <linux/slab.h>
 #include <linux/export.h>
 #include <linux/vmalloc.h>
+#include <linux/notifier.h>
 #include <net/net_namespace.h>
 #include <net/ip.h>
 #include <net/protocol.h>
@@ -84,6 +85,44 @@
 #include <trace/events/fib.h>
 #include "fib_lookup.h"
 
+static BLOCKING_NOTIFIER_HEAD(fib_chain);
+
+int register_fib_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&fib_chain, nb);
+}
+EXPORT_SYMBOL(register_fib_notifier);
+
+int unregister_fib_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&fib_chain, nb);
+}
+EXPORT_SYMBOL(unregister_fib_notifier);
+
+int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
+		       struct fib_notifier_info *info)
+{
+	info->net = net;
+	return blocking_notifier_call_chain(&fib_chain, event_type, info);
+}
+
+static int call_fib_entry_notifiers(struct net *net,
+				    enum fib_event_type event_type, u32 dst,
+				    int dst_len, struct fib_info *fi,
+				    u8 tos, u8 type, u32 tb_id, u32 nlflags)
+{
+	struct fib_entry_notifier_info info = {
+		.dst = dst,
+		.dst_len = dst_len,
+		.fi = fi,
+		.tos = tos,
+		.type = type,
+		.tb_id = tb_id,
+		.nlflags = nlflags,
+	};
+	return call_fib_notifiers(net, event_type, &info.info);
+}
+
 #define MAX_STAT_DEPTH 32
 
 #define KEYLENGTH	(8*sizeof(t_key))
@@ -1076,7 +1115,8 @@ static int fib_insert_alias(struct trie *t, struct key_vector *tp,
 }
 
 /* Caller must hold RTNL. */
-int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
+int fib_table_insert(struct net *net, struct fib_table *tb,
+		     struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
 	struct fib_alias *fa, *new_fa;
@@ -1193,6 +1233,11 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 			fib_release_info(fi_drop);
 			if (state & FA_S_ACCESSED)
 				rt_cache_flush(cfg->fc_nlinfo.nl_net);
+
+			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_ADD,
+						 key, plen, fi,
+						 new_fa->fa_tos, cfg->fc_type,
+						 tb->tb_id, cfg->fc_nlflags);
 			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
 				tb->tb_id, &cfg->fc_nlinfo, nlflags);
 
@@ -1245,6 +1290,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 		tb->tb_num_default++;
 
 	rt_cache_flush(cfg->fc_nlinfo.nl_net);
+	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_ADD, key, plen, fi, tos,
+				 cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
 	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id,
 		  &cfg->fc_nlinfo, nlflags);
 succeeded:
@@ -1490,7 +1537,8 @@ static void fib_remove_alias(struct trie *t, struct key_vector *tp,
 }
 
 /* Caller must hold RTNL. */
-int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
+int fib_table_delete(struct net *net, struct fib_table *tb,
+		     struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *) tb->tb_data;
 	struct fib_alias *fa, *fa_to_delete;
@@ -1546,6 +1594,9 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
 			       cfg->fc_type, tb->tb_id);
 
+	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key, plen,
+				 fa_to_delete->fa_info, tos, cfg->fc_type,
+				 tb->tb_id, 0);
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 
@@ -1809,7 +1860,7 @@ void fib_table_flush_external(struct fib_table *tb)
 }
 
 /* Caller must hold RTNL. */
-int fib_table_flush(struct fib_table *tb)
+int fib_table_flush(struct net *net, struct fib_table *tb)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
 	struct key_vector *pn = t->kv;
@@ -1861,6 +1912,11 @@ int fib_table_flush(struct fib_table *tb)
 			switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
 					       fi, fa->fa_tos, fa->fa_type,
 					       tb->tb_id);
+			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL,
+						 n->key,
+						 KEYLENGTH - fa->fa_slen,
+						 fi, fa->fa_tos, fa->fa_type,
+						 tb->tb_id, 0);
 			hlist_del_rcu(&fa->fa_list);
 			fib_release_info(fa->fa_info);
 			alias_free_mem_rcu(fa);
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 0/6] fib offload: switch to notifier
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john

From: Jiri Pirko <jiri@mellanox.com>

The goal of this patchset is to allow driver to propagate all prefixes
configured in kernel down HW. This is necessary for routing to work
as expected. If we don't do that HW might forward prefixes known to kernel
incorrectly. Take an example when default route is set in switch HW and there
is an IP address set on a management (non-switch) port.

Currently, only FIB entries related to the switch port netdev are
offloaded using switchdev ops. This model is not extendable so the
first patch introduces a replacement: notifier to propagate FIB entry
additions and removals to whoever is interested.

The second patch introduces couple of helpers to deal with RTNH_F_OFFLOAD
flags. Currently it is set in switchdev core. There the assumption is
that only one offload device exists. But for FIB notifier, we assume
multiple offload devices. So the patch introduces a per FIB entry
reference counter and helpers use it in order to achieve this:
   0 means RTNH_F_OFFLOAD is not set, no device offloads this entry
   n means RTNH_F_OFFLOAD is set and the entry is offloaded by n devices

Patches 3 and 4 convert mlxsw and rocker to adopt this new way, registering
one notifier block for each asic instance. Both of these patches also
implement internal "abort" mechanism.

Using switchdev ops, "abort" is called by switchdev core whenever there is
an error during FIB entry add offload. This leads to removal of all
offloaded entries on system by fib_trie code.

Now the new notifier assumes the driver takes care of the abort action.
Here's why:
1) The fact that one HW cannot offload an entry does not mean that the
   others can't do it. So let only one entity to abort and leave the rest
   to work happily.
2) The driver knows what to in order to properly abort. For example,
   currently abort is broken for mlxsw, as for Spectrum there is a need
   to set 0.0.0.0/0 trap in RALUE register.

The fifth patch removes the old, no longer used FIB offload infrastructure.

The last patch reflects the changes into switchdev documentation file.

Jiri Pirko (6):
  fib: introduce FIB notification infrastructure
  fib: introduce FIB info offload flag helpers
  mlxsw: spectrum_router: Use FIB notifications instead of switchdev
    calls
  rocker: use FIB notifications instead of switchdev calls
  switchdev: remove FIB offload infrastructure
  doc: update switchdev L3 section

 Documentation/networking/switchdev.txt             |  27 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |   9 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 386 ++++++++++++---------
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   |   9 -
 drivers/net/ethernet/rocker/rocker.h               |  15 +-
 drivers/net/ethernet/rocker/rocker_main.c          | 120 +++++--
 drivers/net/ethernet/rocker/rocker_ofdpa.c         | 115 ++++--
 include/net/ip_fib.h                               |  49 ++-
 include/net/switchdev.h                            |  40 ---
 net/ipv4/fib_frontend.c                            |  29 +-
 net/ipv4/fib_rules.c                               |  12 +-
 net/ipv4/fib_trie.c                                | 166 ++++-----
 net/switchdev/switchdev.c                          | 181 ----------
 13 files changed, 537 insertions(+), 621 deletions(-)

-- 
2.5.5

^ permalink raw reply

* [PATCH next v3 2/2] sctp: make use of SCTP_TRUNC4 macro
From: Marcelo Ricardo Leitner @ 2016-09-21 11:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, David Miller,
	David Laight
In-Reply-To: <cover.1474457954.git.marcelo.leitner@gmail.com>

And avoid the usage of '&~3'. This is the last place still not using
the macro.
Also break the line to make it easier to read.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
When I checked it the other day I thought I had this patch applied by
the moment but I hadn't.

v2: updated patch summary

 net/sctp/chunk.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 76eae828ec891076bc2360bf0ee89c3b650ef986..8afe2e90d003b9e1bbf233fd4e2fde0538ac65f6 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -195,9 +195,10 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 	/* This is the biggest possible DATA chunk that can fit into
 	 * the packet
 	 */
-	max_data = (asoc->pathmtu -
-		sctp_sk(asoc->base.sk)->pf->af->net_header_len -
-		sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk)) & ~3;
+	max_data = asoc->pathmtu -
+		   sctp_sk(asoc->base.sk)->pf->af->net_header_len -
+		   sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk);
+	max_data = SCTP_TRUNC4(max_data);
 
 	max = asoc->frag_point;
 	/* If the the peer requested that we authenticate DATA chunks
-- 
2.7.4

^ permalink raw reply related

* [PATCH next v3 1/2] sctp: rename WORD_TRUNC/ROUND macros
From: Marcelo Ricardo Leitner @ 2016-09-21 11:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, David Miller,
	David Laight
In-Reply-To: <cover.1474457954.git.marcelo.leitner@gmail.com>

To something more meaningful these days, specially because this is
working on packet headers or lengths and which are not tied to any CPU
arch but to the protocol itself.

So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4.

Reported-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
v3:
- Name it SCTP_PAD4 instead of SCTP_ALIGN4, as suggested by David Laight

 include/net/sctp/sctp.h  | 10 +++++-----
 net/netfilter/xt_sctp.c  |  2 +-
 net/sctp/associola.c     |  2 +-
 net/sctp/chunk.c         |  6 +++---
 net/sctp/input.c         |  8 ++++----
 net/sctp/inqueue.c       |  2 +-
 net/sctp/output.c        | 12 ++++++------
 net/sctp/sm_make_chunk.c | 28 ++++++++++++++--------------
 net/sctp/sm_statefuns.c  |  6 +++---
 net/sctp/transport.c     |  4 ++--
 net/sctp/ulpevent.c      |  4 ++--
 11 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 632e205ca54bfe85124753e09445251056e19aa7..87a7f42e763963255afc32deaa774570bf03551a 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -83,9 +83,9 @@
 #endif
 
 /* Round an int up to the next multiple of 4.  */
-#define WORD_ROUND(s) (((s)+3)&~3)
+#define SCTP_PAD4(s) (((s)+3)&~3)
 /* Truncate to the previous multiple of 4.  */
-#define WORD_TRUNC(s) ((s)&~3)
+#define SCTP_TRUNC4(s) ((s)&~3)
 
 /*
  * Function declarations.
@@ -433,7 +433,7 @@ static inline int sctp_frag_point(const struct sctp_association *asoc, int pmtu)
 	if (asoc->user_frag)
 		frag = min_t(int, frag, asoc->user_frag);
 
-	frag = WORD_TRUNC(min_t(int, frag, SCTP_MAX_CHUNK_LEN));
+	frag = SCTP_TRUNC4(min_t(int, frag, SCTP_MAX_CHUNK_LEN));
 
 	return frag;
 }
@@ -462,7 +462,7 @@ _sctp_walk_params((pos), (chunk), ntohs((chunk)->chunk_hdr.length), member)
 for (pos.v = chunk->member;\
      pos.v <= (void *)chunk + end - ntohs(pos.p->length) &&\
      ntohs(pos.p->length) >= sizeof(sctp_paramhdr_t);\
-     pos.v += WORD_ROUND(ntohs(pos.p->length)))
+     pos.v += SCTP_PAD4(ntohs(pos.p->length)))
 
 #define sctp_walk_errors(err, chunk_hdr)\
 _sctp_walk_errors((err), (chunk_hdr), ntohs((chunk_hdr)->length))
@@ -472,7 +472,7 @@ for (err = (sctp_errhdr_t *)((void *)chunk_hdr + \
 	    sizeof(sctp_chunkhdr_t));\
      (void *)err <= (void *)chunk_hdr + end - ntohs(err->length) &&\
      ntohs(err->length) >= sizeof(sctp_errhdr_t); \
-     err = (sctp_errhdr_t *)((void *)err + WORD_ROUND(ntohs(err->length))))
+     err = (sctp_errhdr_t *)((void *)err + SCTP_PAD4(ntohs(err->length))))
 
 #define sctp_walk_fwdtsn(pos, chunk)\
 _sctp_walk_fwdtsn((pos), (chunk), ntohs((chunk)->chunk_hdr->length) - sizeof(struct sctp_fwdtsn_chunk))
diff --git a/net/netfilter/xt_sctp.c b/net/netfilter/xt_sctp.c
index ef36a56a02c6881c58296b2bf45c4b99d3836456..4dedb96d1a067de6c9766017642e16f9b148d616 100644
--- a/net/netfilter/xt_sctp.c
+++ b/net/netfilter/xt_sctp.c
@@ -68,7 +68,7 @@ match_packet(const struct sk_buff *skb,
 			 ++i, offset, sch->type, htons(sch->length),
 			 sch->flags);
 #endif
-		offset += WORD_ROUND(ntohs(sch->length));
+		offset += SCTP_PAD4(ntohs(sch->length));
 
 		pr_debug("skb->len: %d\toffset: %d\n", skb->len, offset);
 
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 1c23060c41a669f38f4e2d244cfb61c85a522be6..f10d3397f917986d25240fc42f6a33ae8049e7b7 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1408,7 +1408,7 @@ void sctp_assoc_sync_pmtu(struct sock *sk, struct sctp_association *asoc)
 				transports) {
 		if (t->pmtu_pending && t->dst) {
 			sctp_transport_update_pmtu(sk, t,
-						   WORD_TRUNC(dst_mtu(t->dst)));
+						   SCTP_TRUNC4(dst_mtu(t->dst)));
 			t->pmtu_pending = 0;
 		}
 		if (!pmtu || (t->pathmtu < pmtu))
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index af9cc8055465b18e9754a5542fc7bd43f9dad240..76eae828ec891076bc2360bf0ee89c3b650ef986 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -208,8 +208,8 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 		struct sctp_hmac *hmac_desc = sctp_auth_asoc_get_hmac(asoc);
 
 		if (hmac_desc)
-			max_data -= WORD_ROUND(sizeof(sctp_auth_chunk_t) +
-					    hmac_desc->hmac_len);
+			max_data -= SCTP_PAD4(sizeof(sctp_auth_chunk_t) +
+					      hmac_desc->hmac_len);
 	}
 
 	/* Now, check if we need to reduce our max */
@@ -229,7 +229,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 	    asoc->outqueue.out_qlen == 0 &&
 	    list_empty(&asoc->outqueue.retransmit) &&
 	    msg_len > max)
-		max_data -= WORD_ROUND(sizeof(sctp_sack_chunk_t));
+		max_data -= SCTP_PAD4(sizeof(sctp_sack_chunk_t));
 
 	/* Encourage Cookie-ECHO bundling. */
 	if (asoc->state < SCTP_STATE_COOKIE_ECHOED)
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 69444d32ecda6cd1a4924911172feba89c5ae976..a1d85065bfc0bb38cab03c925fe230103c13d593 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -605,7 +605,7 @@ void sctp_v4_err(struct sk_buff *skb, __u32 info)
 		/* PMTU discovery (RFC1191) */
 		if (ICMP_FRAG_NEEDED == code) {
 			sctp_icmp_frag_needed(sk, asoc, transport,
-					      WORD_TRUNC(info));
+					      SCTP_TRUNC4(info));
 			goto out_unlock;
 		} else {
 			if (ICMP_PROT_UNREACH == code) {
@@ -673,7 +673,7 @@ static int sctp_rcv_ootb(struct sk_buff *skb)
 		if (ntohs(ch->length) < sizeof(sctp_chunkhdr_t))
 			break;
 
-		ch_end = offset + WORD_ROUND(ntohs(ch->length));
+		ch_end = offset + SCTP_PAD4(ntohs(ch->length));
 		if (ch_end > skb->len)
 			break;
 
@@ -1121,7 +1121,7 @@ static struct sctp_association *__sctp_rcv_walk_lookup(struct net *net,
 		if (ntohs(ch->length) < sizeof(sctp_chunkhdr_t))
 			break;
 
-		ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length));
+		ch_end = ((__u8 *)ch) + SCTP_PAD4(ntohs(ch->length));
 		if (ch_end > skb_tail_pointer(skb))
 			break;
 
@@ -1190,7 +1190,7 @@ static struct sctp_association *__sctp_rcv_lookup_harder(struct net *net,
 	 * that the chunk length doesn't cause overflow.  Otherwise, we'll
 	 * walk off the end.
 	 */
-	if (WORD_ROUND(ntohs(ch->length)) > skb->len)
+	if (SCTP_PAD4(ntohs(ch->length)) > skb->len)
 		return NULL;
 
 	/* If this is INIT/INIT-ACK look inside the chunk too. */
diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
index 6437aa97cfd79f14c633499c2b131389204c435b..f731de3e8428c951583c3217586da0609bb25d3a 100644
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -213,7 +213,7 @@ new_skb:
 	}
 
 	chunk->chunk_hdr = ch;
-	chunk->chunk_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length));
+	chunk->chunk_end = ((__u8 *)ch) + SCTP_PAD4(ntohs(ch->length));
 	skb_pull(chunk->skb, sizeof(sctp_chunkhdr_t));
 	chunk->subh.v = NULL; /* Subheader is no longer valid.  */
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 0c605ec74dc4ac2b300d7e52107597bb1be5fefd..2a5c1896d18fa674f0e0e84ff6787d04914b2b73 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -297,7 +297,7 @@ static sctp_xmit_t __sctp_packet_append_chunk(struct sctp_packet *packet,
 					      struct sctp_chunk *chunk)
 {
 	sctp_xmit_t retval = SCTP_XMIT_OK;
-	__u16 chunk_len = WORD_ROUND(ntohs(chunk->chunk_hdr->length));
+	__u16 chunk_len = SCTP_PAD4(ntohs(chunk->chunk_hdr->length));
 
 	/* Check to see if this chunk will fit into the packet */
 	retval = sctp_packet_will_fit(packet, chunk, chunk_len);
@@ -508,7 +508,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 		if (gso) {
 			pkt_size = packet->overhead;
 			list_for_each_entry(chunk, &packet->chunk_list, list) {
-				int padded = WORD_ROUND(chunk->skb->len);
+				int padded = SCTP_PAD4(chunk->skb->len);
 
 				if (pkt_size + padded > tp->pathmtu)
 					break;
@@ -538,7 +538,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 		 * included in the chunk length field.  The sender should
 		 * never pad with more than 3 bytes.
 		 *
-		 * [This whole comment explains WORD_ROUND() below.]
+		 * [This whole comment explains SCTP_PAD4() below.]
 		 */
 
 		pkt_size -= packet->overhead;
@@ -560,7 +560,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 				has_data = 1;
 			}
 
-			padding = WORD_ROUND(chunk->skb->len) - chunk->skb->len;
+			padding = SCTP_PAD4(chunk->skb->len) - chunk->skb->len;
 			if (padding)
 				memset(skb_put(chunk->skb, padding), 0, padding);
 
@@ -587,7 +587,7 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
 			 * acknowledged or have failed.
 			 * Re-queue auth chunks if needed.
 			 */
-			pkt_size -= WORD_ROUND(chunk->skb->len);
+			pkt_size -= SCTP_PAD4(chunk->skb->len);
 
 			if (!sctp_chunk_is_data(chunk) && chunk != packet->auth)
 				sctp_chunk_free(chunk);
@@ -911,7 +911,7 @@ static sctp_xmit_t sctp_packet_will_fit(struct sctp_packet *packet,
 		 */
 		maxsize = pmtu - packet->overhead;
 		if (packet->auth)
-			maxsize -= WORD_ROUND(packet->auth->skb->len);
+			maxsize -= SCTP_PAD4(packet->auth->skb->len);
 		if (chunk_len > maxsize)
 			retval = SCTP_XMIT_PMTU_FULL;
 
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 8c77b87a8565cb4f82c09cea65557dc9c8d1138f..79dd66079dd7f0f503c2493d66885b9939589ee4 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -253,7 +253,7 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc,
 	num_types = sp->pf->supported_addrs(sp, types);
 
 	chunksize = sizeof(init) + addrs_len;
-	chunksize += WORD_ROUND(SCTP_SAT_LEN(num_types));
+	chunksize += SCTP_PAD4(SCTP_SAT_LEN(num_types));
 	chunksize += sizeof(ecap_param);
 
 	if (asoc->prsctp_enable)
@@ -283,14 +283,14 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc,
 		/* Add HMACS parameter length if any were defined */
 		auth_hmacs = (sctp_paramhdr_t *)asoc->c.auth_hmacs;
 		if (auth_hmacs->length)
-			chunksize += WORD_ROUND(ntohs(auth_hmacs->length));
+			chunksize += SCTP_PAD4(ntohs(auth_hmacs->length));
 		else
 			auth_hmacs = NULL;
 
 		/* Add CHUNKS parameter length */
 		auth_chunks = (sctp_paramhdr_t *)asoc->c.auth_chunks;
 		if (auth_chunks->length)
-			chunksize += WORD_ROUND(ntohs(auth_chunks->length));
+			chunksize += SCTP_PAD4(ntohs(auth_chunks->length));
 		else
 			auth_chunks = NULL;
 
@@ -300,8 +300,8 @@ struct sctp_chunk *sctp_make_init(const struct sctp_association *asoc,
 
 	/* If we have any extensions to report, account for that */
 	if (num_ext)
-		chunksize += WORD_ROUND(sizeof(sctp_supported_ext_param_t) +
-					num_ext);
+		chunksize += SCTP_PAD4(sizeof(sctp_supported_ext_param_t) +
+				       num_ext);
 
 	/* RFC 2960 3.3.2 Initiation (INIT) (1)
 	 *
@@ -443,13 +443,13 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
 
 		auth_hmacs = (sctp_paramhdr_t *)asoc->c.auth_hmacs;
 		if (auth_hmacs->length)
-			chunksize += WORD_ROUND(ntohs(auth_hmacs->length));
+			chunksize += SCTP_PAD4(ntohs(auth_hmacs->length));
 		else
 			auth_hmacs = NULL;
 
 		auth_chunks = (sctp_paramhdr_t *)asoc->c.auth_chunks;
 		if (auth_chunks->length)
-			chunksize += WORD_ROUND(ntohs(auth_chunks->length));
+			chunksize += SCTP_PAD4(ntohs(auth_chunks->length));
 		else
 			auth_chunks = NULL;
 
@@ -458,8 +458,8 @@ struct sctp_chunk *sctp_make_init_ack(const struct sctp_association *asoc,
 	}
 
 	if (num_ext)
-		chunksize += WORD_ROUND(sizeof(sctp_supported_ext_param_t) +
-					num_ext);
+		chunksize += SCTP_PAD4(sizeof(sctp_supported_ext_param_t) +
+				       num_ext);
 
 	/* Now allocate and fill out the chunk.  */
 	retval = sctp_make_control(asoc, SCTP_CID_INIT_ACK, 0, chunksize, gfp);
@@ -1390,7 +1390,7 @@ static struct sctp_chunk *_sctp_make_chunk(const struct sctp_association *asoc,
 	struct sock *sk;
 
 	/* No need to allocate LL here, as this is only a chunk. */
-	skb = alloc_skb(WORD_ROUND(sizeof(sctp_chunkhdr_t) + paylen), gfp);
+	skb = alloc_skb(SCTP_PAD4(sizeof(sctp_chunkhdr_t) + paylen), gfp);
 	if (!skb)
 		goto nodata;
 
@@ -1482,7 +1482,7 @@ void *sctp_addto_chunk(struct sctp_chunk *chunk, int len, const void *data)
 	void *target;
 	void *padding;
 	int chunklen = ntohs(chunk->chunk_hdr->length);
-	int padlen = WORD_ROUND(chunklen) - chunklen;
+	int padlen = SCTP_PAD4(chunklen) - chunklen;
 
 	padding = skb_put(chunk->skb, padlen);
 	target = skb_put(chunk->skb, len);
@@ -1900,7 +1900,7 @@ static int sctp_process_missing_param(const struct sctp_association *asoc,
 	struct __sctp_missing report;
 	__u16 len;
 
-	len = WORD_ROUND(sizeof(report));
+	len = SCTP_PAD4(sizeof(report));
 
 	/* Make an ERROR chunk, preparing enough room for
 	 * returning multiple unknown parameters.
@@ -2098,9 +2098,9 @@ static sctp_ierror_t sctp_process_unk_param(const struct sctp_association *asoc,
 
 		if (*errp) {
 			if (!sctp_init_cause_fixed(*errp, SCTP_ERROR_UNKNOWN_PARAM,
-					WORD_ROUND(ntohs(param.p->length))))
+					SCTP_PAD4(ntohs(param.p->length))))
 				sctp_addto_chunk_fixed(*errp,
-						WORD_ROUND(ntohs(param.p->length)),
+						SCTP_PAD4(ntohs(param.p->length)),
 						param.v);
 		} else {
 			/* If there is no memory for generating the ERROR
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index d88bb2b0b69913ad5962f9a5655d413f2c210ed0..026e3bca4a94bd34b418d5e6947f7182c1512358 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3454,7 +3454,7 @@ sctp_disposition_t sctp_sf_ootb(struct net *net,
 		}
 
 		/* Report violation if chunk len overflows */
-		ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length));
+		ch_end = ((__u8 *)ch) + SCTP_PAD4(ntohs(ch->length));
 		if (ch_end > skb_tail_pointer(skb))
 			return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
 						  commands);
@@ -4185,7 +4185,7 @@ sctp_disposition_t sctp_sf_unk_chunk(struct net *net,
 		hdr = unk_chunk->chunk_hdr;
 		err_chunk = sctp_make_op_error(asoc, unk_chunk,
 					       SCTP_ERROR_UNKNOWN_CHUNK, hdr,
-					       WORD_ROUND(ntohs(hdr->length)),
+					       SCTP_PAD4(ntohs(hdr->length)),
 					       0);
 		if (err_chunk) {
 			sctp_add_cmd_sf(commands, SCTP_CMD_REPLY,
@@ -4203,7 +4203,7 @@ sctp_disposition_t sctp_sf_unk_chunk(struct net *net,
 		hdr = unk_chunk->chunk_hdr;
 		err_chunk = sctp_make_op_error(asoc, unk_chunk,
 					       SCTP_ERROR_UNKNOWN_CHUNK, hdr,
-					       WORD_ROUND(ntohs(hdr->length)),
+					       SCTP_PAD4(ntohs(hdr->length)),
 					       0);
 		if (err_chunk) {
 			sctp_add_cmd_sf(commands, SCTP_CMD_REPLY,
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 81b86678be4d6fccc527d3c3e509c12576b2194c..ce54dce13ddb2cba959bca080b06f9285f2fe16d 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -233,7 +233,7 @@ void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
 	}
 
 	if (transport->dst) {
-		transport->pathmtu = WORD_TRUNC(dst_mtu(transport->dst));
+		transport->pathmtu = SCTP_TRUNC4(dst_mtu(transport->dst));
 	} else
 		transport->pathmtu = SCTP_DEFAULT_MAXSEGMENT;
 }
@@ -287,7 +287,7 @@ void sctp_transport_route(struct sctp_transport *transport,
 		return;
 	}
 	if (transport->dst) {
-		transport->pathmtu = WORD_TRUNC(dst_mtu(transport->dst));
+		transport->pathmtu = SCTP_TRUNC4(dst_mtu(transport->dst));
 
 		/* Initialize sk->sk_rcv_saddr, if the transport is the
 		 * association's active path for getsockname().
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index d85b803da11d21202d876c95811bcab4e1fb507e..bea00058ce354b7c910cab96d3f2ec4747763a10 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -383,7 +383,7 @@ sctp_ulpevent_make_remote_error(const struct sctp_association *asoc,
 
 	ch = (sctp_errhdr_t *)(chunk->skb->data);
 	cause = ch->cause;
-	elen = WORD_ROUND(ntohs(ch->length)) - sizeof(sctp_errhdr_t);
+	elen = SCTP_PAD4(ntohs(ch->length)) - sizeof(sctp_errhdr_t);
 
 	/* Pull off the ERROR header.  */
 	skb_pull(chunk->skb, sizeof(sctp_errhdr_t));
@@ -688,7 +688,7 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
 	 * MUST ignore the padding bytes.
 	 */
 	len = ntohs(chunk->chunk_hdr->length);
-	padding = WORD_ROUND(len) - len;
+	padding = SCTP_PAD4(len) - len;
 
 	/* Fixup cloned skb with just this chunks data.  */
 	skb_trim(skb, chunk->chunk_end - padding - skb->data);
-- 
2.7.4

^ permalink raw reply related

* [PATCH next v3 0/2] Rename WORD_TRUNC/ROUND macros and use them
From: Marcelo Ricardo Leitner @ 2016-09-21 11:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, David Miller,
	David Laight

This patchset aims to rename these macros to a non-confusing name, as
reported by David Laight and David Miller, and to update all remaining
places to make use of it, which was 1 last remaining spot.

v3:
- Name it SCTP_PAD4 instead of SCTP_ALIGN4, as suggested by David Laight
v2:
- fixed 2nd patch summary

Details on the specific changelogs.

Thanks!

Marcelo Ricardo Leitner (2):
  sctp: rename WORD_TRUNC/ROUND macros
  sctp: make use of WORD_TRUNC macro

 include/net/sctp/sctp.h  | 10 +++++-----
 net/netfilter/xt_sctp.c  |  2 +-
 net/sctp/associola.c     |  2 +-
 net/sctp/chunk.c         | 13 +++++++------
 net/sctp/input.c         |  8 ++++----
 net/sctp/inqueue.c       |  2 +-
 net/sctp/output.c        | 12 ++++++------
 net/sctp/sm_make_chunk.c | 28 ++++++++++++++--------------
 net/sctp/sm_statefuns.c  |  6 +++---
 net/sctp/transport.c     |  4 ++--
 net/sctp/ulpevent.c      |  4 ++--
 11 files changed, 46 insertions(+), 45 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [ANNOUNCE] ndiv: line-rate network traffic processing
From: Willy Tarreau @ 2016-09-21 11:28 UTC (permalink / raw)
  To: netdev

Hi,

Over the last 3 years I've been working a bit on high traffic processing
for various reasons. It started with the wish to capture line-rate GigE
traffic on very small fanless ARM machines and the framework has evolved
to be used at my company as a basis for our anti-DDoS engine capable of
dealing with multiple 10G links saturated with floods.

I know it comes a bit late now that there is XDP, but it's my first
vacation since then and I needed to have a bit of calm time to collect
the patches from the various branches and put them together. Anyway I'm
sending this here in case it can be of interest to anyone, for use or
just to study it.

I presented it in 2014 at kernel recipes :
  http://kernel-recipes.org/en/2014/ndiv-a-low-overhead-network-traffic-diverter/

It now supports drivers mvneta, ixgbe, e1000e, e1000 and igb. It is
very light, and retrieves the packets in the NIC's driver before they
are converted to an skb, then submits them to a registered RX handler
running in softirq context so we have the best of all worlds by
benefitting from CPU scalability, delayed processing, and not paying
the cost of switching to userland. Also an rx_done() function allows
handlers to batch their processing. The RX handler returns an action
among accepting the packet as-is, accepting it modified (eg: vlan or
tunnel decapsulation), dropping it, postponing the processing
(equivalent to EAGAIN), or building a new packet to send back.

This last function is the one requiring the most changes in existing
drivers, but offers the widest range of possibilities. We use it to
send SYN cookies, but I have also implemented a stateless HTTP server
supporting keep-alive using it, achieving line-rate traffic processing
on a single CPU core when the NIC supports it. It's very convenient to
test various stateful TCP components as it's easy to sustain millions
of connections per second on it.

It does not support forwarding between NICs. It was my first goal
because I wanted to implement a TAP with it, bridging the traffic
between two ports, but figured it was adding some complexity to the
system back then. However since then we've implemented traffic
capture in our product, exploiting this framework to capture without
losses at 14 Mpps. I may find some time to try to extract it later.
It uses the /sys API so that you can simply plug tcpdump -r on a
file there, though there's also an mmap version which uses less CPU
(that's important at 10G).

In its current form since the initial code's intent was to limit
core changes, it happens not to modify anything in the kernel by
default and to reuse the net_device's ax25_ptr to attach devices
(idea borrowed from netmap), so it can be used on an existing
kernel just by loading the patched network drivers (yes, I know
it's not a valid solution for the long term).

The current code is available here :

  http://git.kernel.org/cgit/linux/kernel/git/wtarreau/ndiv.git/

Please let me know if there could be some interest in rebasing it
on more recent versions (currently 3.10, 3.14 and 4.4 are supported).
I don't have much time to assign to it since it works fine as-is,
but will be glad to do so if that can be useful.

Also the stateless HTTP server provided in it definitely is a nice
use case for testing such a framework.

Regards,
Willy

^ permalink raw reply

* [PATCH 4/4] vti6: fix input path
From: Steffen Klassert @ 2016-09-21 11:05 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1474455946-674-1-git-send-email-steffen.klassert@secunet.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Since commit 1625f4529957, vti6 is broken, all input packets are dropped
(LINUX_MIB_XFRMINNOSTATES is incremented).

XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
xfrm6_rcv_spi().

A new function xfrm6_rcv_tnl() that enables to pass a value to
xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
is used in several handlers).

CC: Alexey Kodanev <alexey.kodanev@oracle.com>
Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h      |  4 +++-
 net/ipv6/ip6_vti.c      |  4 +---
 net/ipv6/xfrm6_input.c  | 16 +++++++++++-----
 net/ipv6/xfrm6_tunnel.c |  2 +-
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index adfebd6..1793431 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1540,8 +1540,10 @@ int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
 void xfrm4_local_error(struct sk_buff *skb, u32 mtu);
 int xfrm6_extract_header(struct sk_buff *skb);
 int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
-int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
+		  struct ip6_tnl *t);
 int xfrm6_transport_finish(struct sk_buff *skb, int async);
+int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t);
 int xfrm6_rcv(struct sk_buff *skb);
 int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 		     xfrm_address_t *saddr, u8 proto);
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 52a2f73..5bd3afd 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -321,11 +321,9 @@ static int vti6_rcv(struct sk_buff *skb)
 			goto discard;
 		}
 
-		XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
-
 		rcu_read_unlock();
 
-		return xfrm6_rcv(skb);
+		return xfrm6_rcv_tnl(skb, t);
 	}
 	rcu_read_unlock();
 	return -EINVAL;
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 00a2d40..b578956 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -21,9 +21,10 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 	return xfrm6_extract_header(skb);
 }
 
-int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
+		  struct ip6_tnl *t)
 {
-	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = NULL;
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
 	return xfrm_input(skb, nexthdr, spi, 0);
@@ -49,13 +50,18 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
 	return -1;
 }
 
-int xfrm6_rcv(struct sk_buff *skb)
+int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t)
 {
 	return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff],
-			     0);
+			     0, t);
 }
-EXPORT_SYMBOL(xfrm6_rcv);
+EXPORT_SYMBOL(xfrm6_rcv_tnl);
 
+int xfrm6_rcv(struct sk_buff *skb)
+{
+	return xfrm6_rcv_tnl(skb, NULL);
+}
+EXPORT_SYMBOL(xfrm6_rcv);
 int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 		     xfrm_address_t *saddr, u8 proto)
 {
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index 5743044..e1c0bbe 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -236,7 +236,7 @@ static int xfrm6_tunnel_rcv(struct sk_buff *skb)
 	__be32 spi;
 
 	spi = xfrm6_tunnel_spi_lookup(net, (const xfrm_address_t *)&iph->saddr);
-	return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi);
+	return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi, NULL);
 }
 
 static int xfrm6_tunnel_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
-- 
1.9.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox