Netdev List
 help / color / mirror / Atom feed
* Re: [PATCHv4 net-next 05/15] bpf: expose internal verfier structures
From: Alexei Starovoitov @ 2016-09-15 20:09 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, ast, daniel, jiri, john.fastabend, kubakici
In-Reply-To: <1473966755-30106-6-git-send-email-jakub.kicinski@netronome.com>

On Thu, Sep 15, 2016 at 08:12:25PM +0100, Jakub Kicinski wrote:
> Move verifier's internal structures to a header file and
> prefix their names with bpf_ to avoid potential namespace
> conflicts.  Those structures will soon be used by external
> analyzers.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
> v4:
>  - separate from adding the analyzer;
>  - squash with the prefixing patch.
> ---
>  include/linux/bpf_verifier.h |  78 +++++++++++++
>  kernel/bpf/verifier.c        | 263 +++++++++++++++++--------------------------
>  2 files changed, 180 insertions(+), 161 deletions(-)
>  create mode 100644 include/linux/bpf_verifier.h
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> new file mode 100644
> index 000000000000..1c0511ef7eaf
> --- /dev/null
> +++ b/include/linux/bpf_verifier.h
...
> +#ifndef _LINUX_BPF_ANALYZER_H
> +#define _LINUX_BPF_ANALYZER_H 1

the macro doesn't match the file name.
Other than that
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [PATCHv4 net-next 06/15] bpf: enable non-core use of the verfier
From: Alexei Starovoitov @ 2016-09-15 20:10 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, ast, daniel, jiri, john.fastabend, kubakici
In-Reply-To: <1473966755-30106-7-git-send-email-jakub.kicinski@netronome.com>

On Thu, Sep 15, 2016 at 08:12:26PM +0100, Jakub Kicinski wrote:
> Advanced JIT compilers and translators may want to use
> eBPF verifier as a base for parsers or to perform custom
> checks and validations.
> 
> Add ability for external users to invoke the verifier
> and provide callbacks to be invoked for every intruction
> checked.  For now only add most basic callback for
> per-instruction pre-interpretation checks is added.  More
> advanced users may also like to have per-instruction post
> callback and state comparison callback.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
> v4:
>  - separate from the header split patch.

Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [PATCHv4 net-next 07/15] bpf: recognize 64bit immediate loads as consts
From: Alexei Starovoitov @ 2016-09-15 20:12 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, ast, daniel, jiri, john.fastabend, kubakici
In-Reply-To: <1473966755-30106-8-git-send-email-jakub.kicinski@netronome.com>

On Thu, Sep 15, 2016 at 08:12:27PM +0100, Jakub Kicinski wrote:
> When running as parser interpret BPF_LD | BPF_IMM | BPF_DW
> instructions as loading CONST_IMM with the value stored
> in imm.  The verifier will continue not recognizing those
> due to concerns about search space/program complexity
> increase.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
> v3:
>  - limit to parsers.
> ---
>  kernel/bpf/verifier.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d93e78331b90..f5bed7cce08d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1766,9 +1766,19 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  	if (err)
>  		return err;
>  
> -	if (insn->src_reg == 0)
> -		/* generic move 64-bit immediate into a register */
> +	if (insn->src_reg == 0) {
> +		/* generic move 64-bit immediate into a register,
> +		 * only analyzer needs to collect the ld_imm value.
> +		 */
> +		u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
> +
> +		if (!env->analyzer_ops)
> +			return 0;

the check makes sense. thanks.
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [PATCHv4 net-next 00/15] BPF hardware offload (cls_bpf for now)
From: Alexei Starovoitov @ 2016-09-15 20:16 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, ast, daniel, jiri, john.fastabend, kubakici
In-Reply-To: <1473966755-30106-1-git-send-email-jakub.kicinski@netronome.com>

On Thu, Sep 15, 2016 at 08:12:20PM +0100, Jakub Kicinski wrote:
> In the last year a lot of progress have been made on offloading
> simpler TC classifiers.  There is also growing interest in using
> BPF for generic high-speed packet processing in the kernel.
> It seems beneficial to tie those two trends together and think
> about hardware offloads of BPF programs.  This patch set presents
> such offload to Netronome smart NICs.  cls_bpf is extended with
> hardware offload capabilities and NFP driver gets a JIT translator
> which in presence of capable firmware can be used to offload
> the BPF program onto the card.

Looks great! Thanks for all the hard work.

^ permalink raw reply

* Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets
From: Cyrill Gorcunov @ 2016-09-15 20:22 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, linux-kernel, David Miller, eric.dumazet, kuznet, jmorris,
	yoshfuji, kaber, avagin, stephen
In-Reply-To: <8260ff1f-6907-aed8-caae-68d63a4ad529@cumulusnetworks.com>

On Thu, Sep 15, 2016 at 01:53:13PM -0600, David Ahern wrote:
> On 9/13/16 11:19 AM, Cyrill Gorcunov wrote:
> > In criu we are actively using diag interface to collect sockets
> > present in the system when dumping applications. And while for
> > unix, tcp, udp[lite], packet, netlink it works as expected,
> > the raw sockets do not have. Thus add it.
> > 
> > v2:
> >  - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
> >  - implement @destroy for diag requests (by dsa@)
> > 
> > v3:
> >  - add export of raw_abort for IPv6 (by dsa@)
> >  - pass net-admin flag into inet_sk_diag_fill due to
> >    changes in net-next branch (by dsa@)
> > 
> > CC: David S. Miller <davem@davemloft.net>
> > CC: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: David Ahern <dsa@cumulusnetworks.com>
> > CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> > CC: James Morris <jmorris@namei.org>
> > CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > CC: Patrick McHardy <kaber@trash.net>
> > CC: Andrey Vagin <avagin@openvz.org>
> > CC: Stephen Hemminger <stephen@networkplumber.org>
> > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> > ---
> 
> ss -K is not working. Socket lookup fails to find a match due to a protocol mismatch.
> 
> haven't had time to track down why there is a mismatch since the kill uses the socket returned
> from the dump. Won't have time to come back to this until early next week.

Have you ran iproute2 patched? I just ran ss -K and all sockets get closed
(including raw ones), which actually kicked me off the testing machine sshd :/

	Cyrill

^ permalink raw reply

* [PATCH 0/3] constify net_device_ops structures
From: Julia Lawall @ 2016-09-15 20:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: kernel-janitors, netdev

Constify net_device_ops structures.

---

 drivers/net/ethernet/hisilicon/hip04_eth.c  |    2 +-
 drivers/net/ethernet/synopsys/dwc_eth_qos.c |    2 +-
 net/l2tp/l2tp_eth.c                         |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

^ permalink raw reply

* Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets
From: David Ahern @ 2016-09-15 20:25 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: netdev, linux-kernel, David Miller, eric.dumazet, kuznet, jmorris,
	yoshfuji, kaber, avagin, stephen
In-Reply-To: <20160915202219.GB1867@uranus.lan>

On 9/15/16 2:22 PM, Cyrill Gorcunov wrote:
>> ss -K is not working. Socket lookup fails to find a match due to a protocol mismatch.
>>
>> haven't had time to track down why there is a mismatch since the kill uses the socket returned
>> from the dump. Won't have time to come back to this until early next week.
> 
> Have you ran iproute2 patched? I just ran ss -K and all sockets get closed
> (including raw ones), which actually kicked me off the testing machine sshd :/

yes.

^ permalink raw reply

* Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets
From: Eric Dumazet @ 2016-09-15 20:36 UTC (permalink / raw)
  To: David Ahern
  Cc: Cyrill Gorcunov, netdev, linux-kernel, David Miller, kuznet,
	jmorris, yoshfuji, kaber, avagin, stephen
In-Reply-To: <cc67e1b1-ad0c-f0f6-4b4d-2eab80e940da@cumulusnetworks.com>

On Thu, 2016-09-15 at 14:25 -0600, David Ahern wrote:
> On 9/15/16 2:22 PM, Cyrill Gorcunov wrote:
> >> ss -K is not working. Socket lookup fails to find a match due to a protocol mismatch.
> >>
> >> haven't had time to track down why there is a mismatch since the kill uses the socket returned
> >> from the dump. Won't have time to come back to this until early next week.
> > 
> > Have you ran iproute2 patched? I just ran ss -K and all sockets get closed
> > (including raw ones), which actually kicked me off the testing machine sshd :/
> 
> yes.
> 

And CONFIG_INET_DIAG_DESTROY is also set in your .config ?

^ permalink raw reply

* Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets
From: David Ahern @ 2016-09-15 20:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cyrill Gorcunov, netdev, linux-kernel, David Miller, kuznet,
	jmorris, yoshfuji, kaber, avagin, stephen
In-Reply-To: <1473971768.22679.53.camel@edumazet-glaptop3.roam.corp.google.com>

On 9/15/16 2:36 PM, Eric Dumazet wrote:
> On Thu, 2016-09-15 at 14:25 -0600, David Ahern wrote:
>> On 9/15/16 2:22 PM, Cyrill Gorcunov wrote:
>>>> ss -K is not working. Socket lookup fails to find a match due to a protocol mismatch.
>>>>
>>>> haven't had time to track down why there is a mismatch since the kill uses the socket returned
>>>> from the dump. Won't have time to come back to this until early next week.
>>>
>>> Have you ran iproute2 patched? I just ran ss -K and all sockets get closed
>>> (including raw ones), which actually kicked me off the testing machine sshd :/
>>
>> yes.
>>
> 
> And CONFIG_INET_DIAG_DESTROY is also set in your .config ?
yes

dsa@kenny:~/kernel.git$ grep INET_DIAG_DESTROY kbuild/perf/.config
CONFIG_INET_DIAG_DESTROY=y

raw_diag_destroy is getting called, but protocol is 255:

diff --git a/net/ipv4/raw_diag.c b/net/ipv4/raw_diag.c
index c730e14618ab..95542b3dad76 100644
--- a/net/ipv4/raw_diag.c
+++ b/net/ipv4/raw_diag.c
@@ -192,6 +192,11 @@ static int raw_diag_destroy(struct sk_buff *in_skb,
        struct sock *sk;

        sk = raw_sock_get(net, r);
+
+if (r->sdiag_family == AF_INET)
+pr_warn("raw_diag_destroy: family IPv4 protocol %d dst %pI4 src %pI4 dev %d sk %p\n",
+        r->sdiag_protocol, &r->id.idiag_dst[0], &r->id.idiag_src[0], r->id.idiag_if, sk);
+
        if (IS_ERR(sk))
                return PTR_ERR(sk);
        return sock_diag_destroy(sk, ECONNABORTED);



so it never finds a match to an actual raw socket:

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 03618ed03532..6d0489629e74 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -124,9 +124,14 @@ EXPORT_SYMBOL_GPL(raw_unhash_sk);
 struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
                unsigned short num, __be32 raddr, __be32 laddr, int dif)
 {
+pr_warn("num %d raddr %pI4 laddr %pI4 dif %d\n", num, &raddr, &laddr, dif);
+
        sk_for_each_from(sk) {
                struct inet_sock *inet = inet_sk(sk);

+pr_warn("sk: num %d raddr %pI4 laddr %pI4 dif %d\n",
+       inet->inet_num, &inet->inet_daddr, &inet->inet_rcv_saddr,sk->sk_bound_dev_if);
+
                if (net_eq(sock_net(sk), net) && inet->inet_num == num  &&
                    !(inet->inet_daddr && inet->inet_daddr != raddr)    &&
                    !(inet->inet_rcv_saddr && inet->inet_rcv_saddr != laddr) &&

so raw_abort is not called.

^ permalink raw reply related

* [PATCH 1/3] hisilicon: constify net_device_ops structures
From: Julia Lawall @ 2016-09-15 20:23 UTC (permalink / raw)
  To: Yisen Zhuang; +Cc: kernel-janitors, Salil Mehta, netdev, linux-kernel
In-Reply-To: <1473971006-2074-1-git-send-email-Julia.Lawall@lip6.fr>

Check for net_device_ops structures that are only stored in the netdev_ops
field of a net_device structure.  This field is declared const, so
net_device_ops structures that have this property can be declared as const
also.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct net_device_ops i@p = { ... };

@ok@
identifier r.i;
struct net_device e;
position p;
@@
e.netdev_ops = &i@p;

@bad@
position p != {r.p,ok.p};
identifier r.i;
struct net_device_ops e;
@@
e@i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct net_device_ops i = { ... };
// </smpl>

The result of size on this file before the change is:

   text	      data     bss     dec         hex	  filename
   7995	       848       8    8851        2293
   drivers/net/ethernet/hisilicon/hip04_eth.o

and after the change it is:

   text	     data        bss	    dec	    hex	filename
   8571	      256          8       8835    2283
   drivers/net/ethernet/hisilicon/hip04_eth.o

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

---
 drivers/net/ethernet/hisilicon/hip04_eth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
index a90ab40..415ffa1 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -761,7 +761,7 @@ static const struct ethtool_ops hip04_ethtool_ops = {
 	.get_drvinfo		= hip04_get_drvinfo,
 };
 
-static struct net_device_ops hip04_netdev_ops = {
+static const struct net_device_ops hip04_netdev_ops = {
 	.ndo_open		= hip04_mac_open,
 	.ndo_stop		= hip04_mac_stop,
 	.ndo_get_stats		= hip04_get_stats,

^ permalink raw reply related

* [PATCH 2/3] dwc_eth_qos: constify net_device_ops structures
From: Julia Lawall @ 2016-09-15 20:23 UTC (permalink / raw)
  To: Lars Persson; +Cc: kernel-janitors, netdev, linux-kernel
In-Reply-To: <1473971006-2074-1-git-send-email-Julia.Lawall@lip6.fr>

Check for net_device_ops structures that are only stored in the netdev_ops
field of a net_device structure.  This field is declared const, so
net_device_ops structures that have this property can be declared as const
also.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct net_device_ops i@p = { ... };

@ok@
identifier r.i;
struct net_device e;
position p;
@@
e.netdev_ops = &i@p;

@bad@
position p != {r.p,ok.p};
identifier r.i;
struct net_device_ops e;
@@
e@i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct net_device_ops i = { ... };
// </smpl>

The result of size on this file before the change is:
   text	      data     bss     dec         hex	  filename
  21623       1316      40   22979        59c3
   drivers/net/ethernet/synopsys/dwc_eth_qos.o

and after the change it is:
   text	     data        bss	    dec	    hex	filename
  22199       724         40      22963    59b3
   drivers/net/ethernet/synopsys/dwc_eth_qos.o

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

---
 drivers/net/ethernet/synopsys/dwc_eth_qos.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
index c25d971..b5c4554 100644
--- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c
+++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
@@ -2761,7 +2761,7 @@ static const struct ethtool_ops dwceqos_ethtool_ops = {
 	.set_link_ksettings = phy_ethtool_set_link_ksettings,
 };
 
-static struct net_device_ops netdev_ops = {
+static const struct net_device_ops netdev_ops = {
 	.ndo_open		= dwceqos_open,
 	.ndo_stop		= dwceqos_stop,
 	.ndo_start_xmit		= dwceqos_start_xmit,

^ permalink raw reply related

* [PATCH 3/3] l2tp: constify net_device_ops structures
From: Julia Lawall @ 2016-09-15 20:23 UTC (permalink / raw)
  To: David S. Miller; +Cc: kernel-janitors, netdev, linux-kernel
In-Reply-To: <1473971006-2074-1-git-send-email-Julia.Lawall@lip6.fr>

Check for net_device_ops structures that are only stored in the netdev_ops
field of a net_device structure.  This field is declared const, so
net_device_ops structures that have this property can be declared as const
also.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct net_device_ops i@p = { ... };

@ok@
identifier r.i;
struct net_device e;
position p;
@@
e.netdev_ops = &i@p;

@bad@
position p != {r.p,ok.p};
identifier r.i;
struct net_device_ops e;
@@
e@i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct net_device_ops i = { ... };
// </smpl>

The result of size on this file before the change is:
   text	      data     bss     dec         hex	  filename
   3401        931      44    4376        1118	net/l2tp/l2tp_eth.o

and after the change it is:
   text	     data        bss	    dec	    hex	filename
   3993       347         44       4384    1120	net/l2tp/l2tp_eth.o

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

---
 net/l2tp/l2tp_eth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 57fc5a4..ddb744c 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -121,7 +121,7 @@ static struct rtnl_link_stats64 *l2tp_eth_get_stats64(struct net_device *dev,
 }
 
 
-static struct net_device_ops l2tp_eth_netdev_ops = {
+static const struct net_device_ops l2tp_eth_netdev_ops = {
 	.ndo_init		= l2tp_eth_dev_init,
 	.ndo_uninit		= l2tp_eth_dev_uninit,
 	.ndo_start_xmit		= l2tp_eth_dev_xmit,

^ permalink raw reply related

* Re: MDB offloading of local ipv4 multicast groups
From: Ido Schimmel @ 2016-09-15 20:42 UTC (permalink / raw)
  To: John Crispin
  Cc: Elad Raz, netdev@vger.kernel.org, Ido Schimmel, Jiri Pirko,
	Nikolay Aleksandrov, David S. Miller
In-Reply-To: <ebb1092f-4fab-c490-0553-b108feb5e8d4@phrozen.org>

On Thu, Sep 15, 2016 at 08:58:50PM +0200, John Crispin wrote:
> Hi,
> 
> While adding MDB support to the qca8k dsa driver I found that ipv4 mcast
> groups don't always get propagated to the dsa driver. In my setup there
> are 2 clients connected to the switch, both running a mdns client. The
> .port_mdb_add() callback is properly called for 33:33:00:00:00:FB but
> 01:00:5E:00:00:FB never got propagated to the dsa driver.
> 
> The reason is that the call to ipv4_is_local_multicast() here [1] will
> return true and the notifier is never called. Is this intentional or is
> there something missing in the code ?

I believe this is based on RFC 4541:

"Packets with a destination IP (DIP) address in the 224.0.0.X range
which are not IGMP must be forwarded on all ports."
https://tools.ietf.org/html/rfc4541

But, we are missing the offloading of router ports, which is needed for
the device to correctly flood unregistered multicast packets. That's
also according to the mentioned RFC:

"If a switch receives an unregistered packet, it must forward that
packet on all ports to which an IGMP router is attached."

Implemented at br_flood_multicast()

However, the marking is done per-port and not per-{port, VID}. We need
that in case vlan filtering is enabled. I think Nik is working on that,
but he can correct me if I'm wrong :). The switchdev bits can be added
soon after.

> 
> 	John
> 
> [1]
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_multicast.c?id=refs/tags/v4.8-rc6#n737

^ permalink raw reply

* [PATCH net 1/2] bna: add missing per queue ethtool stat
From: Ivan Vecera @ 2016-09-15 20:47 UTC (permalink / raw)
  To: netdev; +Cc: jarod, rasesh.mody

Commit ba5ca784 "bna: check for dma mapping errors" added besides other
things a statistic that counts number of DMA buffer mapping failures
per each Rx queue. This counter is not included in ethtool stats output.

Fixes: ba5ca784 "bna: check for dma mapping errors"
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
index 0e4fdc3..5671353 100644
--- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
+++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
@@ -31,7 +31,7 @@
 #define BNAD_NUM_TXF_COUNTERS 12
 #define BNAD_NUM_RXF_COUNTERS 10
 #define BNAD_NUM_CQ_COUNTERS (3 + 5)
-#define BNAD_NUM_RXQ_COUNTERS 6
+#define BNAD_NUM_RXQ_COUNTERS 7
 #define BNAD_NUM_TXQ_COUNTERS 5
 
 #define BNAD_ETHTOOL_STATS_NUM						\
@@ -658,6 +658,8 @@ bnad_get_strings(struct net_device *netdev, u32 stringset, u8 *string)
 				string += ETH_GSTRING_LEN;
 				sprintf(string, "rxq%d_allocbuf_failed", q_num);
 				string += ETH_GSTRING_LEN;
+				sprintf(string, "rxq%d_mapbuf_failed", q_num);
+				string += ETH_GSTRING_LEN;
 				sprintf(string, "rxq%d_producer_index", q_num);
 				string += ETH_GSTRING_LEN;
 				sprintf(string, "rxq%d_consumer_index", q_num);
@@ -678,6 +680,9 @@ bnad_get_strings(struct net_device *netdev, u32 stringset, u8 *string)
 					sprintf(string, "rxq%d_allocbuf_failed",
 								q_num);
 					string += ETH_GSTRING_LEN;
+					sprintf(string, "rxq%d_mapbuf_failed",
+						q_num);
+					string += ETH_GSTRING_LEN;
 					sprintf(string, "rxq%d_producer_index",
 								q_num);
 					string += ETH_GSTRING_LEN;
-- 
2.7.3

^ permalink raw reply related

* [PATCH net 2/2] bna: fix crash in bnad_get_strings()
From: Ivan Vecera @ 2016-09-15 20:47 UTC (permalink / raw)
  To: netdev; +Cc: jarod, rasesh.mody
In-Reply-To: <1473972472-29681-1-git-send-email-ivecera@redhat.com>

Commit 6e7333d "net: add rx_nohandler stat counter" added the new entry
rx_nohandler into struct rtnl_link_stats64. Unfortunately the bna
driver foolishly depends on the structure. It uses part of it for
ethtool statistics and it's not bad but the driver assumes its size
is constant as it defines string for each existing entry. The problem
occurs when the structure is extended because you need to modify bna
driver as well. If not any attempt to retrieve ethtool statistics results
in crash in bnad_get_strings().
The patch changes BNAD_ETHTOOL_STATS_NUM so it counts real number of
strings in the array and also removes rtnl_link_stats64 entries that
are not used in output and are always zero.

Fixes: 6e7333d "net: add rx_nohandler stat counter"
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/brocade/bna/bnad_ethtool.c | 50 ++++++++++++-------------
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
index 5671353..31f61a7 100644
--- a/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
+++ b/drivers/net/ethernet/brocade/bna/bnad_ethtool.c
@@ -34,12 +34,7 @@
 #define BNAD_NUM_RXQ_COUNTERS 7
 #define BNAD_NUM_TXQ_COUNTERS 5
 
-#define BNAD_ETHTOOL_STATS_NUM						\
-	(sizeof(struct rtnl_link_stats64) / sizeof(u64) +	\
-	sizeof(struct bnad_drv_stats) / sizeof(u64) +		\
-	offsetof(struct bfi_enet_stats, rxf_stats[0]) / sizeof(u64))
-
-static const char *bnad_net_stats_strings[BNAD_ETHTOOL_STATS_NUM] = {
+static const char *bnad_net_stats_strings[] = {
 	"rx_packets",
 	"tx_packets",
 	"rx_bytes",
@@ -50,22 +45,10 @@ static const char *bnad_net_stats_strings[BNAD_ETHTOOL_STATS_NUM] = {
 	"tx_dropped",
 	"multicast",
 	"collisions",
-
 	"rx_length_errors",
-	"rx_over_errors",
 	"rx_crc_errors",
 	"rx_frame_errors",
-	"rx_fifo_errors",
-	"rx_missed_errors",
-
-	"tx_aborted_errors",
-	"tx_carrier_errors",
 	"tx_fifo_errors",
-	"tx_heartbeat_errors",
-	"tx_window_errors",
-
-	"rx_compressed",
-	"tx_compressed",
 
 	"netif_queue_stop",
 	"netif_queue_wakeup",
@@ -254,6 +237,8 @@ static const char *bnad_net_stats_strings[BNAD_ETHTOOL_STATS_NUM] = {
 	"fc_tx_fid_parity_errors",
 };
 
+#define BNAD_ETHTOOL_STATS_NUM	ARRAY_SIZE(bnad_net_stats_strings)
+
 static int
 bnad_get_settings(struct net_device *netdev, struct ethtool_cmd *cmd)
 {
@@ -859,9 +844,9 @@ bnad_get_ethtool_stats(struct net_device *netdev, struct ethtool_stats *stats,
 		       u64 *buf)
 {
 	struct bnad *bnad = netdev_priv(netdev);
-	int i, j, bi;
+	int i, j, bi = 0;
 	unsigned long flags;
-	struct rtnl_link_stats64 *net_stats64;
+	struct rtnl_link_stats64 net_stats64;
 	u64 *stats64;
 	u32 bmap;
 
@@ -876,14 +861,25 @@ bnad_get_ethtool_stats(struct net_device *netdev, struct ethtool_stats *stats,
 	 * under the same lock
 	 */
 	spin_lock_irqsave(&bnad->bna_lock, flags);
-	bi = 0;
-	memset(buf, 0, stats->n_stats * sizeof(u64));
-
-	net_stats64 = (struct rtnl_link_stats64 *)buf;
-	bnad_netdev_qstats_fill(bnad, net_stats64);
-	bnad_netdev_hwstats_fill(bnad, net_stats64);
 
-	bi = sizeof(*net_stats64) / sizeof(u64);
+	memset(&net_stats64, 0, sizeof(net_stats64));
+	bnad_netdev_qstats_fill(bnad, &net_stats64);
+	bnad_netdev_hwstats_fill(bnad, &net_stats64);
+
+	buf[bi++] = net_stats64.rx_packets;
+	buf[bi++] = net_stats64.tx_packets;
+	buf[bi++] = net_stats64.rx_bytes;
+	buf[bi++] = net_stats64.tx_bytes;
+	buf[bi++] = net_stats64.rx_errors;
+	buf[bi++] = net_stats64.tx_errors;
+	buf[bi++] = net_stats64.rx_dropped;
+	buf[bi++] = net_stats64.tx_dropped;
+	buf[bi++] = net_stats64.multicast;
+	buf[bi++] = net_stats64.collisions;
+	buf[bi++] = net_stats64.rx_length_errors;
+	buf[bi++] = net_stats64.rx_crc_errors;
+	buf[bi++] = net_stats64.rx_frame_errors;
+	buf[bi++] = net_stats64.tx_fifo_errors;
 
 	/* Get netif_queue_stopped from stack */
 	bnad->stats.drv_stats.netif_queue_stopped = netif_queue_stopped(netdev);
-- 
2.7.3

^ permalink raw reply related

* Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets
From: David Ahern @ 2016-09-15 20:54 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: netdev, linux-kernel, David Miller, eric.dumazet, kuznet, jmorris,
	yoshfuji, kaber, avagin, stephen
In-Reply-To: <20160915202219.GB1867@uranus.lan>

On 9/15/16 2:22 PM, Cyrill Gorcunov wrote:
>> ss -K is not working. Socket lookup fails to find a match due to a protocol mismatch.
>>
>> haven't had time to track down why there is a mismatch since the kill uses the socket returned
>> from the dump. Won't have time to come back to this until early next week.
> 
> Have you ran iproute2 patched? I just ran ss -K and all sockets get closed
> (including raw ones), which actually kicked me off the testing machine sshd :/
> 


This is the patch I applied to iproute2; the change in your goo.gl link plus a debug to confirm the kill action is initiated by ss:

diff --git a/misc/ss.c b/misc/ss.c
index 3b268d999426..4d98411738ea 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2334,6 +2334,10 @@ static int show_one_inet_sock(const struct sockaddr_nl *addr,
        if (diag_arg->f->f && run_ssfilter(diag_arg->f->f, &s) == 0)
                return 0;

+       if (diag_arg->f->kill) {
+printf("want to kill:\n");
+       err = inet_show_sock(h, &s, diag_arg->protocol);
+       }
        if (diag_arg->f->kill && kill_inet_sock(h, arg) != 0) {
                if (errno == EOPNOTSUPP || errno == ENOENT) {
                        /* Socket can't be closed, or is already closed. */
@@ -2631,6 +2635,10 @@ static int raw_show(struct filter *f)

        dg_proto = RAW_PROTO;

+if (!getenv("PROC_NET_RAW") && !getenv("PROC_ROOT") &&
+inet_show_netlink(f, NULL, IPPROTO_RAW) == 0)
+return 0;
+
        if (f->families&(1<<AF_INET)) {
                if ((fp = net_raw_open()) == NULL)
                        goto outerr;

^ permalink raw reply related

* Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets
From: Cyrill Gorcunov @ 2016-09-15 21:01 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, linux-kernel, David Miller, eric.dumazet, kuznet, jmorris,
	yoshfuji, kaber, avagin, stephen
In-Reply-To: <f482796f-185e-a3b7-248c-20f1f39cb459@cumulusnetworks.com>

On Thu, Sep 15, 2016 at 02:54:57PM -0600, David Ahern wrote:
> On 9/15/16 2:22 PM, Cyrill Gorcunov wrote:
> >> ss -K is not working. Socket lookup fails to find a match due to a protocol mismatch.
> >>
> >> haven't had time to track down why there is a mismatch since the kill uses the socket returned
> >> from the dump. Won't have time to come back to this until early next week.
> > 
> > Have you ran iproute2 patched? I just ran ss -K and all sockets get closed
> > (including raw ones), which actually kicked me off the testing machine sshd :/
> > 
> 
> 
> This is the patch I applied to iproute2; the change in your goo.gl link plus a debug to confirm the kill action is initiated by ss:
> 
> diff --git a/misc/ss.c b/misc/ss.c
> index 3b268d999426..4d98411738ea 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -2334,6 +2334,10 @@ static int show_one_inet_sock(const struct sockaddr_nl *addr,
>         if (diag_arg->f->f && run_ssfilter(diag_arg->f->f, &s) == 0)
>                 return 0;
> 
> +       if (diag_arg->f->kill) {
> +printf("want to kill:\n");
> +       err = inet_show_sock(h, &s, diag_arg->protocol);
> +       }
>         if (diag_arg->f->kill && kill_inet_sock(h, arg) != 0) {
>                 if (errno == EOPNOTSUPP || errno == ENOENT) {
>                         /* Socket can't be closed, or is already closed. */
> @@ -2631,6 +2635,10 @@ static int raw_show(struct filter *f)
> 
>         dg_proto = RAW_PROTO;
> 
> +if (!getenv("PROC_NET_RAW") && !getenv("PROC_ROOT") &&
> +inet_show_netlink(f, NULL, IPPROTO_RAW) == 0)
> +return 0;
> +
>         if (f->families&(1<<AF_INET)) {
>                 if ((fp = net_raw_open()) == NULL)
>                         goto outerr;
> 

Hmm. Weird. I'm running net-next kernel
---
[root@pcs7 ~]# /root/sock &
[1] 5108

This is a trivial program which opens raw sockets 

[root@pcs7 iproute2]# misc/ss -A raw
State      Recv-Q Send-Q                                Local Address:Port                                                 Peer Address:Port                
ESTAB      0      0                                         127.0.0.1:ipproto-255                                            127.0.0.10:ipproto-9090         
UNCONN     0      0                                        127.0.0.10:ipproto-255                                                     *:*                    
UNCONN     0      0                                                :::ipv6-icmp                                                      :::*                    
UNCONN     0      0                                                :::ipv6-icmp                                                      :::*                    
ESTAB      0      0                                               ::1:ipproto-255                                                   ::1:ipproto-9091         
UNCONN     0      0                                               ::1:ipproto-255                                                    :::*                    
[root@pcs7 iproute2]# 

[root@pcs7 iproute2]# misc/ss -K
Netid  State      Recv-Q Send-Q                             Local Address:Port                                              Peer Address:Port                
u_str  ESTAB      0      0                /var/run/dbus/system_bus_socket 18071                                                        * 16297                
u_str  ESTAB      0      0                    /run/systemd/journal/stdout 18756                                                        * 16188                
u_str  ESTAB      0      0                    /run/systemd/journal/stdout 23014                                                        * 23013                
u_str  ESTAB      0      0                                              * 18909                                                        * 16298                
u_str  ESTAB      0      0                /var/run/dbus/system_bus_socket 19154                                                        * 18163                
...
???    ESTAB      0      0                                      127.0.0.1:ipproto-255                                         127.0.0.10:ipproto-9090         
???    UNCONN     0      0                                     127.0.0.10:ipproto-255                                                  *:*                    
???    ESTAB      0      0                                            ::1:ipproto-255                                                ::1:ipproto-9091         
???    UNCONN     0      0                                            ::1:ipproto-255                                                 :::*            
---

Here I get kicked off the server. Login back

[cyrill@uranus ~] ssh root@pcs7 
Last login: Thu Sep 15 23:20:42 2016 from gateway
[root@pcs7 ~]# cd /home/iproute2/
[root@pcs7 iproute2]# misc/ss -A raw
State      Recv-Q Send-Q                                Local Address:Port                                                 Peer Address:Port                
UNCONN     0      0                                                :::ipv6-icmp                                                      :::*                    
UNCONN     0      0                                                :::ipv6-icmp                                                      :::*                    

Maybe I do something wrong for testing?

^ permalink raw reply

* [PATCH v2 net-next 1/7] lwt: Add net to build_state argument
From: Tom Herbert @ 2016-09-15 21:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, roopa, tgraf
In-Reply-To: <1473974361-2275254-1-git-send-email-tom@herbertland.com>

Users of LWT need to know net if they want to have per net operations
in LWT.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/net/lwtunnel.h    | 14 +++++++-------
 net/core/lwtunnel.c       | 11 +++++++----
 net/ipv4/fib_semantics.c  |  7 ++++---
 net/ipv4/ip_tunnel_core.c | 12 ++++++------
 net/ipv6/ila/ila_lwt.c    |  6 +++---
 net/ipv6/route.c          |  2 +-
 net/mpls/mpls_iptunnel.c  |  6 +++---
 7 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index ea3f80f..9d1e172 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -33,9 +33,9 @@ struct lwtunnel_state {
 };
 
 struct lwtunnel_encap_ops {
-	int (*build_state)(struct net_device *dev, struct nlattr *encap,
-			   unsigned int family, const void *cfg,
-			   struct lwtunnel_state **ts);
+	int (*build_state)(struct net *net, struct net_device *dev,
+			   struct nlattr *encap, unsigned int family,
+			   const void *cfg, struct lwtunnel_state **ts);
 	int (*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
 	int (*input)(struct sk_buff *skb);
 	int (*fill_encap)(struct sk_buff *skb,
@@ -106,8 +106,8 @@ int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
 			   unsigned int num);
 int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
 			   unsigned int num);
-int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
-			 struct nlattr *encap,
+int lwtunnel_build_state(struct net *net, struct net_device *dev,
+			 u16 encap_type, struct nlattr *encap,
 			 unsigned int family, const void *cfg,
 			 struct lwtunnel_state **lws);
 int lwtunnel_fill_encap(struct sk_buff *skb,
@@ -169,8 +169,8 @@ static inline int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
 	return -EOPNOTSUPP;
 }
 
-static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
-				       struct nlattr *encap,
+static inline int lwtunnel_build_state(struct net *net, struct net_device *dev,
+				       u16 encap_type, struct nlattr *encap,
 				       unsigned int family, const void *cfg,
 				       struct lwtunnel_state **lws)
 {
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index e5f84c2..ba8be0b 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -39,6 +39,8 @@ static const char *lwtunnel_encap_str(enum lwtunnel_encap_types encap_type)
 		return "MPLS";
 	case LWTUNNEL_ENCAP_ILA:
 		return "ILA";
+	case LWTUNNEL_ENCAP_ILA_NOTIFY:
+		return "ILA_NOTIFY";
 	case LWTUNNEL_ENCAP_IP6:
 	case LWTUNNEL_ENCAP_IP:
 	case LWTUNNEL_ENCAP_NONE:
@@ -96,9 +98,10 @@ int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *ops,
 }
 EXPORT_SYMBOL(lwtunnel_encap_del_ops);
 
-int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
-			 struct nlattr *encap, unsigned int family,
-			 const void *cfg, struct lwtunnel_state **lws)
+int lwtunnel_build_state(struct net *net, struct net_device *dev,
+			 u16 encap_type, struct nlattr *encap,
+			 unsigned int family, const void *cfg,
+			 struct lwtunnel_state **lws)
 {
 	const struct lwtunnel_encap_ops *ops;
 	int ret = -EINVAL;
@@ -123,7 +126,7 @@ int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
 	}
 #endif
 	if (likely(ops && ops->build_state))
-		ret = ops->build_state(dev, encap, family, cfg, lws);
+		ret = ops->build_state(net, dev, encap, family, cfg, lws);
 	rcu_read_unlock();
 
 	return ret;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 388d3e2..aee4e95 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -511,7 +511,8 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh,
 					goto err_inval;
 				if (cfg->fc_oif)
 					dev = __dev_get_by_index(net, cfg->fc_oif);
-				ret = lwtunnel_build_state(dev, nla_get_u16(
+				ret = lwtunnel_build_state(net, dev,
+							   nla_get_u16(
 							   nla_entype),
 							   nla,  AF_INET, cfg,
 							   &lwtstate);
@@ -610,7 +611,7 @@ static int fib_encap_match(struct net *net, u16 encap_type,
 
 	if (oif)
 		dev = __dev_get_by_index(net, oif);
-	ret = lwtunnel_build_state(dev, encap_type, encap,
+	ret = lwtunnel_build_state(net, dev, encap_type, encap,
 				   AF_INET, cfg, &lwtstate);
 	if (!ret) {
 		result = lwtunnel_cmp_encap(lwtstate, nh->nh_lwtstate);
@@ -1098,7 +1099,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
 				goto err_inval;
 			if (cfg->fc_oif)
 				dev = __dev_get_by_index(net, cfg->fc_oif);
-			err = lwtunnel_build_state(dev, cfg->fc_encap_type,
+			err = lwtunnel_build_state(net, dev, cfg->fc_encap_type,
 						   cfg->fc_encap, AF_INET, cfg,
 						   &lwtstate);
 			if (err)
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 777bc18..6a0cac3 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -239,9 +239,9 @@ static const struct nla_policy ip_tun_policy[LWTUNNEL_IP_MAX + 1] = {
 	[LWTUNNEL_IP_FLAGS]	= { .type = NLA_U16 },
 };
 
-static int ip_tun_build_state(struct net_device *dev, struct nlattr *attr,
-			      unsigned int family, const void *cfg,
-			      struct lwtunnel_state **ts)
+static int ip_tun_build_state(struct net *net, struct net_device *dev,
+			      struct nlattr *attr, unsigned int family,
+			      const void *cfg, struct lwtunnel_state **ts)
 {
 	struct ip_tunnel_info *tun_info;
 	struct lwtunnel_state *new_state;
@@ -335,9 +335,9 @@ static const struct nla_policy ip6_tun_policy[LWTUNNEL_IP6_MAX + 1] = {
 	[LWTUNNEL_IP6_FLAGS]		= { .type = NLA_U16 },
 };
 
-static int ip6_tun_build_state(struct net_device *dev, struct nlattr *attr,
-			       unsigned int family, const void *cfg,
-			       struct lwtunnel_state **ts)
+static int ip6_tun_build_state(struct net *net, struct net_device *dev,
+			       struct nlattr *attr, unsigned int family,
+			       const void *cfg, struct lwtunnel_state **ts)
 {
 	struct ip_tunnel_info *tun_info;
 	struct lwtunnel_state *new_state;
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index e50c27a..30a6920 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -56,9 +56,9 @@ static const struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
 	[ILA_ATTR_CSUM_MODE] = { .type = NLA_U8, },
 };
 
-static int ila_build_state(struct net_device *dev, struct nlattr *nla,
-			   unsigned int family, const void *cfg,
-			   struct lwtunnel_state **ts)
+static int ila_build_state(struct net *net, struct net_device *dev,
+			   struct nlattr *nla, unsigned int family,
+			   const void *cfg, struct lwtunnel_state **ts)
 {
 	struct ila_params *p;
 	struct nlattr *tb[ILA_ATTR_MAX + 1];
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ad4a7ff..48c3aa7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1884,7 +1884,7 @@ static struct rt6_info *ip6_route_info_create(struct fib6_config *cfg)
 	if (cfg->fc_encap) {
 		struct lwtunnel_state *lwtstate;
 
-		err = lwtunnel_build_state(dev, cfg->fc_encap_type,
+		err = lwtunnel_build_state(net, dev, cfg->fc_encap_type,
 					   cfg->fc_encap, AF_INET6, cfg,
 					   &lwtstate);
 		if (err)
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index cf52cf3..d2df225 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -126,9 +126,9 @@ drop:
 	return -EINVAL;
 }
 
-static int mpls_build_state(struct net_device *dev, struct nlattr *nla,
-			    unsigned int family, const void *cfg,
-			    struct lwtunnel_state **ts)
+static int mpls_build_state(struct net *net, struct net_device *dev,
+			    struct nlattr *nla, unsigned int family,
+			    const void *cfg, struct lwtunnel_state **ts)
 {
 	struct mpls_iptunnel_encap *tun_encap_info;
 	struct nlattr *tb[MPLS_IPTUNNEL_MAX + 1];
-- 
2.8.0.rc2

^ permalink raw reply related

* [PATCH v2 net-next 0/7] net: ILA resolver and generic resolver backend
From: Tom Herbert @ 2016-09-15 21:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, roopa, tgraf

This patch set implements an ILA host side resolver. This uses LWT to
implement the hook to a userspace resolver and tracks pending unresolved
address using the backend net resolver.

This patch set contains:

- An new library function to allocate an array of spinlocks for use
  with locking hash buckets.
- Make hash function in rhashtable directly callable.
- A generic resolver backend infrastructure. This primary does two
  things: track unsesolved addresses and implement a timeout for
  resolution not happening. These mechanisms provides rate limiting
  control over resolution requests (for instance in ILA it use used
  to rate limit requests to userspace to resolve addresses).
- The ILA resolver. This is implements to path from the kernel ILA
  implementation to a userspace daemon that an identifier address
  needs to be resolved.
- Routing messages are used over netlink to indicate resoltion
  requests.

Changes from intial RFC:

 - Added net argument to LWT build_state
 - Made resolve timeout an attribute of the LWT encap route
 - Changed ILA notifications to be regular routing messages of event
   RTM_ADDR_RESOLVE, family RTNL_FAMILY_ILA, and group
   RTNLGRP_ILA_NOTIFY

Tested:
 - Ran a UDP flood to random addresses in a resolver prefix. Observed
   timeout and limits were working (watching "ip monitor").
 - Also ran against an ILA client daemon that runs the resolver
   protocol. Observed that when resolution completes (ILA encap route is
   installed) routing messages are no longer sent.

v2:
 - Fixed function prototype issue found by kbuild
 - Fix inccorrect interpretation of return code from
   net_rslv_lookup_and_create

Tom Herbert (7):
  lwt: Add net to build_state argument
  spinlock: Add library function to allocate spinlock buckets array
  rhashtable: Call library function alloc_bucket_locks
  ila: Call library function alloc_bucket_locks
  rhashtable: abstract out function to get hash
  net: Generic resolver backend
  ila: Resolver mechanism

 include/linux/rhashtable.h     |  28 +++--
 include/linux/spinlock.h       |   6 +
 include/net/lwtunnel.h         |  14 +--
 include/net/resolver.h         |  58 +++++++++
 include/uapi/linux/ila.h       |   9 ++
 include/uapi/linux/lwtunnel.h  |   1 +
 include/uapi/linux/rtnetlink.h |   8 +-
 lib/Makefile                   |   2 +-
 lib/bucket_locks.c             |  63 ++++++++++
 lib/rhashtable.c               |  46 +------
 net/Kconfig                    |   4 +
 net/core/Makefile              |   1 +
 net/core/lwtunnel.c            |  11 +-
 net/core/resolver.c            | 272 +++++++++++++++++++++++++++++++++++++++++
 net/ipv4/fib_semantics.c       |   7 +-
 net/ipv4/ip_tunnel_core.c      |  12 +-
 net/ipv6/Kconfig               |   1 +
 net/ipv6/ila/Makefile          |   2 +-
 net/ipv6/ila/ila.h             |  16 +++
 net/ipv6/ila/ila_common.c      |   7 ++
 net/ipv6/ila/ila_lwt.c         |  15 ++-
 net/ipv6/ila/ila_resolver.c    | 249 +++++++++++++++++++++++++++++++++++++
 net/ipv6/ila/ila_xlat.c        |  51 ++------
 net/ipv6/route.c               |   2 +-
 net/mpls/mpls_iptunnel.c       |   6 +-
 25 files changed, 770 insertions(+), 121 deletions(-)
 create mode 100644 include/net/resolver.h
 create mode 100644 lib/bucket_locks.c
 create mode 100644 net/core/resolver.c
 create mode 100644 net/ipv6/ila/ila_resolver.c

-- 
2.8.0.rc2

^ permalink raw reply

* [PATCH v2 net-next 2/7] spinlock: Add library function to allocate spinlock buckets array
From: Tom Herbert @ 2016-09-15 21:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, roopa, tgraf
In-Reply-To: <1473974361-2275254-1-git-send-email-tom@herbertland.com>

Add two new library functions alloc_bucket_spinlocks and
free_bucket_spinlocks. These are use to allocate and free an array
of spinlocks that are useful as locks for hash buckets. The interface
specifies the maximum number of spinlocks in the array as well
as a CPU multiplier to derive the number of spinlocks to allocate.
The number to allocated is rounded up to a power of two to make
the array amenable to hash lookup.

Reviewed by Greg Rose <grose@lightfleet.com>
Acked-by: Thomas Graf <tgraf@suug.ch>

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/spinlock.h |  6 +++++
 lib/Makefile             |  2 +-
 lib/bucket_locks.c       | 63 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+), 1 deletion(-)
 create mode 100644 lib/bucket_locks.c

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 47dd0ce..4ebdfbf 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -416,4 +416,10 @@ extern int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
 #define atomic_dec_and_lock(atomic, lock) \
 		__cond_lock(lock, _atomic_dec_and_lock(atomic, lock))
 
+int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *lock_mask,
+			   unsigned int max_size, unsigned int cpu_mult,
+			   gfp_t gfp);
+
+void free_bucket_spinlocks(spinlock_t *locks);
+
 #endif /* __LINUX_SPINLOCK_H */
diff --git a/lib/Makefile b/lib/Makefile
index 5dc77a8..f91185e 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -36,7 +36,7 @@ obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
-	 once.o
+	 once.o bucket_locks.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
diff --git a/lib/bucket_locks.c b/lib/bucket_locks.c
new file mode 100644
index 0000000..bb9bf11
--- /dev/null
+++ b/lib/bucket_locks.c
@@ -0,0 +1,63 @@
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+#include <linux/export.h>
+
+/* Allocate an array of spinlocks to be accessed by a hash. Two arguments
+ * indicate the number of elements to allocate in the array. max_size
+ * gives the maximum number of elements to allocate. cpu_mult gives
+ * the number of locks per CPU to allocate. The size is rounded up
+ * to a power of 2 to be suitable as a hash table.
+ */
+int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *locks_mask,
+			   unsigned int max_size, unsigned int cpu_mult,
+			   gfp_t gfp)
+{
+	unsigned int i, size;
+#if defined(CONFIG_PROVE_LOCKING)
+	unsigned int nr_pcpus = 2;
+#else
+	unsigned int nr_pcpus = num_possible_cpus();
+#endif
+	spinlock_t *tlocks = NULL;
+
+	if (cpu_mult) {
+		nr_pcpus = min_t(unsigned int, nr_pcpus, 64UL);
+		size = min_t(unsigned int, nr_pcpus * cpu_mult, max_size);
+	} else {
+		size = max_size;
+	}
+	size = roundup_pow_of_two(size);
+
+	if (!size)
+		return -EINVAL;
+
+	if (sizeof(spinlock_t) != 0) {
+#ifdef CONFIG_NUMA
+		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+		    gfp == GFP_KERNEL)
+			tlocks = vmalloc(size * sizeof(spinlock_t));
+#endif
+		if (gfp != GFP_KERNEL)
+			gfp |= __GFP_NOWARN | __GFP_NORETRY;
+
+		if (!tlocks)
+			tlocks = kmalloc_array(size, sizeof(spinlock_t), gfp);
+		if (!tlocks)
+			return -ENOMEM;
+		for (i = 0; i < size; i++)
+			spin_lock_init(&tlocks[i]);
+	}
+	*locks = tlocks;
+	*locks_mask = size - 1;
+
+	return 0;
+}
+EXPORT_SYMBOL(alloc_bucket_spinlocks);
+
+void free_bucket_spinlocks(spinlock_t *locks)
+{
+	kvfree(locks);
+}
+EXPORT_SYMBOL(free_bucket_spinlocks);
-- 
2.8.0.rc2

^ permalink raw reply related

* [PATCH v2 net-next 4/7] ila: Call library function alloc_bucket_locks
From: Tom Herbert @ 2016-09-15 21:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, roopa, tgraf
In-Reply-To: <1473974361-2275254-1-git-send-email-tom@herbertland.com>

To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv6/ila/ila_xlat.c | 36 +++++-------------------------------
 1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index e604013..7d1c34b 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -30,34 +30,6 @@ struct ila_net {
 	bool hooks_registered;
 };
 
-#define	LOCKS_PER_CPU 10
-
-static int alloc_ila_locks(struct ila_net *ilan)
-{
-	unsigned int i, size;
-	unsigned int nr_pcpus = num_possible_cpus();
-
-	nr_pcpus = min_t(unsigned int, nr_pcpus, 32UL);
-	size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU);
-
-	if (sizeof(spinlock_t) != 0) {
-#ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE)
-			ilan->locks = vmalloc(size * sizeof(spinlock_t));
-		else
-#endif
-		ilan->locks = kmalloc_array(size, sizeof(spinlock_t),
-					    GFP_KERNEL);
-		if (!ilan->locks)
-			return -ENOMEM;
-		for (i = 0; i < size; i++)
-			spin_lock_init(&ilan->locks[i]);
-	}
-	ilan->locks_mask = size - 1;
-
-	return 0;
-}
-
 static u32 hashrnd __read_mostly;
 static __always_inline void __ila_hash_secret_init(void)
 {
@@ -561,14 +533,16 @@ static const struct genl_ops ila_nl_ops[] = {
 	},
 };
 
-#define ILA_HASH_TABLE_SIZE 1024
+#define LOCKS_PER_CPU 10
+#define MAX_LOCKS 1024
 
 static __net_init int ila_init_net(struct net *net)
 {
 	int err;
 	struct ila_net *ilan = net_generic(net, ila_net_id);
 
-	err = alloc_ila_locks(ilan);
+	err = alloc_bucket_spinlocks(&ilan->locks, &ilan->locks_mask,
+				     MAX_LOCKS, LOCKS_PER_CPU, GFP_KERNEL);
 	if (err)
 		return err;
 
@@ -583,7 +557,7 @@ static __net_exit void ila_exit_net(struct net *net)
 
 	rhashtable_free_and_destroy(&ilan->rhash_table, ila_free_cb, NULL);
 
-	kvfree(ilan->locks);
+	free_bucket_spinlocks(ilan->locks);
 
 	if (ilan->hooks_registered)
 		nf_unregister_net_hooks(net, ila_nf_hook_ops,
-- 
2.8.0.rc2

^ permalink raw reply related

* [PATCH v2 net-next 3/7] rhashtable: Call library function alloc_bucket_locks
From: Tom Herbert @ 2016-09-15 21:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, roopa, tgraf
In-Reply-To: <1473974361-2275254-1-git-send-email-tom@herbertland.com>

To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks. This function is
based on the old alloc_bucket_locks in rhashtable and should
produce the same effect.

Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 lib/rhashtable.c | 46 ++++------------------------------------------
 1 file changed, 4 insertions(+), 42 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 06c2872..5b53304 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -59,50 +59,10 @@ EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
 #define ASSERT_RHT_MUTEX(HT)
 #endif
 
-
-static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
-			      gfp_t gfp)
-{
-	unsigned int i, size;
-#if defined(CONFIG_PROVE_LOCKING)
-	unsigned int nr_pcpus = 2;
-#else
-	unsigned int nr_pcpus = num_possible_cpus();
-#endif
-
-	nr_pcpus = min_t(unsigned int, nr_pcpus, 64UL);
-	size = roundup_pow_of_two(nr_pcpus * ht->p.locks_mul);
-
-	/* Never allocate more than 0.5 locks per bucket */
-	size = min_t(unsigned int, size, tbl->size >> 1);
-
-	if (sizeof(spinlock_t) != 0) {
-		tbl->locks = NULL;
-#ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
-		    gfp == GFP_KERNEL)
-			tbl->locks = vmalloc(size * sizeof(spinlock_t));
-#endif
-		if (gfp != GFP_KERNEL)
-			gfp |= __GFP_NOWARN | __GFP_NORETRY;
-
-		if (!tbl->locks)
-			tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
-						   gfp);
-		if (!tbl->locks)
-			return -ENOMEM;
-		for (i = 0; i < size; i++)
-			spin_lock_init(&tbl->locks[i]);
-	}
-	tbl->locks_mask = size - 1;
-
-	return 0;
-}
-
 static void bucket_table_free(const struct bucket_table *tbl)
 {
 	if (tbl)
-		kvfree(tbl->locks);
+		free_bucket_spinlocks(tbl->locks);
 
 	kvfree(tbl);
 }
@@ -131,7 +91,9 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 
 	tbl->size = nbuckets;
 
-	if (alloc_bucket_locks(ht, tbl, gfp) < 0) {
+	/* Never allocate more than 0.5 locks per bucket */
+	if (alloc_bucket_spinlocks(&tbl->locks, &tbl->locks_mask,
+				   tbl->size >> 1, ht->p.locks_mul, gfp)) {
 		bucket_table_free(tbl);
 		return NULL;
 	}
-- 
2.8.0.rc2

^ permalink raw reply related

* [PATCH v2 net-next 5/7] rhashtable: abstract out function to get hash
From: Tom Herbert @ 2016-09-15 21:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, roopa, tgraf
In-Reply-To: <1473974361-2275254-1-git-send-email-tom@herbertland.com>

Split out most of rht_key_hashfn which is calculating the hash into
its own function. This way the hash function can be called separately to
get the hash value.

Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/rhashtable.h | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index fd82584..e398a62 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -208,34 +208,42 @@ static inline unsigned int rht_bucket_index(const struct bucket_table *tbl,
 	return (hash >> RHT_HASH_RESERVED_SPACE) & (tbl->size - 1);
 }
 
-static inline unsigned int rht_key_hashfn(
-	struct rhashtable *ht, const struct bucket_table *tbl,
-	const void *key, const struct rhashtable_params params)
+static inline unsigned int rht_key_get_hash(struct rhashtable *ht,
+	const void *key, const struct rhashtable_params params,
+	unsigned int hash_rnd)
 {
 	unsigned int hash;
 
 	/* params must be equal to ht->p if it isn't constant. */
 	if (!__builtin_constant_p(params.key_len))
-		hash = ht->p.hashfn(key, ht->key_len, tbl->hash_rnd);
+		hash = ht->p.hashfn(key, ht->key_len, hash_rnd);
 	else if (params.key_len) {
 		unsigned int key_len = params.key_len;
 
 		if (params.hashfn)
-			hash = params.hashfn(key, key_len, tbl->hash_rnd);
+			hash = params.hashfn(key, key_len, hash_rnd);
 		else if (key_len & (sizeof(u32) - 1))
-			hash = jhash(key, key_len, tbl->hash_rnd);
+			hash = jhash(key, key_len, hash_rnd);
 		else
-			hash = jhash2(key, key_len / sizeof(u32),
-				      tbl->hash_rnd);
+			hash = jhash2(key, key_len / sizeof(u32), hash_rnd);
 	} else {
 		unsigned int key_len = ht->p.key_len;
 
 		if (params.hashfn)
-			hash = params.hashfn(key, key_len, tbl->hash_rnd);
+			hash = params.hashfn(key, key_len, hash_rnd);
 		else
-			hash = jhash(key, key_len, tbl->hash_rnd);
+			hash = jhash(key, key_len, hash_rnd);
 	}
 
+	return hash;
+}
+
+static inline unsigned int rht_key_hashfn(
+	struct rhashtable *ht, const struct bucket_table *tbl,
+	const void *key, const struct rhashtable_params params)
+{
+	unsigned int hash = rht_key_get_hash(ht, key, params, tbl->hash_rnd);
+
 	return rht_bucket_index(tbl, hash);
 }
 
-- 
2.8.0.rc2

^ permalink raw reply related

* [PATCH v2 net-next 6/7] net: Generic resolver backend
From: Tom Herbert @ 2016-09-15 21:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, roopa, tgraf
In-Reply-To: <1473974361-2275254-1-git-send-email-tom@herbertland.com>

This patch implements the backend of a resolver, specifically it
provides a means to track unresolved addresses and to time them out.

The resolver is mostly a frontend to an rhashtable where the key
of the table is whatever address type or object is tracked. A resolver
instance is created by net_rslv_create. A resolver is destroyed by
net_rslv_destroy.

There are two functions that are used to manipulate entries in the
table: net_rslv_lookup_and_create and net_rslv_resolved.

net_rslv_lookup_and_create is called with an unresolved address as
the argument. It returns a structure of type net_rslv_ent. When
called a lookup is performed to see if an entry for the address
is already in the table, if it is the entry is return and the
false is returned in the new bool pointer argument to indicate that
the entry was preexisting. If an entry is not found, one is create
and true is returned on the new pointer argument. It is expected
that when an entry is new the address resolution protocol is
initiated (for instance a RTM_ADDR_RESOLVE message may be sent to a
userspace daemon as we will do in ILA). If net_rslv_lookup_and_create
returns NULL then presumably the hash table has reached the limit of
number of outstanding unresolved addresses, the caller should take
appropriate actions to avoid spamming the resolution protocol.

net_rslv_resolved is called when resolution is completely (e.g.
ILA locator mapping was instantiated for a locator. The entry is
removed for the hash table.

An argument to net_rslv_create indicates a time for the pending
resolution in milliseconds. If the timer fires before resolution
then the entry is removed from the table. Subsequently, another
attempt to resolve the same address will result in a new entry in
the table.

net_rslv_lookup_and_create allocates an net_rslv_ent struct and
includes allocating related user data. This is the object[] field
in the structure. The key (unresolved address) is always the first
field in the the object. Following that the caller may add it's
own private field data. The key length and size of the user object
(including the key) are specific in net_rslv_create.

There are three callback functions that can be set as arugments in
net_rslv_create:

   - cmp_fn: Compare function for hash table. Arguments are the
       key and an object in the table. If this is NULL then the
       default memcmp of rhashtable is used.

   - init_fn: Initial a new net_rslv_ent structure. This allows
       initialization of the user portion of the structure
       (the object[]).

   - destroy_fn: Called right before a net_rslv_ent is freed.
       This allows cleanup of user data associated with the
       entry.

Note that the resolver backend only tracks unresolved addresses, it
is up to the caller to perform the mechanism of resolution. This
includes the possible of queuing packets awaiting resolution; this
can be accomplished for instance by maintaining an skbuff queue
in the net_rslv_ent user object[] data.

DOS mitigation is done by limiting the number of entries in the
resolver table (the max_size which argument of net_rslv_create)
and setting a timeout. IF the timeout is set then the maximum rate
of new resolution requests is max_table_size / timeout. For
instance, with a maximum size of 1000 entries and a timeout of 100
msecs the maximum rate of resolutions requests is 10000/s.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/net/resolver.h |  58 +++++++++++
 net/Kconfig            |   4 +
 net/core/Makefile      |   1 +
 net/core/resolver.c    | 272 +++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 335 insertions(+)
 create mode 100644 include/net/resolver.h
 create mode 100644 net/core/resolver.c

diff --git a/include/net/resolver.h b/include/net/resolver.h
new file mode 100644
index 0000000..9274237
--- /dev/null
+++ b/include/net/resolver.h
@@ -0,0 +1,58 @@
+#ifndef __NET_RESOLVER_H
+#define __NET_RESOLVER_H
+
+#include <linux/rhashtable.h>
+
+struct net_rslv;
+struct net_rslv_ent;
+
+typedef int (*net_rslv_cmpfn)(struct net_rslv *nrslv, const void *key,
+			      const void *object);
+typedef void (*net_rslv_initfn)(struct net_rslv *nrslv, void *object);
+typedef void (*net_rslv_destroyfn)(struct net_rslv_ent *nrent);
+
+struct net_rslv {
+	struct rhashtable rhash_table;
+	struct rhashtable_params params;
+	net_rslv_cmpfn rslv_cmp;
+	net_rslv_initfn rslv_init;
+	net_rslv_destroyfn rslv_destroy;
+	size_t obj_size;
+	spinlock_t *locks;
+	unsigned int locks_mask;
+	unsigned int hash_rnd;
+};
+
+struct net_rslv_ent {
+	struct rcu_head rcu;
+	union {
+		/* Fields set when entry is in hash table */
+		struct {
+			struct rhash_head node;
+			struct delayed_work timeout_work;
+			struct net_rslv *nrslv;
+		};
+
+		/* Fields set when rcu freeing structure */
+		struct {
+			net_rslv_destroyfn destroy;
+		};
+	};
+	char object[];
+};
+
+struct net_rslv *net_rslv_create(size_t size, size_t key_len,
+				 size_t max_size,
+				 net_rslv_cmpfn cmp_fn,
+				 net_rslv_initfn init_fn,
+				 net_rslv_destroyfn destroy_fn);
+
+struct net_rslv_ent *net_rslv_lookup_and_create(struct net_rslv *nrslv,
+						void *key, bool *created,
+						unsigned int timeout);
+
+void net_rslv_resolved(struct net_rslv *nrslv, void *key);
+
+void net_rslv_destroy(struct net_rslv *nrslv);
+
+#endif
diff --git a/net/Kconfig b/net/Kconfig
index 7b6cd34..fad4fac 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -255,6 +255,10 @@ config XPS
 	depends on SMP
 	default y
 
+config NET_EXT_RESOLVER
+	bool
+	default n
+
 config HWBM
        bool
 
diff --git a/net/core/Makefile b/net/core/Makefile
index d6508c2..c0a0208 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_NET_DEVLINK) += devlink.o
+obj-$(CONFIG_NET_EXT_RESOLVER) += resolver.o
diff --git a/net/core/resolver.c b/net/core/resolver.c
new file mode 100644
index 0000000..49812aa
--- /dev/null
+++ b/net/core/resolver.c
@@ -0,0 +1,272 @@
+#include <linux/errno.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+#include <net/checksum.h>
+#include <net/ip.h>
+#include <net/ip6_fib.h>
+#include <net/lwtunnel.h>
+#include <net/protocol.h>
+#include <net/resolver.h>
+#include <uapi/linux/ila.h>
+
+static void net_rslv_destroy_rcu(struct rcu_head *head)
+{
+	struct net_rslv_ent *nrent = container_of(head, struct net_rslv_ent,
+						  rcu);
+	if (nrent->destroy) {
+		/* Call user's destroy function just before freeing */
+		nrent->destroy(nrent);
+	}
+
+	kfree(nrent);
+}
+
+static void net_rslv_destroy_entry(struct net_rslv *nrslv,
+				   struct net_rslv_ent *nrent)
+{
+	nrent->destroy = nrslv->rslv_destroy;
+	call_rcu(&nrent->rcu, net_rslv_destroy_rcu);
+}
+
+static inline spinlock_t *net_rslv_get_lock(struct net_rslv *nrslv, void *key)
+{
+	unsigned int hash;
+
+	/* Use the rhashtable hash function */
+	hash = rht_key_get_hash(&nrslv->rhash_table, key, nrslv->params,
+				nrslv->hash_rnd);
+
+	return &nrslv->locks[hash & nrslv->locks_mask];
+}
+
+static void net_rslv_delayed_work(struct work_struct *w)
+{
+	struct delayed_work *delayed_work = to_delayed_work(w);
+	struct net_rslv_ent *nrent = container_of(delayed_work,
+						  struct net_rslv_ent,
+						  timeout_work);
+	struct net_rslv *nrslv = nrent->nrslv;
+	spinlock_t *lock = net_rslv_get_lock(nrslv, nrent->object);
+
+	spin_lock(lock);
+	rhashtable_remove_fast(&nrslv->rhash_table, &nrent->node,
+			       nrslv->params);
+	spin_unlock(lock);
+
+	net_rslv_destroy_entry(nrslv, nrent);
+}
+
+static void net_rslv_ent_free_cb(void *ptr, void *arg)
+{
+	struct net_rslv_ent *nrent = (struct net_rslv_ent *)ptr;
+	struct net_rslv *nrslv = nrent->nrslv;
+
+	net_rslv_destroy_entry(nrslv, nrent);
+}
+
+void net_rslv_resolved(struct net_rslv *nrslv, void *key)
+{
+	spinlock_t *lock = net_rslv_get_lock(nrslv, key);
+	struct net_rslv_ent *nrent;
+
+	rcu_read_lock();
+
+	nrent = rhashtable_lookup_fast(&nrslv->rhash_table, key,
+				       nrslv->params);
+	if (!nrent)
+		goto out;
+
+	/* Cancel timer first */
+	cancel_delayed_work_sync(&nrent->timeout_work);
+
+	spin_lock(lock);
+
+	/* Lookup again just in case someone already removed */
+	nrent = rhashtable_lookup_fast(&nrslv->rhash_table, key,
+				       nrslv->params);
+	if (unlikely(!nrent)) {
+		spin_unlock(lock);
+		goto out;
+	}
+
+	rhashtable_remove_fast(&nrslv->rhash_table, &nrent->node,
+			       nrslv->params);
+	spin_unlock(lock);
+
+	net_rslv_destroy_entry(nrslv, nrent);
+
+out:
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(net_rslv_resolved);
+
+/* Called with hash bucket lock held */
+static struct net_rslv_ent *net_rslv_new_ent(struct net_rslv *nrslv,
+					     void *key, bool *created,
+					     unsigned int timeout)
+{
+	struct net_rslv_ent *nrent;
+	int err;
+
+	nrent = kzalloc(sizeof(*nrent) + nrslv->obj_size, GFP_KERNEL);
+	if (!nrent)
+		return NULL;
+
+	/* Key is always at beginning of object data */
+	memcpy(nrent->object, key, nrslv->params.key_len);
+
+	nrent->nrslv = nrslv;
+
+	/* Initialize user data */
+	if (nrslv->rslv_init)
+		nrslv->rslv_init(nrslv, nrent);
+
+	/* Put in hash table */
+	err = rhashtable_lookup_insert_fast(&nrslv->rhash_table,
+					    &nrent->node, nrslv->params);
+	if (err) {
+		kfree(nrent);
+		return NULL;
+	}
+
+	if (timeout) {
+		/* Schedule timeout for resolver */
+		INIT_DELAYED_WORK(&nrent->timeout_work, net_rslv_delayed_work);
+		schedule_delayed_work(&nrent->timeout_work, timeout);
+	}
+
+	*created = true;
+
+	return nrent;
+}
+
+struct net_rslv_ent *net_rslv_lookup_and_create(struct net_rslv *nrslv,
+						void *key, bool *created,
+						unsigned int timeout)
+{
+	spinlock_t *lock = net_rslv_get_lock(nrslv, key);
+	struct net_rslv_ent *nrent;
+
+	*created = false;
+
+	nrent = rhashtable_lookup_fast(&nrslv->rhash_table, key,
+				       nrslv->params);
+	if (nrent)
+		return nrent;
+
+	spin_lock(lock);
+
+	/* Check if someone beat us to the punch */
+	nrent = rhashtable_lookup_fast(&nrslv->rhash_table, key,
+				       nrslv->params);
+	if (nrent) {
+		spin_unlock(lock);
+		return nrent;
+	}
+
+	nrent = net_rslv_new_ent(nrslv, key, created, timeout);
+
+	spin_unlock(lock);
+
+	return nrent;
+}
+EXPORT_SYMBOL_GPL(net_rslv_lookup_and_create);
+
+static int net_rslv_cmp(struct rhashtable_compare_arg *arg,
+			const void *obj)
+{
+	struct net_rslv *nrslv = container_of(arg->ht, struct net_rslv,
+					      rhash_table);
+
+	return nrslv->rslv_cmp(nrslv, arg->key, obj);
+}
+
+#define LOCKS_PER_CPU	10
+#define MAX_LOCKS 1024
+
+struct net_rslv *net_rslv_create(size_t obj_size, size_t key_len,
+				 size_t max_size,
+				 net_rslv_cmpfn cmp_fn,
+				 net_rslv_initfn init_fn,
+				 net_rslv_destroyfn destroy_fn)
+{
+	struct net_rslv *nrslv;
+	int err;
+
+	if (key_len < obj_size)
+		return ERR_PTR(-EINVAL);
+
+	nrslv = kzalloc(sizeof(*nrslv), GFP_KERNEL);
+	if (!nrslv)
+		return ERR_PTR(-ENOMEM);
+
+	err = alloc_bucket_spinlocks(&nrslv->locks, &nrslv->locks_mask,
+				     MAX_LOCKS, LOCKS_PER_CPU, GFP_KERNEL);
+	if (err)
+		return ERR_PTR(err);
+
+	nrslv->obj_size = obj_size;
+	nrslv->rslv_init = init_fn;
+	nrslv->rslv_cmp = cmp_fn;
+	nrslv->rslv_destroy = destroy_fn;
+	get_random_bytes(&nrslv->hash_rnd, sizeof(nrslv->hash_rnd));
+
+	nrslv->params.head_offset = offsetof(struct net_rslv_ent, node);
+	nrslv->params.key_offset = offsetof(struct net_rslv_ent, object);
+	nrslv->params.key_len = key_len;
+	nrslv->params.max_size = max_size;
+	nrslv->params.min_size = 256;
+	nrslv->params.automatic_shrinking = true;
+	nrslv->params.obj_cmpfn = cmp_fn ? net_rslv_cmp : NULL;
+
+	rhashtable_init(&nrslv->rhash_table, &nrslv->params);
+
+	return nrslv;
+}
+EXPORT_SYMBOL_GPL(net_rslv_create);
+
+static void net_rslv_cancel_all_delayed_work(struct net_rslv *nrslv)
+{
+	struct rhashtable_iter iter;
+	struct net_rslv_ent *nrent;
+	int ret;
+
+	ret = rhashtable_walk_init(&nrslv->rhash_table, &iter, GFP_ATOMIC);
+	if (WARN_ON(ret))
+		return;
+
+	ret = rhashtable_walk_start(&iter);
+	if (WARN_ON(ret && ret != -EAGAIN))
+		goto err;
+
+	while ((nrent = rhashtable_walk_next(&iter)))
+		cancel_delayed_work_sync(&nrent->timeout_work);
+
+err:
+	rhashtable_walk_stop(&iter);
+	rhashtable_walk_exit(&iter);
+}
+
+void net_rslv_destroy(struct net_rslv *nrslv)
+{
+	/* First cancel delayed work in all the nodes. We don't want
+	 * delayed work trying to remove nodes from the table while
+	 * rhashtable_free_and_destroy is walking.
+	 */
+	net_rslv_cancel_all_delayed_work(nrslv);
+
+	rhashtable_free_and_destroy(&nrslv->rhash_table,
+				    net_rslv_ent_free_cb, NULL);
+
+	free_bucket_spinlocks(nrslv->locks);
+
+	kfree(nrslv);
+}
+EXPORT_SYMBOL_GPL(net_rslv_destroy);
+
-- 
2.8.0.rc2

^ permalink raw reply related

* [PATCH v2 net-next 7/7] ila: Resolver mechanism
From: Tom Herbert @ 2016-09-15 21:19 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team, roopa, tgraf
In-Reply-To: <1473974361-2275254-1-git-send-email-tom@herbertland.com>

Implement an ILA resolver. This uses LWT to implement the hook to a
userspace resolver and tracks pending unresolved address using the
backend net resolver.

The idea is that the kernel sets an ILA resolver route to the
SIR prefix, something like:

ip route add 3333::/64 encap ila-resolve \
     via 2401:db00:20:911a::27:0 dev eth0

When a packet hits the route the address is looked up in a resolver
table. If the entry is created (no entry with the address already
exists) then an rtnl message is generated with group
RTNLGRP_ILA_NOTIFY and type RTM_ADDR_RESOLVE. A userspace
daemon can listen for such messages and perform an ILA resolution
protocol to determine the ILA mapping. If the mapping is resolved
then a /128 ila encap router is set so that host can perform
ILA translation and send directly to destination.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/uapi/linux/ila.h       |   9 ++
 include/uapi/linux/lwtunnel.h  |   1 +
 include/uapi/linux/rtnetlink.h |   8 +-
 net/ipv6/Kconfig               |   1 +
 net/ipv6/ila/Makefile          |   2 +-
 net/ipv6/ila/ila.h             |  16 +++
 net/ipv6/ila/ila_common.c      |   7 ++
 net/ipv6/ila/ila_lwt.c         |   9 ++
 net/ipv6/ila/ila_resolver.c    | 249 +++++++++++++++++++++++++++++++++++++++++
 net/ipv6/ila/ila_xlat.c        |  15 ++-
 10 files changed, 307 insertions(+), 10 deletions(-)
 create mode 100644 net/ipv6/ila/ila_resolver.c

diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index 948c0a9..f186f8b 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -42,4 +42,13 @@ enum {
 	ILA_CSUM_NO_ACTION,
 };
 
+enum {
+	ILA_NOTIFY_ATTR_UNSPEC,
+	ILA_NOTIFY_ATTR_TIMEOUT,		/* u32 */
+
+	__ILA_NOTIFY_ATTR_MAX,
+};
+
+#define ILA_NOTIFY_ATTR_MAX	(__ILA_NOTIFY_ATTR_MAX - 1)
+
 #endif /* _UAPI_LINUX_ILA_H */
diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index a478fe8..d880e49 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -9,6 +9,7 @@ enum lwtunnel_encap_types {
 	LWTUNNEL_ENCAP_IP,
 	LWTUNNEL_ENCAP_ILA,
 	LWTUNNEL_ENCAP_IP6,
+	LWTUNNEL_ENCAP_ILA_NOTIFY,
 	__LWTUNNEL_ENCAP_MAX,
 };
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 262f037..a775464 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -12,7 +12,8 @@
  */
 #define RTNL_FAMILY_IPMR		128
 #define RTNL_FAMILY_IP6MR		129
-#define RTNL_FAMILY_MAX			129
+#define RTNL_FAMILY_ILA			130
+#define RTNL_FAMILY_MAX			130
 
 /****
  *		Routing/neighbour discovery messages.
@@ -144,6 +145,9 @@ enum {
 	RTM_GETSTATS = 94,
 #define RTM_GETSTATS RTM_GETSTATS
 
+	RTM_ADDR_RESOLVE = 95,
+#define RTM_ADDR_RESOLVE RTM_ADDR_RESOLVE
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
@@ -656,6 +660,8 @@ enum rtnetlink_groups {
 #define RTNLGRP_MPLS_ROUTE	RTNLGRP_MPLS_ROUTE
 	RTNLGRP_NSID,
 #define RTNLGRP_NSID		RTNLGRP_NSID
+	RTNLGRP_ILA_NOTIFY,
+#define RTNLGRP_ILA_NOTIFY	RTNLGRP_ILA_NOTIFY
 	__RTNLGRP_MAX
 };
 #define RTNLGRP_MAX	(__RTNLGRP_MAX - 1)
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 2343e4f..cf3ea8e 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -97,6 +97,7 @@ config IPV6_ILA
 	tristate "IPv6: Identifier Locator Addressing (ILA)"
 	depends on NETFILTER
 	select LWTUNNEL
+	select NET_EXT_RESOLVER
 	---help---
 	  Support for IPv6 Identifier Locator Addressing (ILA).
 
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
index 4b32e59..f2aadc3 100644
--- a/net/ipv6/ila/Makefile
+++ b/net/ipv6/ila/Makefile
@@ -4,4 +4,4 @@
 
 obj-$(CONFIG_IPV6_ILA) += ila.o
 
-ila-objs := ila_common.o ila_lwt.o ila_xlat.o
+ila-objs := ila_common.o ila_lwt.o ila_xlat.o ila_resolver.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index e0170f6..e369611 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -15,6 +15,7 @@
 #include <linux/ip.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/rhashtable.h>
 #include <linux/socket.h>
 #include <linux/skbuff.h>
 #include <linux/types.h>
@@ -23,6 +24,16 @@
 #include <net/protocol.h>
 #include <uapi/linux/ila.h>
 
+extern unsigned int ila_net_id;
+
+struct ila_net {
+	struct rhashtable rhash_table;
+	spinlock_t *locks; /* Bucket locks for entry manipulation */
+	unsigned int locks_mask;
+	bool hooks_registered;
+	struct net_rslv *nrslv;
+};
+
 struct ila_locator {
 	union {
 		__u8            v8[8];
@@ -114,9 +125,14 @@ void ila_update_ipv6_locator(struct sk_buff *skb, struct ila_params *p,
 
 void ila_init_saved_csum(struct ila_params *p);
 
+void ila_rslv_resolved(struct ila_net *ilan, struct ila_addr *iaddr);
 int ila_lwt_init(void);
 void ila_lwt_fini(void);
 int ila_xlat_init(void);
 void ila_xlat_fini(void);
+int ila_rslv_init(void);
+void ila_rslv_fini(void);
+int ila_init_resolver_net(struct ila_net *ilan);
+void ila_exit_resolver_net(struct ila_net *ilan);
 
 #endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index aba0998..83c7d4a 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -157,7 +157,13 @@ static int __init ila_init(void)
 	if (ret)
 		goto fail_xlat;
 
+	ret = ila_rslv_init();
+	if (ret)
+		goto fail_rslv;
+
 	return 0;
+fail_rslv:
+	ila_xlat_fini();
 fail_xlat:
 	ila_lwt_fini();
 fail_lwt:
@@ -168,6 +174,7 @@ static void __exit ila_fini(void)
 {
 	ila_xlat_fini();
 	ila_lwt_fini();
+	ila_rslv_fini();
 }
 
 module_init(ila_init);
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index 30a6920..70d8988 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -9,6 +9,7 @@
 #include <net/ip.h>
 #include <net/ip6_fib.h>
 #include <net/lwtunnel.h>
+#include <net/netns/generic.h>
 #include <net/protocol.h>
 #include <uapi/linux/ila.h>
 #include "ila.h"
@@ -122,6 +123,14 @@ static int ila_build_state(struct net *net, struct net_device *dev,
 
 	*ts = newts;
 
+	if (cfg6->fc_dst_len >= sizeof(struct ila_addr)) {
+		struct net *net = dev_net(dev);
+		struct ila_net *ilan = net_generic(net, ila_net_id);
+
+		/* Cancel any pending resolution on this address */
+		ila_rslv_resolved(ilan, iaddr);
+	}
+
 	return 0;
 }
 
diff --git a/net/ipv6/ila/ila_resolver.c b/net/ipv6/ila/ila_resolver.c
new file mode 100644
index 0000000..0f5a819
--- /dev/null
+++ b/net/ipv6/ila/ila_resolver.c
@@ -0,0 +1,249 @@
+#include <linux/errno.h>
+#include <linux/ip.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/types.h>
+#include <net/checksum.h>
+#include <net/ip.h>
+#include <net/ip6_fib.h>
+#include <net/lwtunnel.h>
+#include <net/netns/generic.h>
+#include <net/protocol.h>
+#include <net/resolver.h>
+#include <uapi/linux/ila.h>
+#include "ila.h"
+
+struct ila_notify_params {
+	unsigned int timeout;
+};
+
+static inline struct ila_notify_params *ila_notify_params_lwtunnel(
+	struct lwtunnel_state *lwstate)
+{
+	return (struct ila_notify_params *)lwstate->data;
+}
+
+static int ila_fill_notify(struct sk_buff *skb, struct in6_addr *addr,
+			   u32 pid, u32 seq, int event, int flags)
+{
+	struct nlmsghdr *nlh;
+	struct rtmsg *rtm;
+
+	nlh = nlmsg_put(skb, pid, seq, event, sizeof(*rtm), flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	rtm = nlmsg_data(nlh);
+	rtm->rtm_family   = RTNL_FAMILY_ILA;
+	rtm->rtm_dst_len  = 128;
+	rtm->rtm_src_len  = 0;
+	rtm->rtm_tos      = 0;
+	rtm->rtm_table    = RT6_TABLE_UNSPEC;
+	rtm->rtm_type     = RTN_UNICAST;
+	rtm->rtm_scope    = RT_SCOPE_UNIVERSE;
+
+	if (nla_put_in6_addr(skb, RTA_DST, addr)) {
+		nlmsg_cancel(skb, nlh);
+		return -EMSGSIZE;
+	}
+
+	nlmsg_end(skb, nlh);
+	return 0;
+}
+
+static size_t ila_rslv_msgsize(void)
+{
+	size_t len =
+		NLMSG_ALIGN(sizeof(struct rtmsg))
+		+ nla_total_size(16)     /* RTA_DST */
+		;
+
+	return len;
+}
+
+void ila_rslv_notify(struct net *net, struct sk_buff *skb)
+{
+	struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	struct sk_buff *nlskb;
+	int err = 0;
+
+	/* Send ILA notification to user */
+	nlskb = nlmsg_new(ila_rslv_msgsize(), GFP_KERNEL);
+	if (!nlskb)
+		goto errout;
+
+	err = ila_fill_notify(nlskb, &ip6h->daddr, 0, 0, RTM_ADDR_RESOLVE,
+			      NLM_F_MULTI);
+	if (err < 0) {
+		WARN_ON(err == -EMSGSIZE);
+		kfree_skb(nlskb);
+		goto errout;
+	}
+	rtnl_notify(nlskb, net, 0, RTNLGRP_ILA_NOTIFY, NULL, GFP_ATOMIC);
+	return;
+
+errout:
+	if (err < 0)
+		rtnl_set_sk_err(net, RTNLGRP_ILA_NOTIFY, err);
+}
+
+static int ila_rslv_output(struct net *net, struct sock *sk,
+			   struct sk_buff *skb)
+{
+	struct ila_net *ilan = net_generic(net, ila_net_id);
+	struct dst_entry *dst = skb_dst(skb);
+	struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	struct ila_notify_params *p;
+	bool new;
+
+	p = ila_notify_params_lwtunnel(dst->lwtstate);
+
+	/* Don't bother taking rcu lock, we only want to know if the entry
+	 * exists or not.
+	 */
+	net_rslv_lookup_and_create(ilan->nrslv, &ip6h->daddr, &new,
+				   p->timeout);
+
+	if (new)
+		ila_rslv_notify(net, skb);
+
+	return dst->lwtstate->orig_output(net, sk, skb);
+}
+
+void ila_rslv_resolved(struct ila_net *ilan, struct ila_addr *iaddr)
+{
+	if (ilan->nrslv)
+		net_rslv_resolved(ilan->nrslv, iaddr);
+}
+
+static int ila_rslv_input(struct sk_buff *skb)
+{
+	struct dst_entry *dst = skb_dst(skb);
+
+	return dst->lwtstate->orig_input(skb);
+}
+
+static const struct nla_policy ila_notify_nl_policy[ILA_NOTIFY_ATTR_MAX + 1] = {
+	[ILA_NOTIFY_ATTR_TIMEOUT] = { .type = NLA_U32, },
+};
+
+static int ila_rslv_build_state(struct net *net, struct net_device *dev,
+				struct nlattr *nla, unsigned int family,
+				const void *cfg, struct lwtunnel_state **ts)
+{
+	struct ila_notify_params *p;
+	struct nlattr *tb[ILA_NOTIFY_ATTR_MAX + 1];
+	struct lwtunnel_state *newts;
+	struct ila_net *ilan = net_generic(net, ila_net_id);
+	size_t encap_len = sizeof(*p);
+	int ret;
+
+	if (unlikely(!ilan->nrslv)) {
+		int err;
+
+		/* Only create net resolver on demand */
+		err = ila_init_resolver_net(ilan);
+		if (err)
+			return err;
+	}
+
+	if (family != AF_INET6)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, ILA_NOTIFY_ATTR_MAX, nla,
+			       ila_notify_nl_policy);
+
+	if (ret < 0)
+		return ret;
+
+	newts = lwtunnel_state_alloc(encap_len);
+	if (!newts)
+		return -ENOMEM;
+
+	newts->len = 0;
+	newts->type = LWTUNNEL_ENCAP_ILA_NOTIFY;
+	newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
+			LWTUNNEL_STATE_INPUT_REDIRECT;
+
+	p = ila_notify_params_lwtunnel(newts);
+
+	if (tb[ILA_NOTIFY_ATTR_TIMEOUT])
+		p->timeout = msecs_to_jiffies(nla_get_u32(
+			tb[ILA_NOTIFY_ATTR_TIMEOUT]));
+
+	*ts = newts;
+
+	return 0;
+}
+
+static int ila_rslv_fill_encap_info(struct sk_buff *skb,
+				    struct lwtunnel_state *lwtstate)
+{
+	struct ila_notify_params *p = ila_notify_params_lwtunnel(lwtstate);
+
+	if (nla_put_u32(skb, ILA_NOTIFY_ATTR_TIMEOUT,
+			(__force u32)jiffies_to_msecs(p->timeout)))
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static int ila_rslv_nlsize(struct lwtunnel_state *lwtstate)
+{
+	return nla_total_size(sizeof(u32)) + /* ILA_NOTIFY_ATTR_TIMEOUT */
+	       0;
+}
+
+static int ila_rslv_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
+{
+	return 0;
+}
+
+static const struct lwtunnel_encap_ops ila_rslv_ops = {
+	.build_state = ila_rslv_build_state,
+	.output = ila_rslv_output,
+	.input = ila_rslv_input,
+	.fill_encap = ila_rslv_fill_encap_info,
+	.get_encap_size = ila_rslv_nlsize,
+	.cmp_encap = ila_rslv_cmp,
+};
+
+#define ILA_MAX_SIZE 8192
+
+int ila_init_resolver_net(struct ila_net *ilan)
+{
+	struct net_rslv *nrslv;
+
+	nrslv = net_rslv_create(sizeof(struct ila_addr),
+				sizeof(struct ila_addr), ILA_MAX_SIZE,
+				NULL, NULL, NULL);
+
+	if (IS_ERR(nrslv))
+		return PTR_ERR(nrslv);
+
+	ilan->nrslv = nrslv;
+
+	return 0;
+}
+
+void ila_exit_resolver_net(struct ila_net *ilan)
+{
+	if (ilan->nrslv)
+		net_rslv_destroy(ilan->nrslv);
+}
+
+int ila_rslv_init(void)
+{
+	return lwtunnel_encap_add_ops(&ila_rslv_ops, LWTUNNEL_ENCAP_ILA_NOTIFY);
+}
+
+void ila_rslv_fini(void)
+{
+	lwtunnel_encap_del_ops(&ila_rslv_ops, LWTUNNEL_ENCAP_ILA_NOTIFY);
+}
diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index 7d1c34b..857f8b5 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -21,14 +21,7 @@ struct ila_map {
 	struct rcu_head rcu;
 };
 
-static unsigned int ila_net_id;
-
-struct ila_net {
-	struct rhashtable rhash_table;
-	spinlock_t *locks; /* Bucket locks for entry manipulation */
-	unsigned int locks_mask;
-	bool hooks_registered;
-};
+unsigned int ila_net_id;
 
 static u32 hashrnd __read_mostly;
 static __always_inline void __ila_hash_secret_init(void)
@@ -546,6 +539,10 @@ static __net_init int ila_init_net(struct net *net)
 	if (err)
 		return err;
 
+	/* Resolver net is created on demand when LWT ILA resolver route
+	 * is made.
+	 */
+
 	rhashtable_init(&ilan->rhash_table, &rht_params);
 
 	return 0;
@@ -557,6 +554,8 @@ static __net_exit void ila_exit_net(struct net *net)
 
 	rhashtable_free_and_destroy(&ilan->rhash_table, ila_free_cb, NULL);
 
+	ila_exit_resolver_net(ilan);
+
 	free_bucket_spinlocks(ilan->locks);
 
 	if (ilan->hooks_registered)
-- 
2.8.0.rc2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox