* Re: [PATCH net] net: ipv6: regenerate host route if moved to gc list
From: David Ahern @ 2017-04-22 14:14 UTC (permalink / raw)
To: Dmitry Vyukov, Martin KaFai Lau; +Cc: netdev, andreyknvl, mmanning
In-Reply-To: <CACT4Y+aRBZApshv2T-edaErPmiCsbMxNNkzC5Hs5jGJdvVwOAg@mail.gmail.com>
On 4/22/17 3:14 AM, Dmitry Vyukov wrote:
>> One small question. Why cmpxchg is needed instead
>> of a ip6_rt_put() and then assign?
>> Is it fixing another bug?
> cmpxchg here looks fishy.
> If there are no concurrent modifications, then it is not needed.
> If there are and cmpxchg fails, then we will put the installed rt and
> leak the new one.
>
Yes, I need to convert that to changing the rt under a lock.
Leftover from the beginning of the investigation when I suspected
locking and recalled Li's patch. I'll send a v2.
^ permalink raw reply
* Re: compile issue in latest iproute2
From: Daniel Borkmann @ 2017-04-22 15:00 UTC (permalink / raw)
To: Jamal Hadi Salim, Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <cecf76aa-2883-741e-1901-2f4850e2c188@mojatatu.com>
On 04/22/2017 02:31 PM, Jamal Hadi Salim wrote:
>
> I dont think is a kernel uapi - but it was failing compiling
> when HAVE_ELF is false.
> -----
> jhs@jhs-UX:~/git-trees/others/iproute-with-ck$ git diff include/bpf_util.h
> diff --git a/include/bpf_util.h b/include/bpf_util.h
> index 5361dab..edca339 100644
> --- a/include/bpf_util.h
> +++ b/include/bpf_util.h
> @@ -266,7 +266,7 @@ int bpf_send_map_fds(const char *path, const char *obj);
> int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
> unsigned int entries);
> #else
> -static inline int bpf_send_map_fds(const char *path, const char *obj)
> +inline int bpf_send_map_fds(const char *path, const char *obj)
> {
> return 0;
> }
> -----
>
> Let me know if you want a formal patch or feel free to take it.
Will resolve it and send a patch later today, thanks!
^ permalink raw reply
* Re: [PATCH 2/2] sparc64: Add eBPF JIT.
From: Alexei Starovoitov @ 2017-04-22 15:32 UTC (permalink / raw)
To: David Miller; +Cc: sparclinux, netdev, ast, daniel
In-Reply-To: <20170421.201711.317784995765325131.davem@davemloft.net>
On Fri, Apr 21, 2017 at 08:17:11PM -0700, David Miller wrote:
>
> This is an eBPF JIT for sparc64. All major features are supported.
>
> All tests under tools/testing/selftests/bpf/ pass.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
...
> + /* tail call */
> + case BPF_JMP | BPF_CALL |BPF_X:
> + emit_tail_call(ctx);
> +
I think 'break;' is missing here.
When tail_call's target program is null the current program should
continue instead of aborting.
Like in our current ddos+lb setup the program looks like:
bpf_tail_call(ctx, &prog_array, 1);
bpf_tail_call(ctx, &prog_array, 2);
bpf_tail_call(ctx, &prog_array, 3);
return XDP_DROP;
this way it will jump into the program that is installed in slot 1.
If it's empty, it will try slot 2...
If no programs installed it will drop the packet.
> + /* function return */
> + case BPF_JMP | BPF_EXIT:
> + /* Optimization: when last instruction is EXIT,
> + simply fallthrough to epilogue. */
> + if (i == ctx->prog->len - 1)
> + break;
> + emit_branch(BA, ctx->idx, ctx->epilogue_offset, ctx);
> + emit_nop(ctx);
^ permalink raw reply
* Re: Why max netlink msg size is limited to 16k
From: Eric Dumazet @ 2017-04-22 15:36 UTC (permalink / raw)
To: prashantkumar dhotre; +Cc: netdev
In-Reply-To: <CA+VDgmNYc=djp6ab=n49CTpyvof8CPgHvXRst5Vnm706e-nRvQ@mail.gmail.com>
On Sat, 2017-04-22 at 19:43 +0530, prashantkumar dhotre wrote:
> I am observing that max netlink msg that my kernel module can send to
> user app is close to 16K.
>
> For larger sizes, genlmsg_unicast() succeeds but my app does not receive data.
>
> I have tried increasing RECV buffer size in my user app but that does not help.
>
> Regards
You need a kernel >= linux-4.9 to get about 32KB
Why is this limited ? Please read
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d35c99ff77ecb2eb239731b799386f3b3637a31e
^ permalink raw reply
* [PATCH v2 net-next] net: ipv6: send unsolicited NA if enabled for all interfaces
From: David Ahern @ 2017-04-22 16:10 UTC (permalink / raw)
To: netdev; +Cc: hannes, David Ahern
When arp_notify is set to 1 for either a specific interface or for 'all'
interfaces, gratuitous arp requests are sent. Since ndisc_notify is the
ipv6 equivalent to arp_notify, it should follow the same semantics.
Commit 4a6e3c5def13 ("net: ipv6: send unsolicited NA on admin up") sends
the NA on admin up. The final piece is checking devconf_all->ndisc_notify
in addition to the per device setting. Add it.
Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer address change")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v2
- update commit message with subject of commit 4a6e3c5def13 per comment
from Sergei
net/ipv6/ndisc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index b23822e64228..d310dc41209a 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1753,7 +1753,8 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event,
idev = in6_dev_get(dev);
if (!idev)
break;
- if (idev->cnf.ndisc_notify)
+ if (idev->cnf.ndisc_notify ||
+ net->ipv6.devconf_all->ndisc_notify)
ndisc_send_unsol_na(dev);
in6_dev_put(idev);
break;
--
2.1.4
^ permalink raw reply related
* Re: compile issue in latest iproute2
From: Daniel Borkmann @ 2017-04-22 16:18 UTC (permalink / raw)
To: Jamal Hadi Salim, Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <58FB7012.4080907@iogearbox.net>
On 04/22/2017 05:00 PM, Daniel Borkmann wrote:
> On 04/22/2017 02:31 PM, Jamal Hadi Salim wrote:
>>
>> I dont think is a kernel uapi - but it was failing compiling
>> when HAVE_ELF is false.
>> -----
>> jhs@jhs-UX:~/git-trees/others/iproute-with-ck$ git diff include/bpf_util.h
>> diff --git a/include/bpf_util.h b/include/bpf_util.h
>> index 5361dab..edca339 100644
>> --- a/include/bpf_util.h
>> +++ b/include/bpf_util.h
>> @@ -266,7 +266,7 @@ int bpf_send_map_fds(const char *path, const char *obj);
>> int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
>> unsigned int entries);
>> #else
>> -static inline int bpf_send_map_fds(const char *path, const char *obj)
>> +inline int bpf_send_map_fds(const char *path, const char *obj)
>> {
>> return 0;
>> }
>> -----
>>
>> Let me know if you want a formal patch or feel free to take it.
>
> Will resolve it and send a patch later today, thanks!
Hmm, I'm on latest f443565f8df6 ("ip vrf: Add command name next to
pid") commit in master branch. Compiles fine for me with and without
ELF support. I verified that there's no HAVE_ELF defined and I'm
not seeing an error.
Without ELF support:
# ./configure
TC schedulers
ATM no
libc has setns: yes
SELinux support: yes
ELF support: no
libmnl support: yes
Berkeley DB: yes
docs: latex: yes
pdflatex: yes
sgml2latex: no
WARNING: no LaTeX files can be build from SGML files
sgml2html: no
WARNING: no HTML docs can be built from SGML
# make > /dev/null
ssfilter.y: conflicts: 35 shift/reduce
#
With ELF support:
# ./configure
TC schedulers
ATM no
libc has setns: yes
SELinux support: yes
ELF support: yes
libmnl support: yes
Berkeley DB: yes
docs: latex: yes
pdflatex: yes
sgml2latex: no
WARNING: no LaTeX files can be build from SGML files
sgml2html: no
WARNING: no HTML docs can be built from SGML
# make > /dev/null
ssfilter.y: conflicts: 35 shift/reduce
#
Anything I'm missing?
^ permalink raw reply
* [PATCH net-next] net: add rcu locking when changing early demux
From: David Ahern @ 2017-04-22 16:33 UTC (permalink / raw)
To: netdev; +Cc: subashab, David Ahern
systemd-sysctl is triggering a suspicious RCU usage message when
net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via
a sysctl config file:
[ 33.896184] ===============================
[ 33.899558] [ ERR: suspicious RCU usage. ]
[ 33.900624] 4.11.0-rc7+ #104 Not tainted
[ 33.901698] -------------------------------
[ 33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage!
[ 33.905724]
other info that might help us debug this:
[ 33.907656]
rcu_scheduler_active = 2, debug_locks = 0
[ 33.909288] 1 lock held by systemd-sysctl/143:
[ 33.910373] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff8123a370>] file_start_write+0x45/0x48
[ 33.912407]
stack backtrace:
[ 33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104
[ 33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 33.917870] Call Trace:
[ 33.918431] dump_stack+0x81/0xb6
[ 33.919241] lockdep_rcu_suspicious+0x10f/0x118
[ 33.920263] proc_configure_early_demux+0x65/0x10a
[ 33.921391] proc_udp_early_demux+0x3a/0x41
add rcu locking to proc_configure_early_demux.
Fixes: dddb64bcb3461 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
net/ipv4/sysctl_net_ipv4.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 6fb25693c00b..ddac9e64b702 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -302,6 +302,8 @@ static void proc_configure_early_demux(int enabled, int protocol)
struct inet6_protocol *ip6prot;
#endif
+ rcu_read_lock();
+
ipprot = rcu_dereference(inet_protos[protocol]);
if (ipprot)
ipprot->early_demux = enabled ? ipprot->early_demux_handler :
@@ -313,6 +315,7 @@ static void proc_configure_early_demux(int enabled, int protocol)
ip6prot->early_demux = enabled ? ip6prot->early_demux_handler :
NULL;
#endif
+ rcu_read_unlock();
}
static int proc_tcp_early_demux(struct ctl_table *table, int write,
--
2.1.4
^ permalink raw reply related
* [PATCH v2 net] net: ipv6: regenerate host route if moved to gc list
From: David Ahern @ 2017-04-22 16:40 UTC (permalink / raw)
To: netdev; +Cc: dvyukov, andreyknvl, mmanning, David Ahern
Taking down the loopback device wreaks havoc on IPv6 routes. By
extension, taking a VRF device wreaks havoc on its table.
Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
FIB code while running syzkaller fuzzer. The root cause is a dead dst
that is on the garbage list gets reinserted into the IPv6 FIB. While on
the gc (or perhaps when it gets added to the gc list) the dst->next is
set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
out-of-bounds access.
Andrey's reproducer was the key to getting to the bottom of this.
With IPv6, host routes for an address have the dst->dev set to the
loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
a walk of the fib evicting routes with the 'lo' device which means all
host routes are removed. That process moves the dst which is attached to
an inet6_ifaddr to the gc list and marks it as dead.
The recent change to keep global IPv6 addresses added a new function
fixup_permanent_addr that is called on admin up. That function restarts
dad for an inet6_ifaddr and when it completes the host route attached
to it is inserted into the fib. Since the route was marked dead and
moved to the gc list, we get the reported out-of-bounds accesses. If
the device with the address is taken down or the address is removed, the
WARN_ON in fib6_del is triggered.
All of those faults are fixed by regenerating the host route of the
existing one has been moved to the gc list, something that can be
determined by checking if the rt6i_ref counter is 0.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v2
- change ifp->rt under spinlock vs cmpxchg
- add comment about rt6i_ref == 0
Dmitry / Andrey: can you guys add this patch to your tree and run
syzkaller tests? I'd like to confirm that all of the fib traces
are fixed. Thanks.
net/ipv6/addrconf.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 08f9e8ea7a81..97e86158bbcb 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3303,14 +3303,24 @@ static void addrconf_gre_config(struct net_device *dev)
static int fixup_permanent_addr(struct inet6_dev *idev,
struct inet6_ifaddr *ifp)
{
- if (!ifp->rt) {
- struct rt6_info *rt;
+ /* rt6i_ref == 0 means the host route was removed from the
+ * FIB, for example, if 'lo' device is taken down. In that
+ * case regenerate the host route.
+ */
+ if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
+ struct rt6_info *rt, *prev;
rt = addrconf_dst_alloc(idev, &ifp->addr, false);
if (unlikely(IS_ERR(rt)))
return PTR_ERR(rt);
+ spin_lock(&ifp->lock);
+ prev = ifp->rt;
ifp->rt = rt;
+ spin_unlock(&ifp->lock);
+
+ if (prev)
+ ip6_rt_put(prev);
}
if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
--
2.1.4
^ permalink raw reply related
* Re: compile issue in latest iproute2
From: Jamal Hadi Salim @ 2017-04-22 16:43 UTC (permalink / raw)
To: Daniel Borkmann, Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <58FB8272.5080306@iogearbox.net>
On 17-04-22 12:18 PM, Daniel Borkmann wrote:
[..]
>
> Anything I'm missing?
Let me get back to that machine (couple of hours) and try to see how i
created the issue.
Shouldve cutnpasted the error msg. Cant create it on this laptop.
cheers,
jamal
^ permalink raw reply
* Fw: [Bug 195503] New: tipc: unchecked return value of nlmsg_new() in function tipc_nl_node_get_monitor()
From: Stephen Hemminger @ 2017-04-22 16:48 UTC (permalink / raw)
To: jon.maloy, ying.xue; +Cc: netdev
Begin forwarded message:
Date: Sat, 22 Apr 2017 14:56:25 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 195503] New: tipc: unchecked return value of nlmsg_new() in function tipc_nl_node_get_monitor()
https://bugzilla.kernel.org/show_bug.cgi?id=195503
Bug ID: 195503
Summary: tipc: unchecked return value of nlmsg_new() in
function tipc_nl_node_get_monitor()
Product: Networking
Version: 2.5
Kernel Version: linux-4.11-rc7
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Other
Assignee: stephen@networkplumber.org
Reporter: bianpan2010@ruc.edu.cn
Regression: No
Function nlmsg_new() will return a NULL pointer if there is no enough memory.
In function tipc_nl_node_get_monitor(), the return value of nlmsg_new() is not
checked (see line 2100), which may result in bad memory access.
tipc_nl_node_get_monitor @@ net/tipc/node.c
2094 int tipc_nl_node_get_monitor(struct sk_buff *skb, struct genl_info *info)
2095 {
2096 struct net *net = sock_net(skb->sk);
2097 struct tipc_nl_msg msg;
2098 int err;
2099
2100 msg.skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
2101 msg.portid = info->snd_portid;
2102 msg.seq = info->snd_seq;
2103
2104 err = __tipc_nl_add_monitor_prop(net, &msg);
2105 if (err) {
2106 nlmsg_free(msg.skb);
2107 return err;
2108 }
2109
2110 return genlmsg_reply(msg.skb, info);
2111 }
Generally, the return value of nlmsg_new() should be checked against NULL, as
follows.
nfc_genl_target_lost @@ net/nfc/netlink.c:
213 int nfc_genl_target_lost(struct nfc_dev *dev, u32 target_idx)
214 {
215 struct sk_buff *msg;
216 void *hdr;
217
218 msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
219 if (!msg)
220 return -ENOMEM;
...
237 nla_put_failure:
238 genlmsg_cancel(msg, hdr);
239 free_msg:
240 nlmsg_free(msg);
241 return -EMSGSIZE;
242 }
Thanks very much for your attention!
Pan Bian
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply
* Fw: [Bug 195497] New: openvswitch: unchecked return value of nla_nest_start() in function queue_userspace_packet()
From: Stephen Hemminger @ 2017-04-22 16:49 UTC (permalink / raw)
To: pshelar; +Cc: netdev
Begin forwarded message:
Date: Sat, 22 Apr 2017 14:52:46 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 195497] New: openvswitch: unchecked return value of nla_nest_start() in function queue_userspace_packet()
https://bugzilla.kernel.org/show_bug.cgi?id=195497
Bug ID: 195497
Summary: openvswitch: unchecked return value of
nla_nest_start() in function queue_userspace_packet()
Product: Networking
Version: 2.5
Kernel Version: linux-4.11-rc7
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Other
Assignee: stephen@networkplumber.org
Reporter: bianpan2010@ruc.edu.cn
Regression: No
Function nla_nest_start() may return a NULL pointer on error. However, in
function queue_userspace_packet(), the return value of nla_nest_start() is not
checked against NULL (see lines 489 and 496), and may result in bad memory
access. Related code snippets are shown as follows.
queue_userspace_packet @@ net/openvswitch/datapath.c:420
420 static int queue_userspace_packet(struct datapath *dp, struct sk_buff
*skb,
421 const struct sw_flow_key *key,
422 const struct dp_upcall_info *upcall_info,
423 uint32_t cutlen)
424 {
425 struct ovs_header *upcall;
...
468 len = upcall_msg_size(upcall_info, hlen - cutlen);
469 user_skb = genlmsg_new(len, GFP_ATOMIC);
470 if (!user_skb) {
471 err = -ENOMEM;
472 goto out;
473 }
474
475 upcall = genlmsg_put(user_skb, 0, 0, &dp_packet_genl_family,
476 0, upcall_info->cmd);
477 upcall->dp_ifindex = dp_ifindex;
...
487 if (upcall_info->egress_tun_info) {
488 nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_EGRESS_TUN_KEY);
489 err = ovs_nla_put_tunnel_info(user_skb,
490 upcall_info->egress_tun_info);
491 BUG_ON(err);
492 nla_nest_end(user_skb, nla);
493 }
494
495 if (upcall_info->actions_len) {
496 nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_ACTIONS);
497 err = ovs_nla_put_actions(upcall_info->actions,
498 upcall_info->actions_len,
499 user_skb);
500 if (!err)
501 nla_nest_end(user_skb, nla);
502 else
503 nla_nest_cancel(user_skb, nla);
504 }
...
545 out:
546 if (err)
547 skb_tx_error(skb);
548 kfree_skb(user_skb);
549 kfree_skb(nskb);
550 return err;
551 }
Generally, the return value of function nla_nest_start() should be checked
against NULL, as follows.
rtnetlink_put_metrics @@ net/core/rtnetlink.c:
686 int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics)
687 {
688 struct nlattr *mx;
689 int i, valid = 0;
690
691 mx = nla_nest_start(skb, RTA_METRICS);
692 if (mx == NULL)
693 return -ENOBUFS;
...
726 return nla_nest_end(skb, mx);
727
728 nla_put_failure:
729 nla_nest_cancel(skb, mx);
730 return -EMSGSIZE;
731 }
Thanks very much for your attention!
Pan Bian
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply
* Fw: [Bug 195495] New: unchecked return value of nla_nest_start() in function lwtunnel_fill_encap()
From: Stephen Hemminger @ 2017-04-22 16:49 UTC (permalink / raw)
To: netdev
Begin forwarded message:
Date: Sat, 22 Apr 2017 14:49:46 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 195495] New: unchecked return value of nla_nest_start() in function lwtunnel_fill_encap()
https://bugzilla.kernel.org/show_bug.cgi?id=195495
Bug ID: 195495
Summary: unchecked return value of nla_nest_start() in function
lwtunnel_fill_encap()
Product: Networking
Version: 2.5
Kernel Version: linux-4.11-rc7
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Other
Assignee: stephen@networkplumber.org
Reporter: bianpan2010@ruc.edu.cn
Regression: No
Function nla_nest_start() may return a NULL pointer on error. However, in
function lwtunnel_fill_encap(), the return value of nla_nest_start() is not
checked against NULL (see line 218), and may result in bad memory access.
Related code snippets are shown as follows.
lwtunnel_fill_encap @@ net/core/lwtunnel.c: 204
204 int lwtunnel_fill_encap(struct sk_buff *skb, struct lwtunnel_state
*lwtstate)
205 {
206 const struct lwtunnel_encap_ops *ops;
207 struct nlattr *nest;
208 int ret = -EINVAL;
209
210 if (!lwtstate)
211 return 0;
212
213 if (lwtstate->type == LWTUNNEL_ENCAP_NONE ||
214 lwtstate->type > LWTUNNEL_ENCAP_MAX)
215 return 0;
216
217 ret = -EOPNOTSUPP;
218 nest = nla_nest_start(skb, RTA_ENCAP);
219 rcu_read_lock();
220 ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
221 if (likely(ops && ops->fill_encap))
222 ret = ops->fill_encap(skb, lwtstate);
223 rcu_read_unlock();
224
225 if (ret)
226 goto nla_put_failure;
227 nla_nest_end(skb, nest);
228 ret = nla_put_u16(skb, RTA_ENCAP_TYPE, lwtstate->type);
229 if (ret)
230 goto nla_put_failure;
231
232 return 0;
233
234 nla_put_failure:
235 nla_nest_cancel(skb, nest);
236
237 return (ret == -EOPNOTSUPP ? 0 : ret);
238 }
Generally, the return value of function nla_nest_start() should be checked
against NULL, as follows.
rtnetlink_put_metrics @@ net/core/rtnetlink.c:
686 int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics)
687 {
688 struct nlattr *mx;
689 int i, valid = 0;
690
691 mx = nla_nest_start(skb, RTA_METRICS);
692 if (mx == NULL)
693 return -ENOBUFS;
...
726 return nla_nest_end(skb, mx);
727
728 nla_put_failure:
729 nla_nest_cancel(skb, mx);
730 return -EMSGSIZE;
731 }
Thanks very much for your attention!
Pan Bian
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply
* Re: compile issue in latest iproute2
From: Stephen Hemminger @ 2017-04-22 16:54 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: Daniel Borkmann, netdev@vger.kernel.org
In-Reply-To: <b6773e0d-a8cb-181c-4aee-15d51439675f@mojatatu.com>
On Sat, 22 Apr 2017 12:43:50 -0400
Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 17-04-22 12:18 PM, Daniel Borkmann wrote:
> [..]
> >
> > Anything I'm missing?
>
>
> Let me get back to that machine (couple of hours) and try to see how i
> created the issue.
> Shouldve cutnpasted the error msg. Cant create it on this laptop.
>
> cheers,
> jamal
Current tip of iproute2 master compiles fine for me
both with and without HAVE_ELF
^ permalink raw reply
* [PATCH] net: can: usb: gs_usb: Fix buffer on stack
From: Maksim Salau @ 2017-04-22 16:56 UTC (permalink / raw)
To: Wolfgang Grandegger, Marc Kleine-Budde, Maximilian Schneider,
Hubert Denkmair, Wolfram Sang, Ethan Zonca, linux-can, netdev
Cc: Maksim Salau
Allocate buffer on HEAP instead of STACK for a local structure
that is to be sent using usb_control_msg().
Signed-off-by: Maksim Salau <maksim.salau@gmail.com>
---
drivers/net/can/usb/gs_usb.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index a0dabd4..98f972a 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -740,13 +740,18 @@ static const struct net_device_ops gs_usb_netdev_ops = {
static int gs_usb_set_identify(struct net_device *netdev, bool do_identify)
{
struct gs_can *dev = netdev_priv(netdev);
- struct gs_identify_mode imode;
+ struct gs_identify_mode *imode = NULL;
int rc;
+ imode = kmalloc(sizeof(*imode), GFP_KERNEL);
+
+ if (!imode)
+ return -ENOMEM;
+
if (do_identify)
- imode.mode = GS_CAN_IDENTIFY_ON;
+ imode->mode = GS_CAN_IDENTIFY_ON;
else
- imode.mode = GS_CAN_IDENTIFY_OFF;
+ imode->mode = GS_CAN_IDENTIFY_OFF;
rc = usb_control_msg(interface_to_usbdev(dev->iface),
usb_sndctrlpipe(interface_to_usbdev(dev->iface),
@@ -756,10 +761,12 @@ static int gs_usb_set_identify(struct net_device *netdev, bool do_identify)
USB_RECIP_INTERFACE,
dev->channel,
0,
- &imode,
- sizeof(imode),
+ imode,
+ sizeof(*imode),
100);
+ kfree(imode);
+
return (rc > 0) ? 0 : rc;
}
--
2.9.3
^ permalink raw reply related
* [PATCH] net: wireless: orinoco: usb: Fix buffer on stack
From: Maksim Salau @ 2017-04-22 17:03 UTC (permalink / raw)
To: Kalle Valo, David S . Miller, Wolfram Sang, Mugunthan V N,
Florian Westphal, linux-wireless, netdev
Cc: Maksim Salau
Allocate buffer on HEAP instead of STACK for a local variable
that is to be sent using usb_control_msg().
Signed-off-by: Maksim Salau <maksim.salau@gmail.com>
---
drivers/net/wireless/intersil/orinoco/orinoco_usb.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
index bca6935..eb4528b 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -770,18 +770,31 @@ static int ezusb_submit_in_urb(struct ezusb_priv *upriv)
static inline int ezusb_8051_cpucs(struct ezusb_priv *upriv, int reset)
{
- u8 res_val = reset; /* avoid argument promotion */
+ int ret;
+ u8 *res_val = NULL;
if (!upriv->udev) {
err("%s: !upriv->udev", __func__);
return -EFAULT;
}
- return usb_control_msg(upriv->udev,
+
+ res_val = kmalloc(sizeof(*res_val), GFP_KERNEL);
+
+ if (!res_val)
+ return -ENOMEM;
+
+ *res_val = reset; /* avoid argument promotion */
+
+ ret = usb_control_msg(upriv->udev,
usb_sndctrlpipe(upriv->udev, 0),
EZUSB_REQUEST_FW_TRANS,
USB_TYPE_VENDOR | USB_RECIP_DEVICE |
- USB_DIR_OUT, EZUSB_CPUCS_REG, 0, &res_val,
- sizeof(res_val), DEF_TIMEOUT);
+ USB_DIR_OUT, EZUSB_CPUCS_REG, 0, res_val,
+ sizeof(*res_val), DEF_TIMEOUT);
+
+ kfree(res_val);
+
+ return ret;
}
static int ezusb_firmware_download(struct ezusb_priv *upriv,
--
2.9.3
^ permalink raw reply related
* Re: PROBLEM: IPVS incorrectly reverse-NATs traffic to LVS host
From: Julian Anastasov @ 2017-04-22 17:06 UTC (permalink / raw)
To: Nick Moriarty; +Cc: Wensong Zhang, Simon Horman, netdev, lvs-devel
In-Reply-To: <CAEo=OUnG+Bn04XaEkwHpL2bt6bEZXP_F6RSa9HDN=VjXdOxpoQ@mail.gmail.com>
Hello,
On Wed, 12 Apr 2017, Nick Moriarty wrote:
> Hi,
>
> I've experienced a problem in how traffic returning to an LVS host is
> handled in certain circumstances. Please find a bug report below - if
> there's any further information you'd like, please let me know.
>
> [1.] One line summary of the problem:
> IPVS incorrectly reverse-NATs traffic to LVS host
>
> [2.] Full description of the problem/report:
> When using IPVS in direct-routing mode, normal traffic from the LVS
> host to a back-end server is sometimes incorrectly NATed on the way
> back into the LVS host. Using tcpdump shows that the return packets
> have the correct source IP, but by the time it makes it back to the
> application, it's been changed.
>
> To reproduce this, a configuration such as the following will work:
> - Set up an LVS system with a VIP serving UDP to a backend DNS server
> using the direct-routing method in IPVS
> - Make an outgoing UDP request to the VIP from the LVS system itself
> (this causes a connection to be added to the IPVS connection table)
> - The request should succeed as normal
> - Note the UDP source port used
> - Within 5 minutes (before the UDP connection entry expires), make an
> outgoing UDP request directly to the backend DNS server
> - The request will fail as the reply is incorrectly modified on its
> way back and appears to return from the VIP
>
> Monitoring the above sequence with tcpdump verifies that the returned
> packet (as it enters the host) is from the DNS IP, even though the
> application sees the VIP.
>
> If an outgoing request direct to the DNS server is made from a port
> not in the connection table, everything is fine.
Thanks for the detailed report! I think, I fixed the
problem. Let me know if you are able to test the appended fix.
> I expect that somewhere, something (e.g. functionality for IPVS MASQ
> responses) is applying IPVS connection
> information to incoming traffic, matching a DROUTE rule, and treating
> it as NAT traffic.
Yep, that is what happens.
================================================================
[PATCH net] ipvs: SNAT packet replies only for NATed connections
We do not check if packet from real server is for NAT
connection before performing SNAT. This causes problems
for setups that use DR/TUN and allow local clients to
access the real server directly, for example:
- local client in director creates IPVS-DR/TUN connection
CIP->VIP and the request packets are routed to RIP.
Talks are finished but IPVS connection is not expired yet.
- second local client creates non-IPVS connection CIP->RIP
with same reply tuple RIP->CIP and when replies are received
on LOCAL_IN we wrongly assign them for the first client
connection because RIP->CIP matches the reply direction.
The problem is more visible to local UDP clients but in rare
cases it can happen also for TCP or remote clients when the
real server sends the reply traffic via the director.
So, better to be more precise for the reply traffic.
As replies are not expected for DR/TUN connections, better
to not touch them.
Reported-by: Nick Moriarty <nick.moriarty@york.ac.uk>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
net/netfilter/ipvs/ip_vs_core.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index db40050..ee44ed5 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -849,10 +849,8 @@ static int handle_response_icmp(int af, struct sk_buff *skb,
{
unsigned int verdict = NF_DROP;
- if (IP_VS_FWD_METHOD(cp) != 0) {
- pr_err("shouldn't reach here, because the box is on the "
- "half connection in the tun/dr module.\n");
- }
+ if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
+ goto ignore_cp;
/* Ensure the checksum is correct */
if (!skb_csum_unnecessary(skb) && ip_vs_checksum_complete(skb, ihl)) {
@@ -886,6 +884,8 @@ static int handle_response_icmp(int af, struct sk_buff *skb,
ip_vs_notrack(skb);
else
ip_vs_update_conntrack(skb, cp, 0);
+
+ignore_cp:
verdict = NF_ACCEPT;
out:
@@ -1385,8 +1385,11 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, in
*/
cp = pp->conn_out_get(ipvs, af, skb, &iph);
- if (likely(cp))
+ if (likely(cp)) {
+ if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)
+ goto ignore_cp;
return handle_response(af, skb, pd, cp, &iph, hooknum);
+ }
/* Check for real-server-started requests */
if (atomic_read(&ipvs->conn_out_counter)) {
@@ -1444,9 +1447,15 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, in
}
}
}
+
+out:
IP_VS_DBG_PKT(12, af, pp, skb, iph.off,
"ip_vs_out: packet continues traversal as normal");
return NF_ACCEPT;
+
+ignore_cp:
+ __ip_vs_conn_put(cp);
+ goto out;
}
/*
--
2.9.3
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply related
* Re: [PATCH] net: can: usb: gs_usb: Fix buffer on stack
From: Fabio Estevam @ 2017-04-22 17:30 UTC (permalink / raw)
To: Maksim Salau
Cc: Wolfgang Grandegger, Marc Kleine-Budde, Maximilian Schneider,
Hubert Denkmair, Wolfram Sang, Ethan Zonca, linux-can,
netdev@vger.kernel.org
In-Reply-To: <20170422165626.10534-1-maksim.salau@gmail.com>
On Sat, Apr 22, 2017 at 1:56 PM, Maksim Salau <maksim.salau@gmail.com> wrote:
> Allocate buffer on HEAP instead of STACK for a local structure
> that is to be sent using usb_control_msg().
>
> Signed-off-by: Maksim Salau <maksim.salau@gmail.com>
> ---
> drivers/net/can/usb/gs_usb.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
> index a0dabd4..98f972a 100644
> --- a/drivers/net/can/usb/gs_usb.c
> +++ b/drivers/net/can/usb/gs_usb.c
> @@ -740,13 +740,18 @@ static const struct net_device_ops gs_usb_netdev_ops = {
> static int gs_usb_set_identify(struct net_device *netdev, bool do_identify)
> {
> struct gs_can *dev = netdev_priv(netdev);
> - struct gs_identify_mode imode;
> + struct gs_identify_mode *imode = NULL;
No need to assign imode to NULL here.
^ permalink raw reply
* Re: [PATCH] rtl_bt: Update firmware for BT part of rtl8822be
From: Kyle McMartin @ 2017-04-22 18:25 UTC (permalink / raw)
To: Larry Finger
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-firmware-DgEjT+Ai2ygdnm+yROfE0A
In-Reply-To: <20170414055552.20762-1-Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
On Fri, Apr 14, 2017 at 12:55:52AM -0500, Larry Finger wrote:
> These files were supplied by Realtek.
>
> Signed-off-by: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
Applied, thanks Larry.
--Kyle
^ permalink raw reply
* Re: [PATCH 2/2] sparc64: Add eBPF JIT.
From: David Miller @ 2017-04-22 18:27 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: sparclinux, netdev, ast, daniel
In-Reply-To: <20170422153233.GA47144@ast-mbp.thefacebook.com>
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Sat, 22 Apr 2017 08:32:35 -0700
> On Fri, Apr 21, 2017 at 08:17:11PM -0700, David Miller wrote:
>>
>> This is an eBPF JIT for sparc64. All major features are supported.
>>
>> All tests under tools/testing/selftests/bpf/ pass.
>>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
> ...
>> + /* tail call */
>> + case BPF_JMP | BPF_CALL |BPF_X:
>> + emit_tail_call(ctx);
>> +
>
> I think 'break;' is missing here.
Good catch, I'll fix that.
> When tail_call's target program is null the current program should
> continue instead of aborting.
> Like in our current ddos+lb setup the program looks like:
> bpf_tail_call(ctx, &prog_array, 1);
> bpf_tail_call(ctx, &prog_array, 2);
> bpf_tail_call(ctx, &prog_array, 3);
> return XDP_DROP;
>
> this way it will jump into the program that is installed in slot 1.
> If it's empty, it will try slot 2...
> If no programs installed it will drop the packet.
Yes, with the break; fixed above that's what the sparc64 JIT will
end up doing. If any of the tests don't pass in emit_tail_call()
we branch to the end of the emit_tail_call() sequence.
Thanks.
^ permalink raw reply
* [net-next 4/5] mlx5: hide unused functions
From: Saeed Mahameed @ 2017-04-22 18:45 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Roi Dayan, Stephen Hemminger,
Stephen Hemminger, Saeed Mahameed
In-Reply-To: <20170422184507.26569-1-saeedm@mellanox.com>
From: Stephen Hemminger <stephen@networkplumber.org>
Fix sparse warnings in recent ipoib support.
The RDMA functions are not used yet, hide behind #ifdef.
Based on comment, they will eventually be local so make static.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index ec78e637840f..3c84e36af018 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -178,7 +178,7 @@ static int mlx5i_init_tx(struct mlx5e_priv *priv)
return 0;
}
-void mlx5i_cleanup_tx(struct mlx5e_priv *priv)
+static void mlx5i_cleanup_tx(struct mlx5e_priv *priv)
{
struct mlx5i_priv *ipriv = priv->ppriv;
@@ -359,9 +359,10 @@ static int mlx5i_close(struct net_device *netdev)
return 0;
}
+#ifdef notusedyet
/* IPoIB RDMA netdev callbacks */
-int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca,
- union ib_gid *gid, u16 lid, int set_qkey)
+static int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca,
+ union ib_gid *gid, u16 lid, int set_qkey)
{
struct mlx5e_priv *epriv = mlx5i_epriv(netdev);
struct mlx5_core_dev *mdev = epriv->mdev;
@@ -377,8 +378,8 @@ int mlx5i_attach_mcast(struct net_device *netdev, struct ib_device *hca,
return err;
}
-int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca,
- union ib_gid *gid, u16 lid)
+static int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca,
+ union ib_gid *gid, u16 lid)
{
struct mlx5e_priv *epriv = mlx5i_epriv(netdev);
struct mlx5_core_dev *mdev = epriv->mdev;
@@ -395,7 +396,7 @@ int mlx5i_detach_mcast(struct net_device *netdev, struct ib_device *hca,
return err;
}
-int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
+static int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
struct ib_ah *address, u32 dqpn, u32 dqkey)
{
struct mlx5e_priv *epriv = mlx5i_epriv(dev);
@@ -404,6 +405,7 @@ int mlx5i_xmit(struct net_device *dev, struct sk_buff *skb,
return mlx5i_sq_xmit(sq, skb, &mah->av, dqpn, dqkey);
}
+#endif
static int mlx5i_check_required_hca_cap(struct mlx5_core_dev *mdev)
{
@@ -418,10 +420,10 @@ static int mlx5i_check_required_hca_cap(struct mlx5_core_dev *mdev)
return 0;
}
-struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
- struct ib_device *ibdev,
- const char *name,
- void (*setup)(struct net_device *))
+static struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
+ struct ib_device *ibdev,
+ const char *name,
+ void (*setup)(struct net_device *))
{
const struct mlx5e_profile *profile = &mlx5i_nic_profile;
int nch = profile->max_nch(mdev);
@@ -480,7 +482,7 @@ struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
}
EXPORT_SYMBOL(mlx5_rdma_netdev_alloc);
-void mlx5_rdma_netdev_free(struct net_device *netdev)
+static void mlx5_rdma_netdev_free(struct net_device *netdev)
{
struct mlx5e_priv *priv = mlx5i_epriv(netdev);
const struct mlx5e_profile *profile = priv->profile;
--
2.11.0
^ permalink raw reply related
* [pull request][net-next 0/5] Mellanox, mlx5 updates 2017-04-22
From: Saeed Mahameed @ 2017-04-22 18:45 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Roi Dayan, Stephen Hemminger, Saeed Mahameed
Hi Dave,
This series contains some updates to mlx5 driver.
Sparse and compiler warnings fixes from Stephen Hemminger.
>From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling
E-Switch encapsulation mode, this knob will enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.
Please pull and let me know if there's any problem.
Thanks,
Saeed.
---
The following changes since commit fb796707d7a6c9b24fdf80b9b4f24fa5ffcf0ec5:
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-04-21 20:23:53 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2017-04-22
for you to fetch changes up to 8bf3198a5e394ed6815aeb8fedaf49722986bbd3:
mlx5: fix warning about missing prototype (2017-04-22 20:26:42 +0300)
Or Gerlitz (1):
net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode
Roi Dayan (2):
net/devlink: Add E-Switch encapsulation control
net/mlx5: E-Switch, Add control for encapsulation
Stephen Hemminger (2):
mlx5: hide unused functions
mlx5: fix warning about missing prototype
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 5 +
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 3 +
.../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 132 +++++++++++++++++----
drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 24 ++--
drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +
include/net/devlink.h | 2 +
include/uapi/linux/devlink.h | 7 ++
net/core/devlink.c | 26 +++-
10 files changed, 167 insertions(+), 36 deletions(-)
^ permalink raw reply
* [net-next 2/5] net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode
From: Saeed Mahameed @ 2017-04-22 18:45 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Roi Dayan, Stephen Hemminger, Saeed Mahameed
In-Reply-To: <20170422184507.26569-1-saeedm@mellanox.com>
From: Or Gerlitz <ogerlitz@mellanox.com>
Refactor the creation of the fast path FDB table that holds the
offloaded rules in SRIOV switchdev mode into it's own function.
This will be used in the next patch to be able and re-create the
table under different settings without going through legacy mode.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
.../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 69 +++++++++++++++-------
1 file changed, 49 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 992b380d36be..ce3a2c040706 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -426,31 +426,21 @@ static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
return err;
}
-#define MAX_PF_SQ 256
#define ESW_OFFLOADS_NUM_GROUPS 4
-static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
+static int esw_create_offloads_fast_fdb_table(struct mlx5_eswitch *esw)
{
- int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
- struct mlx5_flow_table_attr ft_attr = {};
- int table_size, ix, esw_size, err = 0;
struct mlx5_core_dev *dev = esw->dev;
struct mlx5_flow_namespace *root_ns;
struct mlx5_flow_table *fdb = NULL;
- struct mlx5_flow_group *g;
- u32 *flow_group_in;
- void *match_criteria;
+ int esw_size, err = 0;
u32 flags = 0;
- flow_group_in = mlx5_vzalloc(inlen);
- if (!flow_group_in)
- return -ENOMEM;
-
root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
if (!root_ns) {
esw_warn(dev, "Failed to get FDB flow namespace\n");
err = -EOPNOTSUPP;
- goto ns_err;
+ goto out;
}
esw_debug(dev, "Create offloads FDB table, min (max esw size(2^%d), max counters(%d)*groups(%d))\n",
@@ -471,10 +461,49 @@ static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
if (IS_ERR(fdb)) {
err = PTR_ERR(fdb);
esw_warn(dev, "Failed to create Fast path FDB Table err %d\n", err);
- goto fast_fdb_err;
+ goto out;
}
esw->fdb_table.fdb = fdb;
+out:
+ return err;
+}
+
+static void esw_destroy_offloads_fast_fdb_table(struct mlx5_eswitch *esw)
+{
+ mlx5_destroy_flow_table(esw->fdb_table.fdb);
+}
+
+#define MAX_PF_SQ 256
+
+static int esw_create_offloads_fdb_tables(struct mlx5_eswitch *esw, int nvports)
+{
+ int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
+ struct mlx5_flow_table_attr ft_attr = {};
+ struct mlx5_core_dev *dev = esw->dev;
+ struct mlx5_flow_namespace *root_ns;
+ struct mlx5_flow_table *fdb = NULL;
+ int table_size, ix, err = 0;
+ struct mlx5_flow_group *g;
+ void *match_criteria;
+ u32 *flow_group_in;
+
+ esw_debug(esw->dev, "Create offloads FDB Tables\n");
+ flow_group_in = mlx5_vzalloc(inlen);
+ if (!flow_group_in)
+ return -ENOMEM;
+
+ root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
+ if (!root_ns) {
+ esw_warn(dev, "Failed to get FDB flow namespace\n");
+ err = -EOPNOTSUPP;
+ goto ns_err;
+ }
+
+ err = esw_create_offloads_fast_fdb_table(esw);
+ if (err)
+ goto fast_fdb_err;
+
table_size = nvports + MAX_PF_SQ + 1;
ft_attr.max_fte = table_size;
@@ -545,18 +574,18 @@ static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
return err;
}
-static void esw_destroy_offloads_fdb_table(struct mlx5_eswitch *esw)
+static void esw_destroy_offloads_fdb_tables(struct mlx5_eswitch *esw)
{
if (!esw->fdb_table.fdb)
return;
- esw_debug(esw->dev, "Destroy offloads FDB Table\n");
+ esw_debug(esw->dev, "Destroy offloads FDB Tables\n");
mlx5_del_flow_rules(esw->fdb_table.offloads.miss_rule);
mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp);
mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp);
mlx5_destroy_flow_table(esw->fdb_table.offloads.fdb);
- mlx5_destroy_flow_table(esw->fdb_table.fdb);
+ esw_destroy_offloads_fast_fdb_table(esw);
}
static int esw_create_offloads_table(struct mlx5_eswitch *esw)
@@ -716,7 +745,7 @@ int esw_offloads_init(struct mlx5_eswitch *esw, int nvports)
mlx5_remove_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
mlx5_dev_list_unlock();
- err = esw_create_offloads_fdb_table(esw, nvports);
+ err = esw_create_offloads_fdb_tables(esw, nvports);
if (err)
goto create_fdb_err;
@@ -753,7 +782,7 @@ int esw_offloads_init(struct mlx5_eswitch *esw, int nvports)
esw_destroy_offloads_table(esw);
create_ft_err:
- esw_destroy_offloads_fdb_table(esw);
+ esw_destroy_offloads_fdb_tables(esw);
create_fdb_err:
/* enable back PF RoCE */
@@ -799,7 +828,7 @@ void esw_offloads_cleanup(struct mlx5_eswitch *esw, int nvports)
esw_destroy_vport_rx_group(esw);
esw_destroy_offloads_table(esw);
- esw_destroy_offloads_fdb_table(esw);
+ esw_destroy_offloads_fdb_tables(esw);
}
static int esw_mode_from_devlink(u16 mode, u16 *mlx5_mode)
--
2.11.0
^ permalink raw reply related
* [net-next 5/5] mlx5: fix warning about missing prototype
From: Saeed Mahameed @ 2017-04-22 18:45 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Roi Dayan, Stephen Hemminger,
Stephen Hemminger, Saeed Mahameed
In-Reply-To: <20170422184507.26569-1-saeedm@mellanox.com>
From: Stephen Hemminger <stephen@networkplumber.org>
Fix sparse warning about missing prototypes. The rx/tx code path
defines functions with prototypes in ipoib.h.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 43308243f519..ae66fad98244 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -39,6 +39,7 @@
#include "en.h"
#include "en_tc.h"
#include "eswitch.h"
+#include "ipoib.h"
static inline bool mlx5e_rx_hw_stamp(struct mlx5e_tstamp *tstamp)
{
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index dda7db503043..ab3bb026ff9e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -33,6 +33,7 @@
#include <linux/tcp.h>
#include <linux/if_vlan.h>
#include "en.h"
+#include "ipoib.h"
#define MLX5E_SQ_NOPS_ROOM MLX5_SEND_WQE_MAX_WQEBBS
#define MLX5E_SQ_STOP_ROOM (MLX5_SEND_WQE_MAX_WQEBBS +\
--
2.11.0
^ permalink raw reply related
* [net-next 3/5] net/mlx5: E-Switch, Add control for encapsulation
From: Saeed Mahameed @ 2017-04-22 18:45 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Roi Dayan, Stephen Hemminger, Saeed Mahameed
In-Reply-To: <20170422184507.26569-1-saeedm@mellanox.com>
From: Roi Dayan <roid@mellanox.com>
Implement the devlink e-switch encapsulation control set and get
callbacks. Apply the value set by the user on the switchdev offloads
mode when creating the fast FDB table where offloaded rules will be set.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 5 ++
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 3 ++
.../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 63 +++++++++++++++++++++-
drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +
4 files changed, 71 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index b3281d1118b3..21bed3c3334d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1806,6 +1806,11 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
esw->enabled_vports = 0;
esw->mode = SRIOV_NONE;
esw->offloads.inline_mode = MLX5_INLINE_MODE_NONE;
+ if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) &&
+ MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap))
+ esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_BASIC;
+ else
+ esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE;
dev->priv.eswitch = esw;
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 1f56ed9f5a6f..1e7f21be1233 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -210,6 +210,7 @@ struct mlx5_esw_offload {
DECLARE_HASHTABLE(encap_tbl, 8);
u8 inline_mode;
u64 num_flows;
+ u8 encap;
};
struct mlx5_eswitch {
@@ -322,6 +323,8 @@ int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode);
int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode);
int mlx5_devlink_eswitch_inline_mode_get(struct devlink *devlink, u8 *mode);
int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode);
+int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink, u8 encap);
+int mlx5_devlink_eswitch_encap_mode_get(struct devlink *devlink, u8 *encap);
void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
int vport_index,
struct mlx5_eswitch_rep *rep);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index ce3a2c040706..189d24dbd3e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -450,8 +450,7 @@ static int esw_create_offloads_fast_fdb_table(struct mlx5_eswitch *esw)
esw_size = min_t(int, MLX5_CAP_GEN(dev, max_flow_counter) * ESW_OFFLOADS_NUM_GROUPS,
1 << MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size));
- if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) &&
- MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap))
+ if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE)
flags |= MLX5_FLOW_TABLE_TUNNEL_EN;
fdb = mlx5_create_auto_grouped_flow_table(root_ns, FDB_FAST_PATH,
@@ -1045,6 +1044,66 @@ int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode)
return 0;
}
+int mlx5_devlink_eswitch_encap_mode_set(struct devlink *devlink, u8 encap)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+ struct mlx5_eswitch *esw = dev->priv.eswitch;
+ int err;
+
+ if (!MLX5_CAP_GEN(dev, vport_group_manager))
+ return -EOPNOTSUPP;
+
+ if (esw->mode == SRIOV_NONE)
+ return -EOPNOTSUPP;
+
+ if (encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE &&
+ (!MLX5_CAP_ESW_FLOWTABLE_FDB(dev, encap) ||
+ !MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap)))
+ return -EOPNOTSUPP;
+
+ if (encap && encap != DEVLINK_ESWITCH_ENCAP_MODE_BASIC)
+ return -EOPNOTSUPP;
+
+ if (esw->mode == SRIOV_LEGACY) {
+ esw->offloads.encap = encap;
+ return 0;
+ }
+
+ if (esw->offloads.encap == encap)
+ return 0;
+
+ if (esw->offloads.num_flows > 0) {
+ esw_warn(dev, "Can't set encapsulation when flows are configured\n");
+ return -EOPNOTSUPP;
+ }
+
+ esw_destroy_offloads_fast_fdb_table(esw);
+
+ esw->offloads.encap = encap;
+ err = esw_create_offloads_fast_fdb_table(esw);
+ if (err) {
+ esw_warn(esw->dev, "Failed re-creating fast FDB table, err %d\n", err);
+ esw->offloads.encap = !encap;
+ (void) esw_create_offloads_fast_fdb_table(esw);
+ }
+ return err;
+}
+
+int mlx5_devlink_eswitch_encap_mode_get(struct devlink *devlink, u8 *encap)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+ struct mlx5_eswitch *esw = dev->priv.eswitch;
+
+ if (!MLX5_CAP_GEN(dev, vport_group_manager))
+ return -EOPNOTSUPP;
+
+ if (esw->mode == SRIOV_NONE)
+ return -EOPNOTSUPP;
+
+ *encap = esw->offloads.encap;
+ return 0;
+}
+
void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
int vport_index,
struct mlx5_eswitch_rep *__rep)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 9c2bec732af9..bde91a8bec73 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1280,6 +1280,8 @@ static const struct devlink_ops mlx5_devlink_ops = {
.eswitch_mode_get = mlx5_devlink_eswitch_mode_get,
.eswitch_inline_mode_set = mlx5_devlink_eswitch_inline_mode_set,
.eswitch_inline_mode_get = mlx5_devlink_eswitch_inline_mode_get,
+ .eswitch_encap_mode_set = mlx5_devlink_eswitch_encap_mode_set,
+ .eswitch_encap_mode_get = mlx5_devlink_eswitch_encap_mode_get,
#endif
};
--
2.11.0
^ permalink raw reply related
* [net-next 1/5] net/devlink: Add E-Switch encapsulation control
From: Saeed Mahameed @ 2017-04-22 18:45 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Roi Dayan, Stephen Hemminger, Saeed Mahameed
In-Reply-To: <20170422184507.26569-1-saeedm@mellanox.com>
From: Roi Dayan <roid@mellanox.com>
This is an e-switch global knob to enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.
The actual encap/decap is carried out (along with the matching and other actions)
per offloaded e-switch rules, e.g as done when offloading the TC tunnel key action.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
include/net/devlink.h | 2 ++
include/uapi/linux/devlink.h | 7 +++++++
net/core/devlink.c | 26 +++++++++++++++++++++++---
3 files changed, 32 insertions(+), 3 deletions(-)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 24de13f8c94f..ed7687bbf5d0 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -268,6 +268,8 @@ struct devlink_ops {
int (*eswitch_mode_set)(struct devlink *devlink, u16 mode);
int (*eswitch_inline_mode_get)(struct devlink *devlink, u8 *p_inline_mode);
int (*eswitch_inline_mode_set)(struct devlink *devlink, u8 inline_mode);
+ int (*eswitch_encap_mode_get)(struct devlink *devlink, u8 *p_encap_mode);
+ int (*eswitch_encap_mode_set)(struct devlink *devlink, u8 encap_mode);
};
static inline void *devlink_priv(struct devlink *devlink)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index b47bee277347..b0e807ac53bb 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -119,6 +119,11 @@ enum devlink_eswitch_inline_mode {
DEVLINK_ESWITCH_INLINE_MODE_TRANSPORT,
};
+enum devlink_eswitch_encap_mode {
+ DEVLINK_ESWITCH_ENCAP_MODE_NONE,
+ DEVLINK_ESWITCH_ENCAP_MODE_BASIC,
+};
+
enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC,
@@ -195,6 +200,8 @@ enum devlink_attr {
DEVLINK_ATTR_PAD,
+ DEVLINK_ATTR_ESWITCH_ENCAP_MODE, /* u8 */
+
/* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 0afac5800b57..b0b87a292e7c 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1397,10 +1397,10 @@ static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink,
u32 seq, int flags)
{
const struct devlink_ops *ops = devlink->ops;
+ u8 inline_mode, encap_mode;
void *hdr;
int err = 0;
u16 mode;
- u8 inline_mode;
hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
if (!hdr)
@@ -1429,6 +1429,15 @@ static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink,
goto nla_put_failure;
}
+ if (ops->eswitch_encap_mode_get) {
+ err = ops->eswitch_encap_mode_get(devlink, &encap_mode);
+ if (err)
+ goto nla_put_failure;
+ err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_ENCAP_MODE, encap_mode);
+ if (err)
+ goto nla_put_failure;
+ }
+
genlmsg_end(msg, hdr);
return 0;
@@ -1468,9 +1477,9 @@ static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb,
{
struct devlink *devlink = info->user_ptr[0];
const struct devlink_ops *ops = devlink->ops;
- u16 mode;
- u8 inline_mode;
+ u8 inline_mode, encap_mode;
int err = 0;
+ u16 mode;
if (!ops)
return -EOPNOTSUPP;
@@ -1493,6 +1502,16 @@ static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb,
if (err)
return err;
}
+
+ if (info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]) {
+ if (!ops->eswitch_encap_mode_set)
+ return -EOPNOTSUPP;
+ encap_mode = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_ENCAP_MODE]);
+ err = ops->eswitch_encap_mode_set(devlink, encap_mode);
+ if (err)
+ return err;
+ }
+
return 0;
}
@@ -2190,6 +2209,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 },
[DEVLINK_ATTR_ESWITCH_MODE] = { .type = NLA_U16 },
[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 },
+ [DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = { .type = NLA_U8 },
[DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type = NLA_U8 },
};
--
2.11.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox