* Re: [PATCH v2 iproute2-next 4/6] rdma: Add CQ resource tracking information
From: Leon Romanovsky @ 2018-03-13 19:05 UTC (permalink / raw)
To: Steve Wise; +Cc: dsahern, stephen, netdev, linux-rdma
In-Reply-To: <9bb3491baaa8c8253d166aced89813bd06c51b01.1520020530.git.swise@opengridcomputing.com>
[-- Attachment #1: Type: text/plain, Size: 752 bytes --]
On Tue, Feb 27, 2018 at 08:07:11AM -0800, Steve Wise wrote:
> Sample output:
>
> # rdma resource show cq
> link cxgb4_0/- cqe 46 users 2 pid 30503 comm rping
> link cxgb4_0/- cqe 46 users 2 pid 30498 comm rping
> link mlx4_0/- cqe 63 users 2 pid 30494 comm rping
> link mlx4_0/- cqe 63 users 2 pid 30489 comm rping
> link mlx4_0/- cqe 1023 users 2 poll_ctx WORKQUEUE pid 0 comm [ib_core]
>
> # rdma resource show cq pid 30489
> link mlx4_0/- cqe 63 users 2 pid 30489 comm rping
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
> rdma/res.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> rdma/utils.c | 5 +++
> 2 files changed, 141 insertions(+)
>
Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* [PATCH v2 net] kcm: lock lower socket in kcm_attach
From: Tom Herbert @ 2018-03-13 19:01 UTC (permalink / raw)
To: davem; +Cc: netdev, ebiggers3, Tom Herbert
Need to lock lower socket in order to provide mutual exclusion
with kcm_unattach.
v2: Add Reported-by for syzbot
Fixes: ab7ac4eb9832e32a09f4e804 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+ea75c0ffcd353d32515f064aaebefc5279e6161e@syzkaller.appspotmail.com
Signed-off-by: Tom Herbert <tom@quantonium.net>
---
net/kcm/kcmsock.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index f297d53a11aa..34355fd19f27 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1381,24 +1381,32 @@ static int kcm_attach(struct socket *sock, struct socket *csock,
.parse_msg = kcm_parse_func_strparser,
.read_sock_done = kcm_read_sock_done,
};
- int err;
+ int err = 0;
csk = csock->sk;
if (!csk)
return -EINVAL;
+ lock_sock(csk);
+
/* Only allow TCP sockets to be attached for now */
if ((csk->sk_family != AF_INET && csk->sk_family != AF_INET6) ||
- csk->sk_protocol != IPPROTO_TCP)
- return -EOPNOTSUPP;
+ csk->sk_protocol != IPPROTO_TCP) {
+ err = -EOPNOTSUPP;
+ goto out;
+ }
/* Don't allow listeners or closed sockets */
- if (csk->sk_state == TCP_LISTEN || csk->sk_state == TCP_CLOSE)
- return -EOPNOTSUPP;
+ if (csk->sk_state == TCP_LISTEN || csk->sk_state == TCP_CLOSE) {
+ err = -EOPNOTSUPP;
+ goto out;
+ }
psock = kmem_cache_zalloc(kcm_psockp, GFP_KERNEL);
- if (!psock)
- return -ENOMEM;
+ if (!psock) {
+ err = -ENOMEM;
+ goto out;
+ }
psock->mux = mux;
psock->sk = csk;
@@ -1407,7 +1415,7 @@ static int kcm_attach(struct socket *sock, struct socket *csock,
err = strp_init(&psock->strp, csk, &cb);
if (err) {
kmem_cache_free(kcm_psockp, psock);
- return err;
+ goto out;
}
write_lock_bh(&csk->sk_callback_lock);
@@ -1419,7 +1427,8 @@ static int kcm_attach(struct socket *sock, struct socket *csock,
write_unlock_bh(&csk->sk_callback_lock);
strp_done(&psock->strp);
kmem_cache_free(kcm_psockp, psock);
- return -EALREADY;
+ err = -EALREADY;
+ goto out;
}
psock->save_data_ready = csk->sk_data_ready;
@@ -1455,7 +1464,10 @@ static int kcm_attach(struct socket *sock, struct socket *csock,
/* Schedule RX work in case there are already bytes queued */
strp_check_rcv(&psock->strp);
- return 0;
+out:
+ release_sock(csk);
+
+ return err;
}
static int kcm_attach_ioctl(struct socket *sock, struct kcm_attach *info)
@@ -1507,6 +1519,7 @@ static void kcm_unattach(struct kcm_psock *psock)
if (WARN_ON(psock->rx_kcm)) {
write_unlock_bh(&csk->sk_callback_lock);
+ release_sock(csk);
return;
}
--
2.11.0
^ permalink raw reply related
* Re: [bug, bisected] pfifo_fast causes packet reordering
From: Dave Taht @ 2018-03-13 18:35 UTC (permalink / raw)
To: Jakob Unterwurzacher
Cc: netdev, linux-kernel, John Fastabend, David S. Miller,
linux-can@vger.kernel.org, Martin Elshuber
In-Reply-To: <946dbe16-a2eb-eca8-8069-468859ccc78d@theobroma-systems.com>
On Tue, Mar 13, 2018 at 11:24 AM, Jakob Unterwurzacher
<jakob.unterwurzacher@theobroma-systems.com> wrote:
> During stress-testing our "ucan" USB/CAN adapter SocketCAN driver on Linux
> v4.16-rc4-383-ged58d66f60b3 we observed that a small fraction of packets are
> delivered out-of-order.
>
> We have tracked the problem down to the driver interface level, and it seems
> that the driver's net_device_ops.ndo_start_xmit() function gets the packets
> handed over in the wrong order.
>
> This behavior was not observed on Linux v4.15 and I have bisected the
> problem down to this patch:
>
>> commit c5ad119fb6c09b0297446be05bd66602fa564758
>> Author: John Fastabend <john.fastabend@gmail.com>
>> Date: Thu Dec 7 09:58:19 2017 -0800
>>
>> net: sched: pfifo_fast use skb_array
>>
>> This converts the pfifo_fast qdisc to use the skb_array data structure
>> and set the lockless qdisc bit. pfifo_fast is the first qdisc to
>> support
>> the lockless bit that can be a child of a qdisc requiring locking. So
>> we add logic to clear the lock bit on initialization in these cases
>> when
>> the qdisc graft operation occurs.
>>
>> This also removes the logic used to pick the next band to dequeue from
>> and instead just checks a per priority array for packets from top
>> priority
>> to lowest. This might need to be a bit more clever but seems to work
>> for now.
>>
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
> The patch does not revert cleanly, but moving to one commit earlier makes
> the problem go away.
>
> Selecting the "fq" scheduler instead of "pfifo_fast" makes the problem go
> away as well.
I am of course, a fan of obsoleting pfifo_fast. There's no good reason
for it anymore.
>
> Is this an unintended side-effect of the patch or is there something the
> driver has to do to request in-order delivery?
>
> Thanks,
> Jakob
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
^ permalink raw reply
* [bug, bisected] pfifo_fast causes packet reordering
From: Jakob Unterwurzacher @ 2018-03-13 18:24 UTC (permalink / raw)
To: netdev, linux-kernel, John Fastabend, David S. Miller
Cc: linux-can@vger.kernel.org, Martin Elshuber
During stress-testing our "ucan" USB/CAN adapter SocketCAN driver on
Linux v4.16-rc4-383-ged58d66f60b3 we observed that a small fraction of
packets are delivered out-of-order.
We have tracked the problem down to the driver interface level, and it
seems that the driver's net_device_ops.ndo_start_xmit() function gets
the packets handed over in the wrong order.
This behavior was not observed on Linux v4.15 and I have bisected the
problem down to this patch:
> commit c5ad119fb6c09b0297446be05bd66602fa564758
> Author: John Fastabend <john.fastabend@gmail.com>
> Date: Thu Dec 7 09:58:19 2017 -0800
>
> net: sched: pfifo_fast use skb_array
>
> This converts the pfifo_fast qdisc to use the skb_array data structure
> and set the lockless qdisc bit. pfifo_fast is the first qdisc to support
> the lockless bit that can be a child of a qdisc requiring locking. So
> we add logic to clear the lock bit on initialization in these cases when
> the qdisc graft operation occurs.
>
> This also removes the logic used to pick the next band to dequeue from
> and instead just checks a per priority array for packets from top priority
> to lowest. This might need to be a bit more clever but seems to work
> for now.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
The patch does not revert cleanly, but moving to one commit earlier
makes the problem go away.
Selecting the "fq" scheduler instead of "pfifo_fast" makes the problem
go away as well.
Is this an unintended side-effect of the patch or is there something the
driver has to do to request in-order delivery?
Thanks,
Jakob
^ permalink raw reply
* Re: [PATCH net-next] net: fix sysctl_fb_tunnels_only_for_init_net link error
From: David Miller @ 2018-03-13 18:15 UTC (permalink / raw)
To: arnd; +Cc: edumazet, jiri, daniel, jakub.kicinski, netdev, linux-kernel
In-Reply-To: <20180313114515.2037554-1-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Tue, 13 Mar 2018 12:44:53 +0100
> The new variable is only available when CONFIG_SYSCTL is enabled,
> otherwise we get a link error:
>
> net/ipv4/ip_tunnel.o: In function `ip_tunnel_init_net':
> ip_tunnel.c:(.text+0x278b): undefined reference to `sysctl_fb_tunnels_only_for_init_net'
> net/ipv6/sit.o: In function `sit_init_net':
> sit.c:(.init.text+0x4c): undefined reference to `sysctl_fb_tunnels_only_for_init_net'
> net/ipv6/ip6_tunnel.o: In function `ip6_tnl_init_net':
> ip6_tunnel.c:(.init.text+0x39): undefined reference to `sysctl_fb_tunnels_only_for_init_net'
>
> This adds an extra condition, keeping the traditional behavior when
> CONFIG_SYSCTL is disabled.
>
> Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH v2 iproute2-next 2/6] rdma: initialize the rd struct
From: Leon Romanovsky @ 2018-03-13 18:11 UTC (permalink / raw)
To: Steve Wise; +Cc: dsahern, stephen, netdev, linux-rdma
In-Reply-To: <0ba0136867d1d296e4d1855cac322fdd88025ce3.1520020530.git.swise@opengridcomputing.com>
[-- Attachment #1: Type: text/plain, Size: 405 bytes --]
On Tue, Feb 27, 2018 at 08:07:00AM -0800, Steve Wise wrote:
> Initialize the rd struct so port_idx is 0 unless set otherwise.
> Otherwise, strict_port queries end up passing an uninitialized PORT
> nlattr.
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
> rdma/rdma.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v2 iproute2-next 3/6] rdma: Add CM_ID resource tracking information
From: Leon Romanovsky @ 2018-03-13 18:07 UTC (permalink / raw)
To: Steve Wise; +Cc: dsahern, stephen, netdev, linux-rdma
In-Reply-To: <743dc7a5306f9b3368fcd4c143cdd822250444a6.1520020530.git.swise@opengridcomputing.com>
[-- Attachment #1: Type: text/plain, Size: 1711 bytes --]
On Tue, Feb 27, 2018 at 08:07:05AM -0800, Steve Wise wrote:
> Sample output:
>
> # rdma resource
> 2: cxgb4_0: pd 5 cq 2 qp 2 cm_id 3 mr 7
> 3: mlx4_0: pd 7 cq 3 qp 3 cm_id 3 mr 7
>
> # rdma resource show cm_id
> link cxgb4_0/- lqpn 0 qp-type RC state LISTEN ps TCP pid 30485 comm rping src-addr 0.0.0.0:7174
> link cxgb4_0/2 lqpn 1048 qp-type RC state CONNECT ps TCP pid 30503 comm rping src-addr 172.16.2.1:7174 dst-addr 172.16.2.1:38246
> link cxgb4_0/2 lqpn 1040 qp-type RC state CONNECT ps TCP pid 30498 comm rping src-addr 172.16.2.1:38246 dst-addr 172.16.2.1:7174
> link mlx4_0/- lqpn 0 qp-type RC state LISTEN ps TCP pid 30485 comm rping src-addr 0.0.0.0:7174
> link mlx4_0/1 lqpn 539 qp-type RC state CONNECT ps TCP pid 30494 comm rping src-addr 172.16.99.1:7174 dst-addr 172.16.99.1:43670
> link mlx4_0/1 lqpn 538 qp-type RC state CONNECT ps TCP pid 30492 comm rping src-addr 172.16.99.1:43670 dst-addr 172.16.99.1:7174
>
> # rdma resource show cm_id dst-port 7174
> link cxgb4_0/2 lqpn 1040 qp-type RC state CONNECT ps TCP pid 30498 comm rping src-addr 172.16.2.1:38246 dst-addr 172.16.2.1:7174
> link mlx4_0/1 lqpn 538 qp-type RC state CONNECT ps TCP pid 30492 comm rping src-addr 172.16.99.1:43670 dst-addr 172.16.99.1:7174
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
> rdma/rdma.h | 2 +
> rdma/res.c | 258 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> rdma/utils.c | 5 ++
> 3 files changed, 264 insertions(+), 1 deletion(-)
>
Steve, can you please add man pages update as a followup to this series?
Especially the fact that you are providing dst-port filter option should
be mentioned.
Thanks,
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [RFC PATCH v0 2/2] skbuff: Notify errors with sk_error_report()
From: Eric Dumazet @ 2018-03-13 17:27 UTC (permalink / raw)
To: Vinicius Costa Gomes, Eric Dumazet, netdev; +Cc: randy.e.witt, davem
In-Reply-To: <87d107yjry.fsf@intel.com>
On 03/13/2018 10:20 AM, Vinicius Costa Gomes wrote:
> Hi
> Cool. Will send a non-RFC version then.
>
> Would the changes to txtimestamp selftest be helpful?
Yes, since it could have spotted the bug earlier.
Thanks.
^ permalink raw reply
* Re: [RFC PATCH v0 2/2] skbuff: Notify errors with sk_error_report()
From: Vinicius Costa Gomes @ 2018-03-13 17:20 UTC (permalink / raw)
To: Eric Dumazet, netdev; +Cc: randy.e.witt, davem
In-Reply-To: <ee7a00a6-a2ce-ea0b-005f-eb3c46efdcd4@gmail.com>
Hi
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On 03/12/2018 04:10 PM, Vinicius Costa Gomes wrote:
>> When errors are enqueued to the error queue via sock_queue_err_skb()
>> function, it is possible that the correct application is not notified.
>
> Your patch makes sense, thanks.
Cool. Will send a non-RFC version then.
Would the changes to txtimestamp selftest be helpful?
Cheers,
--
Vinicius
^ permalink raw reply
* Re: net/wireless: fix spaces and grammar copy/paste in vendor Kconfig help text
From: Kalle Valo @ 2018-03-13 16:52 UTC (permalink / raw)
To: Randy Dunlap
Cc: linux-wireless, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1627081c-ef0c-1d55-ca2e-898f22b98d72-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> From: Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
>
> Lots of the wireless driver vendor Kconfig symol help text says
> "questions about cards." (2 spaces between "about" and "cards")
>
> Besides dropping one of those spaces, it also needs some other word
> inserted there. Instead of putting each vendor's name there, I chose
> to say "these" cards in all of the Kconfig help text.
>
> Cc: Kalle Valo <kvalo-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Randy Dunlap <rdunlap-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Patch applied to wireless-drivers-next.git, thanks.
a6cf02e6485a net/wireless: fix spaces and grammar copy/paste in vendor Kconfig help text
--
https://patchwork.kernel.org/patch/10255933/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* Re: [RESEND] rsi: Remove stack VLA usage
From: Kalle Valo @ 2018-03-13 16:52 UTC (permalink / raw)
To: Tobin C. Harding
Cc: Tobin C. Harding, kernel-hardening, linux-kernel, netdev,
linux-wireless, Tycho Andersen, Kees Cook
In-Reply-To: <1520819022-15238-1-git-send-email-me@tobin.cc>
"Tobin C. Harding" <me@tobin.cc> wrote:
> The kernel would like to have all stack VLA usage removed[1]. rsi uses
> a VLA based on 'blksize'. Elsewhere in the SDIO code maximum block size
> is defined using a magic number. We can use a pre-processor defined
> constant and declare the array to maximum size. We add a check before
> accessing the array in case of programmer error.
>
> [1]: https://lkml.org/lkml/2018/3/7/621
>
> Signed-off-by: Tobin C. Harding <me@tobin.cc>
There were conflicts. Can you rebase on top of wireless-drivers-next and
resend, please?
Recorded preimage for 'drivers/net/wireless/rsi/rsi_91x_sdio.c'
error: Failed to merge in the changes.
Applying: rsi: Remove stack VLA usage
Using index info to reconstruct a base tree...
M drivers/net/wireless/rsi/rsi_91x_hal.c
M drivers/net/wireless/rsi/rsi_91x_sdio.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/net/wireless/rsi/rsi_91x_sdio.c
CONFLICT (content): Merge conflict in drivers/net/wireless/rsi/rsi_91x_sdio.c
Auto-merging drivers/net/wireless/rsi/rsi_91x_hal.c
Patch failed at 0001 rsi: Remove stack VLA usage
The copy of the patch that failed is found in: .git/rebase-apply/patch
Patch set to Changes Requested.
--
https://patchwork.kernel.org/patch/10274983/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* Re: [RESEND] rsi: Remove stack VLA usage
From: Kalle Valo @ 2018-03-13 16:52 UTC (permalink / raw)
To: Tobin C. Harding
Cc: Tobin C. Harding, kernel-hardening, linux-kernel, netdev,
linux-wireless, Tycho Andersen, Kees Cook
In-Reply-To: <1520819022-15238-1-git-send-email-me@tobin.cc>
"Tobin C. Harding" <me@tobin.cc> wrote:
> The kernel would like to have all stack VLA usage removed[1]. rsi uses
> a VLA based on 'blksize'. Elsewhere in the SDIO code maximum block size
> is defined using a magic number. We can use a pre-processor defined
> constant and declare the array to maximum size. We add a check before
> accessing the array in case of programmer error.
>
> [1]: https://lkml.org/lkml/2018/3/7/621
>
> Signed-off-by: Tobin C. Harding <me@tobin.cc>
There were conflicts. Can you rebase on top of wireless-drivers-next and
resend, please?
Recorded preimage for 'drivers/net/wireless/rsi/rsi_91x_sdio.c'
error: Failed to merge in the changes.
Applying: rsi: Remove stack VLA usage
Using index info to reconstruct a base tree...
M drivers/net/wireless/rsi/rsi_91x_hal.c
M drivers/net/wireless/rsi/rsi_91x_sdio.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/net/wireless/rsi/rsi_91x_sdio.c
CONFLICT (content): Merge conflict in drivers/net/wireless/rsi/rsi_91x_sdio.c
Auto-merging drivers/net/wireless/rsi/rsi_91x_hal.c
Patch failed at 0001 rsi: Remove stack VLA usage
The copy of the patch that failed is found in: .git/rebase-apply/patch
Patch set to Changes Requested.
--
https://patchwork.kernel.org/patch/10274983/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* Re: [pci PATCH v5 3/4] ena: Migrate over to unmanaged SR-IOV support
From: Don Dutile @ 2018-03-13 16:17 UTC (permalink / raw)
To: David Woodhouse, Alexander Duyck
Cc: Bjorn Helgaas, Duyck, Alexander H, linux-pci, virtio-dev, kvm,
Netdev, Daly, Dan, LKML, linux-nvme, Keith Busch, netanel,
Maximilian Heyne, Wang, Liang-min, Rustad, Mark D,
Christoph Hellwig
In-Reply-To: <1520953463.20126.15.camel@infradead.org>
On 03/13/2018 11:04 AM, David Woodhouse wrote:
> On Tue, 2018-03-13 at 07:51 -0700, Alexander Duyck wrote:
>
>> Actually the suggestion I had from Don Dutile was that we should be
>> looking at creating a pci-stub like driver specifically for those type
>> of devices, but without the ability to arbitrarily assign devices.
>> Basically we have to white-list it in one device at a time for those
>> kind of things.
>
> It's still not clear what the point of that would be.
>
A PF-stub to do device-assignment(-like) ops preserves the current security model,
and allows one to add pci-quirks at a device-level as well -- when the usual ACS
structs aren't properly added for a device, which happens quite frequently -- which
retains that common workaround as well.
Yet-another-method for VF assignment w/o even a simple PF driver stub
created multiple failure cases/configs when we were hashing the multiple options
a few weeks ago.
It's just simpler to implement a PF stub w/VF enablement ... b/c it's simple...
>> If you have the device ID of the thing you wanted to have work with
>> pci-stub before I could look at putting together a quick driver and
>> adding it to this set.
>
> 1d0f:0053 would be an example.
>
^ permalink raw reply
* Re: [PATCH] netfilter: cttimeout: remove VLA usage
From: Joe Perches @ 2018-03-13 16:13 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Gustavo A. R. Silva, Jozsef Kadlecsik, Florian Westphal,
David S. Miller, netfilter-devel, coreteam, netdev, linux-kernel,
Kernel Hardening, Kees Cook, Gustavo A. R. Silva
In-Reply-To: <20180313145947.tpekwvyioaft5auc@salvia>
On Tue, 2018-03-13 at 15:59 +0100, Pablo Neira Ayuso wrote:
> On Mon, Mar 12, 2018 at 04:58:38PM -0700, Joe Perches wrote:
> > On Mon, 2018-03-12 at 18:14 -0500, Gustavo A. R. Silva wrote:
> > > In preparation to enabling -Wvla, remove VLA and replace it
> > > with dynamic memory allocation.
> > >
> > > From a security viewpoint, the use of Variable Length Arrays can be
> > > a vector for stack overflow attacks. Also, in general, as the code
> > > evolves it is easy to lose track of how big a VLA can get. Thus, we
> > > can end up having segfaults that are hard to debug.
> > >
> > > Also, fixed as part of the directive to remove all VLAs from
> >
> > []
> > > diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
> >
> > []
> > > @@ -51,19 +51,27 @@ ctnl_timeout_parse_policy(void *timeouts,
> > > const struct nf_conntrack_l4proto *l4proto,
> > > struct net *net, const struct nlattr *attr)
> > > {
> > > + struct nlattr **tb;
> > > int ret = 0;
> > >
> > > - if (likely(l4proto->ctnl_timeout.nlattr_to_obj)) {
> > > - struct nlattr *tb[l4proto->ctnl_timeout.nlattr_max+1];
> > > + if (!l4proto->ctnl_timeout.nlattr_to_obj)
> > > + return 0;
> >
> > Why not
> > if unlikely(!...)
>
> This is control plane code - not packet path - I think we should just
> let the compiler decide on this one, not really need to provide an
> explicit hint here.
I don't have an issue with that, but it should probably be
mentioned in the changelog as it's unrelated to VLA removal.
^ permalink raw reply
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
From: Yuval Shaia @ 2018-03-13 16:13 UTC (permalink / raw)
To: Liran Alon; +Cc: davem, netdev, linux-kernel, idan.brown
In-Reply-To: <1520953642-8145-1-git-send-email-liran.alon@oracle.com>
On Tue, Mar 13, 2018 at 05:07:22PM +0200, Liran Alon wrote:
> Before this commit, dev_forward_skb() always cleared packet's
> per-network-namespace info. Even if the packet doesn't cross
> network namespaces.
>
> The comment above dev_forward_skb() describes that this is done
> because the receiving device may be in another network namespace.
> However, this case can easily be tested for and therefore we can
> scrub packet's per-network-namespace info only when receiving device
> is indeed in another network namespace.
>
> Therefore, this commit changes ____dev_forward_skb() to tell
> skb_scrub_packet() that skb has crossed network-namespace only in case
> transmitting device (skb->dev) network namespace is different then
> receiving device (dev) network namespace.
>
> An example of a netdev that use skb_forward_skb() is veth.
> Thus, before this commit a packet transmitted from one veth peer to
> another when both veth peers are on same network namespace will lose
> it's skb->mark. The bug could easily be demonstrated by the following:
>
> ip netns add test
> ip netns exec test bash
> ip link add veth-a type veth peer name veth-b
> ip link set veth-a up
> ip link set veth-b up
> ip addr add dev veth-a 12.0.0.1/24
> tc qdisc add dev veth-a root handle 1 prio
> tc qdisc add dev veth-b ingress
> tc filter add dev veth-a parent 1: u32 match u32 0 0 action skbedit mark 1337
> tc filter add dev veth-b parent ffff: basic match 'meta(nf_mark eq 1337)' action simple "skb->mark 1337!"
> dmesg -C
> ping 12.0.0.2
> dmesg
>
> Before this change, the above will print nothing to dmesg.
> After this change, "skb->mark 1337!" will be printed as necessary.
Hi Liran,
>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
I did not earned the credits for SOB, only r-b.
Yuval
> ---
> include/linux/netdevice.h | 2 +-
> net/core/dev.c | 6 +++---
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 5eef6c8e2741..5908f1e31ee2 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3371,7 +3371,7 @@ static __always_inline int ____dev_forward_skb(struct net_device *dev,
> return NET_RX_DROP;
> }
>
> - skb_scrub_packet(skb, true);
> + skb_scrub_packet(skb, !net_eq(dev_net(dev), dev_net(skb->dev)));
> skb->priority = 0;
> return 0;
> }
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 2cedf520cb28..087787dd0a50 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1877,9 +1877,9 @@ int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
> * start_xmit function of one device into the receive queue
> * of another device.
> *
> - * The receiving device may be in another namespace, so
> - * we have to clear all information in the skb that could
> - * impact namespace isolation.
> + * The receiving device may be in another namespace.
> + * In that case, we have to clear all information in the
> + * skb that could impact namespace isolation.
> */
> int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
> {
> --
> 1.9.1
>
^ permalink raw reply
* Re: [RFC net-next 2/6] driver: net: bonding: allow registration of tc offload callbacks in bond
From: Or Gerlitz @ 2018-03-13 15:53 UTC (permalink / raw)
To: Jiri Pirko, Rabie Loulou, John Hurley
Cc: Jakub Kicinski, Simon Horman, Linux Netdev List, mlxsw,
Yevgeny Kliteynik, Paul Blakey
In-Reply-To: <CAJ3xEMicamC40mt6tkXyob8yHwmUT_T2SRoig7iTPu1ud8Arzw@mail.gmail.com>
On Tue, Mar 13, 2018 at 5:51 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
Sorry ppl, I added MLNX alias (ASAP_Direct_Dev@mellanox.com) which is
not open to outer posts,
please remove it from your replies, otherwise it will bump you back.. Or.
> On Wed, Mar 7, 2018 at 12:57 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Mon, Mar 05, 2018 at 02:28:30PM CET, john.hurley@netronome.com wrote:
>>>Allow drivers to register netdev callbacks for tc offload in linux bonds.
>>>If a netdev has registered and is a slave of a given bond, then any tc
>>>rules offloaded to the bond will be relayed to it if both the bond and the
>>>slave permit hw offload.
>
>>>Because the bond itself is not offloaded, just the rules, we don't care
>>>about whether the bond ports are on the same device or whether some of
>>>slaves are representor ports and some are not.
>
> John, I think we must design here for the case where the bond IS offloaded.
> E.g some sort of HW LAG. For example, the mlxsw driver supports
> LAG offload and support tcflower offload, we need to see how these
> two live together, mlx5 supports tcflower offload and we are working on
> bond offload, etc.
>
>>>+EXPORT_SYMBOL_GPL(tc_setup_cb_bond_register);
>>
>> Please, no "bond" specific calls from drivers. That would be wrong.
>> The idea behing block callbacks was that anyone who is interested could
>> register to receive those. In this case, slave device is interested.
>> So it should register to receive block callbacks in the same way as if
>> the block was directly on top of the slave device. The only thing you
>> need to handle is to propagate block bind/unbind from master down to the
>> slaves.
>
> Jiri,
>
> This sounds nice for the case where one install ingress tc rules on
> the bond (lets
> call them type 1, see next)
>
> One obstacle pointed by my colleague, Rabie, is that when the upper layer
> issues stat call on the filter, they will get two replies, this can confuse them
> and lead to wrong decisions (aging). I wonder if/how we can set a knob
> somewhere that unifies the stats (add packet/bytes, use the latest lastuse).
>
> Also, lets see what other rules have to be offloaded in that scheme
> (call them type 2/3/4)
> where one bonded two HW ports
>
> 2. bond being egress port of a rule
>
> TC rules for overlay networks scheme, e.g in NIC SRIOV
> scheme where one bonds the two uplink representors
>
> Starting with type 2, in our current NIC HW APIs we have to duplicate
> these rules
> into two rules set to HW:
>
> 2.1 VF rep --> uplink 0
> 2.2 VF rep --> uplink 1
>
> and we do that in the driver (add/del two HW rules, combine the stat
> results, etc)
>
> 3. ingress rule on VF rep port with shared tunnel device being the
> egress (encap)
> and where the routing of the underlay (tunnel) goes through LAG.
>
> in our case, this is like 2.1/2.2 above, offload two rules, combine stats
>
> 4. ingress rule shared tunnel device being the ingress and VF rep port
> being the egress (decap)
>
> this uses the egdev facility to be offloaded into the our driver, and
> then in the driver
> we will treat it like type 1, two rules need to be installed into HW,
> but now, we can't delegate them
> from the vxlan device b/c it has no direct connection with the bond.
>
> All to all, for the mlx5 use case, seems we have elegant solution only
> for type 1.
>
> I think we should do the elegant solution for the case where it applicable.
>
> In parallel if/when newer HW APIs are there such that type 2 and 3 can be set
> using one HW rule whose dest is the bond, we are good. As for type 4,
> need to see
> if/how it can be nicer.
>
> Or.
^ permalink raw reply
* Re: [RFC net-next 2/6] driver: net: bonding: allow registration of tc offload callbacks in bond
From: Or Gerlitz @ 2018-03-13 15:51 UTC (permalink / raw)
To: Jiri Pirko, Rabie Loulou, John Hurley
Cc: Jakub Kicinski, Simon Horman, Linux Netdev List, ASAP_Direct_Dev,
mlxsw
On Wed, Mar 7, 2018 at 12:57 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Mon, Mar 05, 2018 at 02:28:30PM CET, john.hurley@netronome.com wrote:
>>Allow drivers to register netdev callbacks for tc offload in linux bonds.
>>If a netdev has registered and is a slave of a given bond, then any tc
>>rules offloaded to the bond will be relayed to it if both the bond and the
>>slave permit hw offload.
>>Because the bond itself is not offloaded, just the rules, we don't care
>>about whether the bond ports are on the same device or whether some of
>>slaves are representor ports and some are not.
John, I think we must design here for the case where the bond IS offloaded.
E.g some sort of HW LAG. For example, the mlxsw driver supports
LAG offload and support tcflower offload, we need to see how these
two live together, mlx5 supports tcflower offload and we are working on
bond offload, etc.
>>+EXPORT_SYMBOL_GPL(tc_setup_cb_bond_register);
>
> Please, no "bond" specific calls from drivers. That would be wrong.
> The idea behing block callbacks was that anyone who is interested could
> register to receive those. In this case, slave device is interested.
> So it should register to receive block callbacks in the same way as if
> the block was directly on top of the slave device. The only thing you
> need to handle is to propagate block bind/unbind from master down to the
> slaves.
Jiri,
This sounds nice for the case where one install ingress tc rules on
the bond (lets
call them type 1, see next)
One obstacle pointed by my colleague, Rabie, is that when the upper layer
issues stat call on the filter, they will get two replies, this can confuse them
and lead to wrong decisions (aging). I wonder if/how we can set a knob
somewhere that unifies the stats (add packet/bytes, use the latest lastuse).
Also, lets see what other rules have to be offloaded in that scheme
(call them type 2/3/4)
where one bonded two HW ports
2. bond being egress port of a rule
TC rules for overlay networks scheme, e.g in NIC SRIOV
scheme where one bonds the two uplink representors
Starting with type 2, in our current NIC HW APIs we have to duplicate
these rules
into two rules set to HW:
2.1 VF rep --> uplink 0
2.2 VF rep --> uplink 1
and we do that in the driver (add/del two HW rules, combine the stat
results, etc)
3. ingress rule on VF rep port with shared tunnel device being the
egress (encap)
and where the routing of the underlay (tunnel) goes through LAG.
in our case, this is like 2.1/2.2 above, offload two rules, combine stats
4. ingress rule shared tunnel device being the ingress and VF rep port
being the egress (decap)
this uses the egdev facility to be offloaded into the our driver, and
then in the driver
we will treat it like type 1, two rules need to be installed into HW,
but now, we can't delegate them
from the vxlan device b/c it has no direct connection with the bond.
All to all, for the mlx5 use case, seems we have elegant solution only
for type 1.
I think we should do the elegant solution for the case where it applicable.
In parallel if/when newer HW APIs are there such that type 2 and 3 can be set
using one HW rule whose dest is the bond, we are good. As for type 4,
need to see
if/how it can be nicer.
Or.
^ permalink raw reply
* [PATCH net 0/1] net/iucv: fix 2018-03-13
From: Julian Wiedmann @ 2018-03-13 15:50 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
Stefan Raspl, Ursula Braun, Julian Wiedmann
Hi Dave,
please apply one patch for af_iucv, that fixes an error path during
initialization.
Thanks,
Julian
Arvind Yadav (1):
net/iucv: Free memory obtained by kzalloc
net/iucv/af_iucv.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--
2.13.5
^ permalink raw reply
* [PATCH net 1/1] net/iucv: Free memory obtained by kzalloc
From: Julian Wiedmann @ 2018-03-13 15:50 UTC (permalink / raw)
To: David Miller
Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20180313155006.88162-1-jwi@linux.vnet.ibm.com>
From: Arvind Yadav <arvind.yadav.cs@gmail.com>
Free memory by calling put_device(), if afiucv_iucv_init is not
successful.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
net/iucv/af_iucv.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 1e8cc7bcbca3..9e2643ab4ccb 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -2433,9 +2433,11 @@ static int afiucv_iucv_init(void)
af_iucv_dev->driver = &af_iucv_driver;
err = device_register(af_iucv_dev);
if (err)
- goto out_driver;
+ goto out_iucv_dev;
return 0;
+out_iucv_dev:
+ put_device(af_iucv_dev);
out_driver:
driver_unregister(&af_iucv_driver);
out_iucv:
--
2.13.5
^ permalink raw reply related
* [PATCH net-next 2/2] selftests: Add multipath tests for onlink flag
From: David Ahern @ 2018-03-13 15:40 UTC (permalink / raw)
To: netdev; +Cc: David Ahern
In-Reply-To: <20180313154010.31623-1-dsahern@gmail.com>
Add multipath tests for onlink flag: one test with onlink added to
both nexthops, then tests with onlink added to only 1 nexthop.
Signed-off-by: David Ahern <dsahern@gmail.com>
---
tools/testing/selftests/net/fib-onlink-tests.sh | 92 ++++++++++++++++++++++++-
1 file changed, 89 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/fib-onlink-tests.sh b/tools/testing/selftests/net/fib-onlink-tests.sh
index 06b1d7cc12cc..fcf0b589e652 100755
--- a/tools/testing/selftests/net/fib-onlink-tests.sh
+++ b/tools/testing/selftests/net/fib-onlink-tests.sh
@@ -56,7 +56,8 @@ TEST_NET6[2]=2001:db8:102
# connected gateway
CONGW[1]=169.254.1.254
-CONGW[2]=169.254.5.254
+CONGW[2]=169.254.3.254
+CONGW[3]=169.254.5.254
# recursive gateway
RECGW4[1]=169.254.11.254
@@ -232,6 +233,23 @@ run_ip()
log_test $? ${exp_rc} "${desc}"
}
+run_ip_mpath()
+{
+ local table="$1"
+ local prefix="$2"
+ local nh1="$3"
+ local nh2="$4"
+ local exp_rc="$5"
+ local desc="$6"
+
+ # dev arg may be empty
+ [ -n "${dev}" ] && dev="dev ${dev}"
+
+ run_cmd ip ro add table "${table}" "${prefix}"/32 \
+ nexthop via ${nh1} nexthop via ${nh2}
+ log_test $? ${exp_rc} "${desc}"
+}
+
valid_onlink_ipv4()
{
# - unicast connected, unicast recursive
@@ -243,13 +261,37 @@ valid_onlink_ipv4()
log_subsection "VRF ${VRF}"
- run_ip ${VRF_TABLE} ${TEST_NET4[2]}.1 ${CONGW[2]} ${NETIFS[p5]} 0 "unicast connected"
+ run_ip ${VRF_TABLE} ${TEST_NET4[2]}.1 ${CONGW[3]} ${NETIFS[p5]} 0 "unicast connected"
run_ip ${VRF_TABLE} ${TEST_NET4[2]}.2 ${RECGW4[2]} ${NETIFS[p5]} 0 "unicast recursive"
log_subsection "VRF device, PBR table"
- run_ip ${PBR_TABLE} ${TEST_NET4[2]}.3 ${CONGW[2]} ${NETIFS[p5]} 0 "unicast connected"
+ run_ip ${PBR_TABLE} ${TEST_NET4[2]}.3 ${CONGW[3]} ${NETIFS[p5]} 0 "unicast connected"
run_ip ${PBR_TABLE} ${TEST_NET4[2]}.4 ${RECGW4[2]} ${NETIFS[p5]} 0 "unicast recursive"
+
+ # multipath version
+ #
+ log_subsection "default VRF - main table - multipath"
+
+ run_ip_mpath 254 ${TEST_NET4[1]}.5 \
+ "${CONGW[1]} dev ${NETIFS[p1]} onlink" \
+ "${CONGW[2]} dev ${NETIFS[p3]} onlink" \
+ 0 "unicast connected - multipath"
+
+ run_ip_mpath 254 ${TEST_NET4[1]}.6 \
+ "${RECGW4[1]} dev ${NETIFS[p1]} onlink" \
+ "${RECGW4[2]} dev ${NETIFS[p3]} onlink" \
+ 0 "unicast recursive - multipath"
+
+ run_ip_mpath 254 ${TEST_NET4[1]}.7 \
+ "${CONGW[1]} dev ${NETIFS[p1]}" \
+ "${CONGW[2]} dev ${NETIFS[p3]} onlink" \
+ 0 "unicast connected - multipath onlink first only"
+
+ run_ip_mpath 254 ${TEST_NET4[1]}.8 \
+ "${CONGW[1]} dev ${NETIFS[p1]} onlink" \
+ "${CONGW[2]} dev ${NETIFS[p3]}" \
+ 0 "unicast connected - multipath onlink second only"
}
invalid_onlink_ipv4()
@@ -289,6 +331,20 @@ run_ip6()
log_test $? ${exp_rc} "${desc}"
}
+run_ip6_mpath()
+{
+ local table="$1"
+ local prefix="$2"
+ local nh1="$3"
+ local nh2="$4"
+ local exp_rc="$5"
+ local desc="$6"
+
+ run_cmd ip -6 ro add table "${table}" "${prefix}"/128 \
+ nexthop via ${nh1} nexthop via ${nh2}
+ log_test $? ${exp_rc} "${desc}"
+}
+
valid_onlink_ipv6()
{
# - unicast connected, unicast recursive, v4-mapped
@@ -310,6 +366,35 @@ valid_onlink_ipv6()
run_ip6 ${PBR_TABLE} ${TEST_NET6[2]}::4 ${V6ADDRS[p5]/::*}::64 ${NETIFS[p5]} 0 "unicast connected"
run_ip6 ${PBR_TABLE} ${TEST_NET6[2]}::5 ${RECGW6[2]} ${NETIFS[p5]} 0 "unicast recursive"
run_ip6 ${PBR_TABLE} ${TEST_NET6[2]}::6 ::ffff:${TEST_NET4IN6[2]} ${NETIFS[p5]} 0 "v4-mapped"
+
+ # multipath version
+ #
+ log_subsection "default VRF - main table - multipath"
+
+ run_ip6_mpath 254 ${TEST_NET6[1]}::4 \
+ "${V6ADDRS[p1]/::*}::64 dev ${NETIFS[p1]} onlink" \
+ "${V6ADDRS[p3]/::*}::64 dev ${NETIFS[p3]} onlink" \
+ 0 "unicast connected - multipath onlink"
+
+ run_ip6_mpath 254 ${TEST_NET6[1]}::5 \
+ "${RECGW6[1]} dev ${NETIFS[p1]} onlink" \
+ "${RECGW6[2]} dev ${NETIFS[p3]} onlink" \
+ 0 "unicast recursive - multipath onlink"
+
+ run_ip6_mpath 254 ${TEST_NET6[1]}::6 \
+ "::ffff:${TEST_NET4IN6[1]} dev ${NETIFS[p1]} onlink" \
+ "::ffff:${TEST_NET4IN6[2]} dev ${NETIFS[p3]} onlink" \
+ 0 "v4-mapped - multipath onlink"
+
+ run_ip6_mpath 254 ${TEST_NET6[1]}::7 \
+ "${V6ADDRS[p1]/::*}::64 dev ${NETIFS[p1]} onlink" \
+ "${V6ADDRS[p3]/::*}::64 dev ${NETIFS[p3]}" \
+ 0 "unicast connected - multipath onlink first only"
+
+ run_ip6_mpath 254 ${TEST_NET6[1]}::8 \
+ "${V6ADDRS[p1]/::*}::64 dev ${NETIFS[p1]}" \
+ "${V6ADDRS[p3]/::*}::64 dev ${NETIFS[p3]} onlink" \
+ 0 "unicast connected - multipath onlink second only"
}
invalid_onlink_ipv6()
@@ -355,6 +440,7 @@ run_onlink_tests()
log_section "IPv6 onlink"
log_subsection "Valid onlink commands"
valid_onlink_ipv6
+ log_subsection "Invalid onlink commands"
invalid_onlink_ipv6
}
--
2.11.0
^ permalink raw reply related
* [PATCH net-next 1/2] net/ipv6: Handle onlink flag with multipath routes
From: David Ahern @ 2018-03-13 15:40 UTC (permalink / raw)
To: netdev; +Cc: David Ahern
In-Reply-To: <20180313154010.31623-1-dsahern@gmail.com>
For multipath routes the ONLINK flag is specified per nexthop in rtnh_flags
(as opposed to rtm_flags for unicast routes). Update ip6_route_multipath_add
to set fc_flags based on rtnh_flags.
Fixes: fc1e64e1092f ("net/ipv6: Add support for onlink flag")
Signed-off-by: David Ahern <dsahern@gmail.com>
---
net/ipv6/route.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f0ae58424c45..b223dffa8521 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4166,6 +4166,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
r_cfg.fc_encap_type = nla_get_u16(nla);
}
+ r_cfg.fc_flags |= (rtnh->rtnh_flags & RTNH_F_ONLINK);
rt = ip6_route_info_create(&r_cfg, extack);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
--
2.11.0
^ permalink raw reply related
* [PATCH net-next 0/2] net/ipv6: Handle onlink flag with multipath routes
From: David Ahern @ 2018-03-13 15:40 UTC (permalink / raw)
To: netdev; +Cc: David Ahern
ONLINK flag is per nexthop. For multipath routes it is specified in rtnh_flags
in struct rtnexthop versus rtm_flags for single path routes. Update
ip6_route_multipath_add to look at rtnh_flags before the call to
ip6_route_info_create.
Add multipath test cases as well.
David Ahern (2):
net/ipv6: Handle onlink flag with multipath routes
selftests: Add multipath tests for onlink flag
net/ipv6/route.c | 1 +
tools/testing/selftests/net/fib-onlink-tests.sh | 92 ++++++++++++++++++++++++-
2 files changed, 90 insertions(+), 3 deletions(-)
--
2.11.0
^ permalink raw reply
* Re: [PATCH 00/30] Netfilter/IPVS updates for net-next
From: Florian Westphal @ 2018-03-13 15:39 UTC (permalink / raw)
To: David Miller; +Cc: fw, nbd, pablo, netfilter-devel, netdev
In-Reply-To: <20180313.113434.1173466843045633114.davem@davemloft.net>
David Miller <davem@davemloft.net> wrote:
[ flow tables ]
> Ok, that seems to constrain the exposure.
>
> We should talk at some point about how exposed conntrack itself is.
Sure, we can do that.
If you have specific scenarios (synflood, peer that opens
100k (legitimate) connections, perpetual-fin, etc) in mind let me know,
i do think that we could still do better in some cases.
^ permalink raw reply
* Re: [PATCH 00/30] Netfilter/IPVS updates for net-next
From: David Miller @ 2018-03-13 15:34 UTC (permalink / raw)
To: fw; +Cc: nbd, pablo, netfilter-devel, netdev
In-Reply-To: <20180313134139.GD31828@breakpoint.cc>
From: Florian Westphal <fw@strlen.de>
Date: Tue, 13 Mar 2018 14:41:39 +0100
> David Miller <davem@davemloft.net> wrote:
>> From: Felix Fietkau <nbd@nbd.name>
>> Date: Mon, 12 Mar 2018 20:30:01 +0100
>>
>> > It's not dead and useless. In its current state, it has a software fast
>> > path that significantly improves nftables routing/NAT throughput,
>> > especially on embedded devices.
>> > On some devices, I've seen "only" 20% throughput improvement (along with
>> > CPU usage reduction), on others it's quite a bit lot more. This is
>> > without any extra drivers or patches aside from what's posted.
>>
>> I wonder if this software fast path has the exploitability problems that
>> things like the ipv4 routing cache and the per-cpu flow cache both had.
>
> No, entries in the flow table are backed by an entry in the conntrack
> table, and that has an upper ceiling.
>
> As decision of when an entry gets placed into the flow table is
> configureable via ruleset (nftables, iptables will be coming too), one
> can tie the 'fastpathing' to almost-arbitrary criterion, e.g.
>
> 'only flows from trusted internal network'
> 'only flows that saw two-way communication'
> 'only flows that sent more than 100kbyte'
>
> or any combination thereof.
>
> Do you see another problem that needs to be addressed?
Ok, that seems to constrain the exposure.
We should talk at some point about how exposed conntrack itself is.
^ permalink raw reply
* Re: linux-next: build warning after merge of the net-next tree
From: David Miller @ 2018-03-13 15:33 UTC (permalink / raw)
To: gustavo; +Cc: sfr, netdev, linux-next, linux-kernel
In-Reply-To: <67db5ac6-9816-4bf1-c594-72697c426466@embeddedor.com>
From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Tue, 13 Mar 2018 06:46:24 -0500
> Hi Stephen,
>
> On 03/13/2018 01:11 AM, Stephen Rothwell wrote:
>> Hi all,
>> After merging the net-next tree, today's linux-next build (sparc
>> defconfig) produced this warning:
>> net/core/pktgen.c: In function 'pktgen_if_write':
>> net/core/pktgen.c:1710:1: warning: the frame size of 1048 bytes is
>> larger than 1024 bytes [-Wframe-larger-than=]
>> }
>> ^
>> Introduced by commit
>> 35951393bbff ("pktgen: Remove VLA usage")
>>
>
> Thanks for the report.
>
> David:
>
> If this code is not going to be executed very often [1], then I think
> it is safe to use dynamic memory allocation instead, as this is not
> going to impact the performance.
>
> What do you think?
>
> [1] https://lkml.org/lkml/2018/3/9/630
Sure, that works.
It is only invoked when pktgen configuration changes are made.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox