Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 2/2] netfilter: nft_bitwise: Adjust parentheses to fix memcmp size argument
From: Pablo Neira Ayuso @ 2019-08-14 21:43 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190814214347.4940-1-pablo@netfilter.org>

From: Nathan Chancellor <natechancellor@gmail.com>

clang warns:

net/netfilter/nft_bitwise.c:138:50: error: size argument in 'memcmp'
call is a comparison [-Werror,-Wmemsize-comparison]
        if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
                                      ~~~~~~~~~~~~~~~~~~^~
net/netfilter/nft_bitwise.c:138:6: note: did you mean to compare the
result of 'memcmp' instead?
        if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
            ^
                                                       )
net/netfilter/nft_bitwise.c:138:32: note: explicitly cast the argument
to size_t to silence this warning
        if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
                                      ^
                                      (size_t)(
1 error generated.

Adjust the parentheses so that the result of the sizeof is used for the
size argument in memcmp, rather than the result of the comparison (which
would always be true because sizeof is a non-zero number).

Fixes: bd8699e9e292 ("netfilter: nft_bitwise: add offload support")
Link: https://github.com/ClangBuiltLinux/linux/issues/638
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_bitwise.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c
index 1f04ed5c518c..974300178fa9 100644
--- a/net/netfilter/nft_bitwise.c
+++ b/net/netfilter/nft_bitwise.c
@@ -135,8 +135,8 @@ static int nft_bitwise_offload(struct nft_offload_ctx *ctx,
 {
 	const struct nft_bitwise *priv = nft_expr_priv(expr);
 
-	if (memcmp(&priv->xor, &zero, sizeof(priv->xor) ||
-	    priv->sreg != priv->dreg))
+	if (memcmp(&priv->xor, &zero, sizeof(priv->xor)) ||
+	    priv->sreg != priv->dreg)
 		return -EOPNOTSUPP;
 
 	memcpy(&ctx->regs[priv->dreg].mask, &priv->mask, sizeof(priv->mask));
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH net-next 1/5] RDS: Re-add pf/sol access via sysctl
From: Gerd Rausch @ 2019-08-14 21:45 UTC (permalink / raw)
  To: David Miller, santosh.shilimkar; +Cc: dledford, netdev, linux-rdma, rds-devel
In-Reply-To: <20190814.143141.178107876214573923.davem@davemloft.net>

Hi David,

On 14/08/2019 14.31, David Miller wrote:
> From: santosh.shilimkar@oracle.com
> Date: Wed, 14 Aug 2019 11:36:19 -0700
> 
>> On 8/14/19 11:21 AM, David Miller wrote:
>>> From: santosh.shilimkar@oracle.com
>>> Date: Wed, 14 Aug 2019 11:01:36 -0700
>>>
>>>> Some of the application software was released before 2009 and ended up
>>>> using these proc entries from downstream kernel. The newer lib/app
>>>> using RDS don't use these. Unfortunately lot of customer still use
>>>> Oracle 9, 10, 11 which were released before 2007 and run these apps
>>>> on modern kernels.
>>> So those apps are using proc entries that were never upstream...
>>>
>>> Sorry, this is completely and utterly inappropriate.
>>>
>> Agree. Unfortunately one of the legacy application library didn't
>> get upgraded even after the ports were registered with IANA.
>> Oracle 11 is still very active release and hence this patch.
>>
>> It is fine to drop $subject patch from this series.
> 
> The appropriate procedure is to resubmit the series with the patch
> removed.
> 

For my understanding:
Are you saying that...

a) It is utterly inappropriate to have Oracle applications
   rely on a /proc/sys API that was kept out of Upstream-Linux
   for this long

b) It is utterly inappropriate to include such a /proc/sys API
   that Oracle applications have depended on this late

c) ... something else ...

At first I read your comment as "a)", which would then imply
that this commit shall be included now (albeit late).

If your answer is "not a)", or implies that Oracle ought to continue
to carry this change in our own repository only, please let me know,
and I will re-submit the series without this patch, to follow
appropriate procedure.

Thanks,

  Gerd

^ permalink raw reply

* [PATCH] selftests: net: tcp_fastopen_backup_key.sh: fix shellcheck issue
From: Anders Roxell @ 2019-08-14 21:49 UTC (permalink / raw)
  To: davem, shuah; +Cc: netdev, linux-kselftest, linux-kernel, Anders Roxell

When running tcp_fastopen_backup_key.sh the following issue was seen in
a busybox environment.
./tcp_fastopen_backup_key.sh: line 33: [: -ne: unary operator expected

Shellcheck showed the following issue.
$ shellcheck tools/testing/selftests/net/tcp_fastopen_backup_key.sh

In tools/testing/selftests/net/tcp_fastopen_backup_key.sh line 33:
        if [ $val -ne 0 ]; then
             ^-- SC2086: Double quote to prevent globbing and word splitting.

Rework to add double quotes around the variable 'val' that shellcheck
recommends.

Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
---
 tools/testing/selftests/net/tcp_fastopen_backup_key.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/tcp_fastopen_backup_key.sh b/tools/testing/selftests/net/tcp_fastopen_backup_key.sh
index 41476399e184..ba5ec3eb314e 100755
--- a/tools/testing/selftests/net/tcp_fastopen_backup_key.sh
+++ b/tools/testing/selftests/net/tcp_fastopen_backup_key.sh
@@ -30,7 +30,7 @@ do_test() {
 	ip netns exec "${NETNS}" ./tcp_fastopen_backup_key "$1"
 	val=$(ip netns exec "${NETNS}" nstat -az | \
 		grep TcpExtTCPFastOpenPassiveFail | awk '{print $2}')
-	if [ $val -ne 0 ]; then
+	if [ "$val" -ne 0 ]; then
 		echo "FAIL: TcpExtTCPFastOpenPassiveFail non-zero"
 		return 1
 	fi
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH bpf-next v2] net: Don't call XDP_SETUP_PROG when nothing is changed
From: Jakub Kicinski @ 2019-08-14 22:01 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf@vger.kernel.org,
	netdev@vger.kernel.org, David S. Miller, Björn Töpel,
	Saeed Mahameed, Jesper Dangaard Brouer, John Fastabend,
	Martin KaFai Lau, Song Liu, Yonghong Song
In-Reply-To: <20190814143352.3759-1-maximmi@mellanox.com>

On Wed, 14 Aug 2019 14:34:06 +0000, Maxim Mikityanskiy wrote:
> Don't uninstall an XDP program when none is installed, and don't install
> an XDP program that has the same ID as the one already installed.
> 
> dev_change_xdp_fd doesn't perform any checks in case it uninstalls an
> XDP program. It means that the driver's ndo_bpf can be called with
> XDP_SETUP_PROG asking to set it to NULL even if it's already NULL. This
> case happens if the user runs `ip link set eth0 xdp off` when there is
> no XDP program attached.
> 
> The symmetrical case is possible when the user tries to set the program
> that is already set.
> 
> The drivers typically perform some heavy operations on XDP_SETUP_PROG,
> so they all have to handle these cases internally to return early if
> they happen. This patch puts this check into the kernel code, so that
> all drivers will benefit from it.
> 
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-14 22:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Colascione, Song Liu, Kees Cook, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
	Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <CALCETrUkqUprujww26VxHwkdXQ3DWJH8nnL2VBYpK2EU0oX_YA@mail.gmail.com>

On Wed, Aug 14, 2019 at 10:51:23AM -0700, Andy Lutomirski wrote:
> 
> If eBPF is genuinely not usable by programs that are not fully trusted
> by the admin, then no kernel changes at all are needed.  Programs that
> want to reduce their own privileges can easily fork() a privileged
> subprocess or run a little helper to which they delegate BPF
> operations.  This is far more flexible than anything that will ever be
> in the kernel because it allows the helper to verify that the rest of
> the program is doing exactly what it's supposed to and restrict eBPF
> operations to exactly the subset that is needed.  So a container
> manager or network manager that drops some provilege could have a
> little bpf-helper that manages its BPF XDP, firewalling, etc
> configuration.  The two processes would talk over a socketpair.

there were three projects that tried to delegate bpf operations.
All of them failed.
bpf operational workflow is much more complex than you're imagining.
fork() also doesn't work for all cases.
I gave this example before: consider multiple systemd-like deamons
that need to do bpf operations that want to pass this 'bpf capability'
to other deamons written by other teams. Some of them will start
non-root, but still need to do bpf. They will be rpm installed
and live upgraded while running.
We considered to make systemd such centralized bpf delegation
authority too. It didn't work. bpf in kernel grows quickly.
libbpf part grows independently. llvm keeps evolving.
All of them are being changed while system overall has to stay
operational. Centralized approach breaks apart.

> The interesting cases you're talking about really *do* involved
> unprivileged or less privileged eBPF, though.  Let's see:
> 
> systemd --user: systemd --user *is not privileged at all*.  There's no
> issue of reducing privilege, since systemd --user doesn't have any
> privilege to begin with.  But systemd supports some eBPF features, and
> presumably it would like to support them in the systemd --user case.
> This is unprivileged eBPF.

Let's disambiguate the terminology.
This /dev/bpf patch set started as describing the feature as 'unprivileged bpf'.
I think that was a mistake.
Let's call systemd-like deamon usage of bpf 'less privileged bpf'.
This is not unprivileged.
'unprivileged bpf' is what sysctl kernel.unprivileged_bpf_disabled controls.

There is a huge difference between the two.
I'm against extending 'unprivileged bpf' even a bit more than what it is
today for many reasons mentioned earlier.
The /dev/bpf is about 'less privileged'.
Less privileged than root. We need to split part of full root capability
into bpf capability. So that most of the root can be dropped.
This is very similar to what cap_net_admin does.
cap_net_amdin can bring down eth0 which is just as bad as crashing the box.
cap_net_admin is very much privileged. Just 'less privileged' than root.
Same thing for cap_bpf.

May be we should do both cap_bpf and /dev/bpf to make it clear that
this is the same thing. Two interfaces to achieve the same result.

> Seccomp.  Seccomp already uses cBPF, which is a form of BPF although
> it doesn't involve the bpf() syscall.  There are some seccomp
> proposals in the works that will want some stuff from eBPF.  In

I'm afraid these proposals won't go anywhere.

> So it's a bit of a chicken-and-egg situation.  There aren't major
> unprivileged eBPF users because the kernel support isn't there.

As I said before there are zero known use cases of 'unprivileged bpf'.

If I understand you correctly you're refusing to accept that
'less privileged bpf' is a valid use case while pushing for extending
scope of 'unprivileged'.
Extending the scope is an orthogonal discussion. Currently I'm
opposed to that. Whereas 'less privileged' is what people require.

> It will remain extremely awkward for containers and
> especially nested containers to use eBPF.

I'm afraid we have to agree to disagree here and move on.

^ permalink raw reply

* Re: [PATCH v2 bpf-next] mm: mmap: increase sockets maximum memory size pgoff for 32bits
From: Andrew Morton @ 2019-08-14 22:18 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: bjorn.topel, linux-mm, xdp-newbies, netdev, bpf, linux-kernel,
	ast, magnus.karlsson
In-Reply-To: <20190814150934.GD4142@khorivan>

On Wed, 14 Aug 2019 18:09:36 +0300 Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:

> On Mon, Aug 12, 2019 at 02:19:24PM -0700, Andrew Morton wrote:
> 
> Hi, Andrew
> 
> >On Mon, 12 Aug 2019 15:43:26 +0300 Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> >
> >> The AF_XDP sockets umem mapping interface uses XDP_UMEM_PGOFF_FILL_RING
> >> and XDP_UMEM_PGOFF_COMPLETION_RING offsets. The offsets seems like are
> >> established already and are part of configuration interface.
> >>
> >> But for 32-bit systems, while AF_XDP socket configuration, the values
> >> are to large to pass maximum allowed file size verification.
> >> The offsets can be tuned ofc, but instead of changing existent
> >> interface - extend max allowed file size for sockets.
> >
> >
> >What are the implications of this?  That all code in the kernel which
> >handles mapped sockets needs to be audited (and tested) for correctly
> >handling mappings larger than 4G on 32-bit machines?  Has that been
> 
> That's to allow only offset to be passed, mapping length is less than 4Gb.
> I have verified all list of mmap for sockets and all of them contain dummy
> cb sock_no_mmap() except the following:
> 
> xsk_mmap()
> tcp_mmap()
> packet_mmap()
> 
> xsk_mmap() - it's what this fix is needed for.
> tcp_mmap() - doesn't have obvious issues with pgoff - no any references on it.
> packet_mmap() - return -EINVAL if it's even set.

Great, thanks.

> 
> >done?  Are we confident that we aren't introducing user-visible buggy
> >behaviour into unsuspecting legacy code?
> >
> >Also...  what are the user-visible runtime effects of this change?
> >Please send along a paragraph which explains this, for the changelog.
> >Does this patch fix some user-visible problem?  If so, should be code
> >be backported into -stable kernels?
> It should go to linux-next, no one has been using it till this patch
> with 32 bits as w/o this fix af_xdp sockets can't be used at all.
> It unblocks af_xdp socket usage for 32bit systems.
> 
> 
> That's example of potential next commit message:
> Subject: mm: mmap: increase sockets maximum memory size pgoff for 32bits
> 
> The AF_XDP sockets umem mapping interface uses XDP_UMEM_PGOFF_FILL_RING
> and XDP_UMEM_PGOFF_COMPLETION_RING offsets.  These offsets are established
> already and are part of the configuration interface.
> 
> But for 32-bit systems, using AF_XDP socket configuration, these values
> are too large to pass the maximum allowed file size verification.  The
> offsets can be tuned off, but instead of changing the existing interface,
> let's extend the max allowed file size for sockets.
> 
> No one has been using it till this patch with 32 bits as w/o this fix
> af_xdp sockets can't be used at all, so it unblocks af_xdp socket usage
> for 32bit systems.
> 
> All list of mmap cbs for sockets were verified on side effects and
> all of them contain dummy cb - sock_no_mmap() at this moment, except the
> following:
> 
> xsk_mmap() - it's what this fix is needed for.
> tcp_mmap() - doesn't have obvious issues with pgoff - no any references on it.
> packet_mmap() - return -EINVAL if it's even set.
>
> ...
>
> Is it ok to be replicated in PATCH v2 or this explanation is enough here
> to use v1?

I have replaced the changlog in my tree with the above, thanks.


^ permalink raw reply

* Re: [PATCH 12/16] arm64: prefer __section from compiler_attributes.h
From: Nick Desaulniers @ 2019-08-14 22:20 UTC (permalink / raw)
  To: Will Deacon, Miguel Ojeda
  Cc: Andrew Morton, Sedat Dilek, Josh Poimboeuf, Yonghong Song,
	clang-built-linux, Catalin Marinas, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Andrey Konovalov,
	Greg Kroah-Hartman, Enrico Weigelt, Suzuki K Poulose,
	Thomas Gleixner, Masayoshi Mizuma, Shaokun Zhang, Alexios Zavras,
	Allison Randal, Linux ARM, linux-kernel, Network Development, bpf
In-Reply-To: <20190813170829.c3lryb6va3eopxd7@willie-the-truck>

On Tue, Aug 13, 2019 at 10:08 AM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Aug 13, 2019 at 02:36:06PM +0200, Miguel Ojeda wrote:
> > On Tue, Aug 13, 2019 at 10:27 AM Will Deacon <will@kernel.org> wrote:
> > > On Mon, Aug 12, 2019 at 02:50:45PM -0700, Nick Desaulniers wrote:
> > > > GCC unescapes escaped string section names while Clang does not. Because
> > > > __section uses the `#` stringification operator for the section name, it
> > > > doesn't need to be escaped.
> > > >
> > > > This antipattern was found with:
> > > > $ grep -e __section\(\" -e __section__\(\" -r
> > > >
> > > > Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
> > > > Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > > > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> > > > ---
> > > >  arch/arm64/include/asm/cache.h     | 2 +-
> > > >  arch/arm64/kernel/smp_spin_table.c | 2 +-
> > > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > Does this fix a build issue, or is it just cosmetic or do we end up with
> > > duplicate sections or something else?
> >
> > This should be cosmetic -- basically we are trying to move all users
> > of current available __attribute__s in compiler_attributes.h to the
> > __attr forms. I am also adding (slowly) new attributes that are
> > already used but we don't have them yet in __attr form.

This lone patch of the series is just cosmetic, but patch 14/16 fixes
a real boot issue:
https://github.com/ClangBuiltLinux/linux/issues/619
Miguel, I'd like to get that one landed ASAP; the rest are just for consistency.

> >
> > > Happy to route it via arm64, just having trouble working out whether it's
> > > 5.3 material!

Thanks!
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/core&id=80d838122643a09a9f99824adea4b4261e4451e6

> >
> > As you prefer! Those that are not taken by a maintainer I will pick up
> > and send via compiler-attributes.

Miguel, how do you want to take the rest of these patches? Will picked
up the arm64 one, I think the SuperH one got picked up.  There was
feedback to add more info to individual commits' commit messages.

I kept these tree wide changes separate to improve the likelihood that
they'd backport to stable cleanly, but could always squash if you'd
prefer to have 1 patch instead of a series.  Just let me know.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski @ 2019-08-14 22:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Daniel Colascione, Song Liu, Kees Cook,
	Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	Lorenz Bauer, Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190814220545.co5pucyo5jk3weiv@ast-mbp.dhcp.thefacebook.com>



> On Aug 14, 2019, at 3:05 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
>> On Wed, Aug 14, 2019 at 10:51:23AM -0700, Andy Lutomirski wrote:
>> 
>> If eBPF is genuinely not usable by programs that are not fully trusted
>> by the admin, then no kernel changes at all are needed.  Programs that
>> want to reduce their own privileges can easily fork() a privileged
>> subprocess or run a little helper to which they delegate BPF
>> operations.  This is far more flexible than anything that will ever be
>> in the kernel because it allows the helper to verify that the rest of
>> the program is doing exactly what it's supposed to and restrict eBPF
>> operations to exactly the subset that is needed.  So a container
>> manager or network manager that drops some provilege could have a
>> little bpf-helper that manages its BPF XDP, firewalling, etc
>> configuration.  The two processes would talk over a socketpair.
> 
> there were three projects that tried to delegate bpf operations.
> All of them failed.
> bpf operational workflow is much more complex than you're imagining.
> fork() also doesn't work for all cases.
> I gave this example before: consider multiple systemd-like deamons
> that need to do bpf operations that want to pass this 'bpf capability'
> to other deamons written by other teams. Some of them will start
> non-root, but still need to do bpf. They will be rpm installed
> and live upgraded while running.
> We considered to make systemd such centralized bpf delegation
> authority too. It didn't work. bpf in kernel grows quickly.
> libbpf part grows independently. llvm keeps evolving.
> All of them are being changed while system overall has to stay
> operational. Centralized approach breaks apart.
> 
>> The interesting cases you're talking about really *do* involved
>> unprivileged or less privileged eBPF, though.  Let's see:
>> 
>> systemd --user: systemd --user *is not privileged at all*.  There's no
>> issue of reducing privilege, since systemd --user doesn't have any
>> privilege to begin with.  But systemd supports some eBPF features, and
>> presumably it would like to support them in the systemd --user case.
>> This is unprivileged eBPF.
> 
> Let's disambiguate the terminology.
> This /dev/bpf patch set started as describing the feature as 'unprivileged bpf'.
> I think that was a mistake.
> Let's call systemd-like deamon usage of bpf 'less privileged bpf'.
> This is not unprivileged.
> 'unprivileged bpf' is what sysctl kernel.unprivileged_bpf_disabled controls.
> 
> There is a huge difference between the two.
> I'm against extending 'unprivileged bpf' even a bit more than what it is
> today for many reasons mentioned earlier.
> The /dev/bpf is about 'less privileged'.
> Less privileged than root. We need to split part of full root capability
> into bpf capability. So that most of the root can be dropped.
> This is very similar to what cap_net_admin does.
> cap_net_amdin can bring down eth0 which is just as bad as crashing the box.
> cap_net_admin is very much privileged. Just 'less privileged' than root.
> Same thing for cap_bpf.

The new pseudo-capability in this patch set is absurdly broad. I’ve proposed some finer-grained divisions in this thread. Do you have comments on them?

> 
> May be we should do both cap_bpf and /dev/bpf to make it clear that
> this is the same thing. Two interfaces to achieve the same result.

What for?  If there’s a CAP_BPF, then why do you want /dev/bpf? Especially if you define it to do the same thing.

> 
>> Seccomp.  Seccomp already uses cBPF, which is a form of BPF although
>> it doesn't involve the bpf() syscall.  There are some seccomp
>> proposals in the works that will want some stuff from eBPF.  In
> 
> I'm afraid these proposals won't go anywhere.

Can you explain why?

> 
>> So it's a bit of a chicken-and-egg situation.  There aren't major
>> unprivileged eBPF users because the kernel support isn't there.
> 
> As I said before there are zero known use cases of 'unprivileged bpf'.
> 
> If I understand you correctly you're refusing to accept that
> 'less privileged bpf' is a valid use case while pushing for extending
> scope of 'unprivileged'.

No, I’m not.  I have no objection at all if you try to come up with a clear definition of what the capability checks do and what it means to grant a new permission to a task.  Changing *all* of the capable checks is needlessly broad.

^ permalink raw reply

* Re: [PATCH net-next v2 5/9] net: phy: add MACsec ops in phy_device
From: Florian Fainelli @ 2019-08-14 23:15 UTC (permalink / raw)
  To: Antoine Tenart, davem, sd, andrew, hkallweit1
  Cc: netdev, linux-kernel, thomas.petazzoni, alexandre.belloni,
	allan.nielsen, camelia.groza, Simon.Edelhaus
In-Reply-To: <20190808140600.21477-6-antoine.tenart@bootlin.com>

On 8/8/19 7:05 AM, Antoine Tenart wrote:
> This patch adds a reference to MACsec ops in the phy_device, to allow
> PHYs to support offloading MACsec operations. The phydev lock will be
> held while calling those helpers.
> 
> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
> ---
>  include/linux/phy.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 462b90b73f93..6947a19587e4 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -22,6 +22,10 @@
>  #include <linux/workqueue.h>
>  #include <linux/mod_devicetable.h>
>  
> +#ifdef CONFIG_MACSEC
> +#include <net/macsec.h>
> +#endif

#if IS_ENABLED(CONFIG_MACSEC)

> +
>  #include <linux/atomic.h>
>  
>  #define PHY_DEFAULT_FEATURES	(SUPPORTED_Autoneg | \
> @@ -345,6 +349,7 @@ struct phy_c45_device_ids {
>   * attached_dev: The attached enet driver's device instance ptr
>   * adjust_link: Callback for the enet controller to respond to
>   * changes in the link state.
> + * macsec_ops: MACsec offloading ops.
>   *
>   * speed, duplex, pause, supported, advertising, lp_advertising,
>   * and autoneg are used like in mii_if_info
> @@ -438,6 +443,11 @@ struct phy_device {
>  
>  	void (*phy_link_change)(struct phy_device *, bool up, bool do_carrier);
>  	void (*adjust_link)(struct net_device *dev);
> +
> +#if defined(CONFIG_MACSEC)
> +	/* MACsec management functions */
> +	const struct macsec_ops *macsec_ops;
> +#endif

#if IS_ENABLED(CONFIG_MACSEC)

likewise.
-- 
Florian

^ permalink raw reply

* Re: [PATCH bpf-next v4 1/2] xsk: remove AF_XDP socket from map when the socket is released
From: Daniel Borkmann @ 2019-08-14 23:17 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Alexei Starovoitov, Netdev, Björn Töpel,
	Karlsson, Magnus, Bruce Richardson, Song Liu, bpf
In-Reply-To: <CAJ+HfNhO+xSs25aPat9WjC75W6_Kgfq=GU+YCEcoZw-GCjZdEg@mail.gmail.com>

On 8/12/19 7:25 PM, Björn Töpel wrote:
> On Mon, 12 Aug 2019 at 14:28, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
> [...]
>>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>>> index 59b57d708697..c3447bad608a 100644
>>> --- a/net/xdp/xsk.c
>>> +++ b/net/xdp/xsk.c
>>> @@ -362,6 +362,50 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
>>>        dev_put(dev);
>>>    }
>>>
>>> +static struct xsk_map *xsk_get_map_list_entry(struct xdp_sock *xs,
>>> +                                           struct xdp_sock ***map_entry)
>>> +{
>>> +     struct xsk_map *map = NULL;
>>> +     struct xsk_map_node *node;
>>> +
>>> +     *map_entry = NULL;
>>> +
>>> +     spin_lock_bh(&xs->map_list_lock);
>>> +     node = list_first_entry_or_null(&xs->map_list, struct xsk_map_node,
>>> +                                     node);
>>> +     if (node) {
>>> +             WARN_ON(xsk_map_inc(node->map));
>>
>> Can you elaborate on the refcount usage here and against what scenario it is protecting?
> 
> Thanks for having a look!
> 
> First we access the map_list (under the lock) and pull out the map
> which we intend to clean. In order to clear the map entry, we need to
> a reference to the map. However, when the map_list_lock is released,
> there's a window where the map entry can be cleared and the map can be
> destroyed, and making the "map", which is used in
> xsk_delete_from_maps, stale. To guarantee existence the additional
> refinc is required. Makes sense?

Seems reasonable to me, and inc as opposed to inc_not_zero is also fine
here since at this point in time we're still holding one reference to
the map. But I think there's a catch with the current code that still
needs fixing:

Imagine you do a xsk_map_update_elem() where we have a situation where
xs == old_xs. There, we first do the xsk_map_sock_add() to add the new
xsk map node at the tail of the socket's xs->map_list. We do the xchg()
and then xsk_map_sock_delete() for old_xs which then walks xs->map_list
again and purges all entries including the just newly created one. This
means we'll end up with an xs socket at the given map slot, but the xs
socket has empty xs->map_list. This means we could release the xs sock
and the xsk_delete_from_maps() won't need to clean up anything anymore
but yet the xs is still in the map slot, so if you redirect to that
socket, it would be use-after-free, no?

>> Do we pretend it never fails on the bpf_map_inc() wrt the WARN_ON(),
>> why that (what makes it different from the xsk_map_node_alloc() inc
>> above where we do error out)?
> 
> Hmm, given that we're in a cleanup (socket release), we can't really
> return any error. What would be a more robust way? Retrying? AFAIK the
> release ops return an int, but it's not checked/used.
> 
>>> +             map = node->map;
>>> +             *map_entry = node->map_entry;
>>> +     }
>>> +     spin_unlock_bh(&xs->map_list_lock);
>>> +     return map;
>>> +}
>>> +
>>> +static void xsk_delete_from_maps(struct xdp_sock *xs)
>>> +{
>>> +     /* This function removes the current XDP socket from all the
>>> +      * maps it resides in. We need to take extra care here, due to
>>> +      * the two locks involved. Each map has a lock synchronizing
>>> +      * updates to the entries, and each socket has a lock that
>>> +      * synchronizes access to the list of maps (map_list). For
>>> +      * deadlock avoidance the locks need to be taken in the order
>>> +      * "map lock"->"socket map list lock". We start off by
>>> +      * accessing the socket map list, and take a reference to the
>>> +      * map to guarantee existence. Then we ask the map to remove
>>> +      * the socket, which tries to remove the socket from the
>>> +      * map. Note that there might be updates to the map between
>>> +      * xsk_get_map_list_entry() and xsk_map_try_sock_delete().
>>> +      */
> 
> I tried to clarify here, but I obviously need to do a better job. :-)
> 
> 
> Björn
> 


^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Florian Fainelli @ 2019-08-14 23:28 UTC (permalink / raw)
  To: Andrew Lunn, Igor Russkikh
  Cc: Antoine Tenart, davem@davemloft.net, sd@queasysnail.net,
	hkallweit1@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, thomas.petazzoni@bootlin.com,
	alexandre.belloni@bootlin.com, allan.nielsen@microchip.com,
	camelia.groza@nxp.com, Simon Edelhaus, Pavel Belous
In-Reply-To: <20190813162823.GH15047@lunn.ch>

On 8/13/19 9:28 AM, Andrew Lunn wrote:
>> 1) With current implementation it's impossible to install SW macsec engine onto
>> the device which supports HW offload. That could be a strong limitation in
>> cases when user sees HW macsec offload is broken or work differently, and he/she
>> wants to replace it with SW one.
>> MACSec is a complex feature, and it may happen something is missing in HW.
>> Trivial example is 256bit encryption, which is not always a musthave in HW
>> implementations.
> 
> Ideally, we want the driver to return EOPNOTSUPP if it does not
> support something and the software implement should be used.
> 
> If the offload is broken, we want a bug report! And if it works
> differently, it suggests there is also a bug we need to fix, or the
> standard is ambiguous.
> 
> It would also be nice to add extra information to the netlink API to
> indicate if HW or SW is being used. In other places where we offload
> to accelerators we have such additional information.

Igor's point is entirely valid in that you should allow both offload to
HW for what is possible and offload to a software implementation for
what is not supported in HW.

Let's an example that is hopefully more familiar to the people in this
thread. Consider a NIC that can do single VLAN tag offload on xmit (or
receive, does not matter), and you find yourself using a double VLAN tag
configuration. You would create a first VLAN stacked network device
which is fully offloaded onto the underlying NIC, and a second VLAN
stacked network device on top of the first once which is not offloaded.

The way I would imagine a MACsec offload is kind of similar here, in
that it would be a macsec virtual network device on top of an underlying
physical device. If no offload is possible, the virtual network device's
xmit/receive path is used. If the NIC driver can offload it, it does
not. How it does it, whether at the MAC/DMA level, or at the PHY level
can be a check added at the appropriate level.
-- 
Florian

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-14 23:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Daniel Colascione, Song Liu, Kees Cook,
	Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	Lorenz Bauer, Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <AD211133-EA60-4B91-AB1B-201713F50AB2@amacapital.net>

On Wed, Aug 14, 2019 at 03:30:51PM -0700, Andy Lutomirski wrote:
> 
> 
> > On Aug 14, 2019, at 3:05 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > 
> >> On Wed, Aug 14, 2019 at 10:51:23AM -0700, Andy Lutomirski wrote:
> >> 
> >> If eBPF is genuinely not usable by programs that are not fully trusted
> >> by the admin, then no kernel changes at all are needed.  Programs that
> >> want to reduce their own privileges can easily fork() a privileged
> >> subprocess or run a little helper to which they delegate BPF
> >> operations.  This is far more flexible than anything that will ever be
> >> in the kernel because it allows the helper to verify that the rest of
> >> the program is doing exactly what it's supposed to and restrict eBPF
> >> operations to exactly the subset that is needed.  So a container
> >> manager or network manager that drops some provilege could have a
> >> little bpf-helper that manages its BPF XDP, firewalling, etc
> >> configuration.  The two processes would talk over a socketpair.
> > 
> > there were three projects that tried to delegate bpf operations.
> > All of them failed.
> > bpf operational workflow is much more complex than you're imagining.
> > fork() also doesn't work for all cases.
> > I gave this example before: consider multiple systemd-like deamons
> > that need to do bpf operations that want to pass this 'bpf capability'
> > to other deamons written by other teams. Some of them will start
> > non-root, but still need to do bpf. They will be rpm installed
> > and live upgraded while running.
> > We considered to make systemd such centralized bpf delegation
> > authority too. It didn't work. bpf in kernel grows quickly.
> > libbpf part grows independently. llvm keeps evolving.
> > All of them are being changed while system overall has to stay
> > operational. Centralized approach breaks apart.
> > 
> >> The interesting cases you're talking about really *do* involved
> >> unprivileged or less privileged eBPF, though.  Let's see:
> >> 
> >> systemd --user: systemd --user *is not privileged at all*.  There's no
> >> issue of reducing privilege, since systemd --user doesn't have any
> >> privilege to begin with.  But systemd supports some eBPF features, and
> >> presumably it would like to support them in the systemd --user case.
> >> This is unprivileged eBPF.
> > 
> > Let's disambiguate the terminology.
> > This /dev/bpf patch set started as describing the feature as 'unprivileged bpf'.
> > I think that was a mistake.
> > Let's call systemd-like deamon usage of bpf 'less privileged bpf'.
> > This is not unprivileged.
> > 'unprivileged bpf' is what sysctl kernel.unprivileged_bpf_disabled controls.
> > 
> > There is a huge difference between the two.
> > I'm against extending 'unprivileged bpf' even a bit more than what it is
> > today for many reasons mentioned earlier.
> > The /dev/bpf is about 'less privileged'.
> > Less privileged than root. We need to split part of full root capability
> > into bpf capability. So that most of the root can be dropped.
> > This is very similar to what cap_net_admin does.
> > cap_net_amdin can bring down eth0 which is just as bad as crashing the box.
> > cap_net_admin is very much privileged. Just 'less privileged' than root.
> > Same thing for cap_bpf.
> 
> The new pseudo-capability in this patch set is absurdly broad. I’ve proposed some finer-grained divisions in this thread. Do you have comments on them?

Initially I agreed that it's probably too broad, but then realized
that they're perfect as-is. There is no need to partition further.

> > May be we should do both cap_bpf and /dev/bpf to make it clear that
> > this is the same thing. Two interfaces to achieve the same result.
> 
> What for?  If there’s a CAP_BPF, then why do you want /dev/bpf? Especially if you define it to do the same thing.

Indeed, ambient capabilities should work for all cases.

> No, I’m not.  I have no objection at all if you try to come up with a clear definition of what the capability checks do and what it means to grant a new permission to a task.  Changing *all* of the capable checks is needlessly broad.

There are not that many bits left. I prefer to consume single CAP_BPF bit.
All capable(CAP_SYS_ADMIN) checks in kernel/bpf/ will become CAP_BPF.
This is no-brainer.

The only question is whether few cases of CAP_NET_ADMIN in kernel/bpf/
should be extended to CAP_BPF or not.
imo devmap and xskmap can stay CAP_NET_ADMIN,
but cgroup bpf attach/detach should be either CAP_NET_ADMIN or CAP_BPF.
Initially cgroup-bpf hooks were limited to networking.
It's no longer the case. Requiring NET_ADMIN there make little sense now.


^ permalink raw reply

* RE: [PATCH] selftests: net: tcp_fastopen_backup_key.sh: fix shellcheck issue
From: Tim.Bird @ 2019-08-14 23:34 UTC (permalink / raw)
  To: anders.roxell, davem, shuah; +Cc: netdev, linux-kselftest, linux-kernel
In-Reply-To: <20190814214948.5571-1-anders.roxell@linaro.org>



> -----Original Message-----
> From: Anders Roxell
> 
> When running tcp_fastopen_backup_key.sh the following issue was seen in
> a busybox environment.
> ./tcp_fastopen_backup_key.sh: line 33: [: -ne: unary operator expected
> 
> Shellcheck showed the following issue.
> $ shellcheck tools/testing/selftests/net/tcp_fastopen_backup_key.sh
> 
> In tools/testing/selftests/net/tcp_fastopen_backup_key.sh line 33:
>         if [ $val -ne 0 ]; then
>              ^-- SC2086: Double quote to prevent globbing and word splitting.
> 
> Rework to add double quotes around the variable 'val' that shellcheck
> recommends.
> 
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> ---
>  tools/testing/selftests/net/tcp_fastopen_backup_key.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/net/tcp_fastopen_backup_key.sh
> b/tools/testing/selftests/net/tcp_fastopen_backup_key.sh
> index 41476399e184..ba5ec3eb314e 100755
> --- a/tools/testing/selftests/net/tcp_fastopen_backup_key.sh
> +++ b/tools/testing/selftests/net/tcp_fastopen_backup_key.sh
> @@ -30,7 +30,7 @@ do_test() {
>  	ip netns exec "${NETNS}" ./tcp_fastopen_backup_key "$1"
>  	val=$(ip netns exec "${NETNS}" nstat -az | \
>  		grep TcpExtTCPFastOpenPassiveFail | awk '{print $2}')
> -	if [ $val -ne 0 ]; then
> +	if [ "$val" -ne 0 ]; then

Did you test this in the failing environment?

With a busybox shell, I get:
 $ [ "" -ne 0 ]
sh: bad number

You might need to explicitly check for empty string here, or switch to a string comparison instead:
if [ "$val" != 0 ]; then

   -- Tim

>  		echo "FAIL: TcpExtTCPFastOpenPassiveFail non-zero"
>  		return 1
>  	fi
> --
> 2.20.1


^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Andy Lutomirski @ 2019-08-14 23:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Daniel Colascione, Song Liu, Kees Cook,
	Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	Lorenz Bauer, Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190814233335.37t4zfsiswrpd4d6@ast-mbp.dhcp.thefacebook.com>



> On Aug 14, 2019, at 4:33 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
>> On Wed, Aug 14, 2019 at 03:30:51PM -0700, Andy Lutomirski wrote:
>> 
>> 
>>>> On Aug 14, 2019, at 3:05 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>> 
>>>> On Wed, Aug 14, 2019 at 10:51:23AM -0700, Andy Lutomirski wrote:
>>>> 
>>>> If eBPF is genuinely not usable by programs that are not fully trusted
>>>> by the admin, then no kernel changes at all are needed.  Programs that
>>>> want to reduce their own privileges can easily fork() a privileged
>>>> subprocess or run a little helper to which they delegate BPF
>>>> operations.  This is far more flexible than anything that will ever be
>>>> in the kernel because it allows the helper to verify that the rest of
>>>> the program is doing exactly what it's supposed to and restrict eBPF
>>>> operations to exactly the subset that is needed.  So a container
>>>> manager or network manager that drops some provilege could have a
>>>> little bpf-helper that manages its BPF XDP, firewalling, etc
>>>> configuration.  The two processes would talk over a socketpair.
>>> 
>>> there were three projects that tried to delegate bpf operations.
>>> All of them failed.
>>> bpf operational workflow is much more complex than you're imagining.
>>> fork() also doesn't work for all cases.
>>> I gave this example before: consider multiple systemd-like deamons
>>> that need to do bpf operations that want to pass this 'bpf capability'
>>> to other deamons written by other teams. Some of them will start
>>> non-root, but still need to do bpf. They will be rpm installed
>>> and live upgraded while running.
>>> We considered to make systemd such centralized bpf delegation
>>> authority too. It didn't work. bpf in kernel grows quickly.
>>> libbpf part grows independently. llvm keeps evolving.
>>> All of them are being changed while system overall has to stay
>>> operational. Centralized approach breaks apart.
>>> 
>>>> The interesting cases you're talking about really *do* involved
>>>> unprivileged or less privileged eBPF, though.  Let's see:
>>>> 
>>>> systemd --user: systemd --user *is not privileged at all*.  There's no
>>>> issue of reducing privilege, since systemd --user doesn't have any
>>>> privilege to begin with.  But systemd supports some eBPF features, and
>>>> presumably it would like to support them in the systemd --user case.
>>>> This is unprivileged eBPF.
>>> 
>>> Let's disambiguate the terminology.
>>> This /dev/bpf patch set started as describing the feature as 'unprivileged bpf'.
>>> I think that was a mistake.
>>> Let's call systemd-like deamon usage of bpf 'less privileged bpf'.
>>> This is not unprivileged.
>>> 'unprivileged bpf' is what sysctl kernel.unprivileged_bpf_disabled controls.
>>> 
>>> There is a huge difference between the two.
>>> I'm against extending 'unprivileged bpf' even a bit more than what it is
>>> today for many reasons mentioned earlier.
>>> The /dev/bpf is about 'less privileged'.
>>> Less privileged than root. We need to split part of full root capability
>>> into bpf capability. So that most of the root can be dropped.
>>> This is very similar to what cap_net_admin does.
>>> cap_net_amdin can bring down eth0 which is just as bad as crashing the box.
>>> cap_net_admin is very much privileged. Just 'less privileged' than root.
>>> Same thing for cap_bpf.
>> 
>> The new pseudo-capability in this patch set is absurdly broad. I’ve proposed some finer-grained divisions in this thread. Do you have comments on them?
> 
> Initially I agreed that it's probably too broad, but then realized
> that they're perfect as-is. There is no need to partition further.
> 
>>> May be we should do both cap_bpf and /dev/bpf to make it clear that
>>> this is the same thing. Two interfaces to achieve the same result.
>> 
>> What for?  If there’s a CAP_BPF, then why do you want /dev/bpf? Especially if you define it to do the same thing.
> 
> Indeed, ambient capabilities should work for all cases.
> 
>> No, I’m not.  I have no objection at all if you try to come up with a clear definition of what the capability checks do and what it means to grant a new permission to a task.  Changing *all* of the capable checks is needlessly broad.
> 
> There are not that many bits left. I prefer to consume single CAP_BPF bit.
> All capable(CAP_SYS_ADMIN) checks in kernel/bpf/ will become CAP_BPF.
> This is no-brainer.
> 
> The only question is whether few cases of CAP_NET_ADMIN in kernel/bpf/
> should be extended to CAP_BPF or not.
> imo devmap and xskmap can stay CAP_NET_ADMIN,
> but cgroup bpf attach/detach should be either CAP_NET_ADMIN or CAP_BPF.
> Initially cgroup-bpf hooks were limited to networking.
> It's no longer the case. Requiring NET_ADMIN there make little sense now.
> 

Cgroup bpf attach/detach, with the current API, gives very strong control over the whole system, and it will just get stronger as bpf gains features. Making it CAP_BPF means that you will never have the ability to make CAP_BPF safe to give to anything other than an extremely highly trusted process.  Unsafe pointers are similar.  The rest could plausibly be hardened in the future, although the by_id stuff may be tricky too.

Do new programs really need the by_id calls?  It could make sense to leave those unchanged and to have new programs use persistent maps instead.

^ permalink raw reply

* Re: [PATCH net-next v2 11/14] netdevsim: Add devlink-trap support
From: Jakub Kicinski @ 2019-08-14 23:59 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, nhorman, jiri, toke, dsahern, roopa, nikolay, andy,
	f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190813075400.11841-12-idosch@idosch.org>

On Tue, 13 Aug 2019 10:53:57 +0300, Ido Schimmel wrote:
> diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> index 08ca59fc189b..2758d95c8d18 100644
> --- a/drivers/net/netdevsim/dev.c
> +++ b/drivers/net/netdevsim/dev.c
> @@ -17,11 +17,21 @@
>  
>  #include <linux/debugfs.h>
>  #include <linux/device.h>
> +#include <linux/etherdevice.h>
> +#include <linux/inet.h>
> +#include <linux/jiffies.h>
> +#include <linux/kernel.h>
>  #include <linux/list.h>
>  #include <linux/mutex.h>
>  #include <linux/random.h>
> +#include <linux/workqueue.h>
> +#include <linux/random.h>
>  #include <linux/rtnetlink.h>
>  #include <net/devlink.h>
> +#include <net/ip.h>
> +#include <uapi/linux/devlink.h>
> +#include <uapi/linux/ip.h>
> +#include <uapi/linux/udp.h>

Please keep includes ordered alphabetically. You're adding
linux/random.h second time.

>  #include "netdevsim.h"

> +static void nsim_dev_trap_report(struct nsim_dev_port *nsim_dev_port)
> +{
> +	struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
> +	struct nsim_trap_data *nsim_trap_data = nsim_dev->trap_data;
> +	struct devlink *devlink = priv_to_devlink(nsim_dev);
> +	int i;

reverse christmas tree, please

^ permalink raw reply

* [PATCH bpf-next] tools: libbpf: update extended attributes version of bpf_object__open()
From: Anton Protopopov @ 2019-08-15  0:03 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, netdev, bpf,
	linux-kernel
  Cc: Anton Protopopov

Update the bpf_object_open_attr structure and corresponding code so that the
bpf_object__open_xattr function could be used to open objects from buffers as
well as from files.  The reason for this change is that the existing
bpf_object__open_buffer function doesn't provide a way to specify neither the
needs_kver nor flags parameters to the internal call to __bpf_object__open
which makes it inconvenient for loading BPF objects which doesn't require a
kernel version.

Two new fields, obj_buf and obj_buf_sz, were added to the structure, and the
file field was union'ed with obj_name so that one can open an object like this:

    struct bpf_object_open_attr attr = {
        .obj_name   = name,
        .obj_buf    = obj_buf,
        .obj_buf_sz = obj_buf_sz,
        .prog_type  = BPF_PROG_TYPE_UNSPEC,
    };
    return bpf_object__open_xattr(&attr);

while still being able to use the file semantics:

    struct bpf_object_open_attr attr = {
        .file       = path,
        .prog_type  = BPF_PROG_TYPE_UNSPEC,
    };
    return bpf_object__open_xattr(&attr);

Another thing to note is that since the commit c034a177d3c8 ("bpf: bpftool, add
flag to allow non-compat map definitions") which introduced a MAPS_RELAX_COMPAT
flag to load objects with non-compat map definitions, bpf_object__open_buffer
was called with this flag enabled (it was passed as the boolean true value in
flags argument to __bpf_object__open).  The default behaviour for all open
functions is to clear this flag and this patch changes bpf_object__open_buffer
to clears this flag.  It can be enabled, if needed, by opening an object from
buffer using __bpf_object__open_xattr.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
---
 tools/lib/bpf/libbpf.c | 45 ++++++++++++++++++++++++++----------------
 tools/lib/bpf/libbpf.h |  7 ++++++-
 2 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2233f919dd88..7c8054afd901 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -3630,13 +3630,31 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
 struct bpf_object *__bpf_object__open_xattr(struct bpf_object_open_attr *attr,
 					    int flags)
 {
+	char tmp_name[64];
+
 	/* param validation */
-	if (!attr->file)
+	if (!attr)
 		return NULL;
 
-	pr_debug("loading %s\n", attr->file);
+	if (attr->obj_buf) {
+		if (attr->obj_buf_sz <= 0)
+			return NULL;
+		if (!attr->file) {
+			snprintf(tmp_name, sizeof(tmp_name), "%lx-%lx",
+				 (unsigned long)attr->obj_buf,
+				 (unsigned long)attr->obj_buf_sz);
+			attr->obj_name = tmp_name;
+		}
+		pr_debug("loading object '%s' from buffer\n", attr->obj_name);
+	} else if (!attr->file) {
+		return NULL;
+	} else {
+		attr->obj_buf_sz = 0;
 
-	return __bpf_object__open(attr->file, NULL, 0,
+		pr_debug("loading object file '%s'\n", attr->file);
+	}
+
+	return __bpf_object__open(attr->file, attr->obj_buf, attr->obj_buf_sz,
 				  bpf_prog_type__needs_kver(attr->prog_type),
 				  flags);
 }
@@ -3660,21 +3678,14 @@ struct bpf_object *bpf_object__open_buffer(void *obj_buf,
 					   size_t obj_buf_sz,
 					   const char *name)
 {
-	char tmp_name[64];
-
-	/* param validation */
-	if (!obj_buf || obj_buf_sz <= 0)
-		return NULL;
-
-	if (!name) {
-		snprintf(tmp_name, sizeof(tmp_name), "%lx-%lx",
-			 (unsigned long)obj_buf,
-			 (unsigned long)obj_buf_sz);
-		name = tmp_name;
-	}
-	pr_debug("loading object '%s' from buffer\n", name);
+	struct bpf_object_open_attr attr = {
+		.obj_name	= name,
+		.obj_buf	= obj_buf,
+		.obj_buf_sz	= obj_buf_sz,
+		.prog_type	= BPF_PROG_TYPE_UNSPEC,
+	};
 
-	return __bpf_object__open(name, obj_buf, obj_buf_sz, true, true);
+	return bpf_object__open_xattr(&attr);
 }
 
 int bpf_object__unload(struct bpf_object *obj)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e8f70977d137..634f278578dd 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -63,8 +63,13 @@ LIBBPF_API libbpf_print_fn_t libbpf_set_print(libbpf_print_fn_t fn);
 struct bpf_object;
 
 struct bpf_object_open_attr {
-	const char *file;
+	union {
+		const char *file;
+		const char *obj_name;
+	};
 	enum bpf_prog_type prog_type;
+	void *obj_buf;
+	size_t obj_buf_sz;
 };
 
 LIBBPF_API struct bpf_object *bpf_object__open(const char *path);
-- 
2.19.1


^ permalink raw reply related

* [PATCH] ipvlan: set hw_enc_features like macvlan
From: Bill Sommerfeld @ 2019-08-15  0:10 UTC (permalink / raw)
  To: David S. Miller, Petr Machata
  Cc: Jiri Pirko, Ido Schimmel, Daniel Borkmann, YueHaibing,
	Thomas Gleixner, Miaohe Lin, Eric Dumazet, Mahesh Bandewar,
	netdev, linux-kernel, Bill Sommerfeld

Allow encapsulated packets sent to tunnels layered over ipvlan to use
offloads rather than forcing SW fallbacks.

Since commit f21e5077010acda73a60 ("macvlan: add offload features for
encapsulation"), macvlan has set dev->hw_enc_features to include
everything in dev->features; do likewise in ipvlan.

Signed-off-by: Bill Sommerfeld <wsommerfeld@google.com>
---
 drivers/net/ipvlan/ipvlan_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 1c96bed5a7c4..887bbba4631e 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -126,6 +126,7 @@ static int ipvlan_init(struct net_device *dev)
 		     (phy_dev->state & IPVLAN_STATE_MASK);
 	dev->features = phy_dev->features & IPVLAN_FEATURES;
 	dev->features |= NETIF_F_LLTX | NETIF_F_VLAN_CHALLENGED;
+	dev->hw_enc_features |= dev->features;
 	dev->gso_max_size = phy_dev->gso_max_size;
 	dev->gso_max_segs = phy_dev->gso_max_segs;
 	dev->hard_header_len = phy_dev->hard_header_len;
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-15  0:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Daniel Colascione, Song Liu, Kees Cook,
	Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	Lorenz Bauer, Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <317422C3-ACE3-42A7-A287-7B8FEE12E33A@amacapital.net>

On Wed, Aug 14, 2019 at 04:59:18PM -0700, Andy Lutomirski wrote:
> 
> 
> > On Aug 14, 2019, at 4:33 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > 
> >> On Wed, Aug 14, 2019 at 03:30:51PM -0700, Andy Lutomirski wrote:
> >> 
> >> 
> >>>> On Aug 14, 2019, at 3:05 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >>>> 
> >>>> On Wed, Aug 14, 2019 at 10:51:23AM -0700, Andy Lutomirski wrote:
> >>>> 
> >>>> If eBPF is genuinely not usable by programs that are not fully trusted
> >>>> by the admin, then no kernel changes at all are needed.  Programs that
> >>>> want to reduce their own privileges can easily fork() a privileged
> >>>> subprocess or run a little helper to which they delegate BPF
> >>>> operations.  This is far more flexible than anything that will ever be
> >>>> in the kernel because it allows the helper to verify that the rest of
> >>>> the program is doing exactly what it's supposed to and restrict eBPF
> >>>> operations to exactly the subset that is needed.  So a container
> >>>> manager or network manager that drops some provilege could have a
> >>>> little bpf-helper that manages its BPF XDP, firewalling, etc
> >>>> configuration.  The two processes would talk over a socketpair.
> >>> 
> >>> there were three projects that tried to delegate bpf operations.
> >>> All of them failed.
> >>> bpf operational workflow is much more complex than you're imagining.
> >>> fork() also doesn't work for all cases.
> >>> I gave this example before: consider multiple systemd-like deamons
> >>> that need to do bpf operations that want to pass this 'bpf capability'
> >>> to other deamons written by other teams. Some of them will start
> >>> non-root, but still need to do bpf. They will be rpm installed
> >>> and live upgraded while running.
> >>> We considered to make systemd such centralized bpf delegation
> >>> authority too. It didn't work. bpf in kernel grows quickly.
> >>> libbpf part grows independently. llvm keeps evolving.
> >>> All of them are being changed while system overall has to stay
> >>> operational. Centralized approach breaks apart.
> >>> 
> >>>> The interesting cases you're talking about really *do* involved
> >>>> unprivileged or less privileged eBPF, though.  Let's see:
> >>>> 
> >>>> systemd --user: systemd --user *is not privileged at all*.  There's no
> >>>> issue of reducing privilege, since systemd --user doesn't have any
> >>>> privilege to begin with.  But systemd supports some eBPF features, and
> >>>> presumably it would like to support them in the systemd --user case.
> >>>> This is unprivileged eBPF.
> >>> 
> >>> Let's disambiguate the terminology.
> >>> This /dev/bpf patch set started as describing the feature as 'unprivileged bpf'.
> >>> I think that was a mistake.
> >>> Let's call systemd-like deamon usage of bpf 'less privileged bpf'.
> >>> This is not unprivileged.
> >>> 'unprivileged bpf' is what sysctl kernel.unprivileged_bpf_disabled controls.
> >>> 
> >>> There is a huge difference between the two.
> >>> I'm against extending 'unprivileged bpf' even a bit more than what it is
> >>> today for many reasons mentioned earlier.
> >>> The /dev/bpf is about 'less privileged'.
> >>> Less privileged than root. We need to split part of full root capability
> >>> into bpf capability. So that most of the root can be dropped.
> >>> This is very similar to what cap_net_admin does.
> >>> cap_net_amdin can bring down eth0 which is just as bad as crashing the box.
> >>> cap_net_admin is very much privileged. Just 'less privileged' than root.
> >>> Same thing for cap_bpf.
> >> 
> >> The new pseudo-capability in this patch set is absurdly broad. I’ve proposed some finer-grained divisions in this thread. Do you have comments on them?
> > 
> > Initially I agreed that it's probably too broad, but then realized
> > that they're perfect as-is. There is no need to partition further.
> > 
> >>> May be we should do both cap_bpf and /dev/bpf to make it clear that
> >>> this is the same thing. Two interfaces to achieve the same result.
> >> 
> >> What for?  If there’s a CAP_BPF, then why do you want /dev/bpf? Especially if you define it to do the same thing.
> > 
> > Indeed, ambient capabilities should work for all cases.
> > 
> >> No, I’m not.  I have no objection at all if you try to come up with a clear definition of what the capability checks do and what it means to grant a new permission to a task.  Changing *all* of the capable checks is needlessly broad.
> > 
> > There are not that many bits left. I prefer to consume single CAP_BPF bit.
> > All capable(CAP_SYS_ADMIN) checks in kernel/bpf/ will become CAP_BPF.
> > This is no-brainer.
> > 
> > The only question is whether few cases of CAP_NET_ADMIN in kernel/bpf/
> > should be extended to CAP_BPF or not.
> > imo devmap and xskmap can stay CAP_NET_ADMIN,
> > but cgroup bpf attach/detach should be either CAP_NET_ADMIN or CAP_BPF.
> > Initially cgroup-bpf hooks were limited to networking.
> > It's no longer the case. Requiring NET_ADMIN there make little sense now.
> > 
> 
> Cgroup bpf attach/detach, with the current API, gives very strong control over the whole system, and it will just get stronger as bpf gains features. Making it CAP_BPF means that you will never have the ability to make CAP_BPF safe to give to anything other than an extremely highly trusted process.  Unsafe pointers are similar. 

'never to less trusted process' ? why do you think so?
I don't see a problem adding /dev/bpf/foo in the future and make things
more granular. There is no such use case today. Hence I don't want to
spend time and design something without clear use case in mind.

> Do new programs really need the by_id calls? 

yes. Lorenz gave an example earlier. map-in-map returns map_id.
To operate on that map by_id is needed.


^ permalink raw reply

* Re: [PATCH net-next v2 13/14] selftests: devlink_trap: Add test cases for devlink-trap
From: Jakub Kicinski @ 2019-08-15  0:42 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, nhorman, jiri, toke, dsahern, roopa, nikolay, andy,
	f.fainelli, andrew, vivien.didelot, mlxsw, Ido Schimmel
In-Reply-To: <20190813075400.11841-14-idosch@idosch.org>

On Tue, 13 Aug 2019 10:53:59 +0300, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> Add test cases for devlink-trap on top of the netdevsim implementation.
> 
> The tests focus on the devlink-trap core infrastructure and user space
> API. They test both good and bad flows and also dismantle of the netdev
> and devlink device used to report trapped packets.
> 
> This allows device drivers to focus their tests on device-specific
> functionality.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>

Thanks for the test!

Should it perhaps live in:
tools/testing/selftests/drivers/net/netdevsim/
?

That's where Jiri puts his devlink tests..

Also the test seems to require netdevsim to be loaded, otherwise:
# ./devlink_trap.sh 
SKIP: No netdevsim support

Is that expected?

^ permalink raw reply

* Re: [patch net-next v2 2/2] selftests: netdevsim: add devlink params tests
From: Jakub Kicinski @ 2019-08-15  1:09 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, mlxsw
In-Reply-To: <20190814152604.6385-3-jiri@resnulli.us>

On Wed, 14 Aug 2019 17:26:04 +0200, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Test recently added netdevsim devlink param implementation.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
> v1->v2:
> -using cmd_jq helper

Still failing here :(

# ./devlink.sh 
TEST: fw flash test                                                 [ OK ]
TEST: params test                                                   [FAIL]
	Failed to get test1 param value
TEST: regions test                                                  [ OK ]

# jq --version
jq-1.5-1-a5b5cbe
# echo '{ "a" : false }' | jq -e -r '.[]'
false
# echo $?
1

On another machine:

$ echo '{ "a" : false }' | jq -e -r '.[]'
false
$ echo $?
1

Did you mean to drop the -e ?

^ permalink raw reply

* Re: [patch net-next v3 1/2] netdevsim: implement support for devlink region and snapshots
From: Jakub Kicinski @ 2019-08-15  1:14 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, mlxsw
In-Reply-To: <20190814153735.6923-2-jiri@resnulli.us>

On Wed, 14 Aug 2019 17:37:34 +0200, Jiri Pirko wrote:
> diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> index 08ca59fc189b..125a0358bc04 100644
> --- a/drivers/net/netdevsim/dev.c
> +++ b/drivers/net/netdevsim/dev.c
> @@ -27,6 +27,41 @@
>  
>  static struct dentry *nsim_dev_ddir;
>  
> +#define NSIM_DEV_DUMMY_REGION_SIZE (1024 * 32)
> +
> +static ssize_t nsim_dev_take_snapshot_write(struct file *file,
> +					    const char __user *data,
> +					    size_t count, loff_t *ppos)
> +{
> +	struct nsim_dev *nsim_dev = file->private_data;
> +	void *dummy_data;
> +	u32 id;
> +	int err;

If you have to rebase on top of Ido's changes and repost, please do
reverse xmas tree.

^ permalink raw reply

* Re: [PATCH net-next 1/5] RDS: Re-add pf/sol access via sysctl
From: David Miller @ 2019-08-15  1:25 UTC (permalink / raw)
  To: gerd.rausch; +Cc: santosh.shilimkar, dledford, netdev, linux-rdma, rds-devel
In-Reply-To: <1c6d1f04-96d5-94e5-3140-d3da194e14f3@oracle.com>

From: Gerd Rausch <gerd.rausch@oracle.com>
Date: Wed, 14 Aug 2019 14:45:21 -0700

> a) It is utterly inappropriate to have Oracle applications
>    rely on a /proc/sys API that was kept out of Upstream-Linux
>    for this long

Yes.

> 
> b) It is utterly inappropriate to include such a /proc/sys API
>    that Oracle applications have depended on this late

Also yes.

^ permalink raw reply

* Re: WARNING in cgroup_rstat_updated
From: syzbot @ 2019-08-15  1:26 UTC (permalink / raw)
  To: ast, daniel, hdanton, john.fastabend, linux-kernel, linux-mm,
	netdev, syzkaller-bugs
In-Reply-To: <000000000000b851cb058f7e637f@google.com>

syzbot has bisected this bug to:

commit e9db4ef6bf4ca9894bb324c76e01b8f1a16b2650
Author: John Fastabend <john.fastabend@gmail.com>
Date:   Sat Jun 30 13:17:47 2018 +0000

     bpf: sockhash fix omitted bucket lock in sock_close

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=143286e2600000
start commit:   31cc088a Merge tag 'drm-next-2019-07-19' of git://anongit...
git tree:       net-next
final crash:    https://syzkaller.appspot.com/x/report.txt?x=163286e2600000
console output: https://syzkaller.appspot.com/x/log.txt?x=123286e2600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4dba67bf8b8c9ad7
dashboard link: https://syzkaller.appspot.com/bug?extid=370e4739fa489334a4ef
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16dd57dc600000

Reported-by: syzbot+370e4739fa489334a4ef@syzkaller.appspotmail.com
Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: [PATCH 0/2] Netfilter updates for net-next
From: David Miller @ 2019-08-15  2:59 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20190814214347.4940-1-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 14 Aug 2019 23:43:45 +0200

> The following patchset contains Netfilter updates for net-next.
> This round addresses fallout from previous pull request:
> 
> 1) Remove #warning from ipt_LOG.h and ip6t_LOG.h headers,
>    from Jeremy Sowden.
> 
> 2) Incorrect parens in memcmp() in nft_bitwise, from Nathan Chancellor.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH] virtio-net: lower min ring num_free for efficiency
From: Jason Wang @ 2019-08-15  3:01 UTC (permalink / raw)
  To: ? jiang, mst@redhat.com
  Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, kafai@fb.com, songliubraving@fb.com,
	yhs@fb.com, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	xdp-newbies@vger.kernel.org, bpf@vger.kernel.org,
	jiangran.jr@alibaba-inc.com
In-Reply-To: <BYAPR14MB3205E4E194942B0A1A91A222A6AD0@BYAPR14MB3205.namprd14.prod.outlook.com>


On 2019/8/14 上午10:06, ? jiang wrote:
> This change lowers ring buffer reclaim threshold from 1/2*queue to budget
> for better performance. According to our test with qemu + dpdk, packet
> dropping happens when the guest is not able to provide free buffer in
> avail ring timely with default 1/2*queue. The value in the patch has been
> tested and does show better performance.


Please add your tests setup and result here.

Thanks


>
> Signed-off-by: jiangkidd <jiangkidd@hotmail.com>
> ---
>   drivers/net/virtio_net.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0d4115c9e20b..bc08be7925eb 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>   		}
>   	}
>   
> -	if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
> +	if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
>   		if (!try_fill_recv(vi, rq, GFP_ATOMIC))
>   			schedule_delayed_work(&vi->refill, 0);
>   	}

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox